From nnorwitz at gmail.com  Thu Jun  1 06:19:25 2006
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Wed, 31 May 2006 21:19:25 -0700
Subject: [Python-3000] Using a list for *args (was: Type annotations:
	annotating generators)
In-Reply-To: <43aa6ff70605311233i6f8195fdye2ed52fc559830ea@mail.gmail.com>
References: <43aa6ff70605271348y352921f6he107ba1f40a0393a@mail.gmail.com>
	<ca471dc20605292030q4f13b9fv7df3397edaab537@mail.gmail.com>
	<43aa6ff70605311233i6f8195fdye2ed52fc559830ea@mail.gmail.com>
Message-ID: <ee2a432c0605312119vb53236xcbdf1a6d33acbc1d@mail.gmail.com>

On 5/31/06, Collin Winter <collinw at gmail.com> wrote:
>
> All in all, the tuple->list change was minimally invasive.
>
> Overall, I've chosen to keep the external interfaces of the changed
> modules/packages the same; if there's a desire to change them later,
> this SVN commit can be used to figure out where adjustments should be
> made. Most of the changes involve the test suite, primarily where
> higher-order functions are concerned.
>
> I've submitted a patch to implement this change as SF #1498441
> (http://python.orf/sf/1498441); it's assigned to Guido.

.org that is :-)

Could you run a benchmark before and after this patch?  I'd like to
know speed diff. Something like:

./python.exe -mtimeit 'def foo(*args): pass' 'foo()'
./python.exe -mtimeit 'def foo(*args): pass' 'foo(1)'
./python.exe -mtimeit 'def foo(*args): pass' 'foo(1, 2)'
./python.exe -mtimeit 'def foo(*args): pass' 'foo(1, 2, 3)'
./python.exe -mtimeit 'def foo(*args): pass' 'foo(*range(10))'

You can post the speeds in the patch.

Thanks,
n

From ncoghlan at gmail.com  Thu Jun  1 12:04:00 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 01 Jun 2006 20:04:00 +1000
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <bbaeab100605311212s40b1372bt4e91d929ca1d8186@mail.gmail.com>
References: <44716940.9000300@acm.org>
	<4472B196.7070506@acm.org>	<ca471dc20605230817x331241e6r45e63c4c1c0eb8ed@mail.gmail.com>	<447BC126.8050107@acm.org>	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>	<1149080922.5718.20.camel@fsol>	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>	<1149095977.5718.51.camel@fsol>	<bbaeab100605311131h37f7f231k1c30ebb20cf873f0@mail.gmail.com>	<1149102266.5718.62.camel@fsol>
	<bbaeab100605311212s40b1372bt4e91d929ca1d8186@mail.gmail.com>
Message-ID: <447EBB90.20104@gmail.com>

Brett Cannon wrote:
>     So perhaps there is a way to create some kind of "virtual packages" or
>     "categories" in which existing modules could register themselves. This
>     could allow third-party modules (e.g. "gtk") to register themselves in
>     stdlib-supplied virtual packages (e.g. "gui"), for documentation and
>     findability purposes. "import gui; help(gui)" would give you the list of
>     available modules.
> 
> 
> I see possible problems with this because then we run into your issue 
> with packaging; where do things go?  At least with the stdlib we have 
> sufficient control to make sure things follow a standard in terms of 
> where thing should end up.
> 
> I would rather do an all-or-nothing solution to the whole package 
> hierarchy for the stdlib.  Does anyone else have an opinion on any of 
> this since this ending up just being fundamental differences in how two 
> people like to organize modules?

Hmm, much as I hate to jump on a Web bandwagon, this just rang the 'tagging' 
bell in my head.

XML, for instance, would fit in the "filefmt" category, but it also fits in 
categories like "datastruct", "markup", "parsing", "net" and "protocol".

If Py3k improves the ability to access metadata associated with packages and 
modules (name, docstring, '__all__', etc) without actually importing the same, 
then it would be possible for "help(tag='gui')" to return one line 
descriptions for all modules that are marked as being 'gui' related.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From p.f.moore at gmail.com  Thu Jun  1 13:29:28 2006
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 1 Jun 2006 12:29:28 +0100
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
References: <44716940.9000300@acm.org> <4472B196.7070506@acm.org>
	<ca471dc20605230817x331241e6r45e63c4c1c0eb8ed@mail.gmail.com>
	<447BC126.8050107@acm.org>
	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>
	<1149080922.5718.20.camel@fsol>
	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>
	<1149095977.5718.51.camel@fsol>
	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>
	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
Message-ID: <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>

On 5/31/06, Brett Cannon <brett at python.org> wrote:
> Why would a 3rd-party module be installed into the stdlib namespace?
> net.jabber wouldn't exist unless it was in the stdlib or the module's author
> decided to be snarky and inject their module into the stdlib namespace.

Do you really want the stdlib to "steal" all of the simple names (like
net, gui, data, ...)? While I don't think it's a particularly good
idea for 3rd party modules to use such names, I'm not too keen on
having them made effectively "reserved", either.

And if there was a "net" package which contained all the networking
modules in the stdlib, then yes I would expect a 3rd party developer
of a jabber module to want to take advantage of the hierarchy and
inject itself into the "net" namespace. Which would actually make name
collisions worse rather than better. [Although, evidence from the
current os module seems to imply that this is less of an issue than
I'm claiming, in practice...]

Paul.

From ronaldoussoren at mac.com  Thu Jun  1 16:51:03 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Thu, 1 Jun 2006 16:51:03 +0200
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
References: <44716940.9000300@acm.org> <4472B196.7070506@acm.org>
	<ca471dc20605230817x331241e6r45e63c4c1c0eb8ed@mail.gmail.com>
	<447BC126.8050107@acm.org>
	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>
	<1149080922.5718.20.camel@fsol>
	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>
	<1149095977.5718.51.camel@fsol>
	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>
	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
Message-ID: <477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com>

On 1-jun-2006, at 13:29, Paul Moore wrote:

> On 5/31/06, Brett Cannon <brett at python.org> wrote:
>> Why would a 3rd-party module be installed into the stdlib namespace?
>> net.jabber wouldn't exist unless it was in the stdlib or the  
>> module's author
>> decided to be snarky and inject their module into the stdlib  
>> namespace.
>
> Do you really want the stdlib to "steal" all of the simple names (like
> net, gui, data, ...)? While I don't think it's a particularly good
> idea for 3rd party modules to use such names, I'm not too keen on
> having them made effectively "reserved", either.

That was my feeling too, except that I haven't made my mind up on the  
merit of having 3th-party modules inside such packages. I don't think  
the risk of nameclashes would be greater than it is now, there's  
already an implicit nameing convention, or rather several of  
them ;-), for naming modules in the standard library.

The main problem I have with excluding 3th-party libraries from such  
generic toplevel packages in the standard library is that this  
increases the separation between stdlib and other code. I'd rather  
see a lean&mean standard library with a standard mechanism for adding  
more libraries and perhaps a central list of good libraries.

>
> And if there was a "net" package which contained all the networking
> modules in the stdlib, then yes I would expect a 3rd party developer
> of a jabber module to want to take advantage of the hierarchy and
> inject itself into the "net" namespace. Which would actually make name
> collisions worse rather than better. [Although, evidence from the
> current os module seems to imply that this is less of an issue than
> I'm claiming, in practice...]

I suppose that's at least partially not an issue at the moment  
because you can only add stuff to existing packages through hacks. I  
wouldn't touch libraries that inject themselves into existing  
packages through .pth hackery because of the juckyness of it [*].

Ronald

From brett at python.org  Thu Jun  1 17:44:02 2006
From: brett at python.org (Brett Cannon)
Date: Thu, 1 Jun 2006 08:44:02 -0700
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com>
References: <44716940.9000300@acm.org> <447BC126.8050107@acm.org>
	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>
	<1149080922.5718.20.camel@fsol>
	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>
	<1149095977.5718.51.camel@fsol>
	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>
	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
	<477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com>
Message-ID: <bbaeab100606010844s552e7918i481301082e706ac6@mail.gmail.com>

On 6/1/06, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>
>
> On 1-jun-2006, at 13:29, Paul Moore wrote:
>
> > On 5/31/06, Brett Cannon <brett at python.org> wrote:
> >> Why would a 3rd-party module be installed into the stdlib namespace?
> >> net.jabber wouldn't exist unless it was in the stdlib or the
> >> module's author
> >> decided to be snarky and inject their module into the stdlib
> >> namespace.
> >
> > Do you really want the stdlib to "steal" all of the simple names (like
> > net, gui, data, ...)? While I don't think it's a particularly good
> > idea for 3rd party modules to use such names, I'm not too keen on
> > having them made effectively "reserved", either.
>
> That was my feeling too, except that I haven't made my mind up on the
> merit of having 3th-party modules inside such packages. I don't think
> the risk of nameclashes would be greater than it is now, there's
> already an implicit nameing convention, or rather several of
> them ;-), for naming modules in the standard library.

Right.  And as Paul said in his email, the os module has shown this is not
an issue.  As long as the names are known ahead of time there is not much of
a problem.

The main problem I have with excluding 3th-party libraries from such
> generic toplevel packages in the standard library is that this
> increases the separation between stdlib and other code. I'd rather
> see a lean&mean standard library with a standard mechanism for adding
> more libraries and perhaps a central list of good libraries.

Well, personally I would like to clean up the stdlib, but I don't want to
make it too lean since the whole "Batteries Included" thing is handy.  As
for sanctioned libraries that don't come included, that could be possible,
but the politics of picking the libraries could be nasty.

>
> > And if there was a "net" package which contained all the networking
> > modules in the stdlib, then yes I would expect a 3rd party developer
> > of a jabber module to want to take advantage of the hierarchy and
> > inject itself into the "net" namespace. Which would actually make name
> > collisions worse rather than better. [Although, evidence from the
> > current os module seems to imply that this is less of an issue than
> > I'm claiming, in practice...]
>
> I suppose that's at least partially not an issue at the moment
> because you can only add stuff to existing packages through hacks. I
> wouldn't touch libraries that inject themselves into existing
> packages through .pth hackery because of the juckyness of it [*].

Yeah, something better than .pth files would be good.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060601/5b596939/attachment.htm 

From jcarlson at uci.edu  Thu Jun  1 18:12:34 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Thu, 01 Jun 2006 09:12:34 -0700
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
References: <bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
Message-ID: <20060601090456.6993.JCARLSON@uci.edu>

"Paul Moore" <p.f.moore at gmail.com> wrote:
> 
> On 5/31/06, Brett Cannon <brett at python.org> wrote:
> > Why would a 3rd-party module be installed into the stdlib namespace?
> > net.jabber wouldn't exist unless it was in the stdlib or the module's author
> > decided to be snarky and inject their module into the stdlib namespace.
> 
> Do you really want the stdlib to "steal" all of the simple names (like
> net, gui, data, ...)? While I don't think it's a particularly good
> idea for 3rd party modules to use such names, I'm not too keen on
> having them made effectively "reserved", either.

This is one reason why I was suggesting the 'py' (or other) top level
package; then we would really have py.net, py.gui, py.data, etc., which
would presumably avoid name collisions, and wouldn't reserve the generic
names.

As for 3rd party modules, that is those modules that would (or should)
go into the site-packages right now, I'm not sure I like the idea of
having them inject themselves into the "package heirarchy" of the
standard library, though it wouldn't be too terribly difficult with an
import hook combined with a setup hook*.

 - Josiah

* The setup hook creates and/or modifies a special "3rd party packages"
bit of metadata (presumably in XML).  This metadata describes two
things; where the module lies in the heirarchy registry, and where it
actually lies in the filesystem.  The import hook would adjust the
__all__ or module/package dictionary on import to include the names of
the modules that are importable, as known by the metadata registry.

From bingham at cenix-bioscience.com  Thu Jun  1 18:03:16 2006
From: bingham at cenix-bioscience.com (Aaron Bingham)
Date: Thu, 01 Jun 2006 18:03:16 +0200
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
References: <44716940.9000300@acm.org>
	<4472B196.7070506@acm.org>	<ca471dc20605230817x331241e6r45e63c4c1c0eb8ed@mail.gmail.com>	<447BC126.8050107@acm.org>	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>	<1149080922.5718.20.camel@fsol>	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>	<1149095977.5718.51.camel@fsol>	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
Message-ID: <447F0FC4.1030906@cenix-bioscience.com>

Paul Moore wrote:

>On 5/31/06, Brett Cannon <brett at python.org> wrote:
>  
>
>>Why would a 3rd-party module be installed into the stdlib namespace?
>>net.jabber wouldn't exist unless it was in the stdlib or the module's author
>>decided to be snarky and inject their module into the stdlib namespace.
>>    
>>
>
>Do you really want the stdlib to "steal" all of the simple names (like
>net, gui, data, ...)? While I don't think it's a particularly good
>idea for 3rd party modules to use such names, I'm not too keen on
>having them made effectively "reserved", either.
>  
>
I'm confused.  As far as I can see, a reserved prefix (the "py" or 
"stdlib" package others have mentioned) is the only reliable way to 
avoid naming conflicts with 3rd-party packages with a growing standard 
library.  I suspect we wll be going round and round in circles here as 
long as a reserved prefix is ruled out.  IMO, multiple reserved prefixes 
("net", "gui", etc.) is much worse than one.  Could someone please 
explain for my sake why a single reserved prefix is not acceptable? 

Thanks,

-- 
--------------------------------------------------------------------
Aaron Bingham
Senior Software Engineer
Cenix BioScience GmbH
--------------------------------------------------------------------

From brett at python.org  Thu Jun  1 18:33:59 2006
From: brett at python.org (Brett Cannon)
Date: Thu, 1 Jun 2006 09:33:59 -0700
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <447F0FC4.1030906@cenix-bioscience.com>
References: <44716940.9000300@acm.org> <447BC126.8050107@acm.org>
	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>
	<1149080922.5718.20.camel@fsol>
	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>
	<1149095977.5718.51.camel@fsol>
	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>
	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
	<447F0FC4.1030906@cenix-bioscience.com>
Message-ID: <bbaeab100606010933j1918ffe2mf44154995879013c@mail.gmail.com>

On 6/1/06, Aaron Bingham <bingham at cenix-bioscience.com> wrote:
>
> Paul Moore wrote:
>
> >On 5/31/06, Brett Cannon <brett at python.org> wrote:
> >
> >
> >>Why would a 3rd-party module be installed into the stdlib namespace?
> >>net.jabber wouldn't exist unless it was in the stdlib or the module's
> author
> >>decided to be snarky and inject their module into the stdlib namespace.
> >>
> >>
> >
> >Do you really want the stdlib to "steal" all of the simple names (like
> >net, gui, data, ...)? While I don't think it's a particularly good
> >idea for 3rd party modules to use such names, I'm not too keen on
> >having them made effectively "reserved", either.
> >
> >
> I'm confused.  As far as I can see, a reserved prefix (the "py" or
> "stdlib" package others have mentioned) is the only reliable way to
> avoid naming conflicts with 3rd-party packages with a growing standard
> library.  I suspect we wll be going round and round in circles here as
> long as a reserved prefix is ruled out.  IMO, multiple reserved prefixes
> ("net", "gui", etc.) is much worse than one.  Could someone please
> explain for my sake why a single reserved prefix is not acceptable?

Guido doesn't like it.  =)  And he said he is going to ignore this topic
probably until we get a good consensus on what we want.  If we can get
almost everyone for it we may be able to convince him to change his mind.

That being said, I don't think the root name is needed if we keep the
hierarchy flat.  We have done fine so far without it.  But if we do have one
level of package organization then I think the root 'py' would be good.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060601/ac079110/attachment-0001.html 

From tjreedy at udel.edu  Thu Jun  1 19:41:40 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 1 Jun 2006 13:41:40 -0400
Subject: [Python-3000] packages in the stdlib
References: <44716940.9000300@acm.org>
	<447BC126.8050107@acm.org><bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com><1149080922.5718.20.camel@fsol><bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com><1149095977.5718.51.camel@fsol><430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com><bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com><79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com><477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com>
	<bbaeab100606010844s552e7918i481301082e706ac6@mail.gmail.com>
Message-ID: <e5n8sk$a0n$1@sea.gmane.org>

"Brett Cannon" <brett at python.org> wrote in message 
news:bbaeab100606010844s552e7918i481301082e706ac6 at mail.gmail.com...

>Well, personally I would like to clean up the stdlib, but I don't want to 
>make it >too lean since the whole "Batteries Included" thing is handy.

Definitely as to both.

> As for sanctioned libraries that don't come included, that could be 
> possible,
> but the politics of picking the libraries could be nasty.

Sanctioning possibly multiple libraries (with non-clashing names) in a 
category shoud be less nasty than picking just one to include and 
distribute with the library.  The criteria for sanction should be similar 
to that for inclusion -- such as maturity and commitment -- but without 
having to be 'the best'.

As a user, I think I would want to be able to plug things in.

Terry Jan Reedy

From tjreedy at udel.edu  Thu Jun  1 20:13:08 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 1 Jun 2006 14:13:08 -0400
Subject: [Python-3000] packages in the stdlib
References: <44716940.9000300@acm.org><4472B196.7070506@acm.org>	<ca471dc20605230817x331241e6r45e63c4c1c0eb8ed@mail.gmail.com>	<447BC126.8050107@acm.org>	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>	<1149080922.5718.20.camel@fsol>	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>	<1149095977.5718.51.camel@fsol>	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com><79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
	<447F0FC4.1030906@cenix-bioscience.com>
Message-ID: <e5nank$h7r$1@sea.gmane.org>

"Aaron Bingham" <bingham at cenix-bioscience.com> wrote in message 
news:447F0FC4.1030906 at cenix-bioscience.com...
> I'm confused.  As far as I can see, a reserved prefix (the "py" or
> "stdlib" package others have mentioned) is the only reliable way to
> avoid naming conflicts with 3rd-party packages with a growing standard
> library.

True, but..

> I suspect we wll be going round and round in circles here as
> long as a reserved prefix is ruled out.  IMO, multiple reserved prefixes
> ("net", "gui", etc.) is much worse than one.

But much better than a hundred or more ;-)

>  Could someone please
> explain for my sake why a single reserved prefix is not acceptable?

Because you have to type it over and over.  Because it is pure nuisance for 
simple usage of python with imports only or almost only from the standard 
lib.  Because it does nothing to organize the standard lib.  Because it 
would be in addition to any set of organizing prefixes such as 'net', 
'gui', etc, which are much more informative from a user viewpoint.

There are two separate issues being discussed here:
1) reducing/eliminating name clashes between stdlib and other modules;
2) organing the stdlib with a shallow hierarchy.

For the former, yes, a prefix on stdlib modules would work, but this most 
common case could/should be the default.  Requiring instead a prefix on all 
*other* imports would accomplish the same.  For instance, 's' for imports 
from site-packages and 'l' for imports of local modules on sys.path (which 
would then not have lib and lib/site-packages on it).

But the problem I see with this approach is that is says that the most 
important thing about a module is where it comes from, rather than what I 
does.

For the latter (2 above), I think those who want such mostly agree in 
principle on a mostly two-level hierarchy with about 10-20 short names for 
the top-level, using the lib docs as a starting point for the categories

The top level files should have nice doc strings so that import xyzt; 
help(xyz) gives a nice list of the contents of xyz.  To deal with the 
problem of cross-categorization, this doc couldalso have a 'See Also' 
section listing modules that might have been put in xyz and might be sought 
in xyz but which were actually put elsewhere.

Up in the air is the question of plugging in other modules not included in 
the stdlib.  With useful categories, this strikes me as a useful thing to 
do.  From a usage viewpoint, what a module does is more important than who 
wrote it and who distributes it.  When it become trivial to grab and 
install non-stdlib modules, then the distinction between stdlib and not 
becomes even less important.  If there is an approved list of plugins for 
each top level package, that can be included in the doc as well.  What 
would be really nice is if trying to import an uninstalled approved module 
would trigger an attempt to download and install it (in the appropriate 
package.)

Terry Jan Reedy

From collinw at gmail.com  Thu Jun  1 21:32:39 2006
From: collinw at gmail.com (Collin Winter)
Date: Thu, 1 Jun 2006 21:32:39 +0200
Subject: [Python-3000] Using a list for *args (was: Type annotations:
	annotating generators)
In-Reply-To: <ee2a432c0605312119vb53236xcbdf1a6d33acbc1d@mail.gmail.com>
References: <43aa6ff70605271348y352921f6he107ba1f40a0393a@mail.gmail.com>
	<ca471dc20605292030q4f13b9fv7df3397edaab537@mail.gmail.com>
	<43aa6ff70605311233i6f8195fdye2ed52fc559830ea@mail.gmail.com>
	<ee2a432c0605312119vb53236xcbdf1a6d33acbc1d@mail.gmail.com>
Message-ID: <43aa6ff70606011232v7e415faax52c0e900b03164f@mail.gmail.com>

On 6/1/06, Neal Norwitz <nnorwitz at gmail.com> wrote:
> Could you run a benchmark before and after this patch?  I'd like to
> know speed diff.

(Sorry you got this twice, Neal.)

I've attached the benchmarks as a comment on the patch, but I'll
repeat them here. All times are usecs per loop.

./python -mtimeit 'def foo(*args): pass' 'foo()'
As tuple: 1.56
As list:  1.7

./python -mtimeit 'def foo(*args): pass' 'foo(1)'
As tuple: 1.75
As list:  2.04

./python -mtimeit 'def foo(*args): pass' 'foo(1, 2)'
As tuple: 1.87
As list:  2.15

./python -mtimeit 'def foo(*args): pass' 'foo(1, 2, 3)'
As tuple: 1.95
As list:  2.3

./python -mtimeit 'def foo(*args): pass' 'foo(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)'
As tuple: 2.67
As list:  2.97

Collin Winter

From mcherm at mcherm.com  Thu Jun  1 22:19:36 2006
From: mcherm at mcherm.com (Michael Chermside)
Date: Thu, 01 Jun 2006 13:19:36 -0700
Subject: [Python-3000] Using a list for *args (was:
	Type	annotations:annotating generators)
Message-ID: <20060601131936.srakny7zvu3lwok0@login.werra.lunarpages.com>

Collin Winter writes:
> I've attached the benchmarks as a comment on the patch, but I'll
> repeat them here. All times are usecs per loop.
      [statistics showing list is about 15% slower]

My memory is fuzzy here. Can someone repeat for me the reasons
why we wanted to use list? Were we just trying it out to see how
it worked, or was there a desire to change? Was the desire to
change because it improved some uses of the C api, or was it
just for "purity" in use of tuples vs lists?

I'm not a "need for speed" kind of guy, but I can't remember what
the advantages of the list approach were supposed to be.

-------

By the way I'm curious about the following also:

# interpolating a list (I presume there's no advantage, but just checking)
./python -mtimeit 'def foo(*args): pass' 'foo(*range(10))'

# calling a function that doesn't use *args
./python -mtimeit 'def foo(): pass' 'foo()'
./python -mtimeit 'def foo(x): pass' 'foo(1)'
./python -mtimeit 'def foo(x,y): pass' 'foo(1,2)'
./python -mtimeit 'def foo(x,y,z): pass' 'foo(1,2,3)'

-- Michael Chermside

PS: Thanks, Collin, for trying this. I have to admit, I'm surprised
at how well-contained the changes turned out to be.

From mike.klaas at gmail.com  Thu Jun  1 23:16:25 2006
From: mike.klaas at gmail.com (Mike Klaas)
Date: Thu, 1 Jun 2006 14:16:25 -0700
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <e5nank$h7r$1@sea.gmane.org>
References: <44716940.9000300@acm.org>
	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>
	<1149080922.5718.20.camel@fsol>
	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>
	<1149095977.5718.51.camel@fsol>
	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>
	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
	<447F0FC4.1030906@cenix-bioscience.com> <e5nank$h7r$1@sea.gmane.org>
Message-ID: <3d2ce8cb0606011416t30333f4aq641c00d557760eae@mail.gmail.com>

Terry Reedy wrote:
> Because you have to type it over and over.

hmm,  With the right context manager:

import py
with py as py:
    from gui import tkinker
    import net
    with net as net:
        import httplib
        import urllib

-Mike

From ronaldoussoren at mac.com  Fri Jun  2 00:08:14 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Fri, 2 Jun 2006 00:08:14 +0200
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <bbaeab100606010844s552e7918i481301082e706ac6@mail.gmail.com>
References: <44716940.9000300@acm.org> <447BC126.8050107@acm.org>
	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>
	<1149080922.5718.20.camel@fsol>
	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>
	<1149095977.5718.51.camel@fsol>
	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>
	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
	<477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com>
	<bbaeab100606010844s552e7918i481301082e706ac6@mail.gmail.com>
Message-ID: <E206EA98-9A18-41E2-BB40-42D93B108320@mac.com>

On 1-jun-2006, at 17:44, Brett Cannon wrote:

>
>
> On 6/1/06, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
> On 1-jun-2006, at 13:29, Paul Moore wrote:
>
> > On 5/31/06, Brett Cannon <brett at python.org> wrote:
> >> Why would a 3rd-party module be installed into the stdlib  
> namespace?
> >> net.jabber wouldn't exist unless it was in the stdlib or the
> >> module's author
> >> decided to be snarky and inject their module into the stdlib
> >> namespace.
> >
> > Do you really want the stdlib to "steal" all of the simple names  
> (like
> > net, gui, data, ...)? While I don't think it's a particularly good
> > idea for 3rd party modules to use such names, I'm not too keen on
> > having them made effectively "reserved", either.
>
> That was my feeling too, except that I haven't made my mind up on the
> merit of having 3th-party modules inside such packages. I don't think
> the risk of nameclashes would be greater than it is now, there's
> already an implicit nameing convention, or rather several of
> them ;-), for naming modules in the standard library.
>
> Right.  And as Paul said in his email, the os module has shown this  
> is not an issue.  As long as the names are known ahead of time  
> there is not much of a problem.

And as I noted that's probably because the only ways to transparently  
patch the os module are evil .pth tricks and replacing files in the  
standard library. Neither are considered good style and therefore not  
something you'd do if you want someone to use your library. How would  
you react to a library on the cheeseshop that claims to add some  
useful functions to the os module? I'd be very, very hesitant to use  
such (hypothetical) library.

There is however nothing wrong with naming a library tftplib. That's  
would be the obvious name for a library that supports TFTP and blends  
nicely with the standard library naming convention for network  
libraries. It would be nice if 3th-party libraries could blend in  
even with a more structured standard library, even if that would only  
be possible for carefully selected portions of the standard library.

Not that this is a really big issue, the range of libraries on the  
cheeseshop is much, much larger than the functionality covered by the  
standard library ;-). There are of course also good reason for not  
wanting to follow the standard library conventions. Even if the  
stdlib would contain a 'gui' package for gui libraries and you could  
extend that from 3th-party code I'd not use that convention in PyObjC  
because its package structure explicitly mirrors that of the  
Objective-C libraries it wraps.

>
> The main problem I have with excluding 3th-party libraries from such
> generic toplevel packages in the standard library is that this
> increases the separation between stdlib and other code. I'd rather
> see a lean&mean standard library with a standard mechanism for adding
> more libraries and perhaps a central list of good libraries.
>
> Well, personally I would like to clean up the stdlib, but I don't  
> want to make it too lean since the whole "Batteries Included" thing  
> is handy.  As for sanctioned libraries that don't come included,  
> that could be possible, but the politics of picking the libraries  
> could be nasty.

How would that be more nasty than picking libraries to be included in  
the standard library?   A sanctioned library list would be a level  
between the standard library and random 3th-party code, basicly to  
avoid tying the release cycle of "obviously useful" software to that  
of python itself.

>
> >
> > And if there was a "net" package which contained all the networking
> > modules in the stdlib, then yes I would expect a 3rd party developer
> > of a jabber module to want to take advantage of the hierarchy and
> > inject itself into the "net" namespace. Which would actually make  
> name
> > collisions worse rather than better. [Although, evidence from the
> > current os module seems to imply that this is less of an issue than
> > I'm claiming, in practice...]
>
> I suppose that's at least partially not an issue at the moment
> because you can only add stuff to existing packages through hacks. I
> wouldn't touch libraries that inject themselves into existing
> packages through .pth hackery because of the juckyness of it [*].
>
> Yeah, something better than .pth files would be good.

There's nothing wrong with .pth files per-se and I use them  
regularly. The juckyness is in .pth files that contain lines that  
start with 'import', those can do very scary things such as hot- 
patching the standard library during python startup. That's something  
I don't like to see in production code.

Ronald

From brett at python.org  Fri Jun  2 00:15:08 2006
From: brett at python.org (Brett Cannon)
Date: Thu, 1 Jun 2006 15:15:08 -0700
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <E206EA98-9A18-41E2-BB40-42D93B108320@mac.com>
References: <44716940.9000300@acm.org> <1149080922.5718.20.camel@fsol>
	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>
	<1149095977.5718.51.camel@fsol>
	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>
	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
	<477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com>
	<bbaeab100606010844s552e7918i481301082e706ac6@mail.gmail.com>
	<E206EA98-9A18-41E2-BB40-42D93B108320@mac.com>
Message-ID: <bbaeab100606011515x5421531ay7ac93a8bc92c8a06@mail.gmail.com>

On 6/1/06, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>
>
> On 1-jun-2006, at 17:44, Brett Cannon wrote:
>
> >
> >
> > On 6/1/06, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
> > On 1-jun-2006, at 13:29, Paul Moore wrote:
> >
> > > On 5/31/06, Brett Cannon <brett at python.org> wrote:
> > >> Why would a 3rd-party module be installed into the stdlib
> > namespace?
> > >> net.jabber wouldn't exist unless it was in the stdlib or the
> > >> module's author
> > >> decided to be snarky and inject their module into the stdlib
> > >> namespace.
> > >
> > > Do you really want the stdlib to "steal" all of the simple names
> > (like
> > > net, gui, data, ...)? While I don't think it's a particularly good
> > > idea for 3rd party modules to use such names, I'm not too keen on
> > > having them made effectively "reserved", either.
> >
> > That was my feeling too, except that I haven't made my mind up on the
> > merit of having 3th-party modules inside such packages. I don't think
> > the risk of nameclashes would be greater than it is now, there's
> > already an implicit nameing convention, or rather several of
> > them ;-), for naming modules in the standard library.
> >
> > Right.  And as Paul said in his email, the os module has shown this
> > is not an issue.  As long as the names are known ahead of time
> > there is not much of a problem.
>
> And as I noted that's probably because the only ways to transparently
> patch the os module are evil .pth tricks and replacing files in the
> standard library. Neither are considered good style and therefore not
> something you'd do if you want someone to use your library. How would
> you react to a library on the cheeseshop that claims to add some
> useful functions to the os module? I'd be very, very hesitant to use
> such (hypothetical) library.

Exactly; I wouldn't touch it.  Which is why I don't like this idea of having
third-party modules add themselves to some stdlib package.

There is however nothing wrong with naming a library tftplib. That's
> would be the obvious name for a library that supports TFTP and blends
> nicely with the standard library naming convention for network
> libraries. It would be nice if 3th-party libraries could blend in
> even with a more structured standard library, even if that would only
> be possible for carefully selected portions of the standard library.

No, there is no problem.  If we stick with a flat stdlib this is what I
would push for.

Not that this is a really big issue, the range of libraries on the
> cheeseshop is much, much larger than the functionality covered by the
> standard library ;-). There are of course also good reason for not
> wanting to follow the standard library conventions. Even if the
> stdlib would contain a 'gui' package for gui libraries and you could
> extend that from 3th-party code I'd not use that convention in PyObjC
> because its package structure explicitly mirrors that of the
> Objective-C libraries it wraps.
>
> >
> > The main problem I have with excluding 3th-party libraries from such
> > generic toplevel packages in the standard library is that this
> > increases the separation between stdlib and other code. I'd rather
> > see a lean&mean standard library with a standard mechanism for adding
> > more libraries and perhaps a central list of good libraries.
> >
> > Well, personally I would like to clean up the stdlib, but I don't
> > want to make it too lean since the whole "Batteries Included" thing
> > is handy.  As for sanctioned libraries that don't come included,
> > that could be possible, but the politics of picking the libraries
> > could be nasty.
>
> How would that be more nasty than picking libraries to be included in
> the standard library?   A sanctioned library list would be a level
> between the standard library and random 3th-party code, basicly to
> avoid tying the release cycle of "obviously useful" software to that
> of python itself.

They are both nasty, and there is a reason why modules that have competitors
don't get added easily.  Look at all of the discussion it took to get
pysqlite added; it took two tries and a lot of emails to get that cleared.

-Brett

>
> > >
> > > And if there was a "net" package which contained all the networking
> > > modules in the stdlib, then yes I would expect a 3rd party developer
> > > of a jabber module to want to take advantage of the hierarchy and
> > > inject itself into the "net" namespace. Which would actually make
> > name
> > > collisions worse rather than better. [Although, evidence from the
> > > current os module seems to imply that this is less of an issue than
> > > I'm claiming, in practice...]
> >
> > I suppose that's at least partially not an issue at the moment
> > because you can only add stuff to existing packages through hacks. I
> > wouldn't touch libraries that inject themselves into existing
> > packages through .pth hackery because of the juckyness of it [*].
> >
> > Yeah, something better than .pth files would be good.
>
> There's nothing wrong with .pth files per-se and I use them
> regularly. The juckyness is in .pth files that contain lines that
> start with 'import', those can do very scary things such as hot-
> patching the standard library during python startup. That's something
> I don't like to see in production code.
>
> Ronald
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060601/45e00ef9/attachment.html 

From greg.ewing at canterbury.ac.nz  Fri Jun  2 03:03:07 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 02 Jun 2006 13:03:07 +1200
Subject: [Python-3000] Wild idea: Deferred Evaluation & Implicit Lambda
In-Reply-To: <200605301243.57103.tdickenson@geminidataloggers.com>
References: <447C1AB3.5000602@acm.org>
	<200605301243.57103.tdickenson@geminidataloggers.com>
Message-ID: <447F8E4B.6030205@canterbury.ac.nz>

Toby Dickenson wrote:

> The ?? operator first evaluated its left operand. If that succeeds its value 
> is returned. If that raised an exception it evaluates and returns its right 
> operand. That allowed your example to be written:
> 
> 	value = a[key] ?? b[key] ?? 0

That wouldn't make sense so much in Python, because you
don't usually want to catch all exceptions, only particular
ones. So the operator would need to be parameterised
somehow with the exception to catch, which would make
it much less concise.

A more Pythonic way would be something like

   value = a.get(key) or b.get(key) or 0

If some of your legitimate values can be false, you
might need to use functions that attempt to get a
value and return it wrapped somehow.

The thought has just occurred that what *might* be
useful here is an operator that works like "or",
except that the only value it recognises as "false"
is None. Not sure what to call it, though...

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Jun  2 03:14:32 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 02 Jun 2006 13:14:32 +1200
Subject: [Python-3000] weakrefs and cyclic references
In-Reply-To: <1d85506f0605311102la44fb40j65954db9bad9a29c@mail.gmail.com>
References: <1d85506f0605311102la44fb40j65954db9bad9a29c@mail.gmail.com>
Message-ID: <447F90F8.6060000@canterbury.ac.nz>

tomer filiba wrote:

> you can solve the problem using
> weakref.proxy
> ...
> so why not do this automatically?

I would *not* want to have some of my references chosen
at random and automatically made into weak ones. I may
temporarily create a cycle and later break it by removing
one of the references. If the other remaining reference had
been picked for auto-conversion into a weak reference, I
would lose the last reference to my object.

(Besides being undesirable, it would also be extremely
difficult to implement efficiently.)

What might be useful is an easier way of *explicitly*
creating and using weak references.

We already have WeakKeyDictionary and WeakValueDictionary
which behave just like ordinary dicts except that they
weakly reference things. I'm thinking it would be nice
to have a way of declaring any attribute to be a weak
reference. Then it could be read and written it in the
usual way, without all the code that uses it having
to know about its weakness.

This could probably be done fairly easily with a suitable
property descriptor.

--
Greg

From bingham at cenix-bioscience.com  Fri Jun  2 11:16:20 2006
From: bingham at cenix-bioscience.com (Aaron Bingham)
Date: Fri, 02 Jun 2006 11:16:20 +0200
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <e5nank$h7r$1@sea.gmane.org>
References: <44716940.9000300@acm.org><4472B196.7070506@acm.org>	<ca471dc20605230817x331241e6r45e63c4c1c0eb8ed@mail.gmail.com>	<447BC126.8050107@acm.org>	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>	<1149080922.5718.20.camel@fsol>	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>	<1149095977.5718.51.camel@fsol>	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com><79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>	<447F0FC4.1030906@cenix-bioscience.com>
	<e5nank$h7r$1@sea.gmane.org>
Message-ID: <448001E4.5070003@cenix-bioscience.com>

Terry Reedy wrote:

>"Aaron Bingham" <bingham at cenix-bioscience.com> wrote in message 
>news:447F0FC4.1030906 at cenix-bioscience.com...
>  
>
>>I'm confused.  As far as I can see, a reserved prefix (the "py" or
>>"stdlib" package others have mentioned) is the only reliable way to
>>avoid naming conflicts with 3rd-party packages with a growing standard
>>library.
>>    
>>
>
>True, but..
>
>  
>
>>I suspect we wll be going round and round in circles here as
>>long as a reserved prefix is ruled out.  IMO, multiple reserved prefixes
>>("net", "gui", etc.) is much worse than one.
>>    
>>
>
>But much better than a hundred or more ;-)
>  
>
The fewer the better of course.

>> Could someone please
>>explain for my sake why a single reserved prefix is not acceptable?
>>    
>>
>
>Because you have to type it over and over.
>
I tiny amount of pain for everyone to save a large amount of pain for a 
few (when their name gets used by a new stdlib package).

>There are two separate issues being discussed here:
>1) reducing/eliminating name clashes between stdlib and other modules;
>2) organing the stdlib with a shallow hierarchy.
>
>For the former, yes, a prefix on stdlib modules would work, but this most 
>common case could/should be the default.  
>
What the most common case is depends on what you are doing.  For people 
writing one-off scripts, stdlib imports will dominate; in the code I 
work on, stdlib imports are only a small fraction (I'd guess about 10%) 
of all imports.

>Requiring instead a prefix on all 
>*other* imports would accomplish the same.  For instance, 's' for imports 
>from site-packages and 'l' for imports of local modules on sys.path (which 
>would then not have lib and lib/site-packages on it).
>  
>
True, but having the name of a module depend on how you choose to 
install it on a particular machine seems dangerous.

>But the problem I see with this approach is that is says that the most 
>important thing about a module is where it comes from, rather than what I 
>does.
>  
>
Which is more important depends on what you are thinking about.  If I am 
just trying to get something working quickly, what the module does is 
most important; if I am trying to minimize external dependancies, where 
the module comes from is most important.

>For the latter (2 above), I think those who want such mostly agree in 
>principle on a mostly two-level hierarchy with about 10-20 short names for 
>the top-level, using the lib docs as a starting point for the categories
>  
>
That's fine with me, but I still think we need a top-level prefix.

>Up in the air is the question of plugging in other modules not included in 
>the stdlib.  With useful categories, this strikes me as a useful thing to 
>do.  From a usage viewpoint, what a module does is more important than who 
>wrote it and who distributes it.  
>
This strikes me as asking for naming conflicts.  An alternative approach 
would be to have a system of categories for documentation purposes that 
are not related to the package names.  Python could include support for 
searching by package category.

>When it become trivial to grab and 
>install non-stdlib modules, then the distinction between stdlib and not 
>becomes even less important.  
>
The distinction is still very important if I want my code to run with minimal fuss on anyone's machine.

Cheers,

-- 
--------------------------------------------------------------------
Aaron Bingham
Senior Software Engineer
Cenix BioScience GmbH
--------------------------------------------------------------------

From gmccaughan at synaptics-uk.com  Fri Jun  2 11:12:32 2006
From: gmccaughan at synaptics-uk.com (Gareth McCaughan)
Date: Fri, 2 Jun 2006 10:12:32 +0100
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <3d2ce8cb0606011416t30333f4aq641c00d557760eae@mail.gmail.com>
References: <44716940.9000300@acm.org> <e5nank$h7r$1@sea.gmane.org>
	<3d2ce8cb0606011416t30333f4aq641c00d557760eae@mail.gmail.com>
Message-ID: <200606021012.33299.gmccaughan@synaptics-uk.com>

On Thursday 2006-06-01 22:16, Mike Klaas wrote:
> Terry Reedy wrote:
> > Because you have to type it over and over.
> 
> hmm,  With the right context manager:
> 
> import py
> with py as py:
>     from gui import tkinker
>     import net
>     with net as net:
>         import httplib
>         import urllib

That's neat, but I think it's worse in both brevity and clarity
than

    import tkinter
    import httplib, urllib

or even (though here things would change as the number of
imported libraries in each category tends to infinity, which
in practice it probably doesn't) than

    import gui.tkinter
    import net.httplib
    import net.urllib

-- 
g

From ncoghlan at gmail.com  Fri Jun  2 12:53:39 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 02 Jun 2006 20:53:39 +1000
Subject: [Python-3000] weakrefs and cyclic references
In-Reply-To: <447F90F8.6060000@canterbury.ac.nz>
References: <1d85506f0605311102la44fb40j65954db9bad9a29c@mail.gmail.com>
	<447F90F8.6060000@canterbury.ac.nz>
Message-ID: <448018B3.7050102@gmail.com>

Greg Ewing wrote:
> What might be useful is an easier way of *explicitly*
> creating and using weak references.
> 
> We already have WeakKeyDictionary and WeakValueDictionary
> which behave just like ordinary dicts except that they
> weakly reference things. I'm thinking it would be nice
> to have a way of declaring any attribute to be a weak
> reference. Then it could be read and written it in the
> usual way, without all the code that uses it having
> to know about its weakness.
> 
> This could probably be done fairly easily with a suitable
> property descriptor.

Something like the following? (although you could do a simpler version without 
the callback support) (untested!)

class WeakAttr(object):
     """Descriptor to define weak instance attributes

     name is the name of the attribute
     callback is an optional callback function

     If supplied, the callback function is called with the
     instance and the attribute name as arguments after a currently
     referenced object is finalized.
     """
     def __init__(self, name, callback=None):
         self._name = name
         self._callback = callback

     def __get__(self, obj, cls):
         if obj is None:
             return self
         attr_ref = getattr(obj, self._name)
         if attr_ref is not None:
             return attr_ref()
         return None

     def __set__(self, obj, value):
         name = self._name
         if value is None:
             setattr(obj, name, None)
         else:
             cb = self._callback
             if cb is not None:
                 _cb = cb
                 def cb(dead_ref):
                     if dead_ref is getattr(obj, name):
                         # Object that went away is still
                         # the one referred to by the
                         # attribute, so invoke the callback
                         _cb(obj, name)
             attr_ref = weakref.ref(value, cb)
             setattr(obj, self._name, )

     def __delete__(self, obj):
         delattr(obj, self._name)

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From tomerfiliba at gmail.com  Fri Jun  2 14:54:21 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Fri, 2 Jun 2006 14:54:21 +0200
Subject: [Python-3000] weakrefs and cyclic references
In-Reply-To: <448018B3.7050102@gmail.com>
References: <1d85506f0605311102la44fb40j65954db9bad9a29c@mail.gmail.com>
	<447F90F8.6060000@canterbury.ac.nz> <448018B3.7050102@gmail.com>
Message-ID: <1d85506f0606020554v3b478434s4b5233f50e1010cc@mail.gmail.com>

dang, you posted before me :)

anyway, please check my implementation as well
http://sebulba.wikispaces.com/recipe+weakattr

i also included some demos.

anyway, i'd like to have this or the other weakattr implementation
included in weakref.py. it's a pretty useful feature to have in the stdlib,
for example:

from weakref import weakattr

class blah(object):
    someattr = weakattr()
    def __init__(self):
        self.someattr = self

just like properties.

-tomer

On 6/2/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Greg Ewing wrote:
> > What might be useful is an easier way of *explicitly*
> > creating and using weak references.
> >
> > We already have WeakKeyDictionary and WeakValueDictionary
> > which behave just like ordinary dicts except that they
> > weakly reference things. I'm thinking it would be nice
> > to have a way of declaring any attribute to be a weak
> > reference. Then it could be read and written it in the
> > usual way, without all the code that uses it having
> > to know about its weakness.
> >
> > This could probably be done fairly easily with a suitable
> > property descriptor.
>
> Something like the following? (although you could do a simpler version without
> the callback support) (untested!)
>
> class WeakAttr(object):
>      """Descriptor to define weak instance attributes
>
>      name is the name of the attribute
>      callback is an optional callback function
>
>      If supplied, the callback function is called with the
>      instance and the attribute name as arguments after a currently
>      referenced object is finalized.
>      """
>      def __init__(self, name, callback=None):
>          self._name = name
>          self._callback = callback
>
>      def __get__(self, obj, cls):
>          if obj is None:
>              return self
>          attr_ref = getattr(obj, self._name)
>          if attr_ref is not None:
>              return attr_ref()
>          return None
>
>      def __set__(self, obj, value):
>          name = self._name
>          if value is None:
>              setattr(obj, name, None)
>          else:
>              cb = self._callback
>              if cb is not None:
>                  _cb = cb
>                  def cb(dead_ref):
>                      if dead_ref is getattr(obj, name):
>                          # Object that went away is still
>                          # the one referred to by the
>                          # attribute, so invoke the callback
>                          _cb(obj, name)
>              attr_ref = weakref.ref(value, cb)
>              setattr(obj, self._name, )
>
>      def __delete__(self, obj):
>          delattr(obj, self._name)
>
>
>
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
>

From collinw at gmail.com  Fri Jun  2 18:05:40 2006
From: collinw at gmail.com (Collin Winter)
Date: Fri, 2 Jun 2006 18:05:40 +0200
Subject: [Python-3000] Using a list for *args (was: Type
	annotations:annotating generators)
In-Reply-To: <20060601131936.srakny7zvu3lwok0@login.werra.lunarpages.com>
References: <20060601131936.srakny7zvu3lwok0@login.werra.lunarpages.com>
Message-ID: <43aa6ff70606020905n5b4472cexfc61d4edfca396c2@mail.gmail.com>

On 6/1/06, Michael Chermside <mcherm at mcherm.com> wrote:
> Collin Winter writes:
> > I've attached the benchmarks as a comment on the patch, but I'll
> > repeat them here. All times are usecs per loop.
>       [statistics showing list is about 15% slower]
>
> My memory is fuzzy here. Can someone repeat for me the reasons
> why we wanted to use list? Were we just trying it out to see how
> it worked, or was there a desire to change? Was the desire to
> change because it improved some uses of the C api, or was it
> just for "purity" in use of tuples vs lists?
>
> I'm not a "need for speed" kind of guy, but I can't remember what
> the advantages of the list approach were supposed to be.

The main reason (in my mind, at least) was tuple/list purity.

> By the way I'm curious about the following also:
>
> # interpolating a list (I presume there's no advantage, but just checking)
> ./python -mtimeit 'def foo(*args): pass' 'foo(*range(10))'
Tuple: 4.22
List: 4.57

> # calling a function that doesn't use *args
> ./python -mtimeit 'def foo(): pass' 'foo()'
Tuple: 1.5
List: 1.51

> ./python -mtimeit 'def foo(x): pass' 'foo(1)'
Tuple: 1.62
List: 1.59

> ./python -mtimeit 'def foo(x,y): pass' 'foo(1,2)'
Tuple: 1.7
List: 1.7

> ./python -mtimeit 'def foo(x,y,z): pass' 'foo(1,2,3)'
Tuple: 1.84
List: 1.83

Collin Winter

From talin at acm.org  Fri Jun  2 19:42:03 2006
From: talin at acm.org (Talin)
Date: Fri, 02 Jun 2006 10:42:03 -0700
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <E206EA98-9A18-41E2-BB40-42D93B108320@mac.com>
References: <44716940.9000300@acm.org>
	<447BC126.8050107@acm.org>	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>	<1149080922.5718.20.camel@fsol>	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>	<1149095977.5718.51.camel@fsol>	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>	<477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com>	<bbaeab100606010844s552e7918i481301082e706ac6@mail.gmail.com>
	<E206EA98-9A18-41E2-BB40-42D93B108320@mac.com>
Message-ID: <4480786B.3060400@acm.org>

Ronald Oussoren wrote:
> On 1-jun-2006, at 17:44, Brett Cannon wrote:
>>I suppose that's at least partially not an issue at the moment
>>because you can only add stuff to existing packages through hacks. I
>>wouldn't touch libraries that inject themselves into existing
>>packages through .pth hackery because of the juckyness of it [*].
>>
>>Yeah, something better than .pth files would be good.
> 
> 
> There's nothing wrong with .pth files per-se and I use them  
> regularly. The juckyness is in .pth files that contain lines that  
> start with 'import', those can do very scary things such as hot- 
> patching the standard library during python startup. That's something  
> I don't like to see in production code.

Reading over this thread, it seems to me that there is a cross-linkage 
between the "reorganize standard library" task and the "refactor import 
machinery" task - in that much of the arguments about the stdlib names 
seem to hinge on policy decisions as to (a) whether 3rd party libs 
should be allowed to co-mingle with the stdlib modules, and (b) what 
kinds of co-mingling should be allowed ('monkeypatching', et al), and 
(c) what specific import mechanisms should these 3rd-party modules have 
access to in order to do this co-mingling.

Moreover, past threads on the topic of import machinery have given me 
the vague sense that there is a lot of accumulated cruft in the way that 
packages are built, distributed, and imported; that a lot of features 
and additions have been made to the various distutils / setuputils / 
import tools in order to solve various problems that have cropped up 
from time to time, and that certain people are rather disatisfied with 
the overal organization (or lack thereof) and inelegance of these 
additions, in particular their lack of a OOWTDI.

I say 'vague sense' because even after reading all these threads, I only 
have a murky idea of what actual *problems* all of these various 
improvements are trying to solve.

Given the cruft-disposal-themed mission statement of Py3000, it seems to 
me that it would make a lot of sense for someone to actually write down 
what all this stuff is actually trying to accomplish; And from there 
perhaps open the discussion as to whether there is some other, more 
sublimely beautiful and obviously simpler way to accomplish the same thing.

As for the specific cast of .pth files, the general concept is, as far 
as I can tell, is that having to modify environment variables to include 
additional packages sucks; And it particularly sucks on non-Unixy 
platforms such as Windows and Jython. That in itself seems like a 
laudable goal, assuming of course that one has also listed the various 
use cases for why a package wouldn't simply be dumped in 'site-packages' 
with no need to modify anything.

So before starting the work of sketching out broad categories of package 
names, it seems to me that step 1 and 2 are (1) identifying a set of 
requirements for package creation/distribution/location/etc, and (2) 
identify how the design of (1) will impact on the conventions of package 
organization.

-- Talin

From brett at python.org  Fri Jun  2 20:20:19 2006
From: brett at python.org (Brett Cannon)
Date: Fri, 2 Jun 2006 11:20:19 -0700
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <4480786B.3060400@acm.org>
References: <44716940.9000300@acm.org>
	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>
	<1149095977.5718.51.camel@fsol>
	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>
	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
	<477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com>
	<bbaeab100606010844s552e7918i481301082e706ac6@mail.gmail.com>
	<E206EA98-9A18-41E2-BB40-42D93B108320@mac.com>
	<4480786B.3060400@acm.org>
Message-ID: <bbaeab100606021120k46c8b53at3630c93405e2dfd9@mail.gmail.com>

On 6/2/06, Talin <talin at acm.org> wrote:
>
> Ronald Oussoren wrote:
> > On 1-jun-2006, at 17:44, Brett Cannon wrote:
> >>I suppose that's at least partially not an issue at the moment
> >>because you can only add stuff to existing packages through hacks. I
> >>wouldn't touch libraries that inject themselves into existing
> >>packages through .pth hackery because of the juckyness of it [*].
> >>
> >>Yeah, something better than .pth files would be good.
> >
> >
> > There's nothing wrong with .pth files per-se and I use them
> > regularly. The juckyness is in .pth files that contain lines that
> > start with 'import', those can do very scary things such as hot-
> > patching the standard library during python startup. That's something
> > I don't like to see in production code.
>
> Reading over this thread, it seems to me that there is a cross-linkage
> between the "reorganize standard library" task and the "refactor import
> machinery" task - in that much of the arguments about the stdlib names
> seem to hinge on policy decisions as to (a) whether 3rd party libs
> should be allowed to co-mingle with the stdlib modules, and (b) what
> kinds of co-mingling should be allowed ('monkeypatching', et al), and
> (c) what specific import mechanisms should these 3rd-party modules have
> access to in order to do this co-mingling.

Personally, I am not advocating any change in imports nor any mingling of
third-party code with the stdlib.

Moreover, past threads on the topic of import machinery have given me
> the vague sense that there is a lot of accumulated cruft in the way that
> packages are built, distributed, and imported; that a lot of features
> and additions have been made to the various distutils / setuputils /
> import tools in order to solve various problems that have cropped up
> from time to time, and that certain people are rather disatisfied with
> the overal organization (or lack thereof) and inelegance of these
> additions, in particular their lack of a OOWTDI.
>
> I say 'vague sense' because even after reading all these threads, I only
> have a murky idea of what actual *problems* all of these various
> improvements are trying to solve.
>
> Given the cruft-disposal-themed mission statement of Py3000, it seems to
> me that it would make a lot of sense for someone to actually write down
> what all this stuff is actually trying to accomplish; And from there
> perhaps open the discussion as to whether there is some other, more
> sublimely beautiful and obviously simpler way to accomplish the same
> thing.

 Well, for me, the reorganization is to help make finding the module you
want easier, both in the docs and at the interpreter.  This includes
grouping and renaming modules to be more reasonable and follow a consistent
naming scheme.

-Brett

As for the specific cast of .pth files, the general concept is, as far
> as I can tell, is that having to modify environment variables to include
> additional packages sucks; And it particularly sucks on non-Unixy
> platforms such as Windows and Jython. That in itself seems like a
> laudable goal, assuming of course that one has also listed the various
> use cases for why a package wouldn't simply be dumped in 'site-packages'
> with no need to modify anything.
>
> So before starting the work of sketching out broad categories of package
> names, it seems to me that step 1 and 2 are (1) identifying a set of
> requirements for package creation/distribution/location/etc, and (2)
> identify how the design of (1) will impact on the conventions of package
> organization.
>
> -- Talin
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060602/e4067afb/attachment.html 

From tjreedy at udel.edu  Fri Jun  2 20:53:15 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 2 Jun 2006 14:53:15 -0400
Subject: [Python-3000] packages in the stdlib
References: <44716940.9000300@acm.org><4472B196.7070506@acm.org>	<ca471dc20605230817x331241e6r45e63c4c1c0eb8ed@mail.gmail.com>	<447BC126.8050107@acm.org>	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>	<1149080922.5718.20.camel@fsol>	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>	<1149095977.5718.51.camel@fsol>	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com><79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>	<447F0FC4.1030906@cenix-bioscience.com><e5nank$h7r$1@sea.gmane.org>
	<448001E4.5070003@cenix-bioscience.com>
Message-ID: <e5q1er$to0$1@sea.gmane.org>

"Aaron Bingham" <bingham at cenix-bioscience.com> wrote in message 
news:448001E4.5070003 at cenix-bioscience.com...
>[me]
>>For the latter (2 above), I think those who want such mostly agree in
>>principle on a mostly two-level hierarchy with about 10-20 short names 
>>for
>>the top-level, using the lib docs as a starting point for the categories

> That's fine with me, but I still think we need a top-level prefix.

I think that 10-20 reserved names is hardly such a burden that we would 
need anything more on top to avoid collisions -- especially if the list is 
fixed.  The currently problem is that modules can be added to the stdlib 
that clash with existing 3rd party modules.  That would no longer happen 
under my variation of the classification proposal, which would include a 
misc package.

>>When it become trivial to grab and
>>install non-stdlib modules, then the distinction between stdlib and not
>>becomes even less important.
>>
> The distinction is still very important if I want my code to run with 
> minimal fuss on anyone's machine.

Under the hypothesis 'trivial to install...' then the extra fuss would be 
small.  Do you really consider 'little extra fuss' to be the same as 'lots 
of extra fuss'?

Terry Jan Reedy

From jimjjewett at gmail.com  Fri Jun  2 21:49:32 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 2 Jun 2006 15:49:32 -0400
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <4480786B.3060400@acm.org>
References: <44716940.9000300@acm.org>
	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>
	<1149095977.5718.51.camel@fsol>
	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>
	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
	<477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com>
	<bbaeab100606010844s552e7918i481301082e706ac6@mail.gmail.com>
	<E206EA98-9A18-41E2-BB40-42D93B108320@mac.com>
	<4480786B.3060400@acm.org>
Message-ID: <fb6fbf560606021249w211bd492oc88f1e5bd0635d9c@mail.gmail.com>

On 6/2/06, Talin <talin at acm.org> wrote:

> ... it seems to me that there is a cross-linkage
> between the "reorganize standard library" task and the "refactor import
> machinery" task

Eventually, yes.  As Brett pointed out, "reorganize the standard
library" stands on its own, and is intended to make finding modules
easier.

The tasks get linked when the library again grows, or when 3rd-party
packages try to replace (or superset) the functionality.  Then we
might start caring that package X is exactly the stdlib package X
(which was sufficient and tested against), or that it be Xplus (which
the sysadmin or user says is a faster and bugfixed superset).

>- in that much of the arguments about the stdlib names
> seem to hinge on policy decisions as to (a) whether 3rd party libs
> should be allowed to co-mingle with the stdlib modules,

By default, yes, but it should be easy to tell which you have if you do care.
So (pretending that wx is in the stdlib, because it has a short name)

    import UI.wx     # Import a module claiming to implement the wx interface

    import py.UI.wx    # Import exactly the wx that was installed with
the standard lib.

> and (b) what
> kinds of co-mingling should be allowed ('monkeypatching', et al), and
> (c) what specific import mechanisms should these 3rd-party modules have
> access to in order to do this co-mingling.

These are not related to the stdlib reorg.  The only catch is that
with a deeper namespace, some 3rd party packages will know where they
belong, and it makes sense to let them say so.  Namespace packages
(let alone tags) are not in the standard library now, so this can't be
done as cleanly.

> Moreover, past threads on the topic of import machinery have given me
> the vague sense that there is a lot of accumulated cruft in the way that
> packages are
...
> I say 'vague sense' because even after reading all these threads, I only
> have a murky idea of what actual *problems* all of these various
> improvements are trying to solve.

Those that I'm vaguely aware of:

(1)  It is hard to split packages.  The idiom of module.py importing
_module sort of works for a two-way split of a single module, but
splitting modules and subpackages across different locations doesn't
work so nicely.  Putting .pyc in one place and .py in another is a
recurring minor itch.

(2)  Every import extension reinvents the wheel, and only one wheel at
a time.  Whether the file is in a zip archive or not should be
unrelated to whether it is a .pyo file or a cheetah template --- but
currently isn't.

(3)  As these one-off wheels build up, it becomes difficult to know
where something really came from (and how), so it is harder to find
the "real" package and easier to test (or ship) the wrong version.

-jJ

From tomerfiliba at gmail.com  Fri Jun  2 22:06:59 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Fri, 2 Jun 2006 22:06:59 +0200
Subject: [Python-3000] a slight change to __[get|set|del]item__
Message-ID: <1d85506f0606021306k32ad723bx826d7cd7debae4dd@mail.gmail.com>

Guido wrote:
> Because the (...) in a function call isn't a tuple.
>
> I'm with Oleg -- a[x, y] is *intentionally* the same as a[(x, y)].
> This is a feature; you can write
>
>    t = x, y    # or t = (x, y)
>
> and later
>
>   a[t]

well is func((1,2,3)) the same as func(1,2,3)? no.
so why should container[1, 2, 3] be the same as container[(1,2,3)]?
you say it's a feature. is it intentionally *ambiguous*?

what you'd want in that case is
    t = (1, 2, 3)
    container[*t]
or something like that.

i guess it's a dead subject, but i wanted to have that clarified.

-tomer

From steven.bethard at gmail.com  Fri Jun  2 22:50:30 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Fri, 2 Jun 2006 14:50:30 -0600
Subject: [Python-3000] have iter(mapping) generate (key, value) pairs
Message-ID: <d11dcfba0606021350l656ea865u18a65a859783c178@mail.gmail.com>

I'd like to suggest that we (at least briefly) re-consider the
decision that iterating over a mapping generates the keys, not the
(key, value) pairs.  This was addressed somewhat in `PEP 234`_, with
the pros and cons basically being:

* From a purity standpoint, iterating over keys keeps the symmetry
between ``if x in y`` and ``for x in y``
* From a practicality standpoint, iterating over keys means that most
of the time, you'll also have to do a ``mapping[key]``, since most
iterations access both the keys and values.

I only bring this up now because Python 3000 is our opportunity to
review old decisions, and I think there's one more argument for
iterating over (key, value) pairs that was not discussed.  Iterating
over (key, value) pairs allows functions like dict() and dict.update()
to accept both mappings and (key, value) iterables, without having to
check for a .keys() function.  Just to clarify the point, here's the
code in UserDict.DictMixin::

    def update(self, other=None, **kwargs):
        # Make progressively weaker assumptions about "other"
        if other is None:
            pass
        elif hasattr(other, 'iteritems'):  # iteritems saves memory and lookups
            for k, v in other.iteritems():
                self[k] = v
        elif hasattr(other, 'keys'):
            for k in other.keys():
                self[k] = other[k]
        else:
            for k, v in other:
                self[k] = v
        if kwargs:
            self.update(kwargs)

Note that even though the `Language Reference`_ defines mappings in
terms of __len__, __getitem__, __setitem__, __delitem__ and __iter__,
UserDict.DictMixin.update has to assume that all mappings have a
.keys() method.

For comparison, here's what it would look like if mappings iterated
over (key, value) pairs::

    def update(self, other=None, **kwargs):
        if other is not None:
            for k, v in other:
                self[k] = v
        if kwargs:
            self.update(kwargs)

As far as backwards compatibility is concerned, if you need to write
code that works in both Python 2.X and Python 3000, you just need to
be explicit, e.g. using dict.iteritems() or dict.iterkeys() as
necessary.  (Yes, I know that .iter* is going to be dropped, but
that's a backward compatibility concern for another PEP, not this
one.)

.. _PEP 234:http://www.python.org/dev/peps/pep-0234/
.. _Language Reference: http://docs.python.org/ref/sequence-types.html

STeVe
-- 
Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From guido at python.org  Fri Jun  2 23:28:13 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 2 Jun 2006 14:28:13 -0700
Subject: [Python-3000] have iter(mapping) generate (key, value) pairs
In-Reply-To: <d11dcfba0606021350l656ea865u18a65a859783c178@mail.gmail.com>
References: <d11dcfba0606021350l656ea865u18a65a859783c178@mail.gmail.com>
Message-ID: <ca471dc20606021428n5363d3am1d50719038a7bb1@mail.gmail.com>

This was already considered and rejected. See PEP 3099.

On 6/2/06, Steven Bethard <steven.bethard at gmail.com> wrote:
> I'd like to suggest that we (at least briefly) re-consider the
> decision that iterating over a mapping generates the keys, not the
> (key, value) pairs.  This was addressed somewhat in `PEP 234`_, with
> the pros and cons basically being:
>
> * From a purity standpoint, iterating over keys keeps the symmetry
> between ``if x in y`` and ``for x in y``
> * From a practicality standpoint, iterating over keys means that most
> of the time, you'll also have to do a ``mapping[key]``, since most
> iterations access both the keys and values.
>
> I only bring this up now because Python 3000 is our opportunity to
> review old decisions, and I think there's one more argument for
> iterating over (key, value) pairs that was not discussed.  Iterating
> over (key, value) pairs allows functions like dict() and dict.update()
> to accept both mappings and (key, value) iterables, without having to
> check for a .keys() function.  Just to clarify the point, here's the
> code in UserDict.DictMixin::
>
>     def update(self, other=None, **kwargs):
>         # Make progressively weaker assumptions about "other"
>         if other is None:
>             pass
>         elif hasattr(other, 'iteritems'):  # iteritems saves memory and lookups
>             for k, v in other.iteritems():
>                 self[k] = v
>         elif hasattr(other, 'keys'):
>             for k in other.keys():
>                 self[k] = other[k]
>         else:
>             for k, v in other:
>                 self[k] = v
>         if kwargs:
>             self.update(kwargs)
>
> Note that even though the `Language Reference`_ defines mappings in
> terms of __len__, __getitem__, __setitem__, __delitem__ and __iter__,
> UserDict.DictMixin.update has to assume that all mappings have a
> .keys() method.
>
> For comparison, here's what it would look like if mappings iterated
> over (key, value) pairs::
>
>     def update(self, other=None, **kwargs):
>         if other is not None:
>             for k, v in other:
>                 self[k] = v
>         if kwargs:
>             self.update(kwargs)
>
> As far as backwards compatibility is concerned, if you need to write
> code that works in both Python 2.X and Python 3000, you just need to
> be explicit, e.g. using dict.iteritems() or dict.iterkeys() as
> necessary.  (Yes, I know that .iter* is going to be dropped, but
> that's a backward compatibility concern for another PEP, not this
> one.)
>
>
> .. _PEP 234:http://www.python.org/dev/peps/pep-0234/
> .. _Language Reference: http://docs.python.org/ref/sequence-types.html
>
> STeVe
> --
> Grammar am for people who can't think for myself.
>         --- Bucky Katt, Get Fuzzy
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mcherm at mcherm.com  Fri Jun  2 23:30:05 2006
From: mcherm at mcherm.com (Michael Chermside)
Date: Fri, 02 Jun 2006 14:30:05 -0700
Subject: [Python-3000] have iter(mapping) generate (key, value) pairs
Message-ID: <20060602143005.mzagusmhwv5cow08@login.werra.lunarpages.com>

Steven Bethard writes:
> I'd like to suggest that we (at least briefly) re-consider the
> decision that iterating over a mapping generates the keys, not the
> (key, value) pairs.

I agree, now is the best time for reconsidering the decision.

My opinion on the matter itself is that I was unsure before we did
it, but that use has convinced me that iter() returning the keys
turns out to be very natural. Since I write "for x in myDict" a LOT
this outweighs any minor implementation details in dict() and
dict.update(). I say the original decision was a Python success
story: it's one of those examples that I look back on whenever my
confidence in Guido's intuition on syntax needs shoring up.

-- Michael Chermside

From ncoghlan at gmail.com  Sat Jun  3 01:54:33 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 03 Jun 2006 09:54:33 +1000
Subject: [Python-3000] have iter(mapping) generate (key, value) pairs
In-Reply-To: <d11dcfba0606021350l656ea865u18a65a859783c178@mail.gmail.com>
References: <d11dcfba0606021350l656ea865u18a65a859783c178@mail.gmail.com>
Message-ID: <4480CFB9.6020301@gmail.com>

Steven Bethard wrote:
> Note that even though the `Language Reference`_ defines mappings in
> terms of __len__, __getitem__, __setitem__, __delitem__ and __iter__,
> UserDict.DictMixin.update has to assume that all mappings have a
> .keys() method.

A slightly different proposal:

Add an iteritems() builtin with the following definition:

     def iteritems(obj):
         # Check for mapping first
         try:
             items = obj.items      # or __items__ if you prefer
         except AttributeError:
             pass
         else:
             return iter(items())
         # Check for sequence next
         if hasattr(obj, "__getitem__"):
             return enumerate(obj)
         # Fall back on normal iteration
         return iter(obj)

Then update the language reference so that the presence of of an items() (or 
__items__()) method is the defining characteristic that makes something a 
mapping instead of a sequence. After all, we've been trying to think of a way 
to denote that anyway.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From tomerfiliba at gmail.com  Sat Jun  3 22:51:57 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Sat, 3 Jun 2006 22:51:57 +0200
Subject: [Python-3000] iostack and sock2
Message-ID: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>

hi all

some time ago i wrote this huge post about stackable IO and the
need for a new socket module. i've made some progress with
those, and i'd like to receive feedback.

* a working alpha version of the new socket module (sock2) is
available for testing and tweaking with at
http://sebulba.wikispaces.com/project+sock2

* i'm working on a version of iostack... but i don't expect to make
a public release until mid july. in the meanwhile, i started a wiki
page on my site for it (motivation, plans, design):
http://sebulba.wikispaces.com/project+iostack
with lots of pretty-formatted info. i remember people saying
that stating `read(n)` returns exactly `n` bytes is problematic,
can you elaborate?

btw, Guido said he'd review it, but he's too busy, and i'd like to
receive comments from other people as well. thanks.

-tomer

From ncoghlan at gmail.com  Sun Jun  4 05:52:19 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 04 Jun 2006 13:52:19 +1000
Subject: [Python-3000] iostack and sock2
In-Reply-To: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
Message-ID: <448258F3.3070808@gmail.com>

tomer filiba wrote:
> hi all
> 
> some time ago i wrote this huge post about stackable IO and the
> need for a new socket module. i've made some progress with
> those, and i'd like to receive feedback.
> 
> * a working alpha version of the new socket module (sock2) is
> available for testing and tweaking with at
> http://sebulba.wikispaces.com/project+sock2
> 
> * i'm working on a version of iostack... but i don't expect to make
> a public release until mid july. in the meanwhile, i started a wiki
> page on my site for it (motivation, plans, design):
> http://sebulba.wikispaces.com/project+iostack

Nice, very nice.

Some things that don't appear to have been considered in the iostack design yet:
  - non-blocking IO and timeouts (e.g. on NetworkStreams)
  - interaction with (replacement of?) the select module

Some other random thoughts about the current writeup:

The design appears to implicitly assume that it is best to treat all streams 
as IO streams, and raise an exception if an output operation is accessed on an 
input-only stream (or vice versa). This seems like a reasonable idea to me, 
but it should be mentioned explicitly (e.g an alternative approach would be to 
define InputStream and OutputStream, and then have an IOStream that inherited 
from both of them).

The common Stream API should include a flush() write method, so that 
application code doesn't need to care whether or not it is dealing with 
buffered IO when forcing output to be displayed.

Any operations that may touch the filesystem or network shouldn't be 
properties - attribute access should never raise IOError (this is a guideline 
that came out of the Path discussion). (e.g. the 'position' property is 
probably a bad idea, because x.position may then raise an IOError)

The stream layer hierarchy needs to be limited to layers that both expose and 
use the normal bytes-based Stream API. A separate stream interface concept is 
needed for something that can be used by the application, but cannot have 
other layers stacked on top of it. Additionally, any "bytes-in-bytes-out" 
transformation operation can be handled as a single codec layer that accepts 
an encoding function and a decoding function. This can then be used for 
compression layers, encryption layers, Golay encoding, A-law companding, AV 
codecs, etc. . .

   StreamLayer
     * ForwardingLayer - forwards all data written or read to another stream
     * BufferingLayer - buffers data using given buffer size
     * CodecLayer - encodes data written, decodes data read

   StreamInterface
     * TextInterface - text oriented interface to a stream
     * BytesInterface - byte oriented interface to a stream
     * RecordInterface - record (struct) oriented interface to a stream
     * ObjectInterface - object (pickle) oriented interface to a stream

The key point about the stream interfaces is that while they will provide a 
common mechanism for getting at the underlying stream, their interfaces are 
otherwise unconstrained. The BytesInterface differs from a normal low-level 
stream primarily in the fact that it *is* line-iterable.

On the topic of line buffering, the Python 2.x IO stack treats binary files as 
line iterable, using '\n' as a line separator (well, more strictly it's a 
record separator, since we're talking about binary files).

There's actually an RFE on SF somewhere about making the record separator 
configurable in the 2.x IO stack (I raised the tracker item ages ago when 
someone else made the suggestion).

However, the streams produced by iostack's 'file' helper are not currently 
line-iterable. Additionally, the 'textfile' helper tries to handle line 
terminators while the data is still bytes, while Unicode defines line endings 
in terms of characters. As I understand it, "\x0A" (CR), "\x0D" (LF), 
"\x0A\x0D" (CRLF), "\x85" (NEL), "\x0C" (FF), "\u2028" (LS), "\u2029" (PS) 
should all be treated as line terminators as far as Unicode is concerned.

So I think line buffering and making things line iterable should be left to 
the TextInterface and BytesInterface layers. TextInterface would be most 
similar to the currently file interface, only working on Unicode strings 
instead of 8-bit strings (as well as using the Unicode definition of what 
constitutes a line ending). BytesInterface would work with binary files, 
returning a bytes object for each record.

So I'd tweak the helper functions to look like:

def file(filename, mode = "r", bufsize = -1, line_sep="\n"):
     f = FileStream(filename, mode)
     # a bufsize of 0 or None means unbuffered
     if bufsize:
         f = BufferingLayer(f, bufsize)
     # Use bytes interface to make file line-iterable
     return BytesInterface(f, line_sep)

def textfile(filename, mode = "r", bufsize = -1, encoding = None):
     f = FileStream(filename, mode)
     # a bufsize of 0 or None means unbuffered
     if bufsize:
         f = BufferingLayer(f, bufsize)
     # Text interface deals with line terminators correctly
     return TextInterface(f, encoding)

> with lots of pretty-formatted info. i remember people saying
> that stating `read(n)` returns exactly `n` bytes is problematic,
> can you elaborate?

I can see that behaviour being seriously annoying when you get to the end of 
the stream. I'd far prefer for the stream to just give me the last bit when I 
ask for it and then tell me *next* time that there isn't anything left. This 
has worked well for a long time with the existing read method of file objects. 
If you want a method with the other behaviour, add a "readexact" API, rather 
than changing the semantics of "read" (although I'd be really curious to hear 
the use case for the other behaviour).

(Take a look at the s3.recv(100) line in your Sock2 example - how irritating 
would it be for that to raise EOFError because you only got a few bytes?)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ronaldoussoren at mac.com  Sun Jun  4 10:45:28 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Sun, 4 Jun 2006 10:45:28 +0200
Subject: [Python-3000] packages in the stdlib
In-Reply-To: <e5q1er$to0$1@sea.gmane.org>
References: <44716940.9000300@acm.org> <4472B196.7070506@acm.org>
	<ca471dc20605230817x331241e6r45e63c4c1c0eb8ed@mail.gmail.com>
	<447BC126.8050107@acm.org>
	<bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com>
	<1149080922.5718.20.camel@fsol>
	<bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com>
	<1149095977.5718.51.camel@fsol>
	<430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com>
	<bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com>
	<79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com>
	<447F0FC4.1030906@cenix-bioscience.com>
	<e5nank$h7r$1@sea.gmane.org>
	<448001E4.5070003@cenix-bioscience.com>
	<e5q1er$to0$1@sea.gmane.org>
Message-ID: <63094894-36B0-4487-9992-F913D9749F08@mac.com>

On 2-jun-2006, at 20:53, Terry Reedy wrote:

>
> "Aaron Bingham" <bingham at cenix-bioscience.com> wrote in message
> news:448001E4.5070003 at cenix-bioscience.com...
>> [me]
>>> For the latter (2 above), I think those who want such mostly  
>>> agree in
>>> principle on a mostly two-level hierarchy with about 10-20 short  
>>> names
>>> for
>>> the top-level, using the lib docs as a starting point for the  
>>> categories
>
>> That's fine with me, but I still think we need a top-level prefix.
>
> I think that 10-20 reserved names is hardly such a burden that we  
> would
> need anything more on top to avoid collisions -- especially if the  
> list is
> fixed.  The currently problem is that modules can be added to the  
> stdlib
> that clash with existing 3rd party modules.  That would no longer  
> happen
> under my variation of the classification proposal, which would  
> include a
> misc package.

I'm -lots on a package named "misc". That's really poor naming,  
almost as bad as "util". Misc is the "we don't know what to do with  
these"-category and completely unobvious for anyone that doesn't  
already know where to look. It seems to me that misc would end up  
containing all modules and packages that don't fit in one of the  
preconceived toplevel packages and don't have enough peers in the  
misc package to move them to their own toplevel package.

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2157 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20060604/c031b399/attachment.bin 

From tjreedy at udel.edu  Sun Jun  4 21:18:15 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 4 Jun 2006 15:18:15 -0400
Subject: [Python-3000] packages in the stdlib
References: <44716940.9000300@acm.org>
	<4472B196.7070506@acm.org><ca471dc20605230817x331241e6r45e63c4c1c0eb8ed@mail.gmail.com><447BC126.8050107@acm.org><bbaeab100605300925k151a1437gea18eeaafe5c8068@mail.gmail.com><1149080922.5718.20.camel@fsol><bbaeab100605310957u4f49bcbbwb2512dd195ba4b49@mail.gmail.com><1149095977.5718.51.camel@fsol><430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com><bbaeab100605311209n28ce4f07qcbe97d928610edcb@mail.gmail.com><79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com><447F0FC4.1030906@cenix-bioscience.com><e5nank$h7r$1@sea.gmane.org><448001E4.5070003@cenix-bioscience.com><e5q1er$to0$1@sea.gmane.org>
	<63094894-36B0-4487-9992-F913D9749F08@mac.com>
Message-ID: <e5vbln$nsp$1@sea.gmane.org>

"Ronald Oussoren" <ronaldoussoren at mac.com> wrote in message 
news:63094894-36B0-4487-9992-F913D9749F08 at mac.com...
>I'm -lots on a package named "misc". That's really poor naming,
>almost as bad as "util". Misc is the "we don't know what to do with
>these"-category and completely unobvious for anyone that doesn't
>already know where to look. It seems to me that misc would end up
>containing all modules and packages that don't fit in one of the
<preconceived toplevel packages and don't have enough peers in the
>misc package to move them to their own toplevel package.
---

Without a misc package, we either need to have an all-inclusive set of top 
level categories (difficult) or else put the oddballs at top level, which 
counteracts the purpose of having categories.  The latter improperly 
highlites the oddballs and increases the chances of name-clashes --  
especially when more are added.  I have found catch-all categories very 
useful in other contexts.

Terry Jan Reedy

From tomerfiliba at gmail.com  Sun Jun  4 21:45:24 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Sun, 4 Jun 2006 12:45:24 -0700
Subject: [Python-3000] iostack and sock2
In-Reply-To: <448258F3.3070808@gmail.com>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<448258F3.3070808@gmail.com>
Message-ID: <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>

you certainly have good points there.

i'll start with the easy ones:
>Some things that don't appear to have been considered in the iostack
design yet:
> - non-blocking IO and timeouts (e.g. on NetworkStreams)

NetworkStreams have a readavail() method, which reads all the available
in-queue data, as well as a may_read and a may_write properties

besides, because of the complexity of sockets (so many different
options, protocols, etc), i'd leave the timeout to the socket itself.
i.e.

s = TcpSocket(...)
s.timeout = 2
ns = NetworkStream(s)
ns.read(100)

> - interaction with (replacement of?) the select module

well, it's too hard to design for a nonexisting module. select is all there
is that's platform independent.

random idea:
* select is virtually platform independent
* improved polling is inconsistent
    * kqueue is BSD-only
    * epoll is linux-only
    * windows has none of those

maybe introduce a new select module that has select-objects, like
the Poll() class, that will default to using select(), but could use
kqueue/epoll when possible?

s = Select((sock1, "r"), (sock2, "rw"), (sock3, "x"))
res = s.wait(timeout = 1)
for sock, events in res:
    ....

- - - - -

> The common Stream API should include a flush() write method, so that
> application code doesn't need to care whether or not it is dealing with
> buffered IO when forcing output to be displayed.

i object. it would soon lead to things like today's StingIO, that defines
isatty and flush, although it's completely meaningless. having to
implement functions "just because" is ugly.

i would suggest a different approach -- PseudoLayers. these are
mockup layers that provide a do-nothing function only for interface
consistency. each layer would define it's own pseudo layer, for
example:

class BufferingLayer(Layer):
    def flush(self):
       <implementation>

class PseudoBufferingLayer(Layer):
    def flush(self):
       pass

when you pass an unbuffered stream to a function that expects
it to be buffered (requires flush, etc), you would just wrap it
with the pseudo-layer. this would allow arbitrary mockup APIs
to be defined by users (why should flush be that special?)

- - - - -

> e.g an alternative approach would be to
> define InputStream and OutputStream, and then have an IOStream that inherited
> from both of them).

hrrm... i need to think about this more. one problem i already see:

class InputStream:
   def close(self):....
   def read(self, count): ...

class OutputStream:
   def close(self):....
   def write(self, data)...

class NetworkStream(InputStream, OutputStream):
   ...

which version of close() gets called?

- - - - -

> e.g. the 'position' property is
> probably a bad idea, because x.position may then raise an IOError

i guess it's reasonable approach, but i'm a "usability beats purity" guy.
f.position = 0
or
f.position += 10

is so much more convenient than seek()ing and tell()ing. we can also
optimize += by defining a Position type where __iadd__(n) uses
seek(n, "curr") instead of seek(n + tell(), "start")

btw, you can first test the "seakable" attribute, to see if positioning
would work.

and in the worst case, i'd vote for converting IOErrors to ValueErrors...

def _set_pos(self, n)
    try:
       self.seek(n)
    except IOError:
       raise ValueError("invalid position value", n)

so that
f.position = -10
raises a ValueError, which is logical

- - - - -

> The stream layer hierarchy needs to be limited to layers that both expose and
> use the normal bytes-based Stream API. A separate stream interface concept is
> needed for something that can be used by the application, but cannot have
> other layers stacked on top of it.

yeah, i wanted to do so myself, but couldn't find a good definition to what
is stackable and what's not. but i like the idea. i'll think some more
about that as well.

> The BytesInterface differs from a normal low-level
> stream primarily in the fact that it *is* line-iterable.

but what's a line in a binary file? how is that making sense? binary files
are usually made of records, headers, pointers, arrays of records (tables)...
think of how ELF32 looks like, or a database, or core dumps -- those are
binary files. what would a "line" mean to a .tar.bz2 file?

- - - - -

> Additionally, the 'textfile' helper tries to handle line
> terminators while the data is still bytes, while Unicode defines line endings
> in terms of characters. As I understand it, "\x0A" (CR), "\x0D" (LF),
> [...]

well, currently, the TextLayer reads the stream character by character,
until it finds "\n"... the specific encoding of "\n" depends on the
layer's encoding, but i don't deal with all the weird cases you mentioned.

- - - - -

random idea:
when compiled with universal line support, python unicode should
equate "\n" to any of the forementioned characters.
i.e.

u"\n" == u"\u2028" # True

the fact unicode is stupid shouldn't make programming unicode
as stupid: a newline is a newline!

but then again, it could be solved with a isnewline(ch) function
instead, without messing the internals of the unicode type...
so that's clearly (-1). i just write it "for the record".

- - - - -

> I can see that behaviour being seriously annoying when you get to the end of
> the stream. I'd far prefer for the stream to just give me the last bit when I
> ask for it and then tell me *next* time that there isn't anything left.

well, today it's done like so:

while True:
   x = f.read(100)
   if not x:
      break

in iostack, that would be done like so:

try:
    while True:
        x = f.read(100)
except EOFError:
    last_x = f.readall() # read all the leftovers (0 <= leftovers < 100)

a little longer, but not illogical

> If you want a method with the other behaviour, add a "readexact" API, rather
> than changing the semantics of "read" (although I'd be really curious to hear
> the use case for the other behaviour).

well, when i work with files/sockets, i tend to send data structures over them,
like records, frames, protocols, etc. if a record is said to be x bytes long,
and read(x) returns less than x bytes, my code has to loop until it gets
enough bytes.

for example, a record-codec:

class RecordCodec:
    ....
    def read(self):
        raw = self.substream.read(struct.calcsize(self.format))
        return struct.unpack(self.format, raw)

if substream.read() returns less than the expected number of bytes,
as is the case with sockets, the RecordCodec would have to perform
its own buffering... and it happens in so many places today.
imho, any framework must follow the DRY principal... i wish i could
expand this acronym, but then i'd repeat myself ;)

since the normal use-case for read(n) is expecting n bytes, read(n)
is the standard API, while readany(n) can be used for unknown lengths.
and when your IO library will be packed with useful things like
FramingLayer, or SerializingLayer, you would just use such frames or
whatever to transfer arbitrary lengths of data, without thinking twice.
it would just become the natural way of doing that.

imagine how cool it could be -- SerializingLayer could mean the end of
specializied protocols and statemachines. you just send an object that
could take care of its own (a ChatMessage would have a .show() method,
etc.),

- - - - -

and you still have readany

>>> my_netstream.readany(100)
"hello"

perhaps it should be renamed readupto(n)

as for code that interacts with ugly protocols like HTTP, you could use:

s = TextInterface(my_netstream, "ascii")
header = []
for line in s:
    if not line:
       break
    header.append(line)

- - - - -

thanks for the ideas.

-tomer

On 6/3/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> tomer filiba wrote:
> > hi all
> >
> > some time ago i wrote this huge post about stackable IO and the
> > need for a new socket module. i've made some progress with
> > those, and i'd like to receive feedback.
> >
> > * a working alpha version of the new socket module (sock2) is
> > available for testing and tweaking with at
> > http://sebulba.wikispaces.com/project+sock2
> >
> > * i'm working on a version of iostack... but i don't expect to make
> > a public release until mid july. in the meanwhile, i started a wiki
> > page on my site for it (motivation, plans, design):
> > http://sebulba.wikispaces.com/project+iostack
>
> Nice, very nice.
>
> Some things that don't appear to have been considered in the iostack design yet:
>   - non-blocking IO and timeouts (e.g. on NetworkStreams)
>   - interaction with (replacement of?) the select module
>
> Some other random thoughts about the current writeup:
>
> The design appears to implicitly assume that it is best to treat all streams
> as IO streams, and raise an exception if an output operation is accessed on an
> input-only stream (or vice versa). This seems like a reasonable idea to me,
> but it should be mentioned explicitly (e.g an alternative approach would be to
> define InputStream and OutputStream, and then have an IOStream that inherited
> from both of them).
>
> The common Stream API should include a flush() write method, so that
> application code doesn't need to care whether or not it is dealing with
> buffered IO when forcing output to be displayed.
>
> Any operations that may touch the filesystem or network shouldn't be
> properties - attribute access should never raise IOError (this is a guideline
> that came out of the Path discussion). (e.g. the 'position' property is
> probably a bad idea, because x.position may then raise an IOError)
>
> The stream layer hierarchy needs to be limited to layers that both expose and
> use the normal bytes-based Stream API. A separate stream interface concept is
> needed for something that can be used by the application, but cannot have
> other layers stacked on top of it. Additionally, any "bytes-in-bytes-out"
> transformation operation can be handled as a single codec layer that accepts
> an encoding function and a decoding function. This can then be used for
> compression layers, encryption layers, Golay encoding, A-law companding, AV
> codecs, etc. . .
>
>    StreamLayer
>      * ForwardingLayer - forwards all data written or read to another stream
>      * BufferingLayer - buffers data using given buffer size
>      * CodecLayer - encodes data written, decodes data read
>
>    StreamInterface
>      * TextInterface - text oriented interface to a stream
>      * BytesInterface - byte oriented interface to a stream
>      * RecordInterface - record (struct) oriented interface to a stream
>      * ObjectInterface - object (pickle) oriented interface to a stream
>
> The key point about the stream interfaces is that while they will provide a
> common mechanism for getting at the underlying stream, their interfaces are
> otherwise unconstrained. The BytesInterface differs from a normal low-level
> stream primarily in the fact that it *is* line-iterable.
>
> On the topic of line buffering, the Python 2.x IO stack treats binary files as
> line iterable, using '\n' as a line separator (well, more strictly it's a
> record separator, since we're talking about binary files).
>
> There's actually an RFE on SF somewhere about making the record separator
> configurable in the 2.x IO stack (I raised the tracker item ages ago when
> someone else made the suggestion).
>
> However, the streams produced by iostack's 'file' helper are not currently
> line-iterable. Additionally, the 'textfile' helper tries to handle line
> terminators while the data is still bytes, while Unicode defines line endings
> in terms of characters. As I understand it, "\x0A" (CR), "\x0D" (LF),
> "\x0A\x0D" (CRLF), "\x85" (NEL), "\x0C" (FF), "\u2028" (LS), "\u2029" (PS)
> should all be treated as line terminators as far as Unicode is concerned.
>
> So I think line buffering and making things line iterable should be left to
> the TextInterface and BytesInterface layers. TextInterface would be most
> similar to the currently file interface, only working on Unicode strings
> instead of 8-bit strings (as well as using the Unicode definition of what
> constitutes a line ending). BytesInterface would work with binary files,
> returning a bytes object for each record.
>
> So I'd tweak the helper functions to look like:
>
> def file(filename, mode = "r", bufsize = -1, line_sep="\n"):
>      f = FileStream(filename, mode)
>      # a bufsize of 0 or None means unbuffered
>      if bufsize:
>          f = BufferingLayer(f, bufsize)
>      # Use bytes interface to make file line-iterable
>      return BytesInterface(f, line_sep)
>
> def textfile(filename, mode = "r", bufsize = -1, encoding = None):
>      f = FileStream(filename, mode)
>      # a bufsize of 0 or None means unbuffered
>      if bufsize:
>          f = BufferingLayer(f, bufsize)
>      # Text interface deals with line terminators correctly
>      return TextInterface(f, encoding)
>
> > with lots of pretty-formatted info. i remember people saying
> > that stating `read(n)` returns exactly `n` bytes is problematic,
> > can you elaborate?
>
> I can see that behaviour being seriously annoying when you get to the end of
> the stream. I'd far prefer for the stream to just give me the last bit when I
> ask for it and then tell me *next* time that there isn't anything left. This
> has worked well for a long time with the existing read method of file objects.
> If you want a method with the other behaviour, add a "readexact" API, rather
> than changing the semantics of "read" (although I'd be really curious to hear
> the use case for the other behaviour).
>
> (Take a look at the s3.recv(100) line in your Sock2 example - how irritating
> would it be for that to raise EOFError because you only got a few bytes?)
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
>

From jcarlson at uci.edu  Sun Jun  4 22:25:58 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 04 Jun 2006 13:25:58 -0700
Subject: [Python-3000] iostack and sock2
In-Reply-To: <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
References: <448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
Message-ID: <20060604131031.69CF.JCARLSON@uci.edu>

"tomer filiba" <tomerfiliba at gmail.com> wrote:
[snip]
> > - interaction with (replacement of?) the select module
> 
> well, it's too hard to design for a nonexisting module. select is all there
> is that's platform independent.

It is /relatively/ platform independent.

> random idea:
> * select is virtually platform independent
> * improved polling is inconsistent
>     * kqueue is BSD-only
>     * epoll is linux-only
>     * windows has none of those

Windows doesn't currently have a module designed to do this kind of
thing, but it is possible to have a higher-performance method for
Windows using various bits from the win32file module from pywin32 (I
have been contemplating writing one, but I haven't had the time).

[snip]

> - - - - -
> 
> > e.g an alternative approach would be to
> > define InputStream and OutputStream, and then have an IOStream that inherited
> > from both of them).
> 
> hrrm... i need to think about this more. one problem i already see:
> 
> class InputStream:
>    def close(self):....
>    def read(self, count): ...
> 
> class OutputStream:
>    def close(self):....
>    def write(self, data)...
> 
> class NetworkStream(InputStream, OutputStream):
>    ...
> 
> which version of close() gets called?

Both, you use super().

> - - - - -
> 
> > e.g. the 'position' property is
> > probably a bad idea, because x.position may then raise an IOError
> 
> i guess it's reasonable approach, but i'm a "usability beats purity" guy.
> f.position = 0
> or
> f.position += 10
> 
> is so much more convenient than seek()ing and tell()ing. we can also
> optimize += by defining a Position type where __iadd__(n) uses
> seek(n, "curr") instead of seek(n + tell(), "start")
> 
> btw, you can first test the "seakable" attribute, to see if positioning
> would work.
> 
> and in the worst case, i'd vote for converting IOErrors to ValueErrors...
> 
> def _set_pos(self, n)
>     try:
>        self.seek(n)
>     except IOError:
>        raise ValueError("invalid position value", n)
> 
> so that
> f.position = -10
> raises a ValueError, which is logical

Raising a ValueError on an unseekable stream would be confusing.

[snip]
> - - - - -
> 
> random idea:
> when compiled with universal line support, python unicode should
> equate "\n" to any of the forementioned characters.
> i.e.
> 
> u"\n" == u"\u2028" # True

I'm glad that you later decided for yourself that such a thing would be
utterly and completely foolish.

> - - - - -
> 
> > I can see that behaviour being seriously annoying when you get to the end of
> > the stream. I'd far prefer for the stream to just give me the last bit when I
> > ask for it and then tell me *next* time that there isn't anything left.
> 
> well, today it's done like so:
> 
> while True:
>    x = f.read(100)
>    if not x:
>       break
> 
> in iostack, that would be done like so:
> 
> try:
>     while True:
>         x = f.read(100)
> except EOFError:
>     last_x = f.readall() # read all the leftovers (0 <= leftovers < 100)
> 
> a little longer, but not illogical
> 
> > If you want a method with the other behaviour, add a "readexact" API, rather
> > than changing the semantics of "read" (although I'd be really curious to hear
> > the use case for the other behaviour).
> 
> well, when i work with files/sockets, i tend to send data structures over them,
> like records, frames, protocols, etc. if a record is said to be x bytes long,
> and read(x) returns less than x bytes, my code has to loop until it gets
> enough bytes.

Rather than changing what people expect with the current .read() method,
why not offer a different method called .readexact(n), which will read
exactly n bytes, performing buffering as necessary.  You can then
optimize by using cStringIOs, lists of strings, resizable bytes, or
whatever other method you want (but be careful never to .read(bignum)
unless you change the underlying .read() implementation; right now it
allocates a buffer of size bignum, which can cause huge amounts of
malloc/realloc thrashing, and generally causes MemoryErrors).

[snip]

 - Josiah

From jcarlson at uci.edu  Sun Jun  4 22:42:41 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 04 Jun 2006 13:42:41 -0700
Subject: [Python-3000] iostack and sock2
In-Reply-To: <448258F3.3070808@gmail.com>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<448258F3.3070808@gmail.com>
Message-ID: <20060604132632.69D2.JCARLSON@uci.edu>

Nick Coghlan <ncoghlan at gmail.com> wrote:
[snip]
> Any operations that may touch the filesystem or network shouldn't be 
> properties - attribute access should never raise IOError (this is a guideline 
> that came out of the Path discussion). (e.g. the 'position' property is 
> probably a bad idea, because x.position may then raise an IOError)

I agree completely.

> The stream layer hierarchy needs to be limited to layers that both expose and 
> use the normal bytes-based Stream API. A separate stream interface concept is 
> needed for something that can be used by the application, but cannot have 
> other layers stacked on top of it. Additionally, any "bytes-in-bytes-out" 
> transformation operation can be handled as a single codec layer that accepts 
> an encoding function and a decoding function. This can then be used for 
> compression layers, encryption layers, Golay encoding, A-law companding, AV 
> codecs, etc. . .
> 
>    StreamLayer
>      * ForwardingLayer - forwards all data written or read to another stream
>      * BufferingLayer - buffers data using given buffer size
>      * CodecLayer - encodes data written, decodes data read
> 
>    StreamInterface
>      * TextInterface - text oriented interface to a stream
>      * BytesInterface - byte oriented interface to a stream
>      * RecordInterface - record (struct) oriented interface to a stream
>      * ObjectInterface - object (pickle) oriented interface to a stream

I think these are generally OK.

[snip]

As I've been reading the updated IO stack discussions since Tomer
brought it up months ago, I've been generally -1 on the idea of
rewriting the IO stack.  I didn't know why at first, but I've figured
out that it is a combination of "I enjoy writing wire protocols" and "it
would be very nice if my old socket/file software continued to work in
py3k".

Obviously the first part will generally not be an issue (and wouldn't
be sufficiently compelling to refuse the change) with the updated IO
stack, but the second will be. That is, if we switched from the current
IO methods to the stack, all old socket and file handling software seem
as though they will break.  This sounds to me like gratuitous breakage.

On the other hand, I wouldn't mind a new IO stack module or package that
defined wrappers and such for files, sockets, etc., along with the
StreamLayer and StreamInterface bits somewhere.

One could then add an interface to the previously mentioned module for
asynchronous IO on *nix (I can't remember its name), with a (hopefully)
updated implementation for Windows, falling back to an implementation
that uses select on platforms where an updated method is not available. 
Whether or not we would want to make this updated select-like framework
available to old sockets, files, etc., is a separate discussion.

 - Josiah

From greg.ewing at canterbury.ac.nz  Mon Jun  5 00:52:07 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 05 Jun 2006 10:52:07 +1200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
Message-ID: <44836417.5080209@canterbury.ac.nz>

tomer filiba wrote:

> NetworkStreams have a readavail() method, which reads all the available
> in-queue data, as well as a may_read and a may_write properties

I'm -1 on having multiple kinds of read methods which
are available only on some kinds of streams. The
basic interface of a stream should be dirt simple.

Given a read-up-to-n-bytes method, it's easy to implement
read-exactly-n-bytes on top of it in a completely
generic way. So provide it as a function that operates
on a stream, or a method inherited from a generic base
class.

> maybe introduce a new select module that has select-objects, like
> the Poll() class, that will default to using select(), but could use
> kqueue/epoll when possible?

My current opinion on select-like functionality is
that you shouldn't need to import a module for it at
all. Rather, you should be able to attach a callback
directly to a stream. Then there just needs to be
a wait_for_something_to_happen() function somewhere
(perhaps with a timeout).

Underneath, the implementation would use select,
poll, or whatever is most fun on the platform
concerned.

--
Greg

From mcherm at mcherm.com  Mon Jun  5 15:52:07 2006
From: mcherm at mcherm.com (Michael Chermside)
Date: Mon, 05 Jun 2006 06:52:07 -0700
Subject: [Python-3000] a slight change to __[get|set|del]item__
Message-ID: <20060605065207.93v8tbf3rx5w84sg@login.werra.lunarpages.com>

Tomer writes:
> well is func((1,2,3)) the same as func(1,2,3)? no.
> so why should container[1, 2, 3] be the same as container[(1,2,3)]?
> you say it's a feature. is it intentionally *ambiguous*?
>
> what you'd want in that case is
>     t = (1, 2, 3)
>     container[*t]
> or something like that.
>
> i guess it's a dead subject, but i wanted to have that clarified.

There's no ambiguity, the rule is like this:

Parentheses are a piece of syntax that is used for grouping
everywhere *except* in function/method argument lists (both
function declarations and invocations). Empty parentheses are also
used to indicate an empty tuple. The comma is a piece of synatax
that has special meaning in function declarations, function/method
invocations, list literals, and dictionary literals (I think
that's the full list of exceptions). Everywhere else it indicates
tuple creation.

Admitedly, it's slightly odd that a special exception to the
meaning of parentheses is made for the syntax of functions, but
there is a LONG and powerful historical convention that makes this
the most widely accepted syntax for function invocation. Both
Smalltalk and Lisp were brilliant languages whose popularity was
(IMO) severely wounded by failing to maintain this syntax for
function invocation.

Using the comma to separate items in collections makes good sense
too. List and dictionary literals obviously fall into this
category. Making tuple be "the collection with no syntax" was a
clever syntactical trick that allows things like these:

     return a, b
     x, y = y, x
     for i, x in enumerate(aList):

to feel completely natural yet still be regular in syntax.

-- Michael Chermside

From tomerfiliba at gmail.com  Mon Jun  5 18:36:30 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Mon, 5 Jun 2006 18:36:30 +0200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <44836417.5080209@canterbury.ac.nz>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<44836417.5080209@canterbury.ac.nz>
Message-ID: <1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com>

> I'm -1 on having multiple kinds of read methods which
> are available only on some kinds of streams. The
> basic interface of a stream should be dirt simple.

it's a convenience method. instead of doing it yourself everytime,
readavail() returns all the available data in the socket's buffers.

the basic interface should be simple and spartan, but does that
mean every deriving class must not extend it? from personal experience,
of myself and others i've worked with, i can tell you readavail() would
be very useful. for reference, .NET sockets has it. so of course .NET
is NOT a model of great design, but it does show you trends and needs
of programmers.

> My current opinion on select-like functionality is
> that you shouldn't need to import a module for it at
> all. Rather, you should be able to attach a callback
> directly to a stream. Then there just needs to be
> a wait_for_something_to_happen() function somewhere
> (perhaps with a timeout).

yes, that's how i'd do it, but then how would you wait for
multiple streams?

compare
select([sock1, sock2, sock3], [], [])
to
sock1.async_read(100, callback)

how can you block/wait for multiple streams?

-tomer

On 6/5/06, Greg Ewing < greg.ewing at canterbury.ac.nz> wrote:
>
> tomer filiba wrote:
>
> > NetworkStreams have a readavail() method, which reads all the available
> > in-queue data, as well as a may_read and a may_write properties
>
> I'm -1 on having multiple kinds of read methods which
> are available only on some kinds of streams. The
> basic interface of a stream should be dirt simple.
>
> Given a read-up-to-n-bytes method, it's easy to implement
> read-exactly-n-bytes on top of it in a completely
> generic way. So provide it as a function that operates
> on a stream, or a method inherited from a generic base
> class.
>
> > maybe introduce a new select module that has select-objects, like
> > the Poll() class, that will default to using select(), but could use
> > kqueue/epoll when possible?
>
> My current opinion on select-like functionality is
> that you shouldn't need to import a module for it at
> all. Rather, you should be able to attach a callback
> directly to a stream. Then there just needs to be
> a wait_for_something_to_happen() function somewhere
> (perhaps with a timeout).
>
> Underneath, the implementation would use select,
> poll, or whatever is most fun on the platform
> concerned.
>
> --
> Greg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060605/77c3b07a/attachment.html 

From tomerfiliba at gmail.com  Mon Jun  5 19:16:40 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Mon, 5 Jun 2006 19:16:40 +0200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <20060604131031.69CF.JCARLSON@uci.edu>
References: <448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<20060604131031.69CF.JCARLSON@uci.edu>
Message-ID: <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>

> > well, it's too hard to design for a nonexisting module. select is all there
> > is that's platform independent.
>
> It is /relatively/ platform independent.

if it runs on windows, linux, *bsd, solaris, it's virtually platform
independent.
i don't consider the nokia N60 or whatever the name was, as well as other
esoteric environments, as "platforms", at least not such that should be taken
into consideration when designing APIs and standard modules.

> I didn't know why at first, but I've figured
> out that it is a combination of "I enjoy writing wire protocols" and "it
> would be very nice if my old socket/file software continued to work in
> py3k".
[...]
> Rather than changing what people expect with the current .read() method,
> why not offer a different method called .readexact(n), which will read
> exactly n bytes, performing buffering as necessary.

okay, i give up on read(n) returning n bytes. that being said, and taking into
account the "helpers" i suggested (a function named file/open that is
API-compliant to today's file) -- i'd assume 80% of the code would be
compatible.

after all, the major use-cases of IO are files and sockets. if we keep
those looking the same, at least the core APIs, most code should
work fine.

again, don't forget sock2 is separate from iostack, and can be used by
itself. it has send/recv like normal sockets, is select()able, etc... the
only adaptation needed for legacy code is converting "import socket"
to "import sock2" (which would be unncecessary if it became the
standard socket module), as well as converting
s = socket.socket()
s.connect(...)
to
s = socket.TcpSocket(...)

grepping through the source can pinpoint these locations.

> > random idea:
> > when compiled with universal line support, python unicode should
> > equate "\n" to any of the forementioned characters.
> > i.e.
> >
> > u"\n" == u"\u2028" # True
>
> I'm glad that you later decided for yourself that such a thing would be
> utterly and completely foolish.

it's not foolish, it's bad. these are different things (foolish being "lacking
a proper rationale", and bad being "destroying the very foundations of
python"). but again, it was kept "for the record".

> > f.position = -10
> > raises a ValueError, which is logical
>
> Raising a ValueError on an unseekable stream would be confusing.

true, but so are TypeErrors for ArgumentErrors, or TypeErrors for HashErrors,
etc. besides, why shouldn't attributes raise IOError? after all you are working
with *IO*, so "s.position = -10" raising an IOError isn't all too strange.
anyway, that's a technicality and the rest of the framework can suffer delaying
that decision for later.

> > class NetworkStream(InputStream, OutputStream):
> >    ...
> >
> > which version of close() gets called?
>
> Both, you use super().

if an InputStream and OutputStream are just interfaces, that's fine,
but still, i don't find it acceptable for one method to be defined by
two interfaces, and then have it intersected in a deriving class.

perhaps the hierarchy should be

class Stream:
    def close
    property closed
    def seek
    def tell

class InputStream(Stream):
    def read
    def readexact
    def readall

class OutputStream(Stream):
    def write

but then, most of the streams, like files, pipes and sockets,
would need to derive from both InputStream and OutputStream.

another issue:

class  InputFile(InputStream)
    ...
class OutputFile(OutputStream):
    ...
class File(InputStream, OutputStream):
    ....

i think there's gonna be much duplication of code, because FIle can't
inherit from InputFile and OutputFile, as they are each a separate stream,
while File is a single InOutStream.

and a huge class hierarchy makes attribute lookups slower.

-tomer

On 6/4/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "tomer filiba" <tomerfiliba at gmail.com> wrote:
> [snip]
> > > - interaction with (replacement of?) the select module
> >
> > well, it's too hard to design for a nonexisting module. select is all there
> > is that's platform independent.
>
> It is /relatively/ platform independent.
>
> > random idea:
> > * select is virtually platform independent
> > * improved polling is inconsistent
> >     * kqueue is BSD-only
> >     * epoll is linux-only
> >     * windows has none of those
>
> Windows doesn't currently have a module designed to do this kind of
> thing, but it is possible to have a higher-performance method for
> Windows using various bits from the win32file module from pywin32 (I
> have been contemplating writing one, but I haven't had the time).
>
> [snip]
>
> > - - - - -
> >
> > > e.g an alternative approach would be to
> > > define InputStream and OutputStream, and then have an IOStream that inherited
> > > from both of them).
> >
> > hrrm... i need to think about this more. one problem i already see:
> >
> > class InputStream:
> >    def close(self):....
> >    def read(self, count): ...
> >
> > class OutputStream:
> >    def close(self):....
> >    def write(self, data)...
> >
> > class NetworkStream(InputStream, OutputStream):
> >    ...
> >
> > which version of close() gets called?
>
> Both, you use super().
>
> > - - - - -
> >
> > > e.g. the 'position' property is
> > > probably a bad idea, because x.position may then raise an IOError
> >
> > i guess it's reasonable approach, but i'm a "usability beats purity" guy.
> > f.position = 0
> > or
> > f.position += 10
> >
> > is so much more convenient than seek()ing and tell()ing. we can also
> > optimize += by defining a Position type where __iadd__(n) uses
> > seek(n, "curr") instead of seek(n + tell(), "start")
> >
> > btw, you can first test the "seakable" attribute, to see if positioning
> > would work.
> >
> > and in the worst case, i'd vote for converting IOErrors to ValueErrors...
> >
> > def _set_pos(self, n)
> >     try:
> >        self.seek(n)
> >     except IOError:
> >        raise ValueError("invalid position value", n)
> >
> > so that
> > f.position = -10
> > raises a ValueError, which is logical
>
> Raising a ValueError on an unseekable stream would be confusing.
>
> [snip]
> > - - - - -
> >
> > random idea:
> > when compiled with universal line support, python unicode should
> > equate "\n" to any of the forementioned characters.
> > i.e.
> >
> > u"\n" == u"\u2028" # True
>
> I'm glad that you later decided for yourself that such a thing would be
> utterly and completely foolish.
>
> > - - - - -
> >
> > > I can see that behaviour being seriously annoying when you get to the end of
> > > the stream. I'd far prefer for the stream to just give me the last bit when I
> > > ask for it and then tell me *next* time that there isn't anything left.
> >
> > well, today it's done like so:
> >
> > while True:
> >    x = f.read(100)
> >    if not x:
> >       break
> >
> > in iostack, that would be done like so:
> >
> > try:
> >     while True:
> >         x = f.read(100)
> > except EOFError:
> >     last_x = f.readall() # read all the leftovers (0 <= leftovers < 100)
> >
> > a little longer, but not illogical
> >
> > > If you want a method with the other behaviour, add a "readexact" API, rather
> > > than changing the semantics of "read" (although I'd be really curious to hear
> > > the use case for the other behaviour).
> >
> > well, when i work with files/sockets, i tend to send data structures over them,
> > like records, frames, protocols, etc. if a record is said to be x bytes long,
> > and read(x) returns less than x bytes, my code has to loop until it gets
> > enough bytes.
>
> Rather than changing what people expect with the current .read() method,
> why not offer a different method called .readexact(n), which will read
> exactly n bytes, performing buffering as necessary.  You can then
> optimize by using cStringIOs, lists of strings, resizable bytes, or
> whatever other method you want (but be careful never to .read(bignum)
> unless you change the underlying .read() implementation; right now it
> allocates a buffer of size bignum, which can cause huge amounts of
> malloc/realloc thrashing, and generally causes MemoryErrors).
>
> [snip]
>
>  - Josiah
>
>

From rasky at develer.com  Mon Jun  5 20:00:41 2006
From: rasky at develer.com (Giovanni Bajo)
Date: Mon, 5 Jun 2006 20:00:41 +0200
Subject: [Python-3000] iostack and sock2
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
Message-ID: <016c01c688c9$f5b212d0$bf03030a@trilan>

tomer filiba wrote:

> some time ago i wrote this huge post about stackable IO and the
> need for a new socket module. i've made some progress with
> those, and i'd like to receive feedback.
>
> * a working alpha version of the new socket module (sock2) is
> available for testing and tweaking with at
> http://sebulba.wikispaces.com/project+sock2
>
> * i'm working on a version of iostack... but i don't expect to make
> a public release until mid july. in the meanwhile, i started a wiki
> page on my site for it (motivation, plans, design):
> http://sebulba.wikispaces.com/project+iostack
> with lots of pretty-formatted info. i remember people saying
> that stating `read(n)` returns exactly `n` bytes is problematic,
> can you elaborate?

Hi Tomer, this is great stuff you're doing! It's something that's really
needed in my opinion. Basically, right now there's only a convention of
passing around duck-typed things which have a "read" method, and that's all!
It's nice to better define this duck-typed interface, and it seems you're
doing very good progress on that. I hope I have more time to properly
comment on this later (I'll wait for the first iteration of comments).

One thing I would like to raise is the issue of KeyboardInterrupt. I find
very inconvenient that a normal application doing a very simple blocking
read from a socket can't be interrupted by a CTRL+C sequence. Usually, what
I do is to setup a timeout on the sockets (eg. 0.4 seconds) and then simply
retry if the data has not arrived yet. But this changes the code from:

data = sock.recv(10)

to:

while 1:
   try:
      data = sock.recv(10)
   except socket.timeout:
      # just so that CTRL+C is processed
      continue
   else:
      break

which is IMO counter-intuitive and un-pythonic. It's such a convoluted code
that it happened once to me that another programmer collapsed this back into
the bare sock.recv() because he couldn't immediately see why that complexity
was required (of course, comments might have helped and stuff, but I guess
you see my point).

I believe that this kind of things ought to work by default with the minimum
possible amount of code. Specifically, I think that the new iostack should
allow blocking mode without trapping CTRL+C by default (which is the normal
behaviour expected). I'm not sure if it's worth doing the auto-retry trick
internally (bleah), or implement blocking calls with a call to select() so
that you can also wait on signals, or something like that; I don't have a
suggestion at this point, but I thought it was worth to raise the issue.
-- 
Giovanni Bajo

From jcarlson at uci.edu  Mon Jun  5 20:44:15 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Mon, 05 Jun 2006 11:44:15 -0700
Subject: [Python-3000] iostack and sock2
In-Reply-To: <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
References: <20060604131031.69CF.JCARLSON@uci.edu>
	<1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
Message-ID: <20060605110457.69DD.JCARLSON@uci.edu>

"tomer filiba" <tomerfiliba at gmail.com> wrote:
> 
> > > well, it's too hard to design for a nonexisting module. select is all there
> > > is that's platform independent.
> >
> > It is /relatively/ platform independent.
> 
> if it runs on windows, linux, *bsd, solaris, it's virtually platform
> independent.
> i don't consider the nokia N60 or whatever the name was, as well as other
> esoteric environments, as "platforms", at least not such that should be taken
> into consideration when designing APIs and standard modules.

[the following snipped from a different reply of yours]
> compare
> select([sock1, sock2, sock3], [], [])
> to
> sock1.async_read(100, callback)
> 
> how can you block/wait for multiple streams?

Depending on the constants defined during compile time, the file handle
limit can be lower or higher than expected (I once used a version with a
32 handle limit; was a bit frustrating).  Also, as discussed in the
'epoll implementation' thread on python-dev, an IOCP implementation for
Windows could perhaps be written in such a way to be compatible with the
libevent-python project.  Visiting the libevent-python example script (
http://python-hpio.net/trac/browser/Projects/libevent-python/trunk/exa
mples/echo_server.py) shows us how you can do such things.

[snip]
> > > random idea:
> > > when compiled with universal line support, python unicode should
> > > equate "\n" to any of the forementioned characters.
> > > i.e.
> > >
> > > u"\n" == u"\u2028" # True
> >
> > I'm glad that you later decided for yourself that such a thing would be
> > utterly and completely foolish.
> 
> it's not foolish, it's bad. these are different things (foolish being "lacking
> a proper rationale", and bad being "destroying the very foundations of
> python"). but again, it was kept "for the record".

I don't believe it would "[destroy] the very foundations of python"
(unicode is not the very foundation of Python, and it wouldn't destroy
unicode, only change its comparison semantics), but I do believe it
"[lacks] a proper rationale".  That is; unicode.split() should work as
expected (if not, it should be fixed), and it seems as though line
iteration over files with an encoding specified should deal with those
other line endings - though its behavior in regards to universal
newlines should probably be discussed.

> > > f.position = -10
> > > raises a ValueError, which is logical
> >
> > Raising a ValueError on an unseekable stream would be confusing.
> 
> true, but so are TypeErrors for ArgumentErrors, or TypeErrors for HashErrors,
> etc. besides, why shouldn't attributes raise IOError? after all you are working
> with *IO*, so "s.position = -10" raising an IOError isn't all too strange.
> anyway, that's a technicality and the rest of the framework can suffer delaying
> that decision for later.

What other properties on other classes do is their own business.  We are
talking about this particular implementation of this particular feature
on this particular class (or set of related classes).  If given the
choice of a ValueError or an IOError on f.position failure, I would opt
for IOError; but I would prefer f.seek() and f.tell(), because with
f.seek() you can use the "whence" parameter to get absolute or relative
seeking.

> > > class NetworkStream(InputStream, OutputStream):
> > >    ...
> > >
> > > which version of close() gets called?
> >
> > Both, you use super().
> 
> if an InputStream and OutputStream are just interfaces, that's fine,
> but still, i don't find it acceptable for one method to be defined by
> two interfaces, and then have it intersected in a deriving class.

So have both InputStream and OutputStream use super to handle other
possible .close() calls, rather than making their subclasses do so.

> perhaps the hierarchy should be
> 
> class Stream:
>     def close
>     property closed
>     def seek
>     def tell
> 
> class InputStream(Stream):
>     def read
>     def readexact
>     def readall
> 
> class OutputStream(Stream):
>     def write
> 
> but then, most of the streams, like files, pipes and sockets,
> would need to derive from both InputStream and OutputStream.

But then there are other streams where you want to call two *different* 
.close() methods, and the above would only allow for 1.  Closing
multiple times shouldn't be a problem for most streams, but not closing
enough could be a problem.

> another issue:
> 
> class  InputFile(InputStream)
>     ...
> class OutputFile(OutputStream):
>     ...
> class File(InputStream, OutputStream):
>     ....
> 
> i think there's gonna be much duplication of code, because FIle can't
> inherit from InputFile and OutputFile, as they are each a separate stream,
> while File is a single InOutStream.
> 
> and a huge class hierarchy makes attribute lookups slower.

Have you tried to measure this?  In my tests (with Python 2.3), it's
somewhere on the order of .2 microseconds per operation of difference
between the original class and a 7th level subclass (i subclasses h,
which subclasses g, which subclasses f, which subclasses, e, ...).

 - Josiah

From tomerfiliba at gmail.com  Mon Jun  5 21:06:48 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Mon, 5 Jun 2006 21:06:48 +0200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <016c01c688c9$f5b212d0$bf03030a@trilan>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<016c01c688c9$f5b212d0$bf03030a@trilan>
Message-ID: <1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com>

hey

> One thing I would like to raise is the issue of KeyboardInterrupt. I find
> very inconvenient that a normal application doing a very simple blocking
> read from a socket can't be interrupted by a CTRL+C sequence. Usually, what
> I do is to setup a timeout on the sockets (eg. 0.4 seconds) and then simply
> retry if the data has not arrived yet. But this changes the code from:

from my experience with linux and solaris, this CTRL+C problem only
happens on windows machines. but then again, windows can't select()
on anything but sockets, so there's not gonna be a generic solution.
setting timeouts has some issues (inefficiency, platform dependency,
etc.). but it's a good point to take into account. i'll see where that fits.

-tomer

On 6/5/06, Giovanni Bajo <rasky at develer.com> wrote:
> tomer filiba wrote:
>
> > some time ago i wrote this huge post about stackable IO and the
> > need for a new socket module. i've made some progress with
> > those, and i'd like to receive feedback.
> >
> > * a working alpha version of the new socket module (sock2) is
> > available for testing and tweaking with at
> > http://sebulba.wikispaces.com/project+sock2
> >
> > * i'm working on a version of iostack... but i don't expect to make
> > a public release until mid july. in the meanwhile, i started a wiki
> > page on my site for it (motivation, plans, design):
> > http://sebulba.wikispaces.com/project+iostack
> > with lots of pretty-formatted info. i remember people saying
> > that stating `read(n)` returns exactly `n` bytes is problematic,
> > can you elaborate?
>
> Hi Tomer, this is great stuff you're doing! It's something that's really
> needed in my opinion. Basically, right now there's only a convention of
> passing around duck-typed things which have a "read" method, and that's all!
> It's nice to better define this duck-typed interface, and it seems you're
> doing very good progress on that. I hope I have more time to properly
> comment on this later (I'll wait for the first iteration of comments).
>
> One thing I would like to raise is the issue of KeyboardInterrupt. I find
> very inconvenient that a normal application doing a very simple blocking
> read from a socket can't be interrupted by a CTRL+C sequence. Usually, what
> I do is to setup a timeout on the sockets (eg. 0.4 seconds) and then simply
> retry if the data has not arrived yet. But this changes the code from:
>
> data = sock.recv(10)
>
> to:
>
> while 1:
>    try:
>       data = sock.recv(10)
>    except socket.timeout:
>       # just so that CTRL+C is processed
>       continue
>    else:
>       break
>
> which is IMO counter-intuitive and un-pythonic. It's such a convoluted code
> that it happened once to me that another programmer collapsed this back into
> the bare sock.recv() because he couldn't immediately see why that complexity
> was required (of course, comments might have helped and stuff, but I guess
> you see my point).
>
> I believe that this kind of things ought to work by default with the minimum
> possible amount of code. Specifically, I think that the new iostack should
> allow blocking mode without trapping CTRL+C by default (which is the normal
> behaviour expected). I'm not sure if it's worth doing the auto-retry trick
> internally (bleah), or implement blocking calls with a call to select() so
> that you can also wait on signals, or something like that; I don't have a
> suggestion at this point, but I thought it was worth to raise the issue.
> --
> Giovanni Bajo
>
>

From tomerfiliba at gmail.com  Mon Jun  5 21:26:21 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Mon, 5 Jun 2006 21:26:21 +0200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <20060605110457.69DD.JCARLSON@uci.edu>
References: <20060604131031.69CF.JCARLSON@uci.edu>
	<1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
	<20060605110457.69DD.JCARLSON@uci.edu>
Message-ID: <1d85506f0606051226q7fa13a54jc69442fe029f5993@mail.gmail.com>

> I don't believe it would "[destroy] the very foundations of python"
> (unicode is not the very foundation of Python, and it wouldn't destroy
> unicode, only change its comparison semantics), but I do believe it
> "[lacks] a proper rationale"

no, it would break the basic rules of comparisson. all of the sudden,
0x0a == 0x2028 == 0x85, etc. so you can't tell whether you got a
"\x0a" character of "\x85" one... it would make python inconsistent.
but this discussion is silly, let's quit it. we are both -1 on it.

> That is; unicode.split() should work as
> expected (if not, it should be fixed), and it seems as though line
> iteration over files with an encoding specified should deal with those
> other line endings - though its behavior in regards to universal
> newlines should probably be discussed.

unicode being native to python is gonna be one big pain to implement :)

> If given the
> choice of a ValueError or an IOError on f.position failure, I would opt
> for IOError;

so would i.

> but I would prefer f.seek() and f.tell(), because with
> f.seek() you can use the "whence" parameter to get absolute or relative
> seeking.

yes, but +=/-= can be overriden to provide "efficient seeking". and, just
thought about it: just like negative indexes of sequences, negative positions
should be relative to the end of the stream. for example:

f.position = 4     # absolute -- seek(4, "start")
f.position += 6   # relative to current -- seek(6, "curr")
f.position = -7    # relative to end of stream -- seek(-7, "end")

that's easy to implement and easy AND efficient to work with.

> But then there are other streams where you want to call two *different*
> .close() methods, and the above would only allow for 1.  Closing
> multiple times shouldn't be a problem for most streams, but not closing
> enough could be a problem.

hrrm... what do you mean by "closing multiple times"? like socket.shutdown
for reading or for writing? but other than sockets, what else can be closed
in multiple ways? you can't close the "reading" of a file, while keeping it
open for writing.

f = FileStream(...)
InputStream.close(f)
f.write(...) # exception: stream closed

> Visiting the libevent-python example script (
> http://python-hpio.net/trac/browser/Projects/libevent-python/trunk/exa
> mples/echo_server.py) shows us how you can do such things.

i didn't see this before. i'll look into it.

> > and a huge class hierarchy makes attribute lookups slower.
> Have you tried to measure this?  In my tests (with Python 2.3), it's
> somewhere on the order of .2 microseconds per operation of difference

no, i guess i fell for the common urban legend. sorry.

thanks for the feedback.

-tomer

On 6/5/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "tomer filiba" <tomerfiliba at gmail.com> wrote:
> >
> > > > well, it's too hard to design for a nonexisting module. select is all there
> > > > is that's platform independent.
> > >
> > > It is /relatively/ platform independent.
> >
> > if it runs on windows, linux, *bsd, solaris, it's virtually platform
> > independent.
> > i don't consider the nokia N60 or whatever the name was, as well as other
> > esoteric environments, as "platforms", at least not such that should be taken
> > into consideration when designing APIs and standard modules.
>
> [the following snipped from a different reply of yours]
> > compare
> > select([sock1, sock2, sock3], [], [])
> > to
> > sock1.async_read(100, callback)
> >
> > how can you block/wait for multiple streams?
>
> Depending on the constants defined during compile time, the file handle
> limit can be lower or higher than expected (I once used a version with a
> 32 handle limit; was a bit frustrating).  Also, as discussed in the
> 'epoll implementation' thread on python-dev, an IOCP implementation for
> Windows could perhaps be written in such a way to be compatible with the
> libevent-python project.  Visiting the libevent-python example script (
> http://python-hpio.net/trac/browser/Projects/libevent-python/trunk/exa
> mples/echo_server.py) shows us how you can do such things.
>
> [snip]
> > > > random idea:
> > > > when compiled with universal line support, python unicode should
> > > > equate "\n" to any of the forementioned characters.
> > > > i.e.
> > > >
> > > > u"\n" == u"\u2028" # True
> > >
> > > I'm glad that you later decided for yourself that such a thing would be
> > > utterly and completely foolish.
> >
> > it's not foolish, it's bad. these are different things (foolish being "lacking
> > a proper rationale", and bad being "destroying the very foundations of
> > python"). but again, it was kept "for the record".
>
> I don't believe it would "[destroy] the very foundations of python"
> (unicode is not the very foundation of Python, and it wouldn't destroy
> unicode, only change its comparison semantics), but I do believe it
> "[lacks] a proper rationale".  That is; unicode.split() should work as
> expected (if not, it should be fixed), and it seems as though line
> iteration over files with an encoding specified should deal with those
> other line endings - though its behavior in regards to universal
> newlines should probably be discussed.
>
>
> > > > f.position = -10
> > > > raises a ValueError, which is logical
> > >
> > > Raising a ValueError on an unseekable stream would be confusing.
> >
> > true, but so are TypeErrors for ArgumentErrors, or TypeErrors for HashErrors,
> > etc. besides, why shouldn't attributes raise IOError? after all you are working
> > with *IO*, so "s.position = -10" raising an IOError isn't all too strange.
> > anyway, that's a technicality and the rest of the framework can suffer delaying
> > that decision for later.
>
> What other properties on other classes do is their own business.  We are
> talking about this particular implementation of this particular feature
> on this particular class (or set of related classes).  If given the
> choice of a ValueError or an IOError on f.position failure, I would opt
> for IOError; but I would prefer f.seek() and f.tell(), because with
> f.seek() you can use the "whence" parameter to get absolute or relative
> seeking.
>
>
> > > > class NetworkStream(InputStream, OutputStream):
> > > >    ...
> > > >
> > > > which version of close() gets called?
> > >
> > > Both, you use super().
> >
> > if an InputStream and OutputStream are just interfaces, that's fine,
> > but still, i don't find it acceptable for one method to be defined by
> > two interfaces, and then have it intersected in a deriving class.
>
> So have both InputStream and OutputStream use super to handle other
> possible .close() calls, rather than making their subclasses do so.
>
>
> > perhaps the hierarchy should be
> >
> > class Stream:
> >     def close
> >     property closed
> >     def seek
> >     def tell
> >
> > class InputStream(Stream):
> >     def read
> >     def readexact
> >     def readall
> >
> > class OutputStream(Stream):
> >     def write
> >
> > but then, most of the streams, like files, pipes and sockets,
> > would need to derive from both InputStream and OutputStream.
>
> But then there are other streams where you want to call two *different*
> .close() methods, and the above would only allow for 1.  Closing
> multiple times shouldn't be a problem for most streams, but not closing
> enough could be a problem.
>
>
> > another issue:
> >
> > class  InputFile(InputStream)
> >     ...
> > class OutputFile(OutputStream):
> >     ...
> > class File(InputStream, OutputStream):
> >     ....
> >
> > i think there's gonna be much duplication of code, because FIle can't
> > inherit from InputFile and OutputFile, as they are each a separate stream,
> > while File is a single InOutStream.
> >
> > and a huge class hierarchy makes attribute lookups slower.
>
> Have you tried to measure this?  In my tests (with Python 2.3), it's
> somewhere on the order of .2 microseconds per operation of difference
> between the original class and a 7th level subclass (i subclasses h,
> which subclasses g, which subclasses f, which subclasses, e, ...).
>
>
>  - Josiah
>
>

From greg.ewing at canterbury.ac.nz  Tue Jun  6 02:26:17 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 06 Jun 2006 12:26:17 +1200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<44836417.5080209@canterbury.ac.nz>
	<1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com>
Message-ID: <4484CBA9.6070507@canterbury.ac.nz>

tomer filiba wrote:
> I wrote:
>  > My current opinion on select-like functionality is
>  > that you shouldn't need to import a module for it at
>  > all. Rather, you should be able to attach a callback
>  > directly to a stream. Then there just needs to be
>  > a wait_for_something_to_happen() function somewhere
>  > (perhaps with a timeout).
> 
> yes, that's how i'd do it, but then how would you wait for
> multiple streams?

   # somewhere in the program

   stream1.on_readable = handle_stream1

   # somewhere else

   stream2.on_readable = handle_stream2

   # and the main loop says

   while program_is_running():
     wait_for_streams()

The wait_for_streams function waits for activity on any
stream which has a callback, and calls it.

(BTW, I actually think this sort of functionality should
be part of the OS kernel, with event-driven programs and
libraries being so important nowadays. Sort of like being
able to define signal handlers for file descriptors instead
of having a small, fixed number of signals.)

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Jun  6 02:32:51 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 06 Jun 2006 12:32:51 +1200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
References: <448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<20060604131031.69CF.JCARLSON@uci.edu>
	<1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
Message-ID: <4484CD33.10400@canterbury.ac.nz>

tomer filiba wrote:

> okay, i give up on read(n) returning n bytes.

An idea I had about this some time ago was that read()
could be callable with two arguments:

   f.read(min_bytes, max_bytes)

The two variations we're considering would then be special
cases of this:

   f.read(0, num_bytes)         # current read() behaviour

   f.read(num_bytes, num_bytes) # record-oriented read() behaviour

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Jun  6 02:57:07 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 06 Jun 2006 12:57:07 +1200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <1d85506f0606051226q7fa13a54jc69442fe029f5993@mail.gmail.com>
References: <20060604131031.69CF.JCARLSON@uci.edu>
	<1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
	<20060605110457.69DD.JCARLSON@uci.edu>
	<1d85506f0606051226q7fa13a54jc69442fe029f5993@mail.gmail.com>
Message-ID: <4484D2E3.20402@canterbury.ac.nz>

tomer filiba wrote:

> yes, but +=/-= can be overriden to provide "efficient seeking". and, just
> thought about it: just like negative indexes of sequences, negative positions
> should be relative to the end of the stream. for example:
> 
> f.position = 4     # absolute -- seek(4, "start")
> f.position += 6   # relative to current -- seek(6, "curr")
> f.position = -7    # relative to end of stream -- seek(-7, "end")

How would you seek to exactly the end of the file,
without introducing signed integer zeroes to Python?-)

--
Greg

From jimjjewett at gmail.com  Tue Jun  6 02:57:24 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 5 Jun 2006 20:57:24 -0400
Subject: [Python-3000] [Python-Dev] Stdlib Logging questions (PEP 337
	SoC)
In-Reply-To: <5.1.1.6.0.20060604231709.02f07700@mail.telecommunity.com>
References: <5.1.1.6.0.20060604231709.02f07700@mail.telecommunity.com>
Message-ID: <fb6fbf560606051757x6ea829e2r88dd9de3c9717a12@mail.gmail.com>

On 6/4/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> can we please delay the import until it's actually needed?  i.e.,
> until after some logging option is enabled?

I have asked her to make this change.

I don't like the extra conditional dance it causes, but I agree that
not wanting to log is a valid use case.

On the other hand, the one-time import cost is pretty low for a
long-running process, and eventually gets paid if any other module
calls logging.  Would it make more sense to offer a null package that
can be installed earlier in the search path if you want to truly
disable logging?

-jJ

From jimjjewett at gmail.com  Tue Jun  6 03:05:15 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 5 Jun 2006 21:05:15 -0400
Subject: [Python-3000] [Python-Dev] Stdlib Logging questions (PEP 337
	SoC)
In-Reply-To: <fb6fbf560606051757x6ea829e2r88dd9de3c9717a12@mail.gmail.com>
References: <5.1.1.6.0.20060604231709.02f07700@mail.telecommunity.com>
	<fb6fbf560606051757x6ea829e2r88dd9de3c9717a12@mail.gmail.com>
Message-ID: <fb6fbf560606051805l269d9b6fq5fa327f804520df5@mail.gmail.com>

oops -- this was meant for python-dev, not python-3000.

From lunz at falooley.org  Tue Jun  6 03:44:36 2006
From: lunz at falooley.org (Jason Lunz)
Date: Tue, 6 Jun 2006 01:44:36 +0000 (UTC)
Subject: [Python-3000] iostack and sock2
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<44836417.5080209@canterbury.ac.nz>
	<1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com>
	<4484CBA9.6070507@canterbury.ac.nz>
Message-ID: <e62mm4$1ba$1@sea.gmane.org>

greg.ewing at canterbury.ac.nz said:
> (BTW, I actually think this sort of functionality should be part of
> the OS kernel, with event-driven programs and libraries being so
> important nowadays. Sort of like being able to define signal handlers
> for file descriptors instead of having a small, fixed number of
> signals.)

do you mean that hypothetically? That's supported on linux, but I don't
know how portable it is. You *can* define signal handlers for file
descriptors. See F_SETSIG in fcntl(2), and sigaction(2).

Jason

From talin at acm.org  Tue Jun  6 09:53:38 2006
From: talin at acm.org (Talin)
Date: Tue, 06 Jun 2006 00:53:38 -0700
Subject: [Python-3000] String formatting: Conversion specifiers
Message-ID: <44853482.3050103@acm.org>

I've been slowly working on PEP 3101, specifically fleshing out the 
details, and there's a couple of issues that I wanted to run by the 
group mind here.

Originally, I decided to punt on the issue of field conversion 
specifiers (i.e. %2.2s etc.) and simply say that they were unchanged 
from the existing implementation.

However, I've been looking over the source for PyString_Format, and I'm 
thinking that what the code for handling field conversions is a lot more 
complicated than what we really need here.

Here is a list of the conversion types that are currently supported by 
the % operator. First thing you notice is an eerie similarity between 
this and the documentation for 'sprintf'. :)

Conversion	Meaning	Notes
d	Signed integer decimal.	
i	Signed integer decimal.	
o	Unsigned octal.	(1)
u	Unsigned decimal.	
x	Unsigned hexadecimal (lowercase).	(2)
X	Unsigned hexadecimal (uppercase).	(2)
e	Floating point exponential format (lowercase).	
E	Floating point exponential format (uppercase).	
f	Floating point decimal format.	
F	Floating point decimal format.	
g	Same as "e" if exponent is greater than -4 or less than precision, 
"f" otherwise.	
G	Same as "E" if exponent is greater than -4 or less than precision, 
"F" otherwise.	
c	Single character (accepts integer or single character string).	
r	String (converts any python object using repr()).	(3)
s	String (converts any python object using str()).	(4)
%	No argument is converted, results in a "%" character in the result.	

Now, unlike C, in Python we already know the type of the thing we're 
going to print. So there's no need to tell the system 'this is a float' 
or 'this is an integer'. The only way I could see this being useful is 
if you had a type and wanted it to print out as some different type - 
but is that really the proper role of the string formatter?

Similarly, what does it mean to have an 'unsigned' quantity in Python? 
If you say "print this negative number as unsigned", what does that 
mean? Does it take the absolute value, or does it do what C does and 
takes the number modulo 2^32? Neither seems particularly correct or 
intuitive to me.

So I decided to sit down and rethink the whole conversion specifier 
system. I looked at the docs for the '%' operator, and some other 
languages, and here is what I came up with (this is an excerpt from the 
revised PEP.)

Oh, and I should mention that I have a working implementation of what is 
described below.
--------------------------

Standard Conversion Specifiers

     Most built-in types will support a standard set of conversion
     specifiers. These are similar in concept to the conversion
     specifiers used by the existing '%' operator, however there are
     also a number of significant differences.

     The general form of the standard conversion specifier is:

         [flags][length][.precision][type]

     The brackets ([]) indicate an optional field.

     The flags can be one of the following:

         '+' - indicates that a sign should be used for both
               positive as well as negative numbers (normally only
               negative numbers will have a sign.)

         '<' - Forces the field to be left-aligned within the available
               space (This is the default.)

         '>' - Forces the field to be right-aligned within the
               available space.

         '0' - Causes any leftover space in the field to be filled
               with leading zeros. Note that this option also implies
               that the field is right-aligned.

         ' ' - Causes the leftover space in the field to be filled
               with spaces.

     'length' is the minimum field width. If not specified, then the
     field width will be determined by the content.

     For a numeric value, 'precision' is the number of digits after
     the decimal point that should be displayed.

     Finally, the 'type' determines how the data should be presented.
     It is generally only used for numeric types - string types do
     not need to indicate a type.

     The available types are:

         'b' - Binary. Outputs the number in base 2.
         'c' - Character. Converts the integer to the corresponding
               unicode character before printing.
         'd' - Decimal Integer. Prints only the whole-number portion
               of the number.
         'e' - Exponent notation. Prints the number in scientific
               notation using the letter 'e' to indicate the exponent.
         'E' - Exponent notation. Same as 'e' except it uses an upper
               case 'E' as the separator character.
         'f' - Fixed point. Displays the number as a fixed-point
               number.
         'F' - Fixed point. Same as 'f'.
         'g' - General format. This prints the number as a fixed-point
               number, unless the number is too large, in which case
               it switches to exponent notation.
         'G' - General format. Same as 'g' except switches to 'E'
               if the number gets to large.
         'n' - Number. This is the same as 'g', except that it uses the
               current locale setting to insert the appropriate
               number separator characters.
         'o' - Octal format. Outputs the number in base 8.
         'r' - Repr format. Outputs the value in a format which is
               likely to be readable by the interpreter. Also works
               with non-numeric fields.
         'x' - Hex format. Outputs the number in base 16, using lower-
               case letters for the upper digits.
         'X' - Hex format. Outputs the number in base 16, using upper-
               case letters for the upper digits.
         '%' - Percentage. Multiplies the number by 100 and displays
               in fixed ('f') format, followed by a percent sign.

     For non-built-in types, the conversion specifiers will be specific
     to that type.  An example is the 'datetime' class, whose
     conversion specifiers might look something like the arguments
     to the strftime() function:

         "Today is: {0:a b d H:M:S Y}".format(datetime.now())

-- Talin

From rasky at develer.com  Tue Jun  6 10:34:11 2006
From: rasky at develer.com (Giovanni Bajo)
Date: Tue, 6 Jun 2006 10:34:11 +0200
Subject: [Python-3000] iostack and sock2
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com><016c01c688c9$f5b212d0$bf03030a@trilan>
	<1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com>
Message-ID: <009301c68943$fc37fe60$3db72997@bagio>

tomer filiba <tomerfiliba at gmail.com> wrote:

>> One thing I would like to raise is the issue of KeyboardInterrupt. I
>> find very inconvenient that a normal application doing a very simple
>> blocking read from a socket can't be interrupted by a CTRL+C
>> sequence. Usually, what I do is to setup a timeout on the sockets
>> (eg. 0.4 seconds) and then simply retry if the data has not arrived
>> yet. But this changes the code from:
>
> from my experience with linux and solaris, this CTRL+C problem only
> happens on windows machines. but then again, windows can't select()
> on anything but sockets, so there's not gonna be a generic solution.

Windows has WaitForMultipleObjects() which can be used to multiplex between
sockets and other handles.

Giovanni Bajo

From ncoghlan at gmail.com  Tue Jun  6 11:51:32 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 06 Jun 2006 19:51:32 +1000
Subject: [Python-3000] iostack and sock2
In-Reply-To: <4484CD33.10400@canterbury.ac.nz>
References: <448258F3.3070808@gmail.com>	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>	<20060604131031.69CF.JCARLSON@uci.edu>	<1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
	<4484CD33.10400@canterbury.ac.nz>
Message-ID: <44855024.1010408@gmail.com>

Greg Ewing wrote:
> tomer filiba wrote:
> 
>> okay, i give up on read(n) returning n bytes.
> 
> An idea I had about this some time ago was that read()
> could be callable with two arguments:
> 
>    f.read(min_bytes, max_bytes)
> 
> The two variations we're considering would then be special
> cases of this:
> 
>    f.read(0, num_bytes)         # current read() behaviour
> 
>    f.read(num_bytes, num_bytes) # record-oriented read() behaviour

You can even makes this backwards compatible by having the min_bytes argument 
default to 0. (whether or not the order of the two arguments should be 
reversed in that case is debatable, though)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Tue Jun  6 11:47:28 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 06 Jun 2006 19:47:28 +1000
Subject: [Python-3000] iostack and sock2
In-Reply-To: <4484D2E3.20402@canterbury.ac.nz>
References: <20060604131031.69CF.JCARLSON@uci.edu>	<1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>	<20060605110457.69DD.JCARLSON@uci.edu>	<1d85506f0606051226q7fa13a54jc69442fe029f5993@mail.gmail.com>
	<4484D2E3.20402@canterbury.ac.nz>
Message-ID: <44854F30.60500@gmail.com>

Greg Ewing wrote:
> tomer filiba wrote:
> 
>> yes, but +=/-= can be overriden to provide "efficient seeking". and, just
>> thought about it: just like negative indexes of sequences, negative positions
>> should be relative to the end of the stream. for example:
>>
>> f.position = 4     # absolute -- seek(4, "start")
>> f.position += 6   # relative to current -- seek(6, "curr")
>> f.position = -7    # relative to end of stream -- seek(-7, "end")
> 
> How would you seek to exactly the end of the file,
> without introducing signed integer zeroes to Python?-)

Since it doesn't mean anything else, you could define a position of "None" as 
meaning 'just past the last valid byte in the file' (i.e., right at the end).

Then "f.position = None" would seek to the end. This actually matches the way 
None behaves when it is used as the endpoint of a slice: range(3)[0:None] 
returns [0, 1, 2, 3].

If that's not intuitive enough for your tastes, then a class attribute would 
also work:

f.position = f.END

(f.END would be serving as 'signed zero', since f.position -=1 and f.position 
= -1 would do the same thing)

FWIW, I also realised my objection to properties raising IOError doesn't apply 
for IO streams - unlike path objects, an IO stream *only* makes sense if you 
have access to the underlying IO layer. So documenting certain properties as 
potentially raising IOError seems legitimate in this case.

I'm glad I thought of that, since I like this API a lot better than mucking 
around with passing strings to seek() ;)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ronaldoussoren at mac.com  Tue Jun  6 12:06:19 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Tue, 6 Jun 2006 12:06:19 +0200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <44855024.1010408@gmail.com>
References: <448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<20060604131031.69CF.JCARLSON@uci.edu>
	<1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
	<4484CD33.10400@canterbury.ac.nz> <44855024.1010408@gmail.com>
Message-ID: <F68D4BBC-92BD-4323-A544-20B2F42FAC47@mac.com>

On 6-jun-2006, at 11:51, Nick Coghlan wrote:

> Greg Ewing wrote:
>> tomer filiba wrote:
>>
>>> okay, i give up on read(n) returning n bytes.
>>
>> An idea I had about this some time ago was that read()
>> could be callable with two arguments:
>>
>>    f.read(min_bytes, max_bytes)
>>
>> The two variations we're considering would then be special
>> cases of this:
>>
>>    f.read(0, num_bytes)         # current read() behaviour
>>
>>    f.read(num_bytes, num_bytes) # record-oriented read() behaviour
>
> You can even makes this backwards compatible by having the  
> min_bytes argument
> default to 0. (whether or not the order of the two arguments should be
> reversed in that case is debatable, though)

I'm slighly worried about this thread. Async I/O and "read exactly N  
bytes" don't really match up. I don't know about the other  
mechanisms, but at least with select and poll when the system says  
you can read from a file descriptor you're only guaranteed that one  
call to read(2)/recv(2)/... won't block. The implementation of a  
python read method that returns exactly the number of bytes that you  
requested will have to call the read system call in a loop and hence  
might block.

There's also to issue of error handling: what happens when the first  
call to the read system call doesn't return enough data and the  
second call fails? Does this raise an exception  (I suppose it does)  
and if so, what happens with the data that was returned by the first  
call to the read system call?

All in all I'm not too thrilled by having this behaviour. It is handy  
when implementing record-oriented I/O, but not when doing line- 
oriented I/O.

BTW. Has anyone looked at the consequences of the new iostack and  
sock2 for libraries like Twisted?

Ronald

From ncoghlan at gmail.com  Tue Jun  6 12:45:17 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 06 Jun 2006 20:45:17 +1000
Subject: [Python-3000] String formatting: Conversion specifiers
In-Reply-To: <44853482.3050103@acm.org>
References: <44853482.3050103@acm.org>
Message-ID: <44855CBD.3010507@gmail.com>

Talin wrote:
> So I decided to sit down and rethink the whole conversion specifier 
> system. I looked at the docs for the '%' operator, and some other 
> languages, and here is what I came up with (this is an excerpt from the 
> revised PEP.)

Generally nice, but I'd format the writeup a bit differently (see below) and 
reorder the elements so that an arbitrary character can be supplied as the 
fill character and the old ' ' sign flag behaviour remains available.

I'd also design it so that the standard conversion specifiers are available 
'for free' (i.e., they work for any class, unless the class author 
deliberately replaces them with something else).

Cheers,
Nick.

--------------------------------

Standard Conversion Specifiers

      If an object does not define its own conversion specifiers, a standard
      set of conversion specifiers are used. These are similar in concept to
      the conversion specifiers used by the existing '%' operator, however
      there are also a number of significant differences. The standard
      conversion specifiers fall into three major categories: string
      conversions, integer conversions and floating point conversions.

      The general form of a string conversion specifier is:

          [[fill][align]width][type]

      The brackets ([]) indicate an optional field.

      'width' is a decimal integer defining the minimum field width.
      If not specified, then the field width will be determined by
      the content.

      If the minimum field width is defined, then the optional align
      flag can be one of the following:

          '<' - Forces the field to be left-aligned within the available
                space (This is the default.)
          '>' - Forces the field to be right-aligned within the
                available space.

      The optional 'fill' character defines the character to be used to
      pad the field to the minimum width. The alignment flag must be
      supplied if the character is a number other than 0 (otherwise the
      character would be interpreted as part of the field width specifier).

      Finally, the 'type' determines how the data should be presented.

      The available string conversion types are:

          's' - String format. Invokes str() on the object.
                This is the default conversion specifier type.
          'r' - Repr format. Invokes repr() on the object.

      The general form of an integer conversion specifier is:

          [[fill][align]width][sign]type

      The 'fill', 'align' and 'width' fields are as for string conversion
      specifiers.

      The 'sign' field can be one of the following:

          '+'  - indicates that a sign should be used for both
                 positive as well as negative numbers
          '-'  - indicates that a sign should be used only for negative
                 numbers (this is the default behaviour)
          ' '  - indicates that a leading space should be used on
                 positive numbers
          '()' - indicates that negative numbers should be surrounded
                 by parentheses

      There are several integer conversion types. All invoke int() on the
      object before attempting to format it.

      The available integer conversion types are:

          'b' - Binary. Outputs the number in base 2.
          'c' - Character. Converts the integer to the corresponding
                unicode character before printing.
          'd' - Decimal Integer. Outputs the number in base 10.
          'o' - Octal format. Outputs the number in base 8.
          'x' - Hex format. Outputs the number in base 16, using lower-
                case letters for the digits above 9.
          'X' - Hex format. Outputs the number in base 16, using upper-
                case letters for the digits above 9.

      The general form of a floating point conversion specifier is:

          [[fill][align]width][.precision][sign]type

      The 'fill', 'align', 'width' and 'sign' fields are as for
      integer conversion specifiers.

      The 'precision' field is a decimal number indicating how many digits
      should be displayed after the decimal point.

      There are several floating point conversion types. All invoke float() on
      the object before attempting to format it.

      The available floating point conversion types are:

          'e' - Exponent notation. Prints the number in scientific
                notation using the letter 'e' to indicate the exponent.
          'E' - Exponent notation. Same as 'e' except it uses an upper
                case 'E' as the separator character.
          'f' - Fixed point. Displays the number as a fixed-point
                number.
          'F' - Fixed point. Same as 'f'.
          'g' - General format. This prints the number as a fixed-point
                number, unless the number is too large, in which case
                it switches to 'e' exponent notation.
          'G' - General format. Same as 'g' except switches to 'E'
                if the number gets to large.
          'n' - Number. This is the same as 'g', except that it uses the
                current locale setting to insert the appropriate
                number separator characters.
          '%' - Percentage. Multiplies the number by 100 and displays
                in fixed ('f') format, followed by a percent sign.

      Objects are able to define their own conversion specifiers to replace
      the standard ones.  An example is the 'datetime' class, whose
      conversion specifiers might look something like the arguments
      to the strftime() function:

          "Today is: {0:a b d H:M:S Y}".format(datetime.now())

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From qrczak at knm.org.pl  Tue Jun  6 12:49:34 2006
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Tue, 06 Jun 2006 12:49:34 +0200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <4484CD33.10400@canterbury.ac.nz> (Greg Ewing's message of
	"Tue, 06 Jun 2006 12:32:51 +1200")
References: <448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<20060604131031.69CF.JCARLSON@uci.edu>
	<1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
	<4484CD33.10400@canterbury.ac.nz>
Message-ID: <87fyiih83l.fsf@qrnik.zagroda>

Greg Ewing <greg.ewing at canterbury.ac.nz> writes:

> The two variations we're considering would then be special
> cases of this:
>
>    f.read(0, num_bytes)         # current read() behaviour
>
>    f.read(num_bytes, num_bytes) # record-oriented read() behaviour

Current read() reads at least 1 byte.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From mcherm at mcherm.com  Tue Jun  6 14:23:17 2006
From: mcherm at mcherm.com (Michael Chermside)
Date: Tue, 06 Jun 2006 05:23:17 -0700
Subject: [Python-3000] String formatting: Conversion specifiers
Message-ID: <20060606052317.yrbvse4311q80go0@login.werra.lunarpages.com>

Talin writes:
> So I decided to sit down and rethink the whole conversion specifier
> system.

+1: good idea!

Nick Coghlan writes:
> Generally nice, but I'd format the writeup a bit differently (see below) and
> reorder the elements so that an arbitrary character can be supplied as the
> fill character and the old ' ' sign flag behaviour remains available.

+1: nice tweak

-- Michael Chermside

From greg.ewing at canterbury.ac.nz  Tue Jun  6 15:20:02 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Jun 2006 01:20:02 +1200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <e62mm4$1ba$1@sea.gmane.org>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<44836417.5080209@canterbury.ac.nz>
	<1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com>
	<4484CBA9.6070507@canterbury.ac.nz> <e62mm4$1ba$1@sea.gmane.org>
Message-ID: <44858102.30306@canterbury.ac.nz>

Jason Lunz wrote:
> greg.ewing at canterbury.ac.nz said:
> 
> > Sort of like being able to define signal handlers
> > for file descriptors instead of having a small, fixed number of
> > signals.)
> 
> That's supported on linux, but I don't
> know how portable it is. See F_SETSIG in fcntl(2), and sigaction(2).

According to the man page, it's Linux-specific.

It's not quite the same thing, anyway. What I had in
mind was attaching the handler itself directly to the
file descriptor, rather than going through a signal
number. That way, different piece of code can use the
mechanism independently on different file descriptors
without having to coordinate over sharing a signal
handler.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Jun  6 15:27:44 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Jun 2006 01:27:44 +1200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <F68D4BBC-92BD-4323-A544-20B2F42FAC47@mac.com>
References: <448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<20060604131031.69CF.JCARLSON@uci.edu>
	<1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
	<4484CD33.10400@canterbury.ac.nz> <44855024.1010408@gmail.com>
	<F68D4BBC-92BD-4323-A544-20B2F42FAC47@mac.com>
Message-ID: <448582D0.7080506@canterbury.ac.nz>

Ronald Oussoren wrote:

> I'm slighly worried about this thread. Async I/O and "read exactly N  
> bytes" don't really match up. I don't know about the other  mechanisms, 
> but at least with select and poll when the system says  you can read 
> from a file descriptor you're only guaranteed that one  call to 
> read(2)/recv(2)/... won't block. The implementation of a  python read 
> method that returns exactly the number of bytes that you  requested will 
> have to call the read system call in a loop and hence  might block.

This is one case where the callback model of async i/o may
help. If there were a way to say "don't call me until you've
got n bytes ready", the descriptor could become ready
multiple times and multiple reads performed behind the
scenes, then when enough bytes are there, your callback
is called.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Jun  6 15:30:55 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Jun 2006 01:30:55 +1200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <87fyiih83l.fsf@qrnik.zagroda>
References: <448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<20060604131031.69CF.JCARLSON@uci.edu>
	<1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com>
	<4484CD33.10400@canterbury.ac.nz> <87fyiih83l.fsf@qrnik.zagroda>
Message-ID: <4485838F.50407@canterbury.ac.nz>

Marcin 'Qrczak' Kowalczyk wrote:
> Greg Ewing <greg.ewing at canterbury.ac.nz> writes:

>>      f.read(0, num_bytes)         # current read() behaviour
> 
> Current read() reads at least 1 byte.

Except if EOF is reached before getting any bytes.
In that case, if min_bytes is 0, the call simply
returns 0 bytes. If min_bytes is greater than
0, it raises EOFError.

--
Greg

From lunz at falooley.org  Tue Jun  6 16:58:55 2006
From: lunz at falooley.org (Jason Lunz)
Date: Tue, 6 Jun 2006 10:58:55 -0400
Subject: [Python-3000] iostack and sock2
In-Reply-To: <44858102.30306@canterbury.ac.nz>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<44836417.5080209@canterbury.ac.nz>
	<1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com>
	<4484CBA9.6070507@canterbury.ac.nz>
	<e62mm4$1ba$1@sea.gmane.org> <44858102.30306@canterbury.ac.nz>
Message-ID: <20060606145855.GC14823@knob.reflex>

On Wed, Jun 07, 2006 at 01:20:02AM +1200, Greg Ewing wrote:
> It's not quite the same thing, anyway. What I had in mind was
> attaching the handler itself directly to the file descriptor, rather
> than going through a signal number. That way, different piece of code
> can use the mechanism independently on different file descriptors
> without having to coordinate over sharing a signal handler.

I imagine if one were going to do this, that would be hidden in the
stdlib. The OS gives you the primitive needed to implement it, but yes,
there would have to be some infrastructure attached to the signal
handler to look up the fd on each SIGIO and multiplex the event out to
that fd's registered handler.

>From the point of view of the code attaching the handler to the fd, that
would all be pretty transparent, though.

This all reminds me of something I've been wondering about - how do
people feel about beefing up the stdlib's support for os primitives?
There are plenty of things in the os module that are unix-only, but at
the same time I've had to code things myself in C (like file descriptor
passing over a unix socket, for example).

One of the things I like about python is that it doesn't take a
java-like approach of only exposing lowest-common-denominator OS
facilities. os.select() is a good example - it's far more useful on unix
than on Windows because of the platforms' respective implementations.
But precisely because of that, it strikes me that python ought to expose
the windows equivalent of nonblocking i/o and select/poll - i think
that's overlapped i/o? Something like win32all, iow, could have a place
in the standard distribution.

Jason

From janssen at parc.com  Tue Jun  6 18:29:27 2006
From: janssen at parc.com (Bill Janssen)
Date: Tue, 6 Jun 2006 09:29:27 PDT
Subject: [Python-3000] String formatting: Conversion specifiers
In-Reply-To: Your message of "Tue, 06 Jun 2006 00:53:38 PDT."
	<44853482.3050103@acm.org> 
Message-ID: <06Jun6.092931pdt."58641"@synergy1.parc.xerox.com>

> Here is a list of the conversion types that are currently supported by 
> the % operator. First thing you notice is an eerie similarity between 
> this and the documentation for 'sprintf'. :)

Yes.  This is (or was) a significant advantage to the system.  Many
people already had mastered the C/C++ printf system of specifiers, and
could use Python's with no mental upgrades.  Is that no longer thought
to be an advantage?

> So there's no need to tell the system 'this is a float' 
> or 'this is an integer'.

Except that the type specifier can affect the interpretation of the
rest of the format string.  For example, %.3f means to print three
fractional digits.

> The only way I could see this being useful is 
> if you had a type and wanted it to print out as some different type - 
> but is that really the proper role of the string formatter?

Isn't that exactly what the string formatter does?  I've got a binary
value and want to express it as a different type, a string?  Type
punning at the low levels is often a useful debugging tool.

Bill

From jcarlson at uci.edu  Tue Jun  6 18:38:05 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 06 Jun 2006 09:38:05 -0700
Subject: [Python-3000] iostack and sock2
In-Reply-To: <448582D0.7080506@canterbury.ac.nz>
References: <F68D4BBC-92BD-4323-A544-20B2F42FAC47@mac.com>
	<448582D0.7080506@canterbury.ac.nz>
Message-ID: <20060606091601.6A02.JCARLSON@uci.edu>

Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> 
> Ronald Oussoren wrote:
> 
> > I'm slighly worried about this thread. Async I/O and "read exactly N  
> > bytes" don't really match up. I don't know about the other  mechanisms, 
> > but at least with select and poll when the system says  you can read 
> > from a file descriptor you're only guaranteed that one  call to 
> > read(2)/recv(2)/... won't block. The implementation of a  python read 
> > method that returns exactly the number of bytes that you  requested will 
> > have to call the read system call in a loop and hence  might block.
> 
> This is one case where the callback model of async i/o may
> help. If there were a way to say "don't call me until you've
> got n bytes ready", the descriptor could become ready
> multiple times and multiple reads performed behind the
> scenes, then when enough bytes are there, your callback
> is called.

class ReadExactly:
    def __init__(self, callback, current_count):
        self.remaining = current_count
        self.callback = callback
        self.buffer = []
    def __call__(self, data):
        while data:
            if len(data) >= self.remaining:
                b, self.buffer = self.buffer, []
                b.append(data[:self.remaining])
                data = data[:self.remaining]
                self.remaining = 0
                self.callback(''.join(b), reader=self)
            else:
                self.buffer.append(data)
                self.remaining -= len(data)
                break

Generally though, it's a bit easier to handle the piecewise reading, etc.,
as part of the async socket class.  The asynchat module uses,
handle_read(), collect_incoming_data(data), and found_terminator(); a
semantic I've borrowed for my own asynchronous socket classes and have
been fairly happy with.  asynchat implements the handle_read() portion,
which knows about line-terminated protocols (pop, smtp, http, ...) as
well as protocols using the 'read X bytes semantic', where X can be
fixed or variable.  (read 4 bytes, decode, read X bytes, decode, read 4...)

If the new asynchronous class had some equivalent functionality, and
some reasonable set of default behavior (overridable via subclass or
flags), you could get arbitrarily desired behavior; from "call this
thing whenever you get data", to "call this thing when you have gotten X
bytes", to "call this thing when you have found this ''line'' terminator
X", etc.

 - Josiah

From tomerfiliba at gmail.com  Tue Jun  6 19:43:15 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Tue, 6 Jun 2006 19:43:15 +0200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <009301c68943$fc37fe60$3db72997@bagio>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<016c01c688c9$f5b212d0$bf03030a@trilan>
	<1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com>
	<009301c68943$fc37fe60$3db72997@bagio>
Message-ID: <1d85506f0606061043wef3f0d7jd25cb11a5f3123ef@mail.gmail.com>

WaitForMultipleObjects doesnt work on sockets of files...

On 6/6/06, Giovanni Bajo <rasky at develer.com> wrote:
> tomer filiba <tomerfiliba at gmail.com> wrote:
>
> >> One thing I would like to raise is the issue of KeyboardInterrupt. I
> >> find very inconvenient that a normal application doing a very simple
> >> blocking read from a socket can't be interrupted by a CTRL+C
> >> sequence. Usually, what I do is to setup a timeout on the sockets
> >> (eg. 0.4 seconds) and then simply retry if the data has not arrived
> >> yet. But this changes the code from:
> >
> > from my experience with linux and solaris, this CTRL+C problem only
> > happens on windows machines. but then again, windows can't select()
> > on anything but sockets, so there's not gonna be a generic solution.
>
> Windows has WaitForMultipleObjects() which can be used to multiplex between
> sockets and other handles.
>
> Giovanni Bajo
>
>

From talin at acm.org  Tue Jun  6 20:07:09 2006
From: talin at acm.org (Talin)
Date: Tue, 06 Jun 2006 11:07:09 -0700
Subject: [Python-3000] String formatting: Conversion specifiers
In-Reply-To: <44855CBD.3010507@gmail.com>
References: <44853482.3050103@acm.org> <44855CBD.3010507@gmail.com>
Message-ID: <4485C44D.70007@acm.org>

Nick Coghlan wrote:
> Talin wrote:
> 
>> So I decided to sit down and rethink the whole conversion specifier 
>> system. I looked at the docs for the '%' operator, and some other 
>> languages, and here is what I came up with (this is an excerpt from 
>> the revised PEP.)
> 
> 
> Generally nice, but I'd format the writeup a bit differently (see below) 
> and reorder the elements so that an arbitrary character can be supplied 
> as the fill character and the old ' ' sign flag behaviour remains 
> available.

Looks good - thanks for the feedback.

My only comment is that I think that I would still like to have the sign 
field before the width. I'm pretty sure that this can be parsed 
unambiguously.

> I'd also design it so that the standard conversion specifiers are 
> available 'for free' (i.e., they work for any class, unless the class 
> author deliberately replaces them with something else).
> 
> Cheers,
> Nick.
> 
> --------------------------------
> 
> Standard Conversion Specifiers
> 
>      If an object does not define its own conversion specifiers, a standard
>      set of conversion specifiers are used. These are similar in concept to
>      the conversion specifiers used by the existing '%' operator, however
>      there are also a number of significant differences. The standard
>      conversion specifiers fall into three major categories: string
>      conversions, integer conversions and floating point conversions.
> 
>      The general form of a string conversion specifier is:
> 
>          [[fill][align]width][type]
> 
>      The brackets ([]) indicate an optional field.
> 
>      'width' is a decimal integer defining the minimum field width.
>      If not specified, then the field width will be determined by
>      the content.
> 
>      If the minimum field width is defined, then the optional align
>      flag can be one of the following:
> 
>          '<' - Forces the field to be left-aligned within the available
>                space (This is the default.)
>          '>' - Forces the field to be right-aligned within the
>                available space.
> 
>      The optional 'fill' character defines the character to be used to
>      pad the field to the minimum width. The alignment flag must be
>      supplied if the character is a number other than 0 (otherwise the
>      character would be interpreted as part of the field width specifier).
> 
>      Finally, the 'type' determines how the data should be presented.
> 
>      The available string conversion types are:
> 
>          's' - String format. Invokes str() on the object.
>                This is the default conversion specifier type.
>          'r' - Repr format. Invokes repr() on the object.
> 
> 
>      The general form of an integer conversion specifier is:
> 
>          [[fill][align]width][sign]type
> 
>      The 'fill', 'align' and 'width' fields are as for string conversion
>      specifiers.
> 
>      The 'sign' field can be one of the following:
> 
>          '+'  - indicates that a sign should be used for both
>                 positive as well as negative numbers
>          '-'  - indicates that a sign should be used only for negative
>                 numbers (this is the default behaviour)
>          ' '  - indicates that a leading space should be used on
>                 positive numbers
>          '()' - indicates that negative numbers should be surrounded
>                 by parentheses
> 
>      There are several integer conversion types. All invoke int() on the
>      object before attempting to format it.
> 
>      The available integer conversion types are:
> 
>          'b' - Binary. Outputs the number in base 2.
>          'c' - Character. Converts the integer to the corresponding
>                unicode character before printing.
>          'd' - Decimal Integer. Outputs the number in base 10.
>          'o' - Octal format. Outputs the number in base 8.
>          'x' - Hex format. Outputs the number in base 16, using lower-
>                case letters for the digits above 9.
>          'X' - Hex format. Outputs the number in base 16, using upper-
>                case letters for the digits above 9.
> 
>      The general form of a floating point conversion specifier is:
> 
>          [[fill][align]width][.precision][sign]type
> 
>      The 'fill', 'align', 'width' and 'sign' fields are as for
>      integer conversion specifiers.
> 
>      The 'precision' field is a decimal number indicating how many digits
>      should be displayed after the decimal point.
> 
>      There are several floating point conversion types. All invoke 
> float() on
>      the object before attempting to format it.
> 
>      The available floating point conversion types are:
> 
>          'e' - Exponent notation. Prints the number in scientific
>                notation using the letter 'e' to indicate the exponent.
>          'E' - Exponent notation. Same as 'e' except it uses an upper
>                case 'E' as the separator character.
>          'f' - Fixed point. Displays the number as a fixed-point
>                number.
>          'F' - Fixed point. Same as 'f'.
>          'g' - General format. This prints the number as a fixed-point
>                number, unless the number is too large, in which case
>                it switches to 'e' exponent notation.
>          'G' - General format. Same as 'g' except switches to 'E'
>                if the number gets to large.
>          'n' - Number. This is the same as 'g', except that it uses the
>                current locale setting to insert the appropriate
>                number separator characters.
>          '%' - Percentage. Multiplies the number by 100 and displays
>                in fixed ('f') format, followed by a percent sign.
> 
>      Objects are able to define their own conversion specifiers to replace
>      the standard ones.  An example is the 'datetime' class, whose
>      conversion specifiers might look something like the arguments
>      to the strftime() function:
> 
>          "Today is: {0:a b d H:M:S Y}".format(datetime.now())
> 
> 
> 

From rasky at develer.com  Tue Jun  6 20:15:07 2006
From: rasky at develer.com (Giovanni Bajo)
Date: Tue, 6 Jun 2006 20:15:07 +0200
Subject: [Python-3000] iostack and sock2
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<016c01c688c9$f5b212d0$bf03030a@trilan>
	<1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com>
	<009301c68943$fc37fe60$3db72997@bagio>
	<1d85506f0606061043wef3f0d7jd25cb11a5f3123ef@mail.gmail.com>
Message-ID: <01e401c68995$2480c8b0$3db72997@bagio>

> On 6/6/06, Giovanni Bajo <rasky at develer.com> wrote:
>> tomer filiba <tomerfiliba at gmail.com> wrote:
>>
>>>> One thing I would like to raise is the issue of KeyboardInterrupt.
>>>> I find very inconvenient that a normal application doing a very
>>>> simple blocking read from a socket can't be interrupted by a CTRL+C
>>>> sequence. Usually, what I do is to setup a timeout on the sockets
>>>> (eg. 0.4 seconds) and then simply retry if the data has not arrived
>>>> yet. But this changes the code from:
>>>
>>> from my experience with linux and solaris, this CTRL+C problem only
>>> happens on windows machines. but then again, windows can't select()
>>> on anything but sockets, so there's not gonna be a generic solution.
>>
>> Windows has WaitForMultipleObjects() which can be used to multiplex
>> between sockets and other handles.
>>
> WaitForMultipleObjects doesnt work on sockets of files...

You can use WSAAsyncSelect to activate message notification for socket events,
and then wait with MsgWaitForMultipleObjects.

Qt has a very good portable "reactor" implementation (QEventLoop in Qt3, could
be renamed in Qt4) where you can register various events, including socket
notifications and of course normal window messages. The implementation in Qt4
is GPL so you can have a look (src/corelib/kernel/qeventdispatcher_win.cpp).

It *is* possible to have a single point of event dispatching under Windows too,
and it is even possible to have it wrapped portably as Qt did. This is why I do
expect Python to be able to handle this kind of things. Whatever portable
poll/epoll/kqueue-kind of thing we end up with Py3k, it should use a similar
technique under Windows to make sure normal messages are still processed. You
really don't want to wait on sockets only and ignore Window messages altogether
anyway.

Giovanni Bajo

From tomerfiliba at gmail.com  Tue Jun  6 20:24:47 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Tue, 6 Jun 2006 20:24:47 +0200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <01e401c68995$2480c8b0$3db72997@bagio>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<016c01c688c9$f5b212d0$bf03030a@trilan>
	<1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com>
	<009301c68943$fc37fe60$3db72997@bagio>
	<1d85506f0606061043wef3f0d7jd25cb11a5f3123ef@mail.gmail.com>
	<01e401c68995$2480c8b0$3db72997@bagio>
Message-ID: <1d85506f0606061124t411c5b44l8b067b46812160a4@mail.gmail.com>

> You can use WSAAsyncSelect to activate message notification for socket events,
> and then wait with MsgWaitForMultipleObjects.

i remember reading in the winsock manual that these two methods are slower,
and not suitable for servers.

> It *is* possible to have a single point of event dispatching under Windows too,
> and it is even possible to have it wrapped portably as Qt did. This is why I do
> expect Python to be able to handle this kind of things

i agree, but it needs much more thinking and research, and i will look into it.
but i'm not sure it should be part of the iostack. this kind of sync/async io
needs more thought anyway.

-tomer

On 6/6/06, Giovanni Bajo <rasky at develer.com> wrote:
> > On 6/6/06, Giovanni Bajo <rasky at develer.com> wrote:
> >> tomer filiba <tomerfiliba at gmail.com> wrote:
> >>
> >>>> One thing I would like to raise is the issue of KeyboardInterrupt.
> >>>> I find very inconvenient that a normal application doing a very
> >>>> simple blocking read from a socket can't be interrupted by a CTRL+C
> >>>> sequence. Usually, what I do is to setup a timeout on the sockets
> >>>> (eg. 0.4 seconds) and then simply retry if the data has not arrived
> >>>> yet. But this changes the code from:
> >>>
> >>> from my experience with linux and solaris, this CTRL+C problem only
> >>> happens on windows machines. but then again, windows can't select()
> >>> on anything but sockets, so there's not gonna be a generic solution.
> >>
> >> Windows has WaitForMultipleObjects() which can be used to multiplex
> >> between sockets and other handles.
> >>
> > WaitForMultipleObjects doesnt work on sockets of files...
>
> You can use WSAAsyncSelect to activate message notification for socket events,
> and then wait with MsgWaitForMultipleObjects.
>
> Qt has a very good portable "reactor" implementation (QEventLoop in Qt3, could
> be renamed in Qt4) where you can register various events, including socket
> notifications and of course normal window messages. The implementation in Qt4
> is GPL so you can have a look (src/corelib/kernel/qeventdispatcher_win.cpp).
>
> It *is* possible to have a single point of event dispatching under Windows too,
> and it is even possible to have it wrapped portably as Qt did. This is why I do
> expect Python to be able to handle this kind of things. Whatever portable
> poll/epoll/kqueue-kind of thing we end up with Py3k, it should use a similar
> technique under Windows to make sure normal messages are still processed. You
> really don't want to wait on sockets only and ignore Window messages altogether
> anyway.
>
> Giovanni Bajo
>
>

From rasky at develer.com  Tue Jun  6 20:30:01 2006
From: rasky at develer.com (Giovanni Bajo)
Date: Tue, 6 Jun 2006 20:30:01 +0200
Subject: [Python-3000] iostack and sock2
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<016c01c688c9$f5b212d0$bf03030a@trilan>
	<1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com>
	<009301c68943$fc37fe60$3db72997@bagio>
	<1d85506f0606061043wef3f0d7jd25cb11a5f3123ef@mail.gmail.com>
	<01e401c68995$2480c8b0$3db72997@bagio>
	<1d85506f0606061124t411c5b44l8b067b46812160a4@mail.gmail.com>
Message-ID: <01f001c68997$3935ec20$3db72997@bagio>

tomer filiba <tomerfiliba at gmail.com> wrote:

>> You can use WSAAsyncSelect to activate message notification for
>> socket events, and then wait with MsgWaitForMultipleObjects.
>
> i remember reading in the winsock manual that these two methods are
> slower, and not suitable for servers.

Might be FUD or outdated: have a link? Anyway, you can't have a long-running
process which does not poll messages under Windows (the task is immediately
marked as "not responding"), so I'm not sure what you compare it to, when you
say "slower". What is the other "faster" method? WSAAsyncEvent? The only other
way to go I can think of is multithreading (that is exactly how I do write
servers in Python nowadays).

>> It *is* possible to have a single point of event dispatching under
>> Windows too, and it is even possible to have it wrapped portably as
>> Qt did. This is why I do expect Python to be able to handle this
>> kind of things
>
> i agree, but it needs much more thinking and research, and i will
> look into it. but i'm not sure it should be part of the iostack. this
> kind of sync/async io needs more thought anyway.

Agreed. Thanks!

Giovanni Bajo

From jcarlson at uci.edu  Tue Jun  6 21:46:49 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 06 Jun 2006 12:46:49 -0700
Subject: [Python-3000] iostack and sock2
In-Reply-To: <01f001c68997$3935ec20$3db72997@bagio>
References: <1d85506f0606061124t411c5b44l8b067b46812160a4@mail.gmail.com>
	<01f001c68997$3935ec20$3db72997@bagio>
Message-ID: <20060606124525.6A0E.JCARLSON@uci.edu>

"Giovanni Bajo" <rasky at develer.com> wrote:
> tomer filiba <tomerfiliba at gmail.com> wrote:
> 
> >> You can use WSAAsyncSelect to activate message notification for
> >> socket events, and then wait with MsgWaitForMultipleObjects.
> >
> > i remember reading in the winsock manual that these two methods are
> > slower, and not suitable for servers.
> 
> Might be FUD or outdated: have a link? Anyway, you can't have a long-running
> process which does not poll messages under Windows (the task is immediately
> marked as "not responding"), so I'm not sure what you compare it to, when you
> say "slower". What is the other "faster" method? WSAAsyncEvent? The only other
> way to go I can think of is multithreading (that is exactly how I do write
> servers in Python nowadays).

If I've read the reports correctly, WSA* is technically limited to 512
file handles, and practically limited to fewer (you start getting a
particular kind of exception).  As a suggested replacement, they offer
IO Completion Ports.

 - Josiah

From tomerfiliba at gmail.com  Tue Jun  6 22:14:58 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Tue, 6 Jun 2006 22:14:58 +0200
Subject: [Python-3000] iostack, continued
Message-ID: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>

the old thread was getting too nested, so i made a summary
of the key points raised during that discussion:

http://sebulba.wikispaces.com/project+iostack+todo

is there anything else i missed? any more comments to add
to the summary?

i'll have time to incorporate part of these issues on the weekend,
not before. i'll also update the design document on my site accordingly.

thanks for all the comments so far, they have already proved
very helpful and furtile.

-tomer

From tjreedy at udel.edu  Wed Jun  7 00:20:39 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 6 Jun 2006 18:20:39 -0400
Subject: [Python-3000] iostack, continued
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>
Message-ID: <e64v3n$tpv$1@sea.gmane.org>

"tomer filiba" <tomerfiliba at gmail.com> wrote in message 
news:1d85506f0606061314x615f07e0g748dbdba6ef97aae at mail.gmail.com...
> thanks for all the comments so far, they have already proved
> very helpful and furtile.

I think you meant fertile (as opposed to futile ;-)

From rasky at develer.com  Wed Jun  7 02:31:07 2006
From: rasky at develer.com (Giovanni Bajo)
Date: Wed, 7 Jun 2006 02:31:07 +0200
Subject: [Python-3000] iostack, continued
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>
Message-ID: <028301c689c9$aac61d60$3db72997@bagio>

tomer filiba <tomerfiliba at gmail.com> wrote:

> the old thread was getting too nested, so i made a summary
> of the key points raised during that discussion:
>
> http://sebulba.wikispaces.com/project+iostack+todo
>
> is there anything else i missed? any more comments to add
> to the summary?

About this part: "properties raising IOError", I would like to remember that
Guido pronounced on The Way properties should be used in Py3k. It should be
already written as part of a PEP 3000+ (don't remember way). Part of the
pronouncement was that reading/writing properties should never have
side-effects. I guess this kills the argument on "position" being a property?

Giovanni Bajo

From rasky at develer.com  Wed Jun  7 02:32:37 2006
From: rasky at develer.com (Giovanni Bajo)
Date: Wed, 7 Jun 2006 02:32:37 +0200
Subject: [Python-3000] iostack, continued
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>
Message-ID: <028701c689c9$e070f9d0$3db72997@bagio>

tomer filiba <tomerfiliba at gmail.com> wrote:

> the old thread was getting too nested, so i made a summary
> of the key points raised during that discussion:
>
> http://sebulba.wikispaces.com/project+iostack+todo
>
> is there anything else i missed? any more comments to add
> to the summary?

About this:

> Streams be line-iterable (like today's file)? But what does a line mean to a
binary file? Only text-files have a notion of lines.

Maybe iterating records, at least for those wrapped streams with the notion of
records.

Giovanni Bajo

From greg.ewing at canterbury.ac.nz  Wed Jun  7 02:45:20 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Jun 2006 12:45:20 +1200
Subject: [Python-3000] iostack and sock2
In-Reply-To: <20060606145855.GC14823@knob.reflex>
References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com>
	<448258F3.3070808@gmail.com>
	<1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com>
	<44836417.5080209@canterbury.ac.nz>
	<1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com>
	<4484CBA9.6070507@canterbury.ac.nz> <e62mm4$1ba$1@sea.gmane.org>
	<44858102.30306@canterbury.ac.nz> <20060606145855.GC14823@knob.reflex>
Message-ID: <448621A0.7050900@canterbury.ac.nz>

Jason Lunz wrote:

> I imagine if one were going to do this, that would be hidden in the
> stdlib.

Having it in libc would be okay. The important thing
is that the implementation should allow your handlers
to get called even if some library call is blocked
and not being cooperative.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Jun  7 03:05:10 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Jun 2006 13:05:10 +1200
Subject: [Python-3000] iostack, continued
In-Reply-To: <028301c689c9$aac61d60$3db72997@bagio>
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>
	<028301c689c9$aac61d60$3db72997@bagio>
Message-ID: <44862646.7020403@canterbury.ac.nz>

Giovanni Bajo wrote:

> About this part: "properties raising IOError", I would like to remember that
> Guido pronounced on The Way properties should be used in Py3k.  Part of the
> pronouncement was that reading/writing properties should never have
> side-effects.

That's meaningless without a definition of what counts as a
"side effect". Writing to a property must have *some* effect
on the state of something, otherwise it's pointless.

I'm guessing he meant it shouldn't affect the state of anything
outside that object. But then we need to decide what counts
as part of the state of a file object. Does it include the
value of the file position of the underlying file descriptor?
If it does, then file.position = foo is a legitimate usage
of a property.

--
Greg

From jcarlson at uci.edu  Wed Jun  7 06:41:03 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 06 Jun 2006 21:41:03 -0700
Subject: [Python-3000] iostack, continued
In-Reply-To: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>
Message-ID: <20060606213547.6A27.JCARLSON@uci.edu>

"tomer filiba" <tomerfiliba at gmail.com> wrote:
> 
> the old thread was getting too nested, so i made a summary
> of the key points raised during that discussion:
> 
> http://sebulba.wikispaces.com/project+iostack+todo
> 
> is there anything else i missed? any more comments to add
> to the summary?

* """But then there are other streams where you want to call two
*different* .close() methods, and the above would only allow for 1.
Closing multiple times shouldn't be a problem for most streams, but not
closing enough could be a problem."""

    * """hrrm... what do you mean by "closing multiple times"? like
socket.shutdown for reading or for writing? but other than sockets, what
else can be closed in multiple ways? you can't close the "reading" of a
file, while keeping it open for writing.

That's not what I meant.  What I meant was that most streams don't care
if you close them twice.  That is a = file(...);a.close();a.close() is
OK.  However, not all streams are robust against *not* closing.  From my
perspective, having each of the reading and writing stream classes
include their own .close() method is perfectly reasonable, and if they
happen to refer to the same stream (say a file opened in r+ or w+ mode),
then .close()ing it twice is fine.

But if they refer to different files, sockets, what have you (I'm sure
someone will have a use case for these), and you don't .close() one of
the streams, then that could be a problem.

 - Josiah

From talin at acm.org  Wed Jun  7 07:36:49 2006
From: talin at acm.org (Talin)
Date: Tue, 06 Jun 2006 22:36:49 -0700
Subject: [Python-3000] String formatting: Conversion specifiers
In-Reply-To: <44855CBD.3010507@gmail.com>
References: <44853482.3050103@acm.org> <44855CBD.3010507@gmail.com>
Message-ID: <448665F1.5060501@acm.org>

Nick Coghlan wrote:
> Talin wrote:
> 
>> So I decided to sit down and rethink the whole conversion specifier 
>> system. I looked at the docs for the '%' operator, and some other 
>> languages, and here is what I came up with (this is an excerpt from 
>> the revised PEP.)
> 
> 
> Generally nice, but I'd format the writeup a bit differently (see below) 
> and reorder the elements so that an arbitrary character can be supplied 
> as the fill character and the old ' ' sign flag behaviour remains 
> available.
> 
> I'd also design it so that the standard conversion specifiers are 
> available 'for free' (i.e., they work for any class, unless the class 
> author deliberately replaces them with something else).
> 
> Cheers,
> Nick.
> 

I've taken your proposal as a base, and made some additional changes to 
it. In addition, I've gone ahead and implemented a prototype of the 
built-in formatter based on the revised text.

Note that I decided not to have different specifier syntax for each 
different data type - the reason is because I have a single parser that 
parses the conversion specifier, and it always parses precision, sign, 
etc., even if they are not used by that particular format type. So 
instead, it is simply the case that some specifier options aren't use 
for some format types.

Here is the new text for the section:

Standard Conversion Specifiers

     If an object does not define its own conversion specifiers, a
     standard set of conversion specifiers are used.  These are similar
     in concept to the conversion specifiers used by the existing '%'
     operator, however there are also a number of significant
     differences.  The standard conversion specifiers fall into three
     major categories: string conversions, integer conversions and
     floating point conversions.

     The general form of a standard conversion specifier is:

         [[fill]align][sign][width][.precision][type]

     The brackets ([]) indicate an optional field.

     Then the optional align flag can be one of the following:

         '<' - Forces the field to be left-aligned within the available
               space (This is the default.)
         '>' - Forces the field to be right-aligned within the
               available space.
         '=' - Forces the padding to be placed between immediately
               after the sign, if any. This is used for printing fields
               in the form '+000000120'.

     Note that unless a minimum field width is defined, the field
     width will always be the same size as the data to fill it, so
     that the alignment option has no meaning in this case.

     The optional 'fill' character defines the character to be used to
     pad the field to the minimum width.  The alignment flag must be
     supplied if the character is a number other than 0 (otherwise the
     character would be interpreted as part of the field width
     specifier). A '0' fill character without an alignment flag
     implies an alignment type of '='.

     The 'sign' field can be one of the following:

         '+'  - indicates that a sign should be used for both
                positive as well as negative numbers
         '-'  - indicates that a sign should be used only for negative
                numbers (this is the default behaviour)
         ' '  - indicates that a leading space should be used on
                positive numbers
         '()' - indicates that negative numbers should be surrounded
                by parentheses

     'width' is a decimal integer defining the minimum field width. If
     not specified, then the field width will be determined by the
     content.

     The 'precision' field is a decimal number indicating how many
     digits should be displayed after the decimal point.

     Finally, the 'type' determines how the data should be presented.
     If the type field is absent, an appropriate type will be assigned
     based on the value to be formatted ('d' for integers and longs,
     'g' for floats, and 's' for everything else.)

     The available string conversion types are:

         's' - String format. Invokes str() on the object.
               This is the default conversion specifier type.
         'r' - Repr format. Invokes repr() on the object.

     There are several integer conversion types. All invoke int() on
     the object before attempting to format it.

     The available integer conversion types are:

         'b' - Binary. Outputs the number in base 2.
         'c' - Character. Converts the integer to the corresponding
               unicode character before printing.
         'd' - Decimal Integer. Outputs the number in base 10.
         'o' - Octal format. Outputs the number in base 8.
         'x' - Hex format. Outputs the number in base 16, using lower-
               case letters for the digits above 9.
         'X' - Hex format. Outputs the number in base 16, using upper-
               case letters for the digits above 9.

     There are several floating point conversion types. All invoke
     float() on the object before attempting to format it.

     The available floating point conversion types are:

         'e' - Exponent notation. Prints the number in scientific
               notation using the letter 'e' to indicate the exponent.
         'E' - Exponent notation. Same as 'e' except it uses an upper
               case 'E' as the separator character.
         'f' - Fixed point. Displays the number as a fixed-point
               number.
         'F' - Fixed point. Same as 'f'.
         'g' - General format. This prints the number as a fixed-point
               number, unless the number is too large, in which case
               it switches to 'e' exponent notation.
         'G' - General format. Same as 'g' except switches to 'E'
               if the number gets to large.
         'n' - Number. This is the same as 'g', except that it uses the
               current locale setting to insert the appropriate
               number separator characters.
         '%' - Percentage. Multiplies the number by 100 and displays
               in fixed ('f') format, followed by a percent sign.

     Objects are able to define their own conversion specifiers to
     replace the standard ones.  An example is the 'datetime' class,
     whose conversion specifiers might look something like the
     arguments to the strftime() function:

         "Today is: {0:a b d H:M:S Y}".format(datetime.now())

Finally, I have two questions:

1) Where would be a good place to stick the rough prototype? I don't 
want to post it here, its rather long.

2) I'd like to know if anyone out there wants to take over the task of 
implementing 3102 so that I can focus my attention on 3101. I have 
fairly limited bandwidth at the moment, and 3101 is by far the more 
complex proposal.

-- Talin

From rasky at develer.com  Wed Jun  7 11:37:39 2006
From: rasky at develer.com (Giovanni Bajo)
Date: Wed, 7 Jun 2006 11:37:39 +0200
Subject: [Python-3000] iostack, continued
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>
	<028301c689c9$aac61d60$3db72997@bagio>
	<44862646.7020403@canterbury.ac.nz>
Message-ID: <03b601c68a16$05352380$3db72997@bagio>

Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

>> About this part: "properties raising IOError", I would like to
>> remember that Guido pronounced on The Way properties should be used
>> in Py3k.  Part of the pronouncement was that reading/writing
>> properties should never have side-effects.
>
> That's meaningless without a definition of what counts as a
> "side effect". Writing to a property must have *some* effect
> on the state of something, otherwise it's pointless.
>
> I'm guessing he meant it shouldn't affect the state of anything
> outside that object. But then we need to decide what counts
> as part of the state of a file object. Does it include the
> value of the file position of the underlying file descriptor?
> If it does, then file.position = foo is a legitimate usage
> of a property.

I believe what he meant was that property change should not affect the state of
anything but the *Python*'s object.

Giovanni Bajo

From ncoghlan at gmail.com  Wed Jun  7 14:56:46 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 07 Jun 2006 22:56:46 +1000
Subject: [Python-3000] String formatting: Conversion specifiers
In-Reply-To: <448665F1.5060501@acm.org>
References: <44853482.3050103@acm.org> <44855CBD.3010507@gmail.com>
	<448665F1.5060501@acm.org>
Message-ID: <4486CD0E.6010602@gmail.com>

Talin wrote:
> I've taken your proposal as a base, and made some additional changes to 
> it. In addition, I've gone ahead and implemented a prototype of the 
> built-in formatter based on the revised text.

I like it!

As for somewhere to put the prototype, a patch or RFE tracker item isn't too 
bad for holding a single Python file. We can always figure out a better place 
(such as somewhere in the SVN sandbox) later.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Wed Jun  7 15:21:35 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 07 Jun 2006 23:21:35 +1000
Subject: [Python-3000] String formatting: Conversion specifiers
In-Reply-To: <06Jun6.092931pdt."58641"@synergy1.parc.xerox.com>
References: <06Jun6.092931pdt."58641"@synergy1.parc.xerox.com>
Message-ID: <4486D2DF.3080405@gmail.com>

Bill Janssen wrote:
>> Here is a list of the conversion types that are currently supported by 
>> the % operator. First thing you notice is an eerie similarity between 
>> this and the documentation for 'sprintf'. :)
> 
> Yes.  This is (or was) a significant advantage to the system.  Many
> people already had mastered the C/C++ printf system of specifiers, and
> could use Python's with no mental upgrades.  Is that no longer thought
> to be an advantage?

It's still to be preferred. Talin's latest version is still close enough to 
printf that using printf style formatters will 'do the right thing'. {0:s}, 
{0:.3f}, {0:5d}, {0:+8x} are all equivalent to their printf counterparts %s, 
%.3f, %5d, %+8x. (I thought doing it that way would be ambiguous, but Talin 
was able to figure out a way to preserve the compatibility while still adding 
the features we wanted).

The proposed Py3k version just adds some enhancements:
   - choose an arbitrary fill character
   - choose left or right alignment in the filled field
   - choose to have the sign before or after the field padding
   - choose to use () to denote negative numbers
   - choose to output integers as binary numbers

It also allows a class to override the handling of the formatting string 
entirely (so things like datetime can be first-class citizens in the 
formatting world).

>> So there's no need to tell the system 'this is a float' 
>> or 'this is an integer'.
> 
> Except that the type specifier can affect the interpretation of the
> rest of the format string.  For example, %.3f means to print three
> fractional digits.

It's possible to define the format string independently of the type specifier 
- its just that some of the fields only have an effect when certain type 
specifiers are used (e.g. precision is ignored for string and integer type 
specifiers).

Talin's point is that because Python objects know their own type the 
formatting system can figure out a reasonable default type specifier (f for 
floats, d for integers and s for everything else). This means the whole 
conversion specifier can be made optional, including the type specifier.

>> The only way I could see this being useful is 
>> if you had a type and wanted it to print out as some different type - 
>> but is that really the proper role of the string formatter?
> 
> Isn't that exactly what the string formatter does?  I've got a binary
> value and want to express it as a different type, a string?  Type
> punning at the low levels is often a useful debugging tool.

The latest version of the proposal explicitly states which builtin (str(), 
repr(), int() or float()) will be called before the value is formatted for 
each of the standard type specifiers.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Wed Jun  7 15:32:45 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 07 Jun 2006 23:32:45 +1000
Subject: [Python-3000] iostack, continued
In-Reply-To: <03b601c68a16$05352380$3db72997@bagio>
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>	<028301c689c9$aac61d60$3db72997@bagio>	<44862646.7020403@canterbury.ac.nz>
	<03b601c68a16$05352380$3db72997@bagio>
Message-ID: <4486D57D.20005@gmail.com>

Giovanni Bajo wrote:
> Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> I'm guessing he meant it shouldn't affect the state of anything
>> outside that object. But then we need to decide what counts
>> as part of the state of a file object. Does it include the
>> value of the file position of the underlying file descriptor?
>> If it does, then file.position = foo is a legitimate usage
>> of a property.
> 
> 
> I believe what he meant was that property change should not affect the state of
> anything but the *Python*'s object.

I believe the original context where the question came up was for Path objects 
- Guido (rightly) objected to touching the file system as a side effect of 
accessing the attributes of a conceptual object like a path string.

With a position attribute on actual file IO objects, it should be possible to 
set it up so that the file object only invokes tell() when you try to *change* 
the position. When you simply access the attribute, it will return the answer 
from an internal variable (it needs to do this anyway in order to take 
buffering into account).

And having attribute modification on a file object touch the file system 
really doesn't seem particularly unreasonable.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From tanzer at swing.co.at  Wed Jun  7 15:56:24 2006
From: tanzer at swing.co.at (Christian Tanzer)
Date: Wed, 07 Jun 2006 15:56:24 +0200
Subject: [Python-3000] String formatting: Conversion specifiers
In-Reply-To: Your message of "Wed, 07 Jun 2006 23:21:35 +1000."
	<4486D2DF.3080405@gmail.com>
Message-ID: <E1FnyWH-0000Fb-0c@swing.co.at>

Nick Coghlan <ncoghlan at gmail.com> wrote:

> Bill Janssen wrote:
> >> Here is a list of the conversion types that are currently supported by
> >> the % operator. First thing you notice is an eerie similarity between
> >> this and the documentation for 'sprintf'. :)
> >
> > Yes.  This is (or was) a significant advantage to the system.  Many
> > people already had mastered the C/C++ printf system of specifiers, and
> > could use Python's with no mental upgrades.  Is that no longer thought
> > to be an advantage?
(snip)
> It's possible to define the format string independently of the type specifier
> - its just that some of the fields only have an effect when certain type
> specifiers are used (e.g. precision is ignored for string and integer type
> specifiers).

For strings, it isn't (not by Python at least):

    Python 2.4.2 (#1, May 30 2006, 13:47:24)
    [GCC 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> "%3.3s" % "abcdef"
    'abc'

-- 
Christian Tanzer                                    http://www.c-tanzer.at/

From tomerfiliba at gmail.com  Wed Jun  7 18:54:29 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Wed, 7 Jun 2006 18:54:29 +0200
Subject: [Python-3000] iostack, continued
In-Reply-To: <03b601c68a16$05352380$3db72997@bagio>
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>
	<028301c689c9$aac61d60$3db72997@bagio>
	<44862646.7020403@canterbury.ac.nz>
	<03b601c68a16$05352380$3db72997@bagio>
Message-ID: <1d85506f0606070954q50a7c7bau66fdabfe690cdb7@mail.gmail.com>

> I believe what he meant was that property change should not affect the state of
> anything but the *Python*'s object.

for reference, in sock2 i use properties to change the socket
options of sockets.

instead of doing
if not s.getsockopt(SOL_SOCK, SOCK_REBINDADDR):
    s.setsockopt(SOL_SOCK, SOCK_REBINDADDR, 1)

you can just do
if not s.rebind_addr:
    s.rebind_addr = True

which is much easier (both to maintain and read). these property-
options also take care of platform dependent options (like the
linger struct, which is different between winsock and bsd sockets)

i can't speak for Guido now, but at first, when i proposed this
options-via-properties mechanism, Guido was in favor. he agreed
setsockopt is a highly non-pythonic way of doing things.

besides, the context is different. a path object is not a stream
object. they stand for different things. so you can't generalize
like that -- the decision must be made on a per-case basis

another key issue to consider here is convenience. it's much
more convenient to use .position than .seek and tell. for example:

original_pos = f.position
try:
    ... do something with f
except IOError:
    f.position = original_pos

seek and tell are much more cumbersome. they will remain there,
of course, if only for backwards compatibility.

-tomer

On 6/7/06, Giovanni Bajo <rasky at develer.com> wrote:
> Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
> >> About this part: "properties raising IOError", I would like to
> >> remember that Guido pronounced on The Way properties should be used
> >> in Py3k.  Part of the pronouncement was that reading/writing
> >> properties should never have side-effects.
> >
> > That's meaningless without a definition of what counts as a
> > "side effect". Writing to a property must have *some* effect
> > on the state of something, otherwise it's pointless.
> >
> > I'm guessing he meant it shouldn't affect the state of anything
> > outside that object. But then we need to decide what counts
> > as part of the state of a file object. Does it include the
> > value of the file position of the underlying file descriptor?
> > If it does, then file.position = foo is a legitimate usage
> > of a property.
>
>
> I believe what he meant was that property change should not affect the state of
> anything but the *Python*'s object.
>
> Giovanni Bajo
>
>

From tjreedy at udel.edu  Wed Jun  7 20:17:49 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 7 Jun 2006 14:17:49 -0400
Subject: [Python-3000] iostack, continued
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com><028301c689c9$aac61d60$3db72997@bagio><44862646.7020403@canterbury.ac.nz><03b601c68a16$05352380$3db72997@bagio>
	<1d85506f0606070954q50a7c7bau66fdabfe690cdb7@mail.gmail.com>
Message-ID: <e6758d$8v0$1@sea.gmane.org>

"tomer filiba" <tomerfiliba at gmail.com> wrote in message 
news:1d85506f0606070954q50a7c7bau66fdabfe690cdb7 at mail.gmail.com...
> instead of doing
> if not s.getsockopt(SOL_SOCK, SOCK_REBINDADDR):
>    s.setsockopt(SOL_SOCK, SOCK_REBINDADDR, 1)
>
> you can just do
> if not s.rebind_addr:
>    s.rebind_addr = True
>
> which is much easier (both to maintain and read). these property-
> options also take care of platform dependent options (like the
> linger struct, which is different between winsock and bsd sockets)

Very nice.  Much more 'pythonic'.

> i can't speak for Guido now, but at first, when i proposed this
> options-via-properties mechanism, Guido was in favor. he agreed
> setsockopt is a highly non-pythonic way of doing things.
>
> besides, the context is different. a path object is not a stream
> object. they stand for different things. so you can't generalize
> like that -- the decision must be made on a per-case basis

I agreed with Guido's pronouncement in its context.  I also don't see it as 
applying to f.position (unless he explicitly says so).  The Python file 
object is supposed to be a fairly direct proxy for the OS'es file object.

> another key issue to consider here is convenience. it's much
> more convenient to use .position than .seek and tell. for example:

So I also like this.

> original_pos = f.position
> try:
>    ... do something with f
> except IOError:
>    f.position = original_pos
>
> seek and tell are much more cumbersome. they will remain there,
> of course, if only for backwards compatibility.

Tell and seek go back half a century to manipulations of serial media like 
magnetic tape;-)  For random access disks, their meaning is somewhat 
virtual or metaphorical rather than actual.  To me, its time to let go of 
them, as much as possible, and use a more modern API.

Terry Jan Reedy

From greg.ewing at canterbury.ac.nz  Thu Jun  8 01:34:05 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 08 Jun 2006 11:34:05 +1200
Subject: [Python-3000] iostack, continued
In-Reply-To: <03b601c68a16$05352380$3db72997@bagio>
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>
	<028301c689c9$aac61d60$3db72997@bagio>
	<44862646.7020403@canterbury.ac.nz>
	<03b601c68a16$05352380$3db72997@bagio>
Message-ID: <4487626D.1080200@canterbury.ac.nz>

Giovanni Bajo wrote:

> I believe what he meant was that property change should not affect the state of
> anything but the *Python*'s object.

Then what counts as part of the Python object? If the
object is wrapping a C struct from some library, is it
okay for a property to change a member of that struct?

--
Greg

From greg.ewing at canterbury.ac.nz  Thu Jun  8 01:42:40 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 08 Jun 2006 11:42:40 +1200
Subject: [Python-3000] iostack, continued
In-Reply-To: <4486D57D.20005@gmail.com>
References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com>
	<028301c689c9$aac61d60$3db72997@bagio>
	<44862646.7020403@canterbury.ac.nz>
	<03b601c68a16$05352380$3db72997@bagio> <4486D57D.20005@gmail.com>
Message-ID: <44876470.7060806@canterbury.ac.nz>

Nick Coghlan wrote:

> With a position attribute on actual file IO objects, it should be 
> possible to set it up so that the file object only invokes tell() when 
> you try to *change* the position. When you simply access the attribute, 
> it will return the answer from an internal variable (it needs to do this 
> anyway in order to take buffering into account).

Be careful -- in Unix it's possible for different file
descriptors to share the same position pointer. For
unbuffered streams at least, this should be reflected
in the relevant properties or whatever is being used.

--
Greg

From greg.ewing at canterbury.ac.nz  Thu Jun  8 12:14:34 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 08 Jun 2006 22:14:34 +1200
Subject: [Python-3000] Assignment decorators, anyone?
Message-ID: <4487F88A.7080307@canterbury.ac.nz>

I think I've come across a use case for @decorators
on assignment statements.

I have a function which is used like this:

   my_property = overridable_property('my_property', "This is my property.")

However, it sucks a bit to have to write the name of
the property twice. I just got bitten by changing the
name of one of my properties and forgetting to change
it in both places.

If decorators could be applied to assignment statements,
I'd be able to write it as something like

@overridable_property
my_property = "This is my property."

(This would require the semantics of assignment
decoration to be defined so that the assigned name
is passed to the decorator function as well as the
value being assigned.)

On the other hand, maybe this is a use case for
the "make" statement that was proposed earlier.

--
Greg

From tomerfiliba at gmail.com  Sat Jun 10 10:55:17 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Sat, 10 Jun 2006 10:55:17 +0200
Subject: [Python-3000] enhanced descriptors
Message-ID: <1d85506f0606100155v4e1c63d1jc66b21806055add4@mail.gmail.com>

disclaimer: i'm not sure this suggestion is feasible to
implement, because of the way descriptors work, but
it's something we should consider adding.
----

as you may remember, in iostack, we said the position
property should act like the following:

f.position = <non-negative int> # absolute seek
f.position = <negative int> # relative-to-end seek
f.position += <int> # relative-to-current seek

so i wrote a data descriptor. implementing the first two is
easy, but the third version is tricky.
doing x.y += z on a data descriptor translates to
x.__set__(y, x.__get__(y) + z)

in my case, it means first tell()ing, adding the offset, and
then seek()ing to the new position. this works, of course,
but it requires two system calls instead of one. what i
wished i had was x.__iadd__(y, z)

so my suggestion is as follows:
data descriptors must define __get__ and __set__. if they
also define one of the inplace-operators (__iadd___, etc),
it will be called instead of first __get__()ing and then
__set__()ing.

however, the inplace operators would have to use a different
signature than the normal operators -- instead of
__iadd__(self, other)
they would be defined as
__iadd__(self, obj, value).

therefore, i suggest adding __set_iadd__, or something in
that spirit, to solve the ambiguity.

for example, my position descriptor would look like:

class PositionDesc(object):
    def __get__(self, obj, cls):
        if obj is None:
            return self
        return obj.tell()

    def __set__(self, obj, value):
        if value >= 0:
            obj.seek(value, "start")
        else:
            obj.seek(value, "end")

    def __set_iadd__(self, obj, value):
         obj.seek(value, "curr")

...

p = f.position      # calls __get__
f.position = 5      # calls __set__
f.position = -5     # calls __set__
f.position += 10    # calls __set_iadd__

now there are two issues:
* is it even possible to implement (without overcomplicating the
descriptors mechanism)?
* is it generally useful?

i can't answer the first question, but it would surely be useful
in iostack; and besides, for symmetry's sake, if x += y calls
x.__iadd__(y), it should be optimized for descriptors as well.
i'd hate having to do two system calls for something that can
be done with one using seek().

-tomer

From greg.ewing at canterbury.ac.nz  Sat Jun 10 13:25:50 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 10 Jun 2006 23:25:50 +1200
Subject: [Python-3000] enhanced descriptors
In-Reply-To: <1d85506f0606100155v4e1c63d1jc66b21806055add4@mail.gmail.com>
References: <1d85506f0606100155v4e1c63d1jc66b21806055add4@mail.gmail.com>
Message-ID: <448AAC3E.7020001@canterbury.ac.nz>

tomer filiba wrote:

> so my suggestion is as follows:
> data descriptors must define __get__ and __set__. if they
> also define one of the inplace-operators (__iadd___, etc),
> it will be called instead of first __get__()ing and then
> __set__()ing.
> 
> however, the inplace operators would have to use a different
> signature than the normal operators -- instead of
> __iadd__(self, other)
> they would be defined as
> __iadd__(self, obj, value).

This could be done, although it would require some large
changes to the way things work. Currently the attribute
access and inplace operation are done by separate bytecodes,
so by the time the += gets processed, the whole descriptor
business is finished with.

What would be needed is to combine the attribute access
and += operator into a single "add to attribute" operation.
So there would be an ADD_TO_ATTRIBUTE bytecode, and a
corresponding __iaddattr__ method or some such implementing
it.

Then of course you'd want corresponding methods for all
the other inplace operators applied to attributes. And
probably a third set for obj[index] += value etc.

That's getting to be a ridiculously large set of methods.
It could be cut down considerably by having just one
in-place method of each kind, parameterised by a code
indicating the arithmetic operation (like __richcmp__):

    Syntax                Method
    obj.attr OP= value    obj.__iattr__(op, value)
    obj[index] OP= value  obj.__iitem__(op, value)

It might be worth writing a PEP about this.

Getting back to the problem at hand, there's another way
it might be handled using current Python. Instead of a
normal int, the position descriptor could return an
instance of an int subclass with an __iadd__ method that
manipulates the file position.

There's one further problem with all of this, though.
Afterwards, the result of the += is going get assigned
back to the position property. If you want to avoid
making another redundant system call, you'll somehow
have to detect when the value being assigned is the
result of a += and ignore it.

--
Greg

From tomerfiliba at gmail.com  Sat Jun 10 16:43:53 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Sat, 10 Jun 2006 16:43:53 +0200
Subject: [Python-3000] enhanced descriptors
In-Reply-To: <448AAC3E.7020001@canterbury.ac.nz>
References: <1d85506f0606100155v4e1c63d1jc66b21806055add4@mail.gmail.com>
	<448AAC3E.7020001@canterbury.ac.nz>
Message-ID: <1d85506f0606100743g5cb7a803yb9674a2058141cac@mail.gmail.com>

well, adding bytecodes is out-of-the-question for me.

i did think of doing a position-proxy class, but it has lots of
drawbacks as well:
* lots of methods to implement (to make it look like an int)

* lazy evaluation -- should only perform tell() when requested,
not before. for example, calling __repr__ or __add__ would have
to tell(), while __iadd__ would not... nasty code

* it would be slower: adding logic to __set__, and an int-like
object (never as fast as a real int), etc.

* and, worst of all, it would have unavoidable undesired
behavior:

desired behavior:
    f.position += 2

undesired behavior:
    p = f.position
    p += 2     # this would seek()!!!

any good solution would require lots of magic, so i guess
i'm just gonna pull off the += optimization. two system calls
are not worth writing such an ugly code.

the solution must come from "enhanced descriptors".

- - - - - - - - -

> What would be needed is to combine the attribute access
> and += operator into a single "add to attribute" operation.
> So there would be an ADD_TO_ATTRIBUTE bytecode, and a
> corresponding __iaddattr__ method or some such implementing
> it.

i'm afraid that's not possible, because the compiler can't tell
that x.y+=z is a descriptor assignment.

> Then of course you'd want corresponding methods for all
> the other inplace operators applied to attributes. And
> probably a third set for obj[index] += value etc.

no, i don't think so. indexing should first __get__ the object,
and then index it. these are two separate operations. only the
inplace operators should be optimized into one function.

- - - - - - - - -

> It might be worth writing a PEP about this.

well, you asked for it :)

preliminary pep: STORE_INPLACE_ATTR

today, x.y += z is translated to
x.__setattr__("y",  x.__getattr__("y") +/+=  z)
depending on y (if it supports __iadd__ or only __add__)

the proposed change is to replace this scheme by
__setiattr__ - set inplace attr. it takes three arguments:
name, operation, and value. it is invoked by the new
bytecode instruction: STORE_INPLACE_ATTR.

the new instruction's layout looks like this:
TOS+2: value
TOS+1: operation code (1=add, 2=sub, 3=mul, ...)
TOS: object
STORE_INPLACE_ATTR <nameindex>

the need of this new special method is to optimize the
inplace operators, for both normal attributes and descriptors.

examples:
for normal assignment, the normal behavior is retained
x.y = 5 ==> x.__setattr__("y", 5)

for augmented assignment, the inplace version (__setiattr__)
is used instead:
x.y += 5  ==> x.__setiattr__("y", operator.add, 5)

the STORE_INPLACE_ATTR instruction would convert the
operation code into the corresponding function from the
`operator` module, to make __setiattr__ simpler.

descriptors:
the descriptor protocol is also extended with the __iset__
method -- inplace __set__.
if the attribute is a data descriptor, __setiattr__ will try
to call __iset__; if it does not exist, it would default to
__get__ and then __set__.

sketch implementation:

def __setiattr__(self, name, op, value):
    attr = getattr(self, name)

    # descriptors
    if hasattr(attr, "__iset__"):
        attr.__iset__(self, op, value)
        return

    if hasattr(attr, "__set__"):
        result = op(attr.__get__(self, self.__class__), value)
        attr.__set__(result)
        return

    # normal attributes
    inplace_op_name = "__i%s__" % (op.name,) # ugly!!
    if hasattr(attr, inplace_op_name):
        getattr(atttr, inplace_op_name)(value)
    else:
        setattr(self, name, op(attr, value))

issues:
should it be just one special method (__setiattr__) or a
method per-operation (__setiadd__, __setisub__)?
multiple methods mean a each method is simpler, but also
cause code duplication. and lots of new method slots.

notes:
if the STORE_INPLACE_ATTR instruction does not find
__setiattr__, it can always default to __setattr__, the same
way it's done today.

-tomer

On 6/10/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> tomer filiba wrote:
>
> > so my suggestion is as follows:
> > data descriptors must define __get__ and __set__. if they
> > also define one of the inplace-operators (__iadd___, etc),
> > it will be called instead of first __get__()ing and then
> > __set__()ing.
> >
> > however, the inplace operators would have to use a different
> > signature than the normal operators -- instead of
> > __iadd__(self, other)
> > they would be defined as
> > __iadd__(self, obj, value).
>
> This could be done, although it would require some large
> changes to the way things work. Currently the attribute
> access and inplace operation are done by separate bytecodes,
> so by the time the += gets processed, the whole descriptor
> business is finished with.
>
> What would be needed is to combine the attribute access
> and += operator into a single "add to attribute" operation.
> So there would be an ADD_TO_ATTRIBUTE bytecode, and a
> corresponding __iaddattr__ method or some such implementing
> it.
>
> Then of course you'd want corresponding methods for all
> the other inplace operators applied to attributes. And
> probably a third set for obj[index] += value etc.
>
> That's getting to be a ridiculously large set of methods.
> It could be cut down considerably by having just one
> in-place method of each kind, parameterised by a code
> indicating the arithmetic operation (like __richcmp__):
>
>     Syntax                Method
>     obj.attr OP= value    obj.__iattr__(op, value)
>     obj[index] OP= value  obj.__iitem__(op, value)
>
> It might be worth writing a PEP about this.
>
> Getting back to the problem at hand, there's another way
> it might be handled using current Python. Instead of a
> normal int, the position descriptor could return an
> instance of an int subclass with an __iadd__ method that
> manipulates the file position.
>
> There's one further problem with all of this, though.
> Afterwards, the result of the += is going get assigned
> back to the position property. If you want to avoid
> making another redundant system call, you'll somehow
> have to detect when the value being assigned is the
> result of a += and ignore it.
>
> --
> Greg
>

From greg.ewing at canterbury.ac.nz  Sun Jun 11 01:50:09 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 11 Jun 2006 11:50:09 +1200
Subject: [Python-3000] enhanced descriptors
In-Reply-To: <1d85506f0606100743g5cb7a803yb9674a2058141cac@mail.gmail.com>
References: <1d85506f0606100155v4e1c63d1jc66b21806055add4@mail.gmail.com>
	<448AAC3E.7020001@canterbury.ac.nz>
	<1d85506f0606100743g5cb7a803yb9674a2058141cac@mail.gmail.com>
Message-ID: <448B5AB1.8050309@canterbury.ac.nz>

tomer filiba wrote:

> i did think of doing a position-proxy class, but it has lots of
> drawbacks as well:
> * lots of methods to implement (to make it look like an int)

Not if you subclass it from int, and inherit all its
behaviour. The only things you'd need to add are a
reference to the base file object, and __iadd__ and
__isub__ methods.

> * lazy evaluation -- should only perform tell() when requested,
> not before. for example, calling __repr__ or __add__ would have
> to tell(), while __iadd__ would not... nasty code

Yes, I hadn't thought of that. Quite nasty, especially
since code that got the position of a file, did something
else that changed the file position, and *then* used the
position it got before, would get unexpected results.

> any good solution would require lots of magic, so i guess
> i'm just gonna pull off the += optimization. two system calls
> are not worth writing such an ugly code.

Yes, I'm coming to the same conclusion. Without changing
the language, the desired behaviour isn't reasonably
attainable.

> i'm afraid that's not possible, because the compiler can't tell
> that x.y+=z is a descriptor assignment.

It wouldn't operate at the descriptor level, it would
operate on the object itself, i.e. it would call
x.__iattr__('y', '+=', z) or some such.

So in your case you wouldn't use a descriptor for this
part, but give the file object an __iattr__ method.

> no, i don't think so. indexing should first __get__ the object,
> and then index it. these are two separate operations. only the
> inplace operators should be optimized into one function.

I'm talking about using an in-place operator on the
result of an indexing operation, e.g.

   x[i] += y

which is a closely analogous situation. Not needed for
this particular use case -- I'm just thinking ahead.

--
Greg

From talin at acm.org  Sun Jun 11 03:07:05 2006
From: talin at acm.org (Talin)
Date: Sat, 10 Jun 2006 18:07:05 -0700
Subject: [Python-3000] PEP 3101 update
Message-ID: <448B6CB9.9050601@acm.org>

Here's the latest PEP 3101 - I've incorporated changes based on 
suggestions from a lot of folks. This version incorporates:

   -- a detailed specification for conversion type fields
   -- description of error handling behavior
   -- 'strict' vs. 'lenient' error handling flag
   -- compound field names
   -- braces are now escaped using {{ instead of \{

--------------------------------------------------------------------
PEP: 3101
Title: Advanced String Formatting
Version: $Revision: 46845 $
Last-Modified: $Date: 2006-06-10 17:59:06 -0700 (Sat, 10 Jun 2006) $
Author: Talin <talin at acm.org>
Status: Draft
Type: Standards
Content-Type: text/plain
Created: 16-Apr-2006
Python-Version: 3.0
Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006

Abstract

     This PEP proposes a new system for built-in string formatting
     operations, intended as a replacement for the existing '%' string
     formatting operator.

Rationale

     Python currently provides two methods of string interpolation:

     - The '%' operator for strings. [1]

     - The string.Template module. [2]

     The scope of this PEP will be restricted to proposals for built-in
     string formatting operations (in other words, methods of the
     built-in string type).

     The '%' operator is primarily limited by the fact that it is a
     binary operator, and therefore can take at most two arguments.
     One of those arguments is already dedicated to the format string,
     leaving all other variables to be squeezed into the remaining
     argument.  The current practice is to use either a dictionary or a
     tuple as the second argument, but as many people have commented
     [3], this lacks flexibility.  The "all or nothing" approach
     (meaning that one must choose between only positional arguments,
     or only named arguments) is felt to be overly constraining.

     While there is some overlap between this proposal and
     string.Template, it is felt that each serves a distinct need,
     and that one does not obviate the other.  In any case,
     string.Template will not be discussed here.

Specification

     The specification will consist of the following parts:

     - Specification of a new formatting method to be added to the
       built-in string class.

     - Specification of a new syntax for format strings.

     - Specification of a new set of class methods to control the
       formatting and conversion of objects.

     - Specification of an API for user-defined formatting classes.

     - Specification of how formatting errors are handled.

     Note on string encodings: Since this PEP is being targeted
     at Python 3.0, it is assumed that all strings are unicode strings,
     and that the use of the word 'string' in the context of this
     document will generally refer to a Python 3.0 string, which is
     the same as Python 2.x unicode object.

     If it should happen that this functionality is backported to
     the 2.x series, then it will be necessary to handle both regular
     string as well as unicode objects.  All of the function call
     interfaces described in this PEP can be used for both strings
     and unicode objects, and in all cases there is sufficient
     information to be able to properly deduce the output string
     type (in other words, there is no need for two separate APIs).
     In all cases, the type of the template string dominates - that
     is, the result of the conversion will always result in an object
     that contains the same representation of characters as the
     input template string.

String Methods

     The build-in string class will gain a new method, 'format',
     which takes takes an arbitrary number of positional and keyword
     arguments:

         "The story of {0}, {1}, and {c}".format(a, b, c=d)

     Within a format string, each positional argument is identified
     with a number, starting from zero, so in the above example, 'a' is
     argument 0 and 'b' is argument 1.  Each keyword argument is
     identified by its keyword name, so in the above example, 'c' is
     used to refer to the third argument.

Format Strings

     Brace characters ('curly braces') are used to indicate a
     replacement field within the string:

         "My name is {0}".format('Fred')

     The result of this is the string:

         "My name is Fred"

     Braces can be escaped by doubling:

         "My name is {0} :-{{}}".format('Fred')

     Which would produce:

         "My name is Fred :-{}"

     The element within the braces is called a 'field'.  Fields consist
     of a 'field name', which can either be simple or compound, and an
     optional 'conversion specifier'.

Simple and Compound Field Names

     Simple field names are either names or numbers. If numbers, they
     must be valid base-10 integers; if names, they must be valid
     Python identifiers.  A number is used to identify a positional
     argument, while a name is used to identify a keyword argument.

     A compound field name is a combination of multiple simple field
     names in an expression:

         "My name is {0.name}".format(file('out.txt'))

     This example shows the use of the 'getattr' or 'dot' operator
     in a field expression. The dot operator allows an attribute of
     an input value to be specified as the field value.

     The types of expressions that can be used in a compound name
     have been deliberately limited in order to prevent potential
     security exploits resulting from the ability to place arbitrary
     Python expressions inside of strings. Only two operators are
     supported, the '.' (getattr) operator, and the '[]' (getitem)
     operator.

     An example of the 'getitem' syntax:

         "My name is {0[name]}".format(dict(name='Fred'))

     It should be noted that the use of 'getitem' within a string is
     much more limited than its normal use. In the above example, the
     string 'name' really is the literal string 'name', not a variable
     named 'name'. The rules for parsing an item key are the same as
     for parsing a simple name - in other words, if it looks like a
     number, then its treated as a number, if it looks like an
     identifier, then it is used as a string.

     It is not possible to specify arbitrary dictionary keys from
     within a format string.

Conversion Specifiers

     Each field can also specify an optional set of 'conversion
     specifiers' which can be used to adjust the format of that field.
     Conversion specifiers follow the field name, with a colon (':')
     character separating the two:

         "My name is {0:8}".format('Fred')

     The meaning and syntax of the conversion specifiers depends on the
     type of object that is being formatted, however many of the
     built-in types will recognize a standard set of conversion
     specifiers.

     Conversion specifiers can themselves contain replacement fields.
     For example, a field whose field width it itself a parameter
     could be specified via:

         "{0:{1}}".format(a, b, c)

     Note that the doubled '}' at the end, which would normally be
     escaped, is not escaped in this case.  The reason is because
     the '{{' and '}}' syntax for escapes is only applied when used
     *outside* of a format field. Within a format field, the brace
     characters always have their normal meaning.

     The syntax for conversion specifiers is open-ended, since except
     than doing field replacements, the format() method does not
     attempt to interpret them in any way; it merely passes all of the
     characters between the first colon and the matching brace to
     the various underlying formatter methods.

Standard Conversion Specifiers

     If an object does not define its own conversion specifiers, a
     standard set of conversion specifiers are used.  These are similar
     in concept to the conversion specifiers used by the existing '%'
     operator, however there are also a number of significant
     differences.  The standard conversion specifiers fall into three
     major categories: string conversions, integer conversions and
     floating point conversions.

     The general form of a standard conversion specifier is:

         [[fill]align][sign][width][.precision][type]

     The brackets ([]) indicate an optional field.

     Then the optional align flag can be one of the following:

         '<' - Forces the field to be left-aligned within the available
               space (This is the default.)
         '>' - Forces the field to be right-aligned within the
               available space.
         '=' - Forces the padding to be placed between immediately
               after the sign, if any. This is used for printing fields
               in the form '+000000120'.

     Note that unless a minimum field width is defined, the field
     width will always be the same size as the data to fill it, so
     that the alignment option has no meaning in this case.

     The optional 'fill' character defines the character to be used to
     pad the field to the minimum width.  The alignment flag must be
     supplied if the character is a number other than 0 (otherwise the
     character would be interpreted as part of the field width
     specifier). A zero fill character without an alignment flag
     implies an alignment type of '='.

     The 'sign' field can be one of the following:

         '+'  - indicates that a sign should be used for both
                positive as well as negative numbers
         '-'  - indicates that a sign should be used only for negative
                numbers (this is the default behaviour)
         ' '  - indicates that a leading space should be used on
                positive numbers
         '()' - indicates that negative numbers should be surrounded
                by parentheses

     'width' is a decimal integer defining the minimum field width. If
     not specified, then the field width will be determined by the
     content.

     The 'precision' field is a decimal number indicating how many
     digits should be displayed after the decimal point.

     Finally, the 'type' determines how the data should be presented.
     If the type field is absent, an appropriate type will be assigned
     based on the value to be formatted ('d' for integers and longs,
     'g' for floats, and 's' for everything else.)

     The available string conversion types are:

         's' - String format. Invokes str() on the object.
               This is the default conversion specifier type.
         'r' - Repr format. Invokes repr() on the object.

     There are several integer conversion types. All invoke int() on
     the object before attempting to format it.

     The available integer conversion types are:

         'b' - Binary. Outputs the number in base 2.
         'c' - Character. Converts the integer to the corresponding
               unicode character before printing.
         'd' - Decimal Integer. Outputs the number in base 10.
         'o' - Octal format. Outputs the number in base 8.
         'x' - Hex format. Outputs the number in base 16, using lower-
               case letters for the digits above 9.
         'X' - Hex format. Outputs the number in base 16, using upper-
               case letters for the digits above 9.

     There are several floating point conversion types. All invoke
     float() on the object before attempting to format it.

     The available floating point conversion types are:

         'e' - Exponent notation. Prints the number in scientific
               notation using the letter 'e' to indicate the exponent.
         'E' - Exponent notation. Same as 'e' except it uses an upper
               case 'E' as the separator character.
         'f' - Fixed point. Displays the number as a fixed-point
               number.
         'F' - Fixed point. Same as 'f'.
         'g' - General format. This prints the number as a fixed-point
               number, unless the number is too large, in which case
               it switches to 'e' exponent notation.
         'G' - General format. Same as 'g' except switches to 'E'
               if the number gets to large.
         'n' - Number. This is the same as 'g', except that it uses the
               current locale setting to insert the appropriate
               number separator characters.
         '%' - Percentage. Multiplies the number by 100 and displays
               in fixed ('f') format, followed by a percent sign.

     Objects are able to define their own conversion specifiers to
     replace the standard ones.  An example is the 'datetime' class,
     whose conversion specifiers might look something like the
     arguments to the strftime() function:

         "Today is: {0:a b d H:M:S Y}".format(datetime.now())

Controlling Formatting

     A class that wishes to implement a custom interpretation of its
     conversion specifiers can implement a __format__ method:

     class AST:
         def __format__(self, specifiers):
             ...

     The 'specifiers' argument will be either a string object or a
     unicode object, depending on the type of the original format
     string.  The __format__ method should test the type of the
     specifiers parameter to determine whether to return a string or
     unicode object.  It is the responsibility of the __format__ method
     to return an object of the proper type.

     string.format() will format each field using the following steps:

      1) See if the value to be formatted has a __format__ method.  If
         it does, then call it.

      2) Otherwise, check the internal formatter within string.format
         that contains knowledge of certain builtin types.

      3) Otherwise, call str() or unicode() as appropriate.

User-Defined Formatting Classes

     There will be times when customizing the formatting of fields
     on a per-type basis is not enough.  An example might be an
     accounting application, which displays negative numbers in
     parentheses rather than using a negative sign.

     The string formatting system facilitates this kind of application-
     specific formatting by allowing user code to directly invoke
     the code that interprets format strings and fields.  User-written
     code can intercept the normal formatting operations on a per-field
     basis, substituting their own formatting methods.

     For example, in the aforementioned accounting application, there
     could be an application-specific number formatter, which reuses
     the string.format templating code to do most of the work. The
     API for such an application-specific formatter is up to the
     application; here are several possible examples:

         cell_format("The total is: {0}", total)

         TemplateString("The total is: {0}").format(total)

     Creating an application-specific formatter is relatively straight-
     forward.  The string and unicode classes will have a class method
     called 'cformat' that does all the actual work of formatting; The
     built-in format() method is just a wrapper that calls cformat.

     The type signature for the cFormat function is as follows:

         cformat(template, format_hook, args, kwargs)

     The parameters to the cformat function are:

         -- The format template string.
         -- A callable 'format hook', which is called once per field
         -- A tuple containing the positional arguments
         -- A dict containing the keyword arguments

     The cformat function will parse all of the fields in the format
     string, and return a new string (or unicode) with all of the
     fields replaced with their formatted values.

     The format hook is a callable object supplied by the user, which
     is invoked once per field, and which can override the normal
     formatting for that field.  For each field, the cformat function
     will attempt to call the field format hook with the following
     arguments:

        format_hook(value, conversion)

     The 'value' field corresponds to the value being formatted, which
     was retrieved from the arguments using the field name.

     The 'conversion' argument is the conversion spec part of the
     field, which will be either a string or unicode object, depending
     on the type of the original format string.

     The field_hook will be called once per field. The field_hook may
     take one of two actions:

         1) Return a string or unicode object that is the result
            of the formatting operation.

         2) Return None, indicating that the field_hook will not
            process this field and the default formatting should be
            used.  This decision should be based on the type of the
            value object, and the contents of the conversion string.

Error handling

     The string formatting system has two error handling modes, which
     are controlled by the value of a class variable:

        string.strict_format_errors = True

     The 'strict_format_errors' flag defaults to False, or 'lenient'
     mode. Setting it to True enables 'strict' mode. The current mode
     determines how errors are handled, depending on the type of the
     error.

     The types of errors that can occur are:

     1) Reference to a missing or invalid argument from within a
     field specifier. In strict mode, this will raise an exception.
     In lenient mode, this will cause the value of the field to be
     replaced with the string '?name?', where 'name' will be the
     type of error (KeyError, IndexError, or AttributeError).

     So for example:

         >>> string.strict_format_errors = False
         >>> print 'Item 2 of argument 0 is: {0[2]}'.format( [0,1] )
         "Item 2 of argument 0 is: ?IndexError?"

     2) Unused argument. In strict mode, this will raise an exception.
     In lenient mode, this will be ignored.

     3) Exception raised by underlying formatter. These exceptions
     are always passed through, regardless of the current mode.

Alternate Syntax

     Naturally, one of the most contentious issues is the syntax of the
     format strings, and in particular the markup conventions used to
     indicate fields.

     Rather than attempting to exhaustively list all of the various
     proposals, I will cover the ones that are most widely used
     already.

     - Shell variable syntax: $name and $(name) (or in some variants,
       ${name}).  This is probably the oldest convention out there, and
       is used by Perl and many others.  When used without the braces,
       the length of the variable is determined by lexically scanning
       until an invalid character is found.

       This scheme is generally used in cases where interpolation is
       implicit - that is, in environments where any string can contain
       interpolation variables, and no special subsitution function
       need be invoked.  In such cases, it is important to prevent the
       interpolation behavior from occuring accidentally, so the '$'
       (which is otherwise a relatively uncommonly-used character) is
       used to signal when the behavior should occur.

       It is the author's opinion, however, that in cases where the
       formatting is explicitly invoked, that less care needs to be
       taken to prevent accidental interpolation, in which case a
       lighter and less unwieldy syntax can be used.

     - Printf and its cousins ('%'), including variations that add a
       field index, so that fields can be interpolated out of order.

     - Other bracket-only variations.  Various MUDs (Multi-User
       Dungeons) such as MUSH have used brackets (e.g. [name]) to do
       string interpolation.  The Microsoft .Net libraries uses braces
       ({}), and a syntax which is very similar to the one in this
       proposal, although the syntax for conversion specifiers is quite
       different. [4]

     - Backquoting.  This method has the benefit of minimal syntactical
       clutter, however it lacks many of the benefits of a function
       call syntax (such as complex expression arguments, custom
       formatters, etc.).

     - Other variations include Ruby's #{}, PHP's {$name}, and so
       on.

     Some specific aspects of the syntax warrant additional comments:

     1) Backslash character for escapes.  The original version of
     this PEP used backslash rather than doubling to escape a bracket.
     This worked because backslashes in Python string literals that
     don't conform to a standard backslash sequence such as '\n'
     are left unmodified. However, this caused a certain amount
     of confusion, and led to potential situations of multiple
     recursive escapes, i.e. '\\\\{' to place a literal backslash
     in front of a bracket.

     2) The use of the colon character (':') as a separator for
     conversion specifiers.  This was chosen simply because that's
     what .Net uses.

Sample Implementation

     A rough prototype of the underlying 'cformat' function has been
     coded in Python, however it needs much refinement before being
     submitted.

Backwards Compatibility

     Backwards compatibility can be maintained by leaving the existing
     mechanisms in place.  The new system does not collide with any of
     the method names of the existing string formatting techniques, so
     both systems can co-exist until it comes time to deprecate the
     older system.

References

     [1] Python Library Reference - String formating operations
     http://docs.python.org/lib/typesseq-strings.html

     [2] Python Library References - Template strings
     http://docs.python.org/lib/node109.html

     [3] [Python-3000] String formating operations in python 3k
         http://mail.python.org/pipermail/python-3000/2006-April/000285.html

     [4] Composite Formatting - [.Net Framework Developer's Guide]

http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true

Copyright

     This document has been placed in the public domain.

Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

From ncoghlan at gmail.com  Sun Jun 11 07:31:18 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 11 Jun 2006 15:31:18 +1000
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <448B6CB9.9050601@acm.org>
References: <448B6CB9.9050601@acm.org>
Message-ID: <448BAAA6.5060407@gmail.com>

Talin wrote:
> Conversion Specifiers
> 
>      Each field can also specify an optional set of 'conversion
>      specifiers' which can be used to adjust the format of that field.
>      Conversion specifiers follow the field name, with a colon (':')
>      character separating the two:
> 
>          "My name is {0:8}".format('Fred')
> 
>      The meaning and syntax of the conversion specifiers depends on the
>      type of object that is being formatted, however many of the
>      built-in types will recognize a standard set of conversion
>      specifiers.

Given the changes below, this paragraph should now read something like,

      The meaning and syntax of the conversion specifiers depends on the
      type of object that is being formatted, however there is a standard set
      of conversion specifiers used for any object that does not override
      them.

> 
>      Conversion specifiers can themselves contain replacement fields.
>      For example, a field whose field width it itself a parameter
>      could be specified via:

Typo: s/width it itself/width is itself/

>      The syntax for conversion specifiers is open-ended, since except
>      than doing field replacements, the format() method does not
>      attempt to interpret them in any way; it merely passes all of the
>      characters between the first colon and the matching brace to
>      the various underlying formatter methods.

Again, this paragraph has been overtaken by events.

       The syntax for conversion specifiers is open-ended, since a class can
       override the standard conversion specifiers. In such cases, the format()
       method merely passes all of the characters between the first colon and
       the matching brace to the relevant underlying formatting method.

> Standard Conversion Specifiers

It's probably worth avoiding describing the elements of the conversion 
specifier as fields - something neutral like 'element' should do.

>          '=' - Forces the padding to be placed between immediately
>                after the sign, if any. This is used for printing fields
>                in the form '+000000120'.

Typo: s/placed between immediately/placed immediately/

>      The 'precision' field is a decimal number indicating how many
>      digits should be displayed after the decimal point.

Someone pointed out that for string conversions ('s' & 'r'), this field should 
determine how many characters are displayed.

      The 'precision' is a decimal number indicating how many digits should be
      displayed after the decimal point in a floating point conversion. In a
      string conversion the field indicates how many characters will be used
      from the field content. The precision is ignored for integer conversions.

>      There are several integer conversion types. All invoke int() on
>      the object before attempting to format it.

Having another look at existing str-% behaviour, this should instead say:

       There are several integer conversion types. All will raise TypeError
       if the supplied object does not have an __index__ method.

>      There are several floating point conversion types. All invoke
>      float() on the object before attempting to format it.

Similar to integers, this should instead say:

       There are several floating point conversion types. All will raise
       TypeError if the supplied object is not a float or decimal instance.

> Controlling Formatting

I'm becoming less and less satisfied with the idea that to get a string 
version of a float, I do this:

   x = str(val)

But if I want to control the precision, I have to write:

   x = "{0:.3}".format(val)  # Even worse than the current "%.3f" % val!!

Why can't I instead write:

   x = str(val, ".3")

IOW, why don't we change the signature of 'str' to accept a conversion 
specifier as an optional second argument?

Then the interpretation of conversion specifiers in format strings is 
straightforward - the conversion specifier becomes the second argument to str().

Then it would be str() that does the dispatch of the standard conversion 
specifiers as described above if __format__ is not provided.

Here's the description of controlling formatting in that case:

------------------------------------------------
Controlling Formatting
      A class that wishes to implement a custom interpretation of its
      conversion specifier can implement a __format__ method:

      class AST:
          def __format__(self, specifier):
              ...

     str.format() will always format each field by invoking str() with two
     arguments: the value to be formatted and the conversion specifier. If the
     field does not include a conversion specifier then it defaults to None.

     The signature of str() is updated to accept a conversion specifier as the
     second argument (defaulting to None). When the conversion specifier is
     None, the __str__() method of the passed in object is invoked (if present)
     falling back to __repr__() otherwise (aside from using unicode instead of
     8-bit strings, this is unchanged from Python 2.x).

     If the conversion specifier is not None, then the object's __format__()
     method is invoked if present. Otherwise, the standard conversion
     specifiers described above are used.

     This means that where, in Python 2.x, controlling the precision of a
     float's string output required switching from the str() builtin to string
     formatting, Python 3k permits the conversion specifier to be added to the
     call to the builtin.

       x = str(val)         # Unformatted
       x = str(val, '.3')   # Limited to 3 decimal places

     This works for types with custom format specifiers, too:

          today = str(datetime.now(), 'a b d H:M:S Y')

> User-Defined Formatting Classes
> 
>      There will be times when customizing the formatting of fields
>      on a per-type basis is not enough.  An example might be an
>      accounting application, which displays negative numbers in
>      parentheses rather than using a negative sign.

This is now a bad example, because we moved it into the standard conversion 
specifiers :)

>      The format hook is a callable object supplied by the user, which
>      is invoked once per field, and which can override the normal
>      formatting for that field.  For each field, the cformat function
>      will attempt to call the field format hook with the following
>      arguments:
> 
>         format_hook(value, conversion)

With my str() proposal above, the default format hook becomes 'str' itself.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From talin at acm.org  Sun Jun 11 08:40:08 2006
From: talin at acm.org (Talin)
Date: Sat, 10 Jun 2006 23:40:08 -0700
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <448BAAA6.5060407@gmail.com>
References: <448B6CB9.9050601@acm.org> <448BAAA6.5060407@gmail.com>
Message-ID: <448BBAC8.4090403@acm.org>

Nick Coghlan wrote:
>> Conversion Specifiers

By the way, good feedback. I've incorporated most of the text changes 
into the PEP. I'd like to discuss a few of your suggestions in more 
detail before proceeding.

>>      There are several integer conversion types. All invoke int() on
>>      the object before attempting to format it.
> 
> 
> Having another look at existing str-% behaviour, this should instead say:
> 
>       There are several integer conversion types. All will raise TypeError
>       if the supplied object does not have an __index__ method.

This is a new 2.5 feature, correct?

>>      There are several floating point conversion types. All invoke
>>      float() on the object before attempting to format it.
> 
> 
> Similar to integers, this should instead say:
> 
>       There are several floating point conversion types. All will raise
>       TypeError if the supplied object is not a float or decimal instance.

This seems to close off opportunities for type-punning, which some on 
this list have asked for. If you want to print an int as a float, well, 
why not?

>> Controlling Formatting
> 
> 
> I'm becoming less and less satisfied with the idea that to get a string 
> version of a float, I do this:
> 
>   x = str(val)
> 
> But if I want to control the precision, I have to write:
> 
>   x = "{0:.3}".format(val)  # Even worse than the current "%.3f" % val!!

I've been thinking about this very issue. A lot.

I noticed that PyString_Format has a lot of internal functionality which 
has no pure-python equivalent. Like you say, it would be nice to have a 
simple method to format a single scalar value. I wasn't thinking about 
adding a parameter to str(), but instead some newly-named function (e.g. 
'format'), although I realize that has compatibility problems. I think 
that a second param to str() is better; however, you will have to 
compete against any other possible claims for that valuable second 
parameter.

As an aside, have a look at this function which I was just now working 
on. I'm not entirely sure it is correct, but I think you can see the 
motivation behind it:

# Pure python implementation of the C printf 'e' format specificer
def sci(val,precision,letter='e'):
     sign = ''
     if val < 0:
         sign = '-'
         val = -val
     exp = int(floor(log(val,10)))
     val *= 10**-exp
     if val == floor(val):
         val = int(val)
     else:
         val = round(val,precision)
         if val >= 10.0:
             exp += 1
             val = val * 0.1
     esign = '+'
     if exp < 0:
         exp = -exp
         esign = '-'
     if exp < 10: exp = '0' + str(exp)
     else: exp = str(exp)
     return sign + str(val) + letter + esign + exp

> Why can't I instead write:
> 
>   x = str(val, ".3")
> 
> IOW, why don't we change the signature of 'str' to accept a conversion 
> specifier as an optional second argument?
> 
> Then the interpretation of conversion specifiers in format strings is 
> straightforward - the conversion specifier becomes the second argument 
> to str().

So my only question then is: What about classes that override __str__? 
Do they get the conversion specifier or not?

One way to resolve this would be to go even further and bury the call to 
__format__ inside str! In other words, if you pass a second argument to 
str(), it will first check to see if there's a __format__ function, and 
if not, then it will fall back to __str__.

> Then it would be str() that does the dispatch of the standard conversion 
> specifiers as described above if __format__ is not provided.

My biggest concern about this is that PEP 3101 is getting kind of large, 
because we keep thinking of new issues related to string formatting. I'm 
wondering if maybe this idea of yours could be split off into a separate 
PEP.

Other than that, I think it's a pretty good idea.

-- Talin

From collinw at gmail.com  Sun Jun 11 14:00:57 2006
From: collinw at gmail.com (Collin Winter)
Date: Sun, 11 Jun 2006 14:00:57 +0200
Subject: [Python-3000] Third-party annotation libraries vs the stdlib
Message-ID: <43aa6ff70606110500n616b3f4cya30d114417ecc36e@mail.gmail.com>

In working on the annotations PEP, I've run into more issues
concerning the balance of responsibility between third-party libraries
and the stdlib.

So far, the trend has been to push responsibility for manipulating and
interpreting annotations into libraries, keeping core Python free from
any built-in semantics for the annotation expressions. However, nearly
all the issues that have been discussed on this list go against the
flow: the proposed Function() and Generator() classes, used for
expressing higher-order functions and generator functions,
respectively; type operations, like "T1 & T2" or "T1 | T2"; and the
type parameterisation mechanism.

Shipping any of these things with Python raises a number of other
issues/questions that would need to be dealt with:

1. If Function() and Generator() ship in the stdlib, where do they go?
In types? In a new module?

Also, if Function() and Generator() come with Python, how do we make
sure that third-party libraries can use them with minimal extra
overhead (e.g., wrapping layers to make the shipped Function() and
Generator() objects compatible with the library's internal
architecture)?

2. If "T1 & T2" is possible with core Python (ie, no external
libraries), what does "type(T1 & T2)" return? Is "type(T1 & T2)" the
same as "type(T1 | T2)"?

What can you do with these objects in core Python? Can you subclass
from "T1 & T2"? Does "issubclass(T1, T1 | T2)" return True? What about
"isinstance(5, int | dict)"?

Are "T1 & T2" and "T1 | T2" the only defined operations? What about xor or not?

3. Similar questions are raised by having the "T1[x, y, z]"
parameterisation method present in core Python: what is the type of
"tuple[int, int]"? What can you do with it? Does "isinstance((5, 6,
7), tuple[int, int, int])" return True? Do they have the same & and |
operations as other built-in types? What happens when you mix
parameterised types and non-parameterised types, e.g., "tuple[int,
(int, int)]"?

Based on the complexity involved in specifying all of these issues, I
say we punt: let the third-party libraries handle this. Addressing the
above issues from this perspective:

1. Shipping Function() and Generator() objects is a (relative) piece of cake.

2. In my own experience with this kind of stuff, there's very little
need to express and-ing and or-ing of type expressions. Third-party
libraries can provide this on their own via And() and Or()
classes/functions/whatevers.

If some particular library absolutely insists on using the & and |
operators, there might be some metaclass wizardry that could
accomplish this, but I'm not saying I know what it is.

3. The questions raised by the special type parameterisation mechanism
can be removed by simply omitting the mechanism. In particular, using
regular tuples/lists/dicts/etc instead of the tuple[]/list[]/dict[]
spelling completely removes the issue of mixing parameterised and
non-parameterised expressions.

To sum up: I propose that -- to combat these issues -- I limit the PEP
to discussing how to supply annotations (the annotation syntax and C
API) and how to read them back later (via __signature__).

Collin Winter

From mcherm at mcherm.com  Mon Jun 12 14:58:59 2006
From: mcherm at mcherm.com (Michael Chermside)
Date: Mon, 12 Jun 2006 05:58:59 -0700
Subject: [Python-3000] iostack, continued
Message-ID: <20060612055859.eocfygr98rg0scoo@login.werra.lunarpages.com>

Greg Ewing writes:
> Be careful -- in Unix it's possible for different file
> descriptors to share the same position pointer.

Really? I had no idea.

How does one invoke this behavior? How does current python (2.4)
behave when subjected to this?

-- Michael Chermside

From steven.bethard at gmail.com  Mon Jun 12 20:03:21 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 12 Jun 2006 12:03:21 -0600
Subject: [Python-3000] Assignment decorators, anyone?
In-Reply-To: <4487F88A.7080307@canterbury.ac.nz>
References: <4487F88A.7080307@canterbury.ac.nz>
Message-ID: <d11dcfba0606121103t340e63betbf32832539e65a12@mail.gmail.com>

On 6/8/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> I think I've come across a use case for @decorators
> on assignment statements.
>
> I have a function which is used like this:
>
>    my_property = overridable_property('my_property', "This is my property.")
>
> However, it sucks a bit to have to write the name of
> the property twice. I just got bitten by changing the
> name of one of my properties and forgetting to change
> it in both places.
>
> If decorators could be applied to assignment statements,
> I'd be able to write it as something like
>
> @overridable_property
> my_property = "This is my property."
>
> (This would require the semantics of assignment
> decoration to be defined so that the assigned name
> is passed to the decorator function as well as the
> value being assigned.)
>
> On the other hand, maybe this is a use case for
> the "make" statement that was proposed earlier.

Yes, `PEP 359`_ provided functionality like this, but since it's
withdrawn, another option for you is something like::

    class my_property:
        __metaclass__ = overridable_property
        text = "This is my property."

where overridable_property looks something like:

    def overridable_property(name, args, kwargs):
        text = kwargs.pop('text')
        # do whatever you normally do with name and text

(This is basically all the "make" statement was doing under the covers
anyway.)  Of course, the end result is that you use a class statement
to create something that isn't a class, but at least you manage to
avoid writing "my_property" twice.

.. _PEP 359: http://www.python.org/dev/peps/pep-0359/

STeVe
-- 
Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From tomerfiliba at gmail.com  Mon Jun 12 21:06:38 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Mon, 12 Jun 2006 21:06:38 +0200
Subject: [Python-3000] enhanced descriptors, part 2
Message-ID: <1d85506f0606121206l1a7fb3abq76d4a916b36a764@mail.gmail.com>

hrrrmpff.

there really is a need for "enhanced descriptors", or something like
that. i'm having serious trouble implementing the position property,
as python is currently limited in this area.

the rules were:
* when you assign a positive integer, it's an absolute position to
seek to
* when you assign a negative integer, it's relative to the end, as
is the case with slices
* when you assign None, it's the ultimate last position --
seek(0, "end"), although you would use f.END instead of None
directly
* when you use the +=/-= operators, it's relative to the current
position (if optimized via __iadd__, can reduce one unnecessary
system call)

but descriptors don't support augmented __set__()ing. one solution
would be to return an int-like object, where __iadd__ would seek
relative to the current position.

aside of being slower than expected, complicating __set__, and
implementing position caching, this has a major drawback:
f.position += 4 # this assignment seeks
p = f.position
p += 4 # and this assignment seeks as well!

so that's out of the question, and we'll have to suffer two system
calls, at least in the experimental branch. maybe the C-branch
could utilize under-the-hood tricks to avoid that.

the current code of the descriptor looks like this:

class PositionDesc(object):
    def __get__(self, obj, cls):
        if obj is None:
            return self
        return obj.tell()

    def __set__(self, obj, value):
        if value is None:
            obj.seek(0, "end")
        elif value < 0:
            obj.seek(value, "end")
        else:
            obj.seek(value, "start")

but now we come to another problem... files became cyclic!
or sorta... if f.position < x, then f.position -= x would assign a
negative value to f.position. this, in turn, seeks relative to the
end of the file, thus making the file behave like a semi-cyclic
entity with a not-so-intuitive behavior... for example, assuming
a file size of 100, and a current position of 70:

pos = 70
pos -= 71 ==> pos = -1 ==> pos = (100 - 1) ==> pos = 99

baaaaaah!

in the original design, i wanted to raise an exception if seeking
relative to the current position got negative... but due to the
aforementioned technical limitations, it's not possible.

the whole issue could be solved if the descriptor protocol
supported augmented assignment -- but it requires, of course,
a drastic change to the language, something like the suggested
STORE_INPLACE_ATTR or the __iattr__ suggested by Greg.

will Guido pronounce on his choice (or dischard it altogether)?

-tomer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060612/c36705e9/attachment.htm 

From brett at python.org  Mon Jun 12 22:41:14 2006
From: brett at python.org (Brett Cannon)
Date: Mon, 12 Jun 2006 13:41:14 -0700
Subject: [Python-3000] We should write a PEP on what goes into the stdlib
Message-ID: <bbaeab100606121341h3e95d35em21f4124dade3b2b3@mail.gmail.com>

Right now a discussion is going on in python-dev about what is reasonable
for special needs of developers who bring in modules to the stdlib.  This of
course brings up the idea of slimming down the stdlib, having sumo releases,
etc.

That makes me think perhaps we should start thinking about collectively
coming up with guidelines (which end up in a PEP; and yes, I am volunteering
to write it) on deciding what is needed to accept a module into the stdlib.
We can then use this to go through what is already there and trim out the
fluff already there and get a list going of what will end up disappearing
early on so people can know long in advance.

Now this has nothing to do with a stdlib renaming or anything.  This is
purely about figuring out what is required for accepting a module and for
pruning out what we don't want that we currently have.

So, to start this discussion, here are my ideas...

First, the modules must have been in the wild and used by the community.
This has worked well so far by making sure the code is stable and that the
API is good.

Second, the code must follow Python coding guidelines.  This means not just
proper formatting and naming, but also that good unit tests are included as
well.  It also means that the module name might need to be renamed.
Documentation must also be provided in the proper format before acceptance.
All of this must be done *before* anything is checked in (use a branch if
needed to hold the work on the transition).

Third, a PEP discussing why the module should go in.  Basically, a
documented case for why the module should be distributed in Python.  It also
gives python-dev a central document to read and refer to when voting on
whether something should be let into the stdlib.  Can also document
differences between the public version and the one in the stdlib.

Fourth, the contributor must have signed a contribution agreement.

Fifth, contributors realize that Python developers have any and all rights
to check in changes to the code.  They can do something like how Barry
maintains external email releases and document that in the PEP.  This is
probably  one of the more contentious ideas laid out here.  But we need to
worry about keeping the stdlib easily maintained since python-dev takes on
responsibility for code once it's checked in so we need to keep this as
simple as possible.  Basically this eliminates PEP 360 for Py3K.

Now, another thing is backwards compatibility.  Do we worry about
portability to older versions like we do now with PEP 291, or do all new
modules checked in give up the right to force developers to keep the code
compatible to a certain version?  This is another ease of maintenance/nice
to external release issue.

And that external release/ease of maintenance is going to be the sticky
point in all of this.  We need to find a good balance or we for scaring away
people from contributing code.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060612/6e5b2c70/attachment.htm 

From rhettinger at ewtllc.com  Mon Jun 12 23:44:27 2006
From: rhettinger at ewtllc.com (Raymond Hettinger)
Date: Mon, 12 Jun 2006 14:44:27 -0700
Subject: [Python-3000] We should write a PEP on what goes into the stdlib
In-Reply-To: <bbaeab100606121341h3e95d35em21f4124dade3b2b3@mail.gmail.com>
References: <bbaeab100606121341h3e95d35em21f4124dade3b2b3@mail.gmail.com>
Message-ID: <448DE03B.2050205@ewtllc.com>

Brett Cannon wrote:

> This is purely about figuring out what is required for accepting a 
> module and for pruning out what we don't want that we currently have.

Well intentioned, but futile.   Each case ultimately gets decided on its 
merits.  Any one reason for inclusion or exclusion can be outweighed by 
some other reason.  There isn't a consistent ruleset that explains 
clearly why decimal, elementtree, email, and textwrap were included 
while Cheetah, Twisted, numpy, and BeautifulSoup were not. 

Overly general rules are likely to be rife with exceptions and amount to 
useless administrivia.  I don't think these contentious issues can be 
decided in advance.  The specifics of each case are more relevant than a 
laundry list of generalizations.

> First, the modules must have been in the wild and used by the 
> community.  This has worked well so far by making sure the code is 
> stable and that the API is good.

Nice guideline, but the decimal module did not meet that test.  For AST, 
the stability criterion was tossed and the ultimate API is still in 
limbo.  Itertools went in directly.  However, the tried and true mxTools 
never went in, and the venerable bytecodehacks never had a chance.

>
> Second, the code must follow Python coding guidelines.

We already have a PEP for that.

>
> Third, a PEP discussing why the module should go in.

We don't need a PEP for every module.  If the python-dev discussion says 
we want it and Guido approves, then it is a done deal.

>
> Now, another thing is backwards compatibility.

Isn't there already a PEP where people can add portability restrictions 
(i.e. having decimal continue to work on Py2.3?

From brett at python.org  Tue Jun 13 00:23:08 2006
From: brett at python.org (Brett Cannon)
Date: Mon, 12 Jun 2006 15:23:08 -0700
Subject: [Python-3000] We should write a PEP on what goes into the stdlib
In-Reply-To: <448DE03B.2050205@ewtllc.com>
References: <bbaeab100606121341h3e95d35em21f4124dade3b2b3@mail.gmail.com>
	<448DE03B.2050205@ewtllc.com>
Message-ID: <bbaeab100606121523l2baf95c1jd4ca6309acd1189d@mail.gmail.com>

One thing I forgot to say in the initial email was that I am being
intentially heavy-handed with restrictions on people to get some dialog and
see where people think things are okay and not.

On 6/12/06, Raymond Hettinger <rhettinger at ewtllc.com> wrote:
>
> Brett Cannon wrote:
>
> > This is purely about figuring out what is required for accepting a
> > module and for pruning out what we don't want that we currently have.
>
>
> Well intentioned, but futile.   Each case ultimately gets decided on its
> merits.  Any one reason for inclusion or exclusion can be outweighed by
> some other reason.  There isn't a consistent ruleset that explains
> clearly why decimal, elementtree, email, and textwrap were included
> while Cheetah, Twisted, numpy, and BeautifulSoup were not.

True.  And notice none of my points say that some package must have been
used in the community for X number of months or have Y number of users
across Z operating systems.  That is not the point of the PEP.

The points I laid out are not that rigid and are basically what we follow,
but centralized in a single place.  Plus it codifies how we want to handle
contributed code in terms of how flexible we want to be for handling
people's wants on how we touch their code in the repository.  A PEP on this
would give us something to point to when people email the list saying, "I
want to get this module added to the stdlib" and prevent ourselves from
repeating the same lines over and over and let people know what we expect.

Overly general rules are likely to be rife with exceptions and amount to
> useless administrivia.  I don't think these contentious issues can be
> decided in advance.  The specifics of each case are more relevant than a
> laundry list of generalizations.

I don't think the points made are that unreasonable.  Following formatting
guidelines, signing a contributor agreement, etc. are not useless
administrivia.  The PEP requirement maybe.  And stating what python-dev is
willing to do in terms of maintenance I think is totally reasonable to state
up front.

> First, the modules must have been in the wild and used by the
> > community.  This has worked well so far by making sure the code is
> > stable and that the API is good.
>
>
> Nice guideline, but the decimal module did not meet that test.

Right, so?  The decimal module would have most likely been picked up
eventually; maybe not 2.3 but at some point.  Having it available during dev
would have counted as use in the community anyway.

  For AST,
> the stability criterion was tossed and the ultimate API is still in
> limbo.

The AST is not a stdlib thing, in my opinion.  That was back-end stuff.
Plus you can't provide AST access directly without mucking with the
internals anyway, so that basically requires dev within at least a branch.

  Itertools went in directly.

Once again, fine, but would that have prevented it from ever going in?  I
doubt that.  I know you did a lot of asking the community for what to
include and such.  Had you done that externally while working on it and then
propose it to python-dev once you were satisfied with the implementation it
probably would have gone right in.

  However, the tried and true mxTools
> never went in, and the venerable bytecodehacks never had a chance.
>
>
> >
> > Second, the code must follow Python coding guidelines.
>
> We already have a PEP for that.

Yeah, and yet we still accept stuff that does not necessarily follow those
PEPs.  I am not saying we need to write those PEPs, again, but say that
those PEPs *must* be followed.

>
> > Third, a PEP discussing why the module should go in.
>
> We don't need a PEP for every module.  If the python-dev discussion says
> we want it and Guido approves, then it is a done deal.

Look at pysqlite.  We went through that discussion twice.  Most module
discussions end up being rather long and having a single place where stuff
is written would be nice.

But I don't view this as a necessary step.

>
> > Now, another thing is backwards compatibility.
>
> Isn't there already a PEP where people can add portability restrictions
> (i.e. having decimal continue to work on Py2.3?
>
>
>

Yep, PEP 291.  What I am asking here is whether contributers should be able
to request compatibility restrictions on the source code at all.  As I said,
I purposely went heavy-handed in this to get feedback from people.  The
points I made are all very python-def friendly and not external developer
friendly.  But we need to discuss that to get an idea of what python-dev is
willing to do to get external contributions for the stdlib.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060612/2a122c84/attachment.htm 

From rudyrudolph at excite.com  Tue Jun 13 00:49:45 2006
From: rudyrudolph at excite.com (Rudy Rudolph)
Date: Mon, 12 Jun 2006 18:49:45 -0400 (EDT)
Subject: [Python-3000] PEP 3101 update
Message-ID: <20060612224945.8AF9B2F5C3@xprdmxin.myway.com>

Is it possible to support two additional string formatting
features without overly complicating the whole thing?
It would be nice to have align centered and align on decimal point.

Centered would add fill chars both before and after the value.
If an odd number of fill chars must be added, the extra char is
after the value.

Align on decimal is not necessary with 'f' formatting because you can
right align the fractional part and get the same effect. However, with
'g' formatting the number of digits after the point may vary and there
may not even be a decimal point. In this case, a column of numbers
should be aligned at the last digit of the integer part.

If these are desirable, we would need to choose suitable symbols for
align. Either '><' or '|' seems appropriate for centered.

Rudy

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!

From greg.ewing at canterbury.ac.nz  Tue Jun 13 02:45:51 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 13 Jun 2006 12:45:51 +1200
Subject: [Python-3000] iostack, continued
In-Reply-To: <20060612055859.eocfygr98rg0scoo@login.werra.lunarpages.com>
References: <20060612055859.eocfygr98rg0scoo@login.werra.lunarpages.com>
Message-ID: <448E0ABF.7040704@canterbury.ac.nz>

Michael Chermside wrote:
> Greg Ewing writes:
> 
>> Be careful -- in Unix it's possible for different file
>> descriptors to share the same position pointer.
> 
> Really? I had no idea.
> 
> How does one invoke this behavior?

It happens every time you fork, and the child process
inherits copies of the stdin/out/err descriptors. If
e.g. stdin is coming from a disk file, and the child
reads part of the file, and then the parent reads
some more, it will start reading where the child
left off.

Another way is to use dup() or dup2() to make a
copy of a file descriptor.

> How does current python (2.4)
> behave when subjected to this?

Calls in the os module behave the same as their
underlying system calls. File objects behave
however the platform's C stdio library behaves.

Buffering makes things a bit messy. Usually it's
not a problem, because normally parent and child
processes don't both read or write the same
disk file. If they do, some flushing calls might
be necessary.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Jun 13 04:05:57 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 13 Jun 2006 14:05:57 +1200
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <20060612224945.8AF9B2F5C3@xprdmxin.myway.com>
References: <20060612224945.8AF9B2F5C3@xprdmxin.myway.com>
Message-ID: <448E1D85.2050409@canterbury.ac.nz>

Rudy Rudolph wrote:
> However, with
> 'g' formatting the number of digits after the point may vary and there
> may not even be a decimal point. In this case, a column of numbers
> should be aligned at the last digit of the integer part.

How do you do that when you're formatting one
string at a time?

--
Greg

From tony at printra.net  Tue Jun 13 05:06:40 2006
From: tony at printra.net (Tony Lownds)
Date: Mon, 12 Jun 2006 20:06:40 -0700
Subject: [Python-3000] Third-party annotation libraries vs the stdlib
In-Reply-To: <43aa6ff70606110500n616b3f4cya30d114417ecc36e@mail.gmail.com>
References: <43aa6ff70606110500n616b3f4cya30d114417ecc36e@mail.gmail.com>
Message-ID: <9C7E76CA-F4A0-4C94-B312-891F5A9B93BB@printra.net>

On Jun 11, 2006, at 5:00 AM, Collin Winter wrote:

> In working on the annotations PEP, I've run into more issues
> concerning the balance of responsibility between third-party libraries
> and the stdlib.
>
> So far, the trend has been to push responsibility for manipulating and
> interpreting annotations into libraries, keeping core Python free from
> any built-in semantics for the annotation expressions. However, nearly
> all the issues that have been discussed on this list go against the
> flow: the proposed Function() and Generator() classes, used for
> expressing higher-order functions and generator functions,
> respectively; type operations, like "T1 & T2" or "T1 | T2"; and the
> type parameterisation mechanism.
>
> Shipping any of these things with Python raises a number of other
> issues/questions that would need to be dealt with:
>
> 1. If Function() and Generator() ship in the stdlib, where do they go?
> In types? In a new module?

The types module seems like a decent place.

> Also, if Function() and Generator() come with Python, how do we make
> sure that third-party libraries can use them with minimal extra
> overhead (e.g., wrapping layers to make the shipped Function() and
> Generator() objects compatible with the library's internal
> architecture)?
>

Thats an issue for third party libraries.

> 2. If "T1 & T2" is possible with core Python (ie, no external
> libraries), what does "type(T1 & T2)" return? Is "type(T1 & T2)" the
> same as "type(T1 | T2)"?
>

These operations could return objects that describe the types and  
nothing else.
It doesn't make sense for the result of T1 | T2 to be a type object.

class TypeUnion:
   def __init__(self, *types):
     self.types = types

   def __repr__(self):
     return '(%s)' % ' | '.join(map(repr, self.types))

   def __or__(self, other):
      ...

> What can you do with these objects in core Python? Can you subclass
> from "T1 & T2"? Does "issubclass(T1, T1 | T2)" return True? What about
> "isinstance(5, int | dict)"?

isinstance could be extended to work with TypeUnion objects. Only  
type objects are
sensible for issubclass. It makes more sense for another predicate to  
determine subtype
relationships.

And I think it makes more sense for third party packages to provide  
the specific
definitions and mechanisms for determining subtype relationships.  
Coming to a common
and usable definition would be too difficult otherwise.

I wanted to suggest that core Python's isinstance be extended to work  
with types and
subtyping definitions be left up to third party packages but that  
won't work. You can't
tell whether a given callable is a valid instance of a Function()  
without a subtype predicate.

> Are "T1 & T2" and "T1 | T2" the only defined operations? What about  
> xor or not?
>

I can't think of any useful semantics for this.

> 3. Similar questions are raised by having the "T1[x, y, z]"
> parameterisation method present in core Python: what is the type of
> "tuple[int, int]"? What can you do with it?

It could be a type object that is a subclass of tuple. It could also  
be an object that describes
the type, like TypeUnion above.

> Does "isinstance((5, 6,
> 7), tuple[int, int, int])" return True?

For new style classes, isinstance(obj, T) is roughly equivalent to  
issubclass(type(obj), T).
Lets say tuple[int, int, int] is a subclass of tuple. The result of  
type((5, 6, 7)) won't change -- it's
the tuple type object. So isinstance((5, 6, 7), tuple[int, int, int])  
would return False.

That is misleading. I think it would be better if the tuple[int, int]  
would return something that isn't
a type so that uses of isinstance are not misleading.

Another idea would be to provide a different way to check that an  
instance is a valid member of a type.
I bet this would get rejected quickly.

 >>> int.ismember(5)
True
 >>> (int | dict).ismember(5)
True

> Do they have the same & and | operations as other built-in types?

Sure, why not.

> What happens when you mix
> parameterised types and non-parameterised types, e.g., "tuple[int,
> (int, int)]"?

Is the question is whether the parameterization mechanism should enforce
that it's parameters are valid types?

> Based on the complexity involved in specifying all of these issues, I
> say we punt: let the third-party libraries handle this.
[...]
> To sum up: I propose that -- to combat these issues -- I limit the PEP
> to discussing how to supply annotations (the annotation syntax and C
> API) and how to read them back later (via __signature__).

+1

I think the annotations PEP should definitely punt on this and also  
punt on definitions
of And(), Function(), Generator(), etc. Unless those are what is  
returned by __signature__?
Annotations syntax and __signature__ object API could also be  
independent PEPs.

It would be really nice to have a common language for type annotations.
This lets authors of type-annotated code use the same annotations  
with a variety
of third-party packages. From the issues above this seems hard to  
accomplish
in a way that integrates well with the rest of Python.

-Tony

From thomas at python.org  Tue Jun 13 09:39:40 2006
From: thomas at python.org (Thomas Wouters)
Date: Tue, 13 Jun 2006 09:39:40 +0200
Subject: [Python-3000] [Python-Dev] xrange vs. int.__getslice__
In-Reply-To: <448E6A74.3010409@renet.ru>
References: <448E6A74.3010409@renet.ru>
Message-ID: <9e804ac0606130039o29ce1f39neff8af92e8faeff7@mail.gmail.com>

On 6/13/06, Vladimir 'Yu' Stepanov <vys at renet.ru> wrote:
>
> You were bothered yet with function xrange ? :) I suggest to replace it.

http://www.python.org/dev/peps/pep-0204/

(If you must really discuss this, which would probably be futile and
senseless, please do it on python-3000 only.)
-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060613/0625c63e/attachment.html 

From vys at renet.ru  Tue Jun 13 09:34:12 2006
From: vys at renet.ru (Vladimir 'Yu' Stepanov)
Date: Tue, 13 Jun 2006 11:34:12 +0400
Subject: [Python-3000] xrange vs. int.__getslice__
Message-ID: <448E6A74.3010409@renet.ru>

You were bothered yet with function xrange ? :) I suggest to replace it.

---------------------------------------------
        for i in xrange(100): pass
vs.
        for i in int[:100]: pass
---------------------------------------------

---------------------------------------------
        for i in xrange(1000, 1020): pass
vs.
        for i in int[1000:1020]: pass
---------------------------------------------

---------------------------------------------
        for i in xrange(200, 100, -2): pass
vs.
        for i in int[200:100:-2]: pass
---------------------------------------------

From vys at renet.ru  Tue Jun 13 10:11:17 2006
From: vys at renet.ru (Vladimir 'Yu' Stepanov)
Date: Tue, 13 Jun 2006 12:11:17 +0400
Subject: [Python-3000] [Python-Dev] xrange vs. int.__getslice__
In-Reply-To: <9e804ac0606130039o29ce1f39neff8af92e8faeff7@mail.gmail.com>
References: <448E6A74.3010409@renet.ru>
	<9e804ac0606130039o29ce1f39neff8af92e8faeff7@mail.gmail.com>
Message-ID: <448E7325.4010000@renet.ru>

Thomas Wouters wrote:
> http://www.python.org/dev/peps/pep-0204/
>
> (If you must really discuss this, which would probably be futile and 
> senseless, please do it on python-3000 only.)

Certainly looks very similar. PEP-204 demands change in a parser
and considers a new design as replacement to range functions. My
offer can be considered as replacement to xrange functions. Any
change in a syntactic design of language to spend it is not
necessary.

Thanks.

From mcherm at mcherm.com  Tue Jun 13 15:28:30 2006
From: mcherm at mcherm.com (Michael Chermside)
Date: Tue, 13 Jun 2006 06:28:30 -0700
Subject: [Python-3000] We should write a PEP on what goes into the	stdlib
Message-ID: <20060613062830.55hi3ynpppdd8gc4@login.werra.lunarpages.com>

Brett writes:
> That makes me think perhaps we should start thinking about collectively
> coming up with guidelines [...] on deciding what is needed to accept a
> module into the stdlib.

Raymond replies:
> Each case ultimately gets decided on its merits.  Any one reason for  
> inclusion or exclusion can be outweighed by some other reason. [...]  
> Overly general rules are likely to be rife with
> exceptions and amount to useless administrivia.  I don't think these
> contentious issues can be decided in advance.  The specifics of each case
> are more relevant than a laundry list of generalizations.

I agree. If we have a PEP with rules for acceptance, then every time we
don't follow those rules exactly we will be accused of favoritism. If
we have informal rules like today and decide things on a case-by-case
basis, then everything is fine.

Rather than a formal PEP, how about a wiki page (which is necessarily
less of a formal "rule") that describes a good process to get your
module accepted. The obvious things (release to the community, get wide
usage, be recognized as best-of-breed, agree to donate code, agree to
support for some time) could all be listed there. It's just as easy to
refer someone to a wiki page as it is to refer them to a PEP, but it
doesn't make it seem like we're bound to follow a particular process.

-- Michael Chermside

From mcherm at mcherm.com  Tue Jun 13 15:39:27 2006
From: mcherm at mcherm.com (Michael Chermside)
Date: Tue, 13 Jun 2006 06:39:27 -0700
Subject: [Python-3000] enhanced descriptors, part 2
Message-ID: <20060613063927.cpbh2nslum2owgsw@login.werra.lunarpages.com>

tomer writes:
> there really is a need for "enhanced descriptors", or something like
> that. i'm having serious trouble implementing the position property,  
> as python is currently limited in this area.

No, this doesn't necessarily imply that we need "enhanced descriptors",
an alternative solution is to change the intended API for file
positions. After all, the original motivation for using a property
was that it was (a) nice to use, (b) easy to read, and (c) possible to
implement. If (c) isn't true then perhaps we rethink the API.

After all, how bad would it be to use the following:

     f.position    -- used to access the current position
     f.seek_to(x)  -- seek to an absolute position (may be relative to end)
     f.seek_by(x)  -- seek by a relative amount

Or even go half-way:

     f.position      -- used to access the current position
     f.position = x  -- seek to an absolute position (may be relative to end)
     f.seek_by(x)    -- seek by a relative amount

Properties are nice, but there's nothing wrong with methods either. If
we went with the second approach, people might foolishly use
"f.position += 4" where they intended "f.seek_by(x)" and it would still
work fine, it just wouldn't be optimized. That's really not so bad.

-- Michael Chermside

From bborcic at gmail.com  Tue Jun 13 16:34:18 2006
From: bborcic at gmail.com (Boris Borcic)
Date: Tue, 13 Jun 2006 16:34:18 +0200
Subject: [Python-3000] xrange vs. int.__getslice__
In-Reply-To: <448E6A74.3010409@renet.ru>
References: <448E6A74.3010409@renet.ru>
Message-ID: <e6mido$c0p$1@sea.gmane.org>

Vladimir 'Yu' Stepanov wrote:
> You were bothered yet with function xrange ? :) I suggest to replace it.
> 
> ---------------------------------------------
>         for i in xrange(100): pass
> vs.
>         for i in int[:100]: pass
> ---------------------------------------------

in a similar vein (slices on types)

-----------------------------------------------
           (slice(1,10),Ellipsis,slice(1,10))
vs
           slice[1:10,...,1:10]
-----------------------------------------------

Boris
--
"On na?t tous les m?tres du m?me monde"

From rrr at ronadam.com  Tue Jun 13 19:32:08 2006
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 13 Jun 2006 12:32:08 -0500
Subject: [Python-3000] We should write a PEP on what goes into the stdlib
In-Reply-To: <bbaeab100606121341h3e95d35em21f4124dade3b2b3@mail.gmail.com>
References: <bbaeab100606121341h3e95d35em21f4124dade3b2b3@mail.gmail.com>
Message-ID: <e6msv9$n0r$1@sea.gmane.org>

Brett Cannon wrote:

> So, to start this discussion, here are my ideas...
> 
> First, the modules must have been in the wild and used by the 
> community.  This has worked well so far by making sure the code is 
> stable and that the API is good.

Those modules and packages that necessary parts of python are dependent 
on should probably be near the top of your list.

Just what is included as necessary parts could be discussed.  Possibly a 
short list would include...

* Modules needed to manage, test and document the python installation.

* Modules needed to run, edit and test python programs.

* Modules needed to document programs

* Modules needed to package, install and distribute programs.

* Modules needed for platform compatibility.

After including these, and those modules and packages these are 
dependent on, there might not be all that much to remove.  Which would 
leave...

* Modules and packages that are so popular that it doesn't make since to 
not install them.

All else could probably either be an optionally installed package 
included in the distribution or as an easy to install egg package.

I don't think determining what goes into the stdlib is as difficult as 
people think.  It all seems pretty practical to me (Although not trivial 
to do when taken as a whole)

Maybe adding a few guide lines as to what should not be in the standard 
lib would be a good way to prevent it from growing to large.  Ie, 
modules that haven't been tested sufficiently in the wild or by python 
DEV, or modules rarely needed or used... etc.  Listing the inverse of 
these as reasons for inclusion seems to be the suggested approach here, 
  but that seems to me to be working from the wrong end in my humble 
opinion.

Ron

From rudyrudolph at excite.com  Tue Jun 13 19:33:30 2006
From: rudyrudolph at excite.com (Rudy Rudolph)
Date: Tue, 13 Jun 2006 13:33:30 -0400 (EDT)
Subject: [Python-3000] PEP 3101 update
Message-ID: <20060613173330.9388899E4A@xprdmxin.myway.com>

Rudy Rudolph wrote:
>It would be nice to have align centered and align on decimal point.
>However, with 'g' formatting the number of digits after the point may
>vary and there may not even be a decimal point. In this case, a column
>of numbers should be aligned at the last digit of the integer part.

Greg Ewing wrote:
>How do you do that when you're formatting one string at a time?

I thought the whole idea of alignment specifications was so that we could print one line at a time but get everything to line up.

To right align, we use '>' for align and specify the last position
relative to the what came before. That relative position is known
as the field width.

To decimal align, we use a different char for align and specify the
decimal position relative to what came before. One possible way
to do this is with, for example, '9.3g' which means fill chars and
digits in the first 5 positions, a decimal point if necessary in the
sixth position, and digits and fill chars in the last three positions.
If the same format string is used for every line printed, the decimals
line up, just like the right sides line up with right alignment. Well,
actually the last digits of the integer portions line up even if there
isn't a decimal point, just like with a decimal tab in MS Word.

There are certainly other ways to specify the same thing and I don't
much care what the format is. It should be easy enough both to settle
on a format and to implement, and it would certainly be useful.

Rudy

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!

From tomerfiliba at gmail.com  Tue Jun 13 19:48:25 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Tue, 13 Jun 2006 19:48:25 +0200
Subject: [Python-3000] enhanced descriptors, part 2
In-Reply-To: <20060613063927.cpbh2nslum2owgsw@login.werra.lunarpages.com>
References: <20060613063927.cpbh2nslum2owgsw@login.werra.lunarpages.com>
Message-ID: <1d85506f0606131048i49ee0982jf8729e130b2a8a41@mail.gmail.com>

>      f.position      -- used to access the current position
>      f.position = x  -- seek to an absolute position (may be relative to end)
>      f.seek_by(x)    -- seek by a relative amount
>
> Properties are nice, but there's nothing wrong with methods either. If
> we went with the second approach, people might foolishly use
> "f.position += 4" where they intended "f.seek_by(x)" and it would still
> work fine, it just wouldn't be optimized. That's really not so bad.

okay, i'm fine with that. but i'm not happy with the fact it's not *possible*
to implement such things in python. perhaps with time more use-cases
will show it's needed. until then... ;)

-tomer

On 6/13/06, Michael Chermside <mcherm at mcherm.com> wrote:
> tomer writes:
> > there really is a need for "enhanced descriptors", or something like
> > that. i'm having serious trouble implementing the position property,
> > as python is currently limited in this area.
>
> No, this doesn't necessarily imply that we need "enhanced descriptors",
> an alternative solution is to change the intended API for file
> positions. After all, the original motivation for using a property
> was that it was (a) nice to use, (b) easy to read, and (c) possible to
> implement. If (c) isn't true then perhaps we rethink the API.
>
> After all, how bad would it be to use the following:
>
>      f.position    -- used to access the current position
>      f.seek_to(x)  -- seek to an absolute position (may be relative to end)
>      f.seek_by(x)  -- seek by a relative amount
>
> Or even go half-way:
>
>      f.position      -- used to access the current position
>      f.position = x  -- seek to an absolute position (may be relative to end)
>      f.seek_by(x)    -- seek by a relative amount
>
> Properties are nice, but there's nothing wrong with methods either. If
> we went with the second approach, people might foolishly use
> "f.position += 4" where they intended "f.seek_by(x)" and it would still
> work fine, it just wouldn't be optimized. That's really not so bad.
>
> -- Michael Chermside
>
>

From greg.ewing at canterbury.ac.nz  Wed Jun 14 02:39:58 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 14 Jun 2006 12:39:58 +1200
Subject: [Python-3000] enhanced descriptors, part 2
In-Reply-To: <20060613063927.cpbh2nslum2owgsw@login.werra.lunarpages.com>
References: <20060613063927.cpbh2nslum2owgsw@login.werra.lunarpages.com>
Message-ID: <448F5ADE.2090400@canterbury.ac.nz>

Michael Chermside wrote:

>      f.position = x  -- seek to an absolute position (may be relative to end)

although the "relative to end" part would still admit
the circularity problem (if it's considered to be a
problem - personally I'm not too worried what happens
if you're silly enough to try to seek before the
beginning of a file).

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Jun 14 02:51:41 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 14 Jun 2006 12:51:41 +1200
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <20060613173330.9388899E4A@xprdmxin.myway.com>
References: <20060613173330.9388899E4A@xprdmxin.myway.com>
Message-ID: <448F5D9D.3040702@canterbury.ac.nz>

Rudy Rudolph wrote:
> '9.3g' which means fill chars and
> digits in the first 5 positions, a decimal point if necessary in the
> sixth position, and digits and fill chars in the last three positions.

So what you're really asking for is an option for
suppressing trailing zeroes after a decimal point
(and replacing them with spaces).

That makes sense, although I think calling it
"decimal align" would be confusing. It confused
me, because I was thinking of what this means in
a word processor, where you're aligning decimal
points with some predetermined absolute position.

--
Greg

From rudyrudolph at excite.com  Wed Jun 14 21:45:14 2006
From: rudyrudolph at excite.com (Rudy Rudolph)
Date: Wed, 14 Jun 2006 15:45:14 -0400 (EDT)
Subject: [Python-3000] PEP 3101 update
Message-ID: <20060614194514.3F76B8B354@xprdmxin.myway.com>

Greg Ewing wrote:
>So what you're really asking for is an option for
>suppressing trailing zeroes after a decimal point
>(and replacing them with spaces).
>That makes sense, although I think calling it
>"decimal align" would be confusing. It confused
>me, because I was thinking of what this means in
>a word processor, where you're aligning decimal
>points with some predetermined absolute position.

Formatting with 'g' instead of 'f' already suppresses
trailing zeroes (and the decimal point if there is no
fractional part). Calling it "decimal align" is just as
valid as your "right align" and "left align". That is,
they align the current piece relative to what was printed
before; none gives you absolute positioning. However, if
a) the same format string is used for every line, b) all
fields specify a width, and c) nothing exceeds its format
width, then all the columns will align left, right, decimal,
or whatever. That's one of the main uses of format strings,
using relative positioning one line at a time to achieve a
poor-man's table with the fields in all lines aligned in
columns as if the positions had been specified absolutely.

Center- and decimal-align nicely round out the left-align,
right-align, and pad-after-sign formatting already proposed,
and are easy to implement. I therefore ask that they be added
to the PEP. BTW, I very much like the proposal in the PEP.

Issue to consider: Can decimal alignment be specified together
with pad-after-space?

Rudy

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!

From greg.ewing at canterbury.ac.nz  Thu Jun 15 03:06:36 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Jun 2006 13:06:36 +1200
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <20060614194514.3F76B8B354@xprdmxin.myway.com>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>
Message-ID: <4490B29C.5050402@canterbury.ac.nz>

Rudy Rudolph wrote:

> Calling it "decimal align" is just as
> valid as your "right align" and "left align".

But "decimal align" raises the question "align with
*what*?" The answer to that is far less obvious than
it is with "left" and "right", IMO.

Also, the output of %f is *alread* "decimal aligned"
in this sense. The only difference between the
current behaviour of %f and your suggested "decimal
align" is that trailing zeroes would be suppressed.
So it would make a lot more sense to me to call it
"suppress trailing zeroes" instead.

--
Greg

From talin at acm.org  Thu Jun 15 06:57:26 2006
From: talin at acm.org (Talin)
Date: Wed, 14 Jun 2006 21:57:26 -0700
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <20060614194514.3F76B8B354@xprdmxin.myway.com>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>
Message-ID: <4490E8B6.7010601@acm.org>

Rudy Rudolph wrote:
> Center- and decimal-align nicely round out the left-align,
> right-align, and pad-after-sign formatting already proposed,
> and are easy to implement. I therefore ask that they be added
> to the PEP. BTW, I very much like the proposal in the PEP.

The basic idea of a decimal align option sounds good to me. I even 
started working up an implementation, but I haven't had the time to 
finish it -- I decided to go with the '^' character as an alignment symbol.

Greg Ewing's point that this is effectively the same as "pad with 
spaces" is correct, however I don't think that's the way most people 
think of it - in other words, generally what people ask for is "line up 
all the decimal points".

Here's my concern however: PEP 3101 is getting rather large, because of 
all of these little details that are ancilliary to the primary proposal 
of a 'format' method for string objects. I've already pushed back on 
Nick Coghlan's otherwise excellent suggestion of allowing the same set 
of conversion specifiers to be used as a second argument to str() for 
this reason.

(I thought about breaking out the conversion specifiers into a separate 
PEP, but since they aren't meaningful by themselves it makes no sense to 
accept one PEP and reject another, and also because then I'd have 3 PEPs 
in the Python-3000 queue [including 3102], and right now 2 is as much as 
I want to deal with.)

Because 3101 is targeted at Python-3000, and because 3000 is scheduled 
for release in the distant future, I have no sense as to what the 
timetable is for acceptance or adoption of this PEP; As far as I know, 
it could be a year or more before a decision is made, and the PEP might 
be rejected at the end of that time. So from my point of view, I am 
faced with the prospect of an ever-expanding PEP as people continue to 
think of new suggestions over the course of the next year, all of which 
may come to naught.

My feeling is that a good PEP should contain a limited number of BDFL 
decisions - that is, it should be possible for Guido to go down the 
checklist and accept / reject / suggest changes to a small number of 
essential bullet points. I fear that PEP 3101 is going to turn into a 
kind of omnibus bill with all kinds of little amendments to deal with.

For this reason, I'd like to put some sort of limit on lower-level 
details of the PEP, and let that detail be filled in via the normal 
feature request / patch submission process once the PEP has actually 
been accepted.

I guess what I also need to do is find some place to post my prototype 
so that people can criticize it and submit patches to it. What would be 
ideal for my purposes would be if there was a "research" branch in the 
Python svn so that wild-eyed radicals such as myself could check in code 
that is still being discussed by the community and is not yet intended 
for inclusion in the main tree. This would also allow people who have 
suggestions to submit patches to the prototype, rather than having to 
ask me to do it for them.

-- Talin

From talin at acm.org  Fri Jun 16 07:17:49 2006
From: talin at acm.org (Talin)
Date: Thu, 15 Jun 2006 22:17:49 -0700
Subject: [Python-3000] We should write a PEP on what goes into the	stdlib
In-Reply-To: <20060613062830.55hi3ynpppdd8gc4@login.werra.lunarpages.com>
References: <20060613062830.55hi3ynpppdd8gc4@login.werra.lunarpages.com>
Message-ID: <44923EFD.50804@acm.org>

Michael Chermside wrote:
> I agree. If we have a PEP with rules for acceptance, then every time we
> don't follow those rules exactly we will be accused of favoritism. If
> we have informal rules like today and decide things on a case-by-case
> basis, then everything is fine.

Let me make a suggestion that might help resolve the disagreement.

One of my favorite podcasts is "Life of a Law Student", 
(http://www.lifeofalawstudent.com/) in which a first year law student 
named Neil Wehneman makes a daily podcast of what he learned in law 
school that day. One of the ideas that he talks about (Intro to the Law 
#2) is the difference between a "Rule" and a "Standard":

A 'rule' is a definitive test, intended to provide certainty. An example 
is the speed limit - you are either exceeding the speed limit, or you 
aren't.

A 'standard', on the other hand (at least, in its legal definition) is a 
set of factors to be weighed by a judge when making a decision. Its 
purpose is to provide flexibility, allowing human judgement to stay in 
the loop, but at the same time giving a framework for making those 
judgements in a consistent way.

An example of a standard is fair use under copyright law. When a judge 
decides whether something is fair use, they use a standard consisting of 
a number of factors, including the amount of the work copied, the 
commercial or non-commercial use of the work, and so on.

Note that none of these factors are a simple "yes/no" decision - 
instead, a judgement must be made as to how much a particular case fits 
the standard. A use of a work can be completely commercial, completely 
noncommercial, or something inbetween. To the extent that it is 
noncommercial, that weighs in favor of it being declared fair use; To 
the extent that it is commercial, that weighs against.

So what I would suggest, then, is the creation of a standard (in this 
legal sense) for what factors should be considered in deciding whether 
to include something in the stdlib.

Moreover, the standard should be clearly labeled as such - to prevent 
people from interpreting the document as a set of hard rules that they 
can use to beat other people over the head with.

So for example, it might say something like: "To the extent that the 
module has enjoyed widespread adoption and use within the Python 
community, this weighs in favor of inclusion." and so on.

-- Talin

From brett at python.org  Fri Jun 16 19:01:46 2006
From: brett at python.org (Brett Cannon)
Date: Fri, 16 Jun 2006 10:01:46 -0700
Subject: [Python-3000] We should write a PEP on what goes into the stdlib
In-Reply-To: <44923EFD.50804@acm.org>
References: <20060613062830.55hi3ynpppdd8gc4@login.werra.lunarpages.com>
	<44923EFD.50804@acm.org>
Message-ID: <bbaeab100606161001v3af8ec61v5db24a6e585a40eb@mail.gmail.com>

On 6/15/06, Talin <talin at acm.org> wrote:
>
> Michael Chermside wrote:
> > I agree. If we have a PEP with rules for acceptance, then every time we
> > don't follow those rules exactly we will be accused of favoritism. If
> > we have informal rules like today and decide things on a case-by-case
> > basis, then everything is fine.
>
> Let me make a suggestion that might help resolve the disagreement.
>
> One of my favorite podcasts is "Life of a Law Student",
> (http://www.lifeofalawstudent.com/) in which a first year law student
> named Neil Wehneman makes a daily podcast of what he learned in law
> school that day. One of the ideas that he talks about (Intro to the Law
> #2) is the difference between a "Rule" and a "Standard":
>
> A 'rule' is a definitive test, intended to provide certainty. An example
> is the speed limit - you are either exceeding the speed limit, or you
> aren't.
>
> A 'standard', on the other hand (at least, in its legal definition) is a
> set of factors to be weighed by a judge when making a decision. Its
> purpose is to provide flexibility, allowing human judgement to stay in
> the loop, but at the same time giving a framework for making those
> judgements in a consistent way.
>
> An example of a standard is fair use under copyright law. When a judge
> decides whether something is fair use, they use a standard consisting of
> a number of factors, including the amount of the work copied, the
> commercial or non-commercial use of the work, and so on.
>
> Note that none of these factors are a simple "yes/no" decision -
> instead, a judgement must be made as to how much a particular case fits
> the standard. A use of a work can be completely commercial, completely
> noncommercial, or something inbetween. To the extent that it is
> noncommercial, that weighs in favor of it being declared fair use; To
> the extent that it is commercial, that weighs against.
>
> So what I would suggest, then, is the creation of a standard (in this
> legal sense) for what factors should be considered in deciding whether
> to include something in the stdlib.
>
> Moreover, the standard should be clearly labeled as such - to prevent
> people from interpreting the document as a set of hard rules that they
> can use to beat other people over the head with.
>
> So for example, it might say something like: "To the extent that the
> module has enjoyed widespread adoption and use within the Python
> community, this weighs in favor of inclusion." and so on.

At this point, I am dropping the PEP idea and I am going to make it a
general doc at python.org/dev/ when a take my intro doc (
http://www.python.org/dev/intro/) and break it out into individual docs for
bugs, patches, committing, and getting things into the stdlib or language.

So basically I am going with the Standards approach.  =)

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060616/8fddb173/attachment.htm 

From guido at python.org  Mon Jun 19 19:22:38 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 19 Jun 2006 10:22:38 -0700
Subject: [Python-3000] We should write a PEP on what goes into the stdlib
In-Reply-To: <bbaeab100606161001v3af8ec61v5db24a6e585a40eb@mail.gmail.com>
References: <20060613062830.55hi3ynpppdd8gc4@login.werra.lunarpages.com>
	<44923EFD.50804@acm.org>
	<bbaeab100606161001v3af8ec61v5db24a6e585a40eb@mail.gmail.com>
Message-ID: <ca471dc20606191022iee0302dkbf808f043e2b6c0d@mail.gmail.com>

I'm coming late to this, and am folding all my comments in a single
email. Short version: Brett, please go ahead! Here are some comments.

[Raymond]
> There isn't a consistent ruleset that explains
> clearly why decimal, elementtree, email, and textwrap were included
> while Cheetah, Twisted, numpy, and BeautifulSoup were not.

Oh yes there is.  Just look at the names alone.  Also release cycles.

> Overly general rules are likely to be rife with exceptions and amount to
> useless administrivia.  I don't think these contentious issues can be
> decided in advance.  The specifics of each case are more relevant than a
> laundry list of generalizations.

It still makes sense to have a list of guidelines.  (a) This tells
potential contributors how high the bar is set.  (b) This helps the
discussion if "fairness" is invoked (why did module X get accepted?).

> We don't need a PEP for every module.  If the python-dev discussion says
> we want it and Guido approves, then it is a done deal.

Agreed (this is the only part where I disagree with Brett).  This is
not to say that a PEP wouldn't be helpful in some cases; but it's not
a requirement.  A PEP is helpful when it is likely that the discussion
will be long or contentious.

[Michael Chermside]
> I agree. If we have a PEP with rules for acceptance, then every time we
> don't follow those rules exactly we will be accused of favoritism. If
> we have informal rules like today and decide things on a case-by-case
> basis, then everything is fine.

I don't think that not having rules avoids accusations of favoritism.
There are many rules and guidelines that are being applied quite
consistently when something is proposed for stdlib inclusion (Brett
didn't even enumerate all of them; for example an oft-cited rule is
that the contributor must commit to maintaining the code for several
years).  It only makes sense to write these up in one place.  Of
course we shouldn't create the expectation that anything that matches
the rules is automatically accepted (that would be insane).

> Rather than a formal PEP, how about a wiki page

Absolutely not!  Wikis have no official status.  Not only can anybody
edit them; there's no process in place to remove them when they are
outdated.

[Talin]
> So what I would suggest, then, is the creation of a standard (in this
> legal sense) for what factors should be considered in deciding whether
> to include something in the stdlib.
>
> Moreover, the standard should be clearly labeled as such - to prevent
> people from interpreting the document as a set of hard rules that they
> can use to beat other people over the head with.

Sounds like a good idea.  Not so different from what I said above
about automatic acceptance based on matching the rules.

[Brett]
> At this point, I am dropping the PEP idea and I am going to make it a
> general doc at python.org/dev/ when a take my intro doc
> (http://www.python.org/dev/intro/) and break it out into individual
> docs for bugs, patches, committing, and getting things into the stdlib
> or language.

I think this is fine; but I don't think it would be wrong to do it as
a PEP.  Having a PEP makes it a bit easier for the community to
participate in discussing its contents, so I think I would have
favored a PEP, but the important thing is that the standard we apply
is documented somewhere.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Jun 19 22:59:12 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 19 Jun 2006 13:59:12 -0700
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <4490E8B6.7010601@acm.org>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>
	<4490E8B6.7010601@acm.org>
Message-ID: <ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>

Hi Talin,

Here's how I see it.

The probability of this PEP being accepted doesn't really depend on
whether that particular proposed feature is present. Given all
possible proposed features, it's probably better to err on the side of
exclusion -- a PEP like this is more likely to be rejected due to
excessive baggage than due to lack of functionality, as long as it
covers all the functionality it's replacing. So, I'm with you: try to
get the PEP implemented and accepted before adding too many new
features. Once there's an accepted framework, it's easier to add
features.

(Perhaps there's one use for this particular proposed feature; since
it requires adding yet another parameter to certain formatting
functions, it would be a good test for the generality of the API.
Personally, I wonder if at some point we'll want to pass an arbitrary
argument list, and/or keyword args? That would be a more useful
feature to add to consider for the PEP than a specific decimal
alignment, since it is a feature in support of extensibility.)

--Guido

On 6/14/06, Talin <talin at acm.org> wrote:
> Rudy Rudolph wrote:
> > Center- and decimal-align nicely round out the left-align,
> > right-align, and pad-after-sign formatting already proposed,
> > and are easy to implement. I therefore ask that they be added
> > to the PEP. BTW, I very much like the proposal in the PEP.
>
> The basic idea of a decimal align option sounds good to me. I even
> started working up an implementation, but I haven't had the time to
> finish it -- I decided to go with the '^' character as an alignment symbol.
>
> Greg Ewing's point that this is effectively the same as "pad with
> spaces" is correct, however I don't think that's the way most people
> think of it - in other words, generally what people ask for is "line up
> all the decimal points".
>
> Here's my concern however: PEP 3101 is getting rather large, because of
> all of these little details that are ancilliary to the primary proposal
> of a 'format' method for string objects. I've already pushed back on
> Nick Coghlan's otherwise excellent suggestion of allowing the same set
> of conversion specifiers to be used as a second argument to str() for
> this reason.
>
> (I thought about breaking out the conversion specifiers into a separate
> PEP, but since they aren't meaningful by themselves it makes no sense to
> accept one PEP and reject another, and also because then I'd have 3 PEPs
> in the Python-3000 queue [including 3102], and right now 2 is as much as
> I want to deal with.)
>
> Because 3101 is targeted at Python-3000, and because 3000 is scheduled
> for release in the distant future, I have no sense as to what the
> timetable is for acceptance or adoption of this PEP; As far as I know,
> it could be a year or more before a decision is made, and the PEP might
> be rejected at the end of that time. So from my point of view, I am
> faced with the prospect of an ever-expanding PEP as people continue to
> think of new suggestions over the course of the next year, all of which
> may come to naught.
>
> My feeling is that a good PEP should contain a limited number of BDFL
> decisions - that is, it should be possible for Guido to go down the
> checklist and accept / reject / suggest changes to a small number of
> essential bullet points. I fear that PEP 3101 is going to turn into a
> kind of omnibus bill with all kinds of little amendments to deal with.
>
> For this reason, I'd like to put some sort of limit on lower-level
> details of the PEP, and let that detail be filled in via the normal
> feature request / patch submission process once the PEP has actually
> been accepted.
>
> I guess what I also need to do is find some place to post my prototype
> so that people can criticize it and submit patches to it. What would be
> ideal for my purposes would be if there was a "research" branch in the
> Python svn so that wild-eyed radicals such as myself could check in code
> that is still being discussed by the community and is not yet intended
> for inclusion in the main tree. This would also allow people who have
> suggestions to submit patches to the prototype, rather than having to
> ask me to do it for them.
>
> -- Talin
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Tue Jun 20 20:21:49 2006
From: talin at acm.org (Talin)
Date: Tue, 20 Jun 2006 11:21:49 -0700
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>	
	<4490E8B6.7010601@acm.org>
	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>
Message-ID: <44983CBD.3000703@acm.org>

Guido van Rossum wrote:
> Hi Talin,
> 
> Here's how I see it.
> 
> The probability of this PEP being accepted doesn't really depend on
> whether that particular proposed feature is present. Given all
> possible proposed features, it's probably better to err on the side of
> exclusion -- a PEP like this is more likely to be rejected due to
> excessive baggage than due to lack of functionality, as long as it
> covers all the functionality it's replacing. So, I'm with you: try to
> get the PEP implemented and accepted before adding too many new
> features. Once there's an accepted framework, it's easier to add
> features.
> 
> (Perhaps there's one use for this particular proposed feature; since
> it requires adding yet another parameter to certain formatting
> functions, it would be a good test for the generality of the API.
> Personally, I wonder if at some point we'll want to pass an arbitrary
> argument list, and/or keyword args? That would be a more useful
> feature to add to consider for the PEP than a specific decimal
> alignment, since it is a feature in support of extensibility.)

Well, one of the design goals for conversion specifiers is conciseness. 
I suspect you would get a lot of complaints if the conversion specifiers 
grew much longer than they currently are. So its a balancing act between 
compressability and readability.

There are two reasons for this: First, TOOWTDI. Anything you can do with 
a conversion specifier can be done by pre-processing the parameter that 
you pass into the format function. Allowing arbitrary conversion syntax 
would essentially mean creating a new language-within-a-language that 
would duplicate functionality that is better expressed by function calls.

Secondly, the conversion specifiers should not visually dominate or 
distract from the format string. That is, when reading the format 
string, you should be able to mentally skip over the conversion strings 
without too much trouble. This is much easier if they are short.

So in other words, I'm not trying to make the most general API possible, 
what I am doing instead is looking for various "low hanging fruit", that 
is useful features that can be expressed in one or two characters 
without sacrificing overall readability. Anything that requires more 
than that should be done by function calls.

While you are here, I'd like to ask a couple questions:

1) Do you have any reaction to Brett Cannon's idea that we add a second, 
optional argument to str() that accepts exactly the same conversion 
specifier syntax? Should I incorporate that into the PEP, or should that 
be a separate PEP?

2) What's your feeling (and this isn't just directed at you) about 
having a sandbox area in the svn repository that's open to general 
modification, kind of like the code version of a wiki? Or, to put it 
another way, what's the best place to put my code so that people have 
the ability to hack on it?

-- Talin

From guido at python.org  Tue Jun 20 20:39:44 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Jun 2006 11:39:44 -0700
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <44983CBD.3000703@acm.org>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>
	<4490E8B6.7010601@acm.org>
	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>
	<44983CBD.3000703@acm.org>
Message-ID: <ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>

On 6/20/06, Talin <talin at acm.org> wrote:
> While you are here, I'd like to ask a couple questions:
>
> 1) Do you have any reaction to Brett Cannon's idea that we add a second,
> optional argument to str() that accepts exactly the same conversion
> specifier syntax? Should I incorporate that into the PEP, or should that
> be a separate PEP?

Not so keen. This seems to be a completely different use of str(). If
we want that API it should be called something else. I don't see an
advantage of overloading str().

> 2) What's your feeling (and this isn't just directed at you) about
> having a sandbox area in the svn repository that's open to general
> modification, kind of like the code version of a wiki? Or, to put it
> another way, what's the best place to put my code so that people have
> the ability to hack on it?

The svn access controls make this impossible AFAIK (but I know very
little about them). I suggest you use one of the more distributed
alternatives, e.g. Mercurial (I keep hearing good things about it).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Thu Jun 22 04:07:59 2006
From: talin at acm.org (Talin)
Date: Wed, 21 Jun 2006 19:07:59 -0700
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>	
	<4490E8B6.7010601@acm.org>	
	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>	
	<44983CBD.3000703@acm.org>
	<ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>
Message-ID: <4499FB7F.9030604@acm.org>

Guido van Rossum wrote:
> On 6/20/06, Talin <talin at acm.org> wrote:
> 
>> While you are here, I'd like to ask a couple questions:
>>
>> 1) Do you have any reaction to Brett Cannon's idea that we add a second,
>> optional argument to str() that accepts exactly the same conversion
>> specifier syntax? Should I incorporate that into the PEP, or should that
>> be a separate PEP?
> 
> 
> Not so keen. This seems to be a completely different use of str(). If
> we want that API it should be called something else. I don't see an
> advantage of overloading str().

Before we dismiss that too quickly, let me do a better job of explaining 
the general idea.

The motivation for this is converting an arbitrary value to string form 
- which is exactly what str() does. Only in this case, we want to be 
able to have some control over the formatting of that string.

Converting single values to strings using operator % looks something 
like this:

    s = "%2.2g" % f

With str.format(), the single-conversion case gets a bit more wordy:

    s = "{0:2.2g}".format( f )

Instead of all that, why not allow the conversion to be passed to the 
str() constructor directly:

    s = str( f, "2.2g" )

It doesn't actually have to be called "str", you could say, for example:

    s = str.convert( f, "2.2g" )

However, the str() form is more concise and more readable than any of 
the alternatives presented here. I think it's pretty clear what is 
intended (especially since C# has a similar syntax for its "ToString()" 
method.)

In any case, I think there's a pretty good argument that one ought to be 
able to convert single values without having to embed them as fields 
within a string template.

Now, my personal motive for this is that it allows me to cut my PEP in 
half - because the logic of the conversion specifiers can be isolated 
and used directly, without having to go through format().

-- Talin

From greg.ewing at canterbury.ac.nz  Thu Jun 22 09:42:08 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 22 Jun 2006 19:42:08 +1200
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <4499FB7F.9030604@acm.org>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>
	<4490E8B6.7010601@acm.org>
	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>
	<44983CBD.3000703@acm.org>
	<ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>
	<4499FB7F.9030604@acm.org>
Message-ID: <449A49D0.8020108@canterbury.ac.nz>

Talin wrote:

>     s = str.convert( f, "2.2g" )

If format is a string method, then you will already be
able to do

   s = str.format("2.2g", f)

if you want.

--
Greg

From ncoghlan at gmail.com  Thu Jun 22 11:49:34 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 22 Jun 2006 19:49:34 +1000
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <449A49D0.8020108@canterbury.ac.nz>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>	<4490E8B6.7010601@acm.org>	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>	<44983CBD.3000703@acm.org>	<ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>	<4499FB7F.9030604@acm.org>
	<449A49D0.8020108@canterbury.ac.nz>
Message-ID: <449A67AE.3090002@gmail.com>

Greg Ewing wrote:
> Talin wrote:
> 
>>     s = str.convert( f, "2.2g" )
> 
> If format is a string method, then you will already be
> able to do
> 
>    s = str.format("2.2g", f)
> 
> if you want.

Nope. Given the current PEP, it'd have to be one of the following:

   s = "{0:2.2g}".format(f)
   s = str.format("{0:2.2g}", f)

However, I realised that there's an approach that is aesthetically pleasing 
and doesn't require using str() for this - simply consider the leading '{0:' 
and trailing '}' to be implicit if there are no braces at all in the supplied 
format string.

Then you could do things like:

 >>> "b".format(10)
1010
 >>> "o".format(10)
12
 >>> "x".format(10)
a
 >>> "X".format(10)
A
 >>> "2.2g".format(10)
10.00

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From murman at gmail.com  Thu Jun 22 15:43:50 2006
From: murman at gmail.com (Michael Urman)
Date: Thu, 22 Jun 2006 08:43:50 -0500
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <449A67AE.3090002@gmail.com>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>
	<4490E8B6.7010601@acm.org>
	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>
	<44983CBD.3000703@acm.org>
	<ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>
	<4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz>
	<449A67AE.3090002@gmail.com>
Message-ID: <dcbbbb410606220643u615a5d18pbd1993b35b67e6bf@mail.gmail.com>

On 6/22/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> However, I realised that there's an approach that is aesthetically pleasing
> and doesn't require using str() for this - simply consider the leading '{0:'
> and trailing '}' to be implicit if there are no braces at all in the supplied
> format string.
>
> Then you could do things like:
[examples with missing quotes omitted]

And
>>> "The implicit braces scare me, for I am weak".format(10)
'ValueError'

(Assuming lenient mode, and that str.format raises ValueError for such a case)

Michael
-- 
Michael Urman  http://www.tortall.net/mu/blog

From guido at python.org  Thu Jun 22 18:54:51 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 22 Jun 2006 09:54:51 -0700
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <449A67AE.3090002@gmail.com>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>
	<4490E8B6.7010601@acm.org>
	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>
	<44983CBD.3000703@acm.org>
	<ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>
	<4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz>
	<449A67AE.3090002@gmail.com>
Message-ID: <ca471dc20606220954n13da682en61d3b2e40d477abf@mail.gmail.com>

On 6/22/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> However, I realised that there's an approach that is aesthetically pleasing
> and doesn't require using str() for this - simply consider the leading '{0:'
> and trailing '}' to be implicit if there are no braces at all in the supplied
> format string.

-1. Implicit is better than explicit. It would encourage Python to
guess when there are no braces but there is a format argument, instead
of throwing an exception.

To Talin: I'm all for a way to say blah(x, "2.2g") instead of the more
verbose "{2.2g}".format(x). In fact it would probably be great if the
latter was officially defined as a way to spell the former combined
with literal text:

  "foo{2.2g}bar{3.3f}spam".format(x, y)

is shorter and mor readable than

  "foo" + blah(x, "2.2g") + "bar" + blah(y, "3.3f") + "spam"

What I object to is only the spelling of blah(x, f) as str(x, f).
Perhaps a static string method; but probably better some other
built-in or something in a new stdlib module.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Fri Jun 23 08:14:01 2006
From: talin at acm.org (Talin)
Date: Thu, 22 Jun 2006 23:14:01 -0700
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <ca471dc20606220954n13da682en61d3b2e40d477abf@mail.gmail.com>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>	<4490E8B6.7010601@acm.org>	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>	<44983CBD.3000703@acm.org>	<ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>	<4499FB7F.9030604@acm.org>
	<449A49D0.8020108@canterbury.ac.nz>	<449A67AE.3090002@gmail.com>
	<ca471dc20606220954n13da682en61d3b2e40d477abf@mail.gmail.com>
Message-ID: <449B86A9.8010009@acm.org>

Guido van Rossum wrote:
> To Talin: I'm all for a way to say blah(x, "2.2g") instead of the more
> verbose "{2.2g}".format(x). In fact it would probably be great if the
> latter was officially defined as a way to spell the former combined
> with literal text:
> 
>   "foo{2.2g}bar{3.3f}spam".format(x, y)
> 
> is shorter and mor readable than
> 
>   "foo" + blah(x, "2.2g") + "bar" + blah(y, "3.3f") + "spam"
> 
> What I object to is only the spelling of blah(x, f) as str(x, f).
> Perhaps a static string method; but probably better some other
> built-in or something in a new stdlib module.

OK, how about this:

    y.tostr("3.3f")

Essentially I'm proposing adding an overridable method named 'tostr' (or 
some better name if you can think of one) to class 'object'.

Advantages over a builtin:

   -- Doesn't add another global name
   -- Easily overridable by subclasses (gets rid of the need for a 
__format__ call in my PEP.)
   -- If we make the conversion argument optional, it could eventually 
replace the magic __str__ method.

-- Talin

From guido at python.org  Fri Jun 23 19:27:21 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 23 Jun 2006 10:27:21 -0700
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <449B86A9.8010009@acm.org>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>
	<4490E8B6.7010601@acm.org>
	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>
	<44983CBD.3000703@acm.org>
	<ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>
	<4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz>
	<449A67AE.3090002@gmail.com>
	<ca471dc20606220954n13da682en61d3b2e40d477abf@mail.gmail.com>
	<449B86A9.8010009@acm.org>
Message-ID: <ca471dc20606231027g5c1c1445p4bf5c17e571ed3e3@mail.gmail.com>

On 6/22/06, Talin <talin at acm.org> wrote:
> Guido van Rossum wrote:
> > To Talin: I'm all for a way to say blah(x, "2.2g") instead of the more
> > verbose "{2.2g}".format(x). In fact it would probably be great if the
> > latter was officially defined as a way to spell the former combined
> > with literal text:
> >
> >   "foo{2.2g}bar{3.3f}spam".format(x, y)
> >
> > is shorter and mor readable than
> >
> >   "foo" + blah(x, "2.2g") + "bar" + blah(y, "3.3f") + "spam"
> >
> > What I object to is only the spelling of blah(x, f) as str(x, f).
> > Perhaps a static string method; but probably better some other
> > built-in or something in a new stdlib module.
>
> OK, how about this:
>
>     y.tostr("3.3f")
>
> Essentially I'm proposing adding an overridable method named 'tostr' (or
> some better name if you can think of one) to class 'object'.
>
> Advantages over a builtin:
>
>    -- Doesn't add another global name
>    -- Easily overridable by subclasses (gets rid of the need for a
> __format__ call in my PEP.)
>    -- If we make the conversion argument optional, it could eventually
> replace the magic __str__ method.

I'm not sure that every object should have this method.

Please consider making it just a method in a stdlib module.

Perhaps it could use overloaded functions.

IMO the PEP would do best not to add new builtins or object methods.
(A __format__ method is OK since it just mimics the standard idiom for
providing overridable type-specific operations; but perhaps
overloadable functions are a better alternative.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ncoghlan at gmail.com  Sat Jun 24 04:47:40 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 24 Jun 2006 12:47:40 +1000
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <ca471dc20606231027g5c1c1445p4bf5c17e571ed3e3@mail.gmail.com>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>	<4490E8B6.7010601@acm.org>	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>	<44983CBD.3000703@acm.org>	<ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>	<4499FB7F.9030604@acm.org>
	<449A49D0.8020108@canterbury.ac.nz>	<449A67AE.3090002@gmail.com>	<ca471dc20606220954n13da682en61d3b2e40d477abf@mail.gmail.com>	<449B86A9.8010009@acm.org>
	<ca471dc20606231027g5c1c1445p4bf5c17e571ed3e3@mail.gmail.com>
Message-ID: <449CA7CC.6040506@gmail.com>

Guido van Rossum wrote:
 >  "foo{2.2g}bar{3.3f}spam".format(x, y)

Getting a format string like that to work would be tricky. With the current 
PEP, it would need to be:

   "foo{0:2.2g}bar{1:3.3f}spam".format(x, y)

It should be possible to simplify that without ambiguity to:

   "foo{:2.2g}bar{:3.3f}spam".format(x, y)

by having an internal counter in the format function that kept track of how 
many fields had been encountered that didn't refer to a specific position or 
name. That is, either the braces were empty ('{}'), or there was nothing 
before the conversion specifier ('{:<spec>}').

To get shorter than that, however, you'd be getting into territory where the 
interpreter is trying to guess the programmer's intent (e.g. is '{0f}' 
intentionally just a conversion specifier, or is it a typo for '{0:f}'?). So I 
think going that far falls foul of EIBTI, the same way my idea of an implicit 
"{0:" and "}" did.

I like the internal counter concept though - it means that purely positional 
stuff can be written without any additional mental overhead, and with the 
braces being the only additional typing when compared to the status quo.

That way, if you didn't have any field formatting you wanted to do, you could 
just write:

   "{} picks up the {}. It is {}!".format(person, thing, adjective)

> I'm not sure that every object should have this method.
> 
> Please consider making it just a method in a stdlib module.
> 
> Perhaps it could use overloaded functions.
> 
> IMO the PEP would do best not to add new builtins or object methods.
> (A __format__ method is OK since it just mimics the standard idiom for
> providing overridable type-specific operations; but perhaps
> overloadable functions are a better alternative.)

Since the PEP calls "2.2g" and friends conversion specifiers, how about we use 
an overloaded function "string.convert"?

   # In string.py
   @overloaded
   def convert(obj, spec):
       """Converts an object to a string using a conversion specifier"""
       # Default handling is to convert as per PEP 3101
       #   (AKA the "format_builtin_type" function in Talin's prototype)

Objects with alternate conversion specifiers (like datetime objects) would 
simply overload the function:

   # In datetime.py
   @atimport("string")
   def _string_overloads(module):
       """Register function overloads in string module"""
       overload(module.convert, time)(time.strftime)
       overload(module.convert, date)(date.strftime)
       overload(module.convert, datetime)(datetime.strftime)

The "cformat" function from Talin's prototype could then be named 
"string.format", with the signature:

   # In string.py
   def format(fmt, positional=(), named=None, field_hook=None):
       """Create a formatted string from positional and named values"""
       # Format as per PEP 3101
       #   (AKA the "cformat" function in Talin's prototype)

Finally, the format method of str objects would use the above:

   # Method of str objects
   def format(self, *args, **kwds):
       from string import format
       return format(self, args, kwds)

So if you had an existing tuple and/or dictionary, you could do "from string 
import format" and use the function directly in order to save creation of an 
unnecessary copy of the containers.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From talin at acm.org  Sat Jun 24 08:17:05 2006
From: talin at acm.org (Talin)
Date: Fri, 23 Jun 2006 23:17:05 -0700
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>	
	<4490E8B6.7010601@acm.org>	
	<ca471dc20606191359x49abced6ka8251aa0c21e8b2d@mail.gmail.com>	
	<44983CBD.3000703@acm.org>
	<ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>
Message-ID: <449CD8E1.7030306@acm.org>

Guido van Rossum wrote:
> On 6/20/06, Talin <talin at acm.org> wrote:
> The svn access controls make this impossible AFAIK (but I know very
> little about them). I suggest you use one of the more distributed
> alternatives, e.g. Mercurial (I keep hearing good things about it).

All right, I spent some time playing around with Mercurial and so far I 
am pretty impressed. Particularly with the fact that it can be used as a 
.cgi script. Within an hour after downloading the Mercurial source I was 
able to:

   -- Compile and install the package on my laptop
   -- Create an initial repository
   -- Check in my prototype
   -- Compile and install Mercurial on the web server machine (I have an 
account at bluehost.com)
   -- Propagate the changes from my laptop to the server
   -- set up Mercurial to function as a .cgi script
   -- write a .htaccess file to tell Apache to use it

You can see the result here:

    http://www.viridia.org/hg/python/string_format

(My god, there's even an RSS feed. Sheesh!)

I'd invite all interested parties to take a look at the code. Some of 
it's pretty experimental and I am sure that there are better ways to do 
it. But right not its primarily intended as a proof of concept.

-- Talin

From tomerfiliba at gmail.com  Sat Jun 24 16:25:28 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Sat, 24 Jun 2006 16:25:28 +0200
Subject: [Python-3000] sock2 v0.6
Message-ID: <1d85506f0606240725g4702c7bfw37021a1297c197e2@mail.gmail.com>

i updated the sock2 package. this release:
* added all the socket options that are defined in socketmodule.c
* redesigned the DNS module
* updated the design docs on the site

http://sebulba.wikispaces.com/project+sock2

please download and mess with it a little, and send back your
comments. thanks.

-tomer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060624/a62d59c1/attachment.html 

From murman at gmail.com  Sat Jun 24 17:55:09 2006
From: murman at gmail.com (Michael Urman)
Date: Sat, 24 Jun 2006 10:55:09 -0500
Subject: [Python-3000] PEP 3101 update
In-Reply-To: <449CA7CC.6040506@gmail.com>
References: <20060614194514.3F76B8B354@xprdmxin.myway.com>
	<44983CBD.3000703@acm.org>
	<ca471dc20606201139g5fef5dd0w5dbfa67cf2e3ce93@mail.gmail.com>
	<4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz>
	<449A67AE.3090002@gmail.com>
	<ca471dc20606220954n13da682en61d3b2e40d477abf@mail.gmail.com>
	<449B86A9.8010009@acm.org>
	<ca471dc20606231027g5c1c1445p4bf5c17e571ed3e3@mail.gmail.com>
	<449CA7CC.6040506@gmail.com>
Message-ID: <dcbbbb410606240855j624faff8q5f41b8d710002b5c@mail.gmail.com>

On 6/23/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> I like the internal counter concept though - it means that purely positional
> stuff can be written without any additional mental overhead, and with the
> braces being the only additional typing when compared to the status quo.
>
> That way, if you didn't have any field formatting you wanted to do, you could
> just write:
>
>    "{} picks up the {}. It is {}!".format(person, thing, adjective)

I don't like this, as it makes it easy to fall into a localization
trap which the original programmer may have no reason to predict.
While it would be possible to add indices to the translated format
string (unlike the usual C/C++ equivalent), it would make things much
more confusing, possibly tempting constructs like "{1} ... {0} ...
{}". I doubt most translators would be intimately familiar with
Python's format specification rules.

I would much prefer the consistent explicit counter (or lookup-key) in
the format specifier.

Michael
-- 
Michael Urman  http://www.tortall.net/mu/blog