From ncoghlan at gmail.com  Wed Apr  1 00:03:05 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 01 Apr 2009 08:03:05 +1000
Subject: [Python-Dev] And the winner is...
In-Reply-To: <87y6ulvdb4.fsf@xemacs.org>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com>	<49D20FB3.9050400@gmail.com>
	<87y6ulvdb4.fsf@xemacs.org>
Message-ID: <49D29319.2010902@gmail.com>

Stephen J. Turnbull wrote:
> Nick Coghlan writes:
> 
>  > Every single git command line example I have seen gives me exactly the
>  > same gut reaction I get whenever I have to read Perl code.
> 
> Every single one?  Sounds to me like the cause is probably something
> you ate, not anything you read.  In the examples in the PEP, about 80%
> of the commands were syntactically identical across VCSes.

What, hyperbole on the internets? ;)

The non-trivial examples are the ones I was talking about - as you say,
for trivial tasks, the only difference is typically going to be in the
exact name of the command.

> I hope nobody is put off either git or bzr by the result of this PEP.
> If there's anything striking about the PEP's examples, it's how
> similar the usage of the VCSes would be in the context of Python's
> workflow.  There are important differences, and I agree with Guido's
> choice, for Python, on March 30, 2009.  But all three are capable
> VCSes, with advantages and disadvantages, and were this PEP started
> next June rather than last December, the result could have been very
> different.

Indeed! (although I doubt git's CLI will ever evolve into anything I
could claim to love)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From eduardo.padoan at gmail.com  Wed Apr  1 00:20:49 2009
From: eduardo.padoan at gmail.com (Eduardo O. Padoan)
Date: Tue, 31 Mar 2009 19:20:49 -0300
Subject: [Python-Dev] And the winner is...
In-Reply-To: <3c6c07c20903311104i6b50a9eeg3362ade5cf981c5c@mail.gmail.com>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com>
	<874oxaw95q.fsf@xemacs.org>
	<3c6c07c20903311104i6b50a9eeg3362ade5cf981c5c@mail.gmail.com>
Message-ID: <dea92f560903311520n6495ec27w4b5c701a1c2fabcb@mail.gmail.com>

On Tue, Mar 31, 2009 at 3:04 PM, Mike Coleman <tutufan at gmail.com> wrote:
> It looks like there might be a Python clone sprouting here:
>
> ? ?http://gitorious.org/projects/git-python/

AFAIK, git-python is just a lib to manipulate git repos from python,
not a git clone. Dulwich is more like it:

http://samba.org/~jelmer/dulwich/

-- 
    Eduardo de Oliveira Padoan
http://importskynet.blogspot.com
http://djangopeople.net/edcrypt/

"Distrust those in whom the desire to punish is strong."
   -- Goethe, Nietzsche, Dostoevsky

From martin at v.loewis.de  Wed Apr  1 00:44:29 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 31 Mar 2009 17:44:29 -0500
Subject: [Python-Dev] Test failures under Windows?
In-Reply-To: <m21vsdxuye.fsf@valheru.db3l.homeip.net>
References: <loom.20090324T134451-902@post.gmane.org>	<m2zlfa241x.fsf@valheru.db3l.homeip.net>	<930F189C8A437347B80DF2C156F7EC7F05068D5486@exchis.ccp.ad.local>	<m2vdpy21u0.fsf@valheru.db3l.homeip.net>	<m2ljqu1pzh.fsf@valheru.db3l.homeip.net>	<49C9EEB5.2090804@gmail.com>	<930F189C8A437347B80DF2C156F7EC7F056D526790@exchis.ccp.ad.local>	<d2155e360903250553u2e134d61rc005e8817ad82a60@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F056D5272C8@exchis.ccp.ad.local>
	<m21vsdxuye.fsf@valheru.db3l.homeip.net>
Message-ID: <49D29CCD.9000701@v.loewis.de>

> I guess I'll stop asking after this note, but can anyone give a final
> verdict on whether the older "-n" option can be restored to the
> buildbot test.bat (from the revision history I'm not actually sure it
> was intentionally removed in the first place)? 

I have now restored it; it was removed by an unintentional merge
from the trunk.

Notice, however, that the feature was never present in the trunk.

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Wed Apr  1 00:47:23 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 01 Apr 2009 10:47:23 +1200
Subject: [Python-Dev] Broken import?
In-Reply-To: <49D20B54.1010108@gmail.com>
References: <ca471dc20903301544l3749c80i3d166fbb8780f502@mail.gmail.com>
	<ca471dc20903301617ydbb4bc9pfb221c2cba8df7e4@mail.gmail.com>
	<ca471dc20903301652w63dac59dn3275620c3b220fd3@mail.gmail.com>
	<gqrqv0$8v2$1@ger.gmane.org> <gqrs4i$b1l$1@ger.gmane.org>
	<49D20B54.1010108@gmail.com>
Message-ID: <49D29D7B.7000002@canterbury.ac.nz>

Nick Coghlan wrote:

> Jim Fulton's example in that tracker issue shows that with a bit of
> creativity you can provoke this behaviour *without* using a from-style
> import. Torsten Bronger later brought up the same issue that Fredrik did
> - it prevents some kinds of explicit relative import that look like they
> should be fine.

I haven't been following this very closely, but if there's
something that's making absolute and relative imports
behave differently, I think it should be fixed. The only
difference between an absolute and relative import of the
same module should be the way you specify the module.

-- 
Greg

From kristjan at ccpgames.com  Wed Apr  1 01:28:38 2009
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Tue, 31 Mar 2009 23:28:38 +0000
Subject: [Python-Dev] Test failures under Windows?
In-Reply-To: <4222a8490903311459m68e7b9f8m8cfcf27aa71b92ac@mail.gmail.com>
References: <loom.20090324T134451-902@post.gmane.org>
	<930F189C8A437347B80DF2C156F7EC7F05068D5486@exchis.ccp.ad.local>
	<m2vdpy21u0.fsf@valheru.db3l.homeip.net>
	<m2ljqu1pzh.fsf@valheru.db3l.homeip.net> <49C9EEB5.2090804@gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F056D526790@exchis.ccp.ad.local>
	<d2155e360903250553u2e134d61rc005e8817ad82a60@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F056D5272C8@exchis.ccp.ad.local>
	<4222a8490903311431u351b8d7kfb1e6cca716b1976@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F056D527435@exchis.ccp.ad.local>
	<4222a8490903311459m68e7b9f8m8cfcf27aa71b92ac@mail.gmail.com>
Message-ID: <930F189C8A437347B80DF2C156F7EC7F056D527439@exchis.ccp.ad.local>

Revision 70843.
I don't know when this crept in.  I didn't go and check if it applies to other branches too.

Also, I'm sorry for just checking this in witout warning.  But I had just spent what amounts to a full day tracking this down which was tricky because it happens in a subprocess and those are hard to debug on windows.  My eagerness got the best of me.

But again, it shows how useful assertions can be and why we ought not to disable them.

Cheers, 
Kristj?n
-----Original Message-----
From: Jesse Noller [mailto:jnoller at gmail.com] 
Sent: 31. mars 2009 22:00
To: Kristj?n Valur J?nsson
Cc: Curt Hagenlocher; mhammond at skippinet.com.au; David Bolen; python-dev at python.org
Subject: Re: [Python-Dev] Test failures under Windows?

Does it need to be backported? I wonder when that was introduced.
Also, what CL was it so I can review it?

2009/3/31 Kristj?n Valur J?nsson <kristjan at ccpgames.com>:
> I found a different problem in multiprocessing, for the py3k.
> In import.c, get_file.c, it was knowingly leaking FILE objects, while the underlying fh was being correctly closed. ?This caused the CRT to assert when cleaning up FILE pointers on subprocess exit.
> I fixed this this afternoon in a submission to the py3k branch.
>
> K

From greg.ewing at canterbury.ac.nz  Wed Apr  1 01:50:40 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 01 Apr 2009 11:50:40 +1200
Subject: [Python-Dev] And the winner is...
In-Reply-To: <3c6c07c20903311104i6b50a9eeg3362ade5cf981c5c@mail.gmail.com>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com>
	<874oxaw95q.fsf@xemacs.org>
	<3c6c07c20903311104i6b50a9eeg3362ade5cf981c5c@mail.gmail.com>
Message-ID: <49D2AC50.5040107@canterbury.ac.nz>

Mike Coleman wrote:

> I mentioned this once on the git list and Linus' response was
> something like "C lets me see exactly what's going on".  I'm not
> unsympathetic to this point of view--I'm really growing to loathe C++
> partly because it *doesn't* let me see exactly what's going on--but
> I'm not convinced, either.

I think Python lets you see exactly what's going on
too, at the level of abstraction you're working with.

The problem with C++ is that it indiscriminately mixes
up wildly different levels of abstraction, so that it's
hard to look at a piece of code and decide whether it's
doing something high-level or low-level.

Python takes a uniformly high-level view of everything,
which is fine for the vast majority of application
programming, I think -- VCSes included.

-- 
Greg

From tseaver at palladion.com  Wed Apr  1 02:42:41 2009
From: tseaver at palladion.com (Tres Seaver)
Date: Tue, 31 Mar 2009 20:42:41 -0400
Subject: [Python-Dev] And the winner is...
In-Reply-To: <874oxaw95q.fsf@xemacs.org>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com>
	<874oxaw95q.fsf@xemacs.org>
Message-ID: <gquda2$v73$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Stephen J. Turnbull wrote:

> I also just wrote a long post about the comparison of bzr to hg
> responding to a comment on bazaar at canonical.com.  I won't recap it
> here but it might be of interest.

Thank you very much for your writeups on that thread:  both in tone and
in content I found them extremely helpful.

Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ0riB+gerLs4ltQ4RAir2AJ4rXedI4gfkaZxP5LRiOSonAI/csQCgqkpb
CY6QHmE8VHpGYGaENeUMnXQ=
=t/1R
-----END PGP SIGNATURE-----

From db3l.net at gmail.com  Wed Apr  1 02:50:44 2009
From: db3l.net at gmail.com (David Bolen)
Date: Tue, 31 Mar 2009 20:50:44 -0400
Subject: [Python-Dev] Test failures under Windows?
References: <loom.20090324T134451-902@post.gmane.org>
	<m2zlfa241x.fsf@valheru.db3l.homeip.net>
	<930F189C8A437347B80DF2C156F7EC7F05068D5486@exchis.ccp.ad.local>
	<m2vdpy21u0.fsf@valheru.db3l.homeip.net>
	<m2ljqu1pzh.fsf@valheru.db3l.homeip.net>
	<49C9EEB5.2090804@gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F056D526790@exchis.ccp.ad.local>
	<d2155e360903250553u2e134d61rc005e8817ad82a60@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F056D5272C8@exchis.ccp.ad.local>
	<m21vsdxuye.fsf@valheru.db3l.homeip.net>
	<49D29CCD.9000701@v.loewis.de>
Message-ID: <m2r60dw61n.fsf@valheru.db3l.homeip.net>

"Martin v. L?wis" <martin at v.loewis.de> writes:

> Notice, however, that the feature was never present in the trunk.

Yep - would be nice if it were to get backported to trunk at some
point but that's a separate discussion ... presumably at some point
py3k will be the trunk anyway, and for better or worst (perhaps due to
the sorts of changes made) the assertions seem to have hit the py3k
branch more than others.

Thanks for the test.bat change.

-- David

From db3l.net at gmail.com  Wed Apr  1 02:51:57 2009
From: db3l.net at gmail.com (David Bolen)
Date: Tue, 31 Mar 2009 20:51:57 -0400
Subject: [Python-Dev] Test failures under Windows?
References: <loom.20090324T134451-902@post.gmane.org>
	<930F189C8A437347B80DF2C156F7EC7F05068D5486@exchis.ccp.ad.local>
	<m2vdpy21u0.fsf@valheru.db3l.homeip.net>
	<m2ljqu1pzh.fsf@valheru.db3l.homeip.net>
	<49C9EEB5.2090804@gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F056D526790@exchis.ccp.ad.local>
	<d2155e360903250553u2e134d61rc005e8817ad82a60@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F056D5272C8@exchis.ccp.ad.local>
	<4222a8490903311431u351b8d7kfb1e6cca716b1976@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F056D527435@exchis.ccp.ad.local>
	<4222a8490903311459m68e7b9f8m8cfcf27aa71b92ac@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F056D527439@exchis.ccp.ad.local>
Message-ID: <m2myb1w5zm.fsf@valheru.db3l.homeip.net>

Kristj?n Valur J?nsson <kristjan at ccpgames.com> writes:

> But again, it shows how useful assertions can be and why we ought
> not to disable them.

Note that just to be clear, I'm certainly not advocating the disabling
of CRT assertions - just the redirection of them so they don't prevent
unattended test runs from completing.

-- David

From aleaxit at gmail.com  Wed Apr  1 03:24:56 2009
From: aleaxit at gmail.com (Alex Martelli)
Date: Tue, 31 Mar 2009 18:24:56 -0700
Subject: [Python-Dev] And the winner is...
In-Reply-To: <gquda2$v73$1@ger.gmane.org>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com>
	<874oxaw95q.fsf@xemacs.org> <gquda2$v73$1@ger.gmane.org>
Message-ID: <e8a0972d0903311824lb692f3fhc5352f6b38ade4b8@mail.gmail.com>

On Tue, Mar 31, 2009 at 5:42 PM, Tres Seaver <tseaver at palladion.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Stephen J. Turnbull wrote:
>
> > I also just wrote a long post about the comparison of bzr to hg
> > responding to a comment on bazaar at canonical.com.  I won't recap it
> > here but it might be of interest.
>
> Thank you very much for your writeups on that thread:  both in tone and
> in content I found them extremely helpful.

I'd like to read that thread for my edification -- might there be a URL for
it perhaps...?

Thanks,

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090331/2809e30d/attachment.htm>

From aleaxit at gmail.com  Wed Apr  1 03:26:42 2009
From: aleaxit at gmail.com (Alex Martelli)
Date: Tue, 31 Mar 2009 18:26:42 -0700
Subject: [Python-Dev] And the winner is...
In-Reply-To: <gquda2$v73$1@ger.gmane.org>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com>
	<874oxaw95q.fsf@xemacs.org> <gquda2$v73$1@ger.gmane.org>
Message-ID: <e8a0972d0903311826p778c1681g5e021a299ce1232c@mail.gmail.com>

On Tue, Mar 31, 2009 at 5:42 PM, Tres Seaver <tseaver at palladion.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Stephen J. Turnbull wrote:
>
> > I also just wrote a long post about the comparison of bzr to hg
> > responding to a comment on bazaar at canonical.com.  I won't recap it
> > here but it might be of interest.
>
> Thank you very much for your writeups on that thread:  both in tone and
> in content I found them extremely helpful.

I'd like to read that thread for my edification -- might there be a URL for
it perhaps...?

Thanks,

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090331/e131fe2f/attachment.htm>

From alexandre at peadrop.com  Wed Apr  1 03:33:42 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Tue, 31 Mar 2009 21:33:42 -0400
Subject: [Python-Dev] And the winner is...
In-Reply-To: <e8a0972d0903311824lb692f3fhc5352f6b38ade4b8@mail.gmail.com>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> 
	<874oxaw95q.fsf@xemacs.org> <gquda2$v73$1@ger.gmane.org>
	<e8a0972d0903311824lb692f3fhc5352f6b38ade4b8@mail.gmail.com>
Message-ID: <acd65fa20903311833i2ed64442s8cfeec88ba4419f6@mail.gmail.com>

2009/3/31 Alex Martelli <aleaxit at gmail.com>:
> On Tue, Mar 31, 2009 at 5:42 PM, Tres Seaver <tseaver at palladion.com> wrote:
>> Stephen J. Turnbull wrote:
>>
>> > I also just wrote a long post about the comparison of bzr to hg
>> > responding to a comment on bazaar at canonical.com. ?I won't recap it
>> > here but it might be of interest.
>>
>> Thank you very much for your writeups on that thread: ?both in tone and
>> in content I found them extremely helpful.
>
> I'd like to read that thread for my edification -- might there be a URL for
> it perhaps...?
>

https://lists.ubuntu.com/archives/bazaar/2009q1/055850.html
https://lists.ubuntu.com/archives/bazaar/2009q1/055872.html

-- Alexandre

From aleaxit at gmail.com  Wed Apr  1 03:39:27 2009
From: aleaxit at gmail.com (Alex Martelli)
Date: Tue, 31 Mar 2009 18:39:27 -0700
Subject: [Python-Dev] And the winner is...
In-Reply-To: <acd65fa20903311833i2ed64442s8cfeec88ba4419f6@mail.gmail.com>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com>
	<874oxaw95q.fsf@xemacs.org> <gquda2$v73$1@ger.gmane.org>
	<e8a0972d0903311824lb692f3fhc5352f6b38ade4b8@mail.gmail.com>
	<acd65fa20903311833i2ed64442s8cfeec88ba4419f6@mail.gmail.com>
Message-ID: <e8a0972d0903311839m1fc8970ey1236a3c6aa69e0ca@mail.gmail.com>

On Tue, Mar 31, 2009 at 6:33 PM, Alexandre Vassalotti <alexandre at peadrop.com
> wrote:
   ...

> html <https://lists.ubuntu.com/archives/bazaar/2009q1/055850.html>
> https://lists.ubuntu.com/archives/bazaar/2009q1/055872.html
>

Perfect, thanks!

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090331/c024adf4/attachment.htm>

From aleaxit at gmail.com  Wed Apr  1 03:41:20 2009
From: aleaxit at gmail.com (Alex Martelli)
Date: Tue, 31 Mar 2009 18:41:20 -0700
Subject: [Python-Dev] And the winner is...
In-Reply-To: <acd65fa20903311833i2ed64442s8cfeec88ba4419f6@mail.gmail.com>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com>
	<874oxaw95q.fsf@xemacs.org> <gquda2$v73$1@ger.gmane.org>
	<e8a0972d0903311824lb692f3fhc5352f6b38ade4b8@mail.gmail.com>
	<acd65fa20903311833i2ed64442s8cfeec88ba4419f6@mail.gmail.com>
Message-ID: <e8a0972d0903311841y82e90f0uf983a20d8032db6b@mail.gmail.com>

On Tue, Mar 31, 2009 at 6:33 PM, Alexandre Vassalotti <alexandre at peadrop.com
> wrote:
   ...

> html <https://lists.ubuntu.com/archives/bazaar/2009q1/055850.html>
> https://lists.ubuntu.com/archives/bazaar/2009q1/055872.html
>

Perfect, thanks!

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090331/be29d4f5/attachment.htm>

From fijall at gmail.com  Wed Apr  1 05:15:29 2009
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Wed, 1 Apr 2009 05:15:29 +0200
Subject: [Python-Dev] issue5578 - explanation
Message-ID: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>

So. The issue was closed and I suppose it was closed by not entirely
understanding
the problem (or I didn't get it completely).

The question is - what the following code should do?

def f():
  a = 2
  class C:
    exec 'a = 42'
    abc = a
  return C

print f().abc

(quick answer - on python2.5 it return 42, on python 2.6 and up it
returns 2, the patch changes
it to syntax error).

I would say that returning 2 is the less obvious thing to do. The
reason why IMO this should
be a syntax error is this code:

def f():
  a = 2
  def g():
    exec 'a = 42'
    abc = a

which throws syntax error.

Cheers,
fijal

From guido at python.org  Wed Apr  1 05:25:01 2009
From: guido at python.org (Guido van Rossum)
Date: Tue, 31 Mar 2009 20:25:01 -0700
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
Message-ID: <ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>

Well hold on for a minute, I remember we used to have an exec
statement in a class body in the standard library, to define some file
methods in socket.py IIRC.  It's a totally different case than exec in
a nested function, and I don't believe it should be turned into a
syntax error at all. An exec in a class body is probably meant to
define some methods or other class attributes. I actually think the
2.5 behavior is correct, and I don't know why it changed in 2.6.

--Guido

On Tue, Mar 31, 2009 at 8:15 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
> So. The issue was closed and I suppose it was closed by not entirely
> understanding
> the problem (or I didn't get it completely).
>
> The question is - what the following code should do?
>
> def f():
> ?a = 2
> ?class C:
> ? ?exec 'a = 42'
> ? ?abc = a
> ?return C
>
> print f().abc
>
> (quick answer - on python2.5 it return 42, on python 2.6 and up it
> returns 2, the patch changes
> it to syntax error).
>
> I would say that returning 2 is the less obvious thing to do. The
> reason why IMO this should
> be a syntax error is this code:
>
> def f():
> ?a = 2
> ?def g():
> ? ?exec 'a = 42'
> ? ?abc = a
>
> which throws syntax error.
>
> Cheers,
> fijal
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Apr  1 05:34:15 2009
From: guido at python.org (Guido van Rossum)
Date: Tue, 31 Mar 2009 20:34:15 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D26BB1.8050108@hastings.org>
References: <49D26BB1.8050108@hastings.org>
Message-ID: <ca471dc20903312034s78240531w6a91761156806bce@mail.gmail.com>

Can you get Jim Fulton's feedback? ISTR he originated this.

On Tue, Mar 31, 2009 at 12:14 PM, Larry Hastings <larry at hastings.org> wrote:
>
> The CObject API has two flaws.
>
> First, there is no usable type safety mechanism. ?You can store a void
> *object, and a void *description. ?There is no established schema for
> the description; it could be an integer cast to a pointer, or it could
> point to memory of any configuration, or it could be NULL. ?Thus users
> of the CObject API generally ignore it--thus working without any type
> safety whatsoever. ?A programmer could crash the interpreter from pure
> Python by mixing and matching CObjects from different modules (e.g. give
> "curses" a CObject from "_ctypes").
>
> Second, the destructor callback is defined as taking *either* one *or*
> two parameters, depending on whether the "descr" pointer is non-NULL. One
> can debate the finer points of what is and isn't defined behavior in
> C, but at its heart this is a sloppy API.
>
> MvL and I discussed this last night and decided to float a revision of
> the API. ?I wrote the patch, though, so don't blame Martin if you don't
> like my specific approach.
>
> The color of this particular bike shed is:
> * The PyCObject is now a private data structure; you must use accessors.
> ?I added accessors for all the members.
> * The constructors and the main accessor (PyCObject_AsVoidPtr) now all
> ?*require* a "const char *type" parameter, which must be a non-NULL C
> ?string of non-zero length. ?If you call that accessor and the "type"
> ?is invalid *or doesn't match*, it fails.
> * The destructor now takes the PyObject *, not the PyCObject *. ?You
> ?must use accessors to get your hands on the data inside to free it.
>
> Yes, you can easily skip around the "matching type" restriction by
> calling PyCObject_AsVoidPtr(cobj, PyCObject_GetType(cobj)). ?The point
> of my API changes is to *encourage* correct use.
>
> I've posted a patch implementing this change in the 3.1 trunk to the
> bug tracker:
>
> ? http://bugs.python.org/issue5630
>
> I look forward to your comments!
>
>
> /larry/
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fijall at gmail.com  Wed Apr  1 05:36:24 2009
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Wed, 1 Apr 2009 05:36:24 +0200
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
Message-ID: <693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com>

Because classes have now it's own local scope (according to Martin)

It's not about exec in class, it's about exec in class in nested function.

On Wed, Apr 1, 2009 at 5:25 AM, Guido van Rossum <guido at python.org> wrote:
> Well hold on for a minute, I remember we used to have an exec
> statement in a class body in the standard library, to define some file
> methods in socket.py IIRC. ?It's a totally different case than exec in
> a nested function, and I don't believe it should be turned into a
> syntax error at all. An exec in a class body is probably meant to
> define some methods or other class attributes. I actually think the
> 2.5 behavior is correct, and I don't know why it changed in 2.6.
>
> --Guido
>
> On Tue, Mar 31, 2009 at 8:15 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
>> So. The issue was closed and I suppose it was closed by not entirely
>> understanding
>> the problem (or I didn't get it completely).
>>
>> The question is - what the following code should do?
>>
>> def f():
>> ?a = 2
>> ?class C:
>> ? ?exec 'a = 42'
>> ? ?abc = a
>> ?return C
>>
>> print f().abc
>>
>> (quick answer - on python2.5 it return 42, on python 2.6 and up it
>> returns 2, the patch changes
>> it to syntax error).
>>
>> I would say that returning 2 is the less obvious thing to do. The
>> reason why IMO this should
>> be a syntax error is this code:
>>
>> def f():
>> ?a = 2
>> ?def g():
>> ? ?exec 'a = 42'
>> ? ?abc = a
>>
>> which throws syntax error.
>>
>> Cheers,
>> fijal
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From guido at python.org  Wed Apr  1 05:38:13 2009
From: guido at python.org (Guido van Rossum)
Date: Tue, 31 Mar 2009 20:38:13 -0700
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
	<693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com>
Message-ID: <ca471dc20903312038l67bc96f8u232efc7491bfb305@mail.gmail.com>

OK that might change matters. Shame on you though for posting a patch
without any explanation of the issue.

On Tue, Mar 31, 2009 at 8:36 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
> Because classes have now it's own local scope (according to Martin)
>
> It's not about exec in class, it's about exec in class in nested function.
>
> On Wed, Apr 1, 2009 at 5:25 AM, Guido van Rossum <guido at python.org> wrote:
>> Well hold on for a minute, I remember we used to have an exec
>> statement in a class body in the standard library, to define some file
>> methods in socket.py IIRC. ?It's a totally different case than exec in
>> a nested function, and I don't believe it should be turned into a
>> syntax error at all. An exec in a class body is probably meant to
>> define some methods or other class attributes. I actually think the
>> 2.5 behavior is correct, and I don't know why it changed in 2.6.
>>
>> --Guido
>>
>> On Tue, Mar 31, 2009 at 8:15 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
>>> So. The issue was closed and I suppose it was closed by not entirely
>>> understanding
>>> the problem (or I didn't get it completely).
>>>
>>> The question is - what the following code should do?
>>>
>>> def f():
>>> ?a = 2
>>> ?class C:
>>> ? ?exec 'a = 42'
>>> ? ?abc = a
>>> ?return C
>>>
>>> print f().abc
>>>
>>> (quick answer - on python2.5 it return 42, on python 2.6 and up it
>>> returns 2, the patch changes
>>> it to syntax error).
>>>
>>> I would say that returning 2 is the less obvious thing to do. The
>>> reason why IMO this should
>>> be a syntax error is this code:
>>>
>>> def f():
>>> ?a = 2
>>> ?def g():
>>> ? ?exec 'a = 42'
>>> ? ?abc = a
>>>
>>> which throws syntax error.
>>>
>>> Cheers,
>>> fijal
>>> _______________________________________________
>>> Python-Dev mailing list
>>> Python-Dev at python.org
>>> http://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>
>>
>>
>>
>> --
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fijall at gmail.com  Wed Apr  1 06:16:30 2009
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Wed, 1 Apr 2009 06:16:30 +0200
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <ca471dc20903312038l67bc96f8u232efc7491bfb305@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
	<693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com>
	<ca471dc20903312038l67bc96f8u232efc7491bfb305@mail.gmail.com>
Message-ID: <693bc9ab0903312116o78d31684t8bafbb4b80587047@mail.gmail.com>

Shame on me indeed.

On Wed, Apr 1, 2009 at 5:38 AM, Guido van Rossum <guido at python.org> wrote:
> OK that might change matters. Shame on you though for posting a patch
> without any explanation of the issue.
>
> On Tue, Mar 31, 2009 at 8:36 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
>> Because classes have now it's own local scope (according to Martin)
>>
>> It's not about exec in class, it's about exec in class in nested function.
>>
>> On Wed, Apr 1, 2009 at 5:25 AM, Guido van Rossum <guido at python.org> wrote:
>>> Well hold on for a minute, I remember we used to have an exec
>>> statement in a class body in the standard library, to define some file
>>> methods in socket.py IIRC. ?It's a totally different case than exec in
>>> a nested function, and I don't believe it should be turned into a
>>> syntax error at all. An exec in a class body is probably meant to
>>> define some methods or other class attributes. I actually think the
>>> 2.5 behavior is correct, and I don't know why it changed in 2.6.
>>>
>>> --Guido
>>>
>>> On Tue, Mar 31, 2009 at 8:15 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
>>>> So. The issue was closed and I suppose it was closed by not entirely
>>>> understanding
>>>> the problem (or I didn't get it completely).
>>>>
>>>> The question is - what the following code should do?
>>>>
>>>> def f():
>>>> ?a = 2
>>>> ?class C:
>>>> ? ?exec 'a = 42'
>>>> ? ?abc = a
>>>> ?return C
>>>>
>>>> print f().abc
>>>>
>>>> (quick answer - on python2.5 it return 42, on python 2.6 and up it
>>>> returns 2, the patch changes
>>>> it to syntax error).
>>>>
>>>> I would say that returning 2 is the less obvious thing to do. The
>>>> reason why IMO this should
>>>> be a syntax error is this code:
>>>>
>>>> def f():
>>>> ?a = 2
>>>> ?def g():
>>>> ? ?exec 'a = 42'
>>>> ? ?abc = a
>>>>
>>>> which throws syntax error.
>>>>
>>>> Cheers,
>>>> fijal
>>>> _______________________________________________
>>>> Python-Dev mailing list
>>>> Python-Dev at python.org
>>>> http://mail.python.org/mailman/listinfo/python-dev
>>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>>
>>>
>>>
>>>
>>> --
>>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>>
>>
>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From rdmurray at bitdance.com  Wed Apr  1 07:17:05 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 1 Apr 2009 01:17:05 -0400 (EDT)
Subject: [Python-Dev] 3.1a2
In-Reply-To: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>

On Tue, 31 Mar 2009 at 14:09, Benjamin Peterson wrote:
> I haven't looked at #4847 in depth, but appears that the csv module
> will need some API changes to deal with encodings. Perhaps somebody
> would like to sprint on it?

First we have to figure out what should be done.

http://bugs.python.org/4847

Having read through the ticket, it seems that a CSV file must be (and
2.6 was) treated as a binary file, and part of the CSV module's job
is to convert that binary data to and from strings.  That is, the CSV
module is at the same layer of the input stack as the TextIOWrapper.
So IMO it should have an encoding parameter, and the defaults should be
handled the same way they are for TextIOBase.

_csv as indicated by the initial error report is in py3k expecting to read
strings from the iterator passed to it, which IMO is wrong.  It should
be expecting bytes.  The problem with this solution is that those people
currently passing it string iterators would have to change their code.

The documentation says "If csvfile is a file object, it must be opened
with the ???b??? flag on platforms where that makes a difference."
With the advent of unicode strings, it now makes a difference on all
platforms.

--
R. David Murray             http://www.bitdance.com

From p.f.moore at gmail.com  Wed Apr  1 10:57:34 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 1 Apr 2009 09:57:34 +0100
Subject: [Python-Dev] And the winner is...
In-Reply-To: <gquda2$v73$1@ger.gmane.org>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com>
	<874oxaw95q.fsf@xemacs.org> <gquda2$v73$1@ger.gmane.org>
Message-ID: <79990c6b0904010157p25ac7212v77e1b85947e364da@mail.gmail.com>

2009/4/1 Tres Seaver <tseaver at palladion.com>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Stephen J. Turnbull wrote:
>
>> I also just wrote a long post about the comparison of bzr to hg
>> responding to a comment on bazaar at canonical.com. ?I won't recap it
>> here but it might be of interest.
>
> Thank you very much for your writeups on that thread: ?both in tone and
> in content I found them extremely helpful.

Agreed.
Paul

From solipsis at pitrou.net  Wed Apr  1 12:07:15 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 1 Apr 2009 10:07:15 +0000 (UTC)
Subject: [Python-Dev] CSV, bytes and encodings
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>
Message-ID: <loom.20090401T100027-754@post.gmane.org>

R. David Murray <rdmurray <at> bitdance.com> writes:
> 
> Having read through the ticket, it seems that a CSV file must be (and
> 2.6 was) treated as a binary file, and part of the CSV module's job
> is to convert that binary data to and from strings.

IMO this interpretation is flawed.
In 2.6 there is no tangible difference between "binary" and "text" files, except
for newline handling. Also, as a matter of fact, if you want the 2.x CSV module
to read a file with Windows line endings, you have to open the file in "rU" mode
(that is, the closest we have to a moral equivalent of the 3.x text files).

Therefore, I don't think 2.x is of any guidance to us for what 3.x should do.

I see three possible practical cases that, ideally, the 3.x CSV module should be
able to handle:
1. be handed a binary file (yielding bytes) without an encoding: in this case,
the CSV module should return lists of bytes objects
2. be handed a text file (yielding str) without an encoding: in this case, the
CSV module should return lists of str objects
3. be handed a binary file (yielding bytes) with an encoding: in this case, the
CSV module should also return lists of str objects

I think 2 and 3 both /should/ be supported (for 3, it's probably enough to wrap
the binary file in a TextIOWrapper ;-)). 1 would be convenient too, but perhaps
more work than it deserves (since it means the CSV module must be able to deal
internally with two different datatypes: bytes and str).

> The documentation says "If csvfile is a file object, it must be opened
> with the ?b? flag on platforms where that makes a difference."

The documentation is, IMO, wrong even in 2.x. Just yesterday I had to open a CSV
file in 'rU' mode because it had Windows line endings and I'm under Linux....

Regards

Antoine.

From skip at pobox.com  Wed Apr  1 12:37:38 2009
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 1 Apr 2009 05:37:38 -0500
Subject: [Python-Dev] CSV, bytes and encodings
In-Reply-To: <loom.20090401T100027-754@post.gmane.org>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>
	<loom.20090401T100027-754@post.gmane.org>
Message-ID: <18899.17394.455907.841425@montanaro.dyndns.org>

    >> Having read through the ticket, it seems that a CSV file must be (and
    >> 2.6 was) treated as a binary file, and part of the CSV module's job
    >> is to convert that binary data to and from strings.

    Antoine> IMO this interpretation is flawed.  In 2.6 there is no tangible
    Antoine> difference between "binary" and "text" files, except for
    Antoine> newline handling. Also, as a matter of fact, if you want the
    Antoine> 2.x CSV module to read a file with Windows line endings, you
    Antoine> have to open the file in "rU" mode (that is, the closest we
    Antoine> have to a moral equivalent of the 3.x text files).

The problem is that fields in CSV files, at least those produced by Excel,
can contain embedded newlines.  You are welcome to decide that *all* CRLF
pairs should be translated to LF, but that is not the decision the original
authors (mostly Andrew MacNamara) made.  The contents of the fields was
deemed to be separate from the newline convention, so the csv module needed
to do its own newline processing, and thus required files to be opened in
binary mode.

This case arises rarely, but it does turn up every now and again.  If you
are comfortable with translating all CRLF pairs into LF, no matter if they
are true end-of-line markers or embedded content, that's fine.  (It
certainly simplifies the implementation.)  However, a) I would run it past
the folks on csv at python.org first, and b) put a big fat note in the module
docs about the transformation.

    Antoine> Therefore, I don't think 2.x is of any guidance to us for what
    Antoine> 3.x should do.

I suspect we will disagree on this.  I believe the behavior of the 2.x
version of the module is easily defensible and should be a useful guide to
how the 3.x version of the module behaves.

    >> The documentation says "If csvfile is a file object, it must be
    >> opened with the $,1rx(Bb$,1ry(B flag on platforms where that makes a difference."

    Antoine> The documentation is, IMO, wrong even in 2.x. Just yesterday I
    Antoine> had to open a CSV file in 'rU' mode because it had Windows line
    Antoine> endings and I'm under Linux....

See above.  You almost certainly didn't have fields containing CRLF pairs or
didn't care that while reading the file your data values were silently
altered.

Skip

From ncoghlan at gmail.com  Wed Apr  1 12:45:26 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 01 Apr 2009 20:45:26 +1000
Subject: [Python-Dev] Broken import?
In-Reply-To: <49D29D7B.7000002@canterbury.ac.nz>
References: <ca471dc20903301544l3749c80i3d166fbb8780f502@mail.gmail.com>	<ca471dc20903301617ydbb4bc9pfb221c2cba8df7e4@mail.gmail.com>	<ca471dc20903301652w63dac59dn3275620c3b220fd3@mail.gmail.com>	<gqrqv0$8v2$1@ger.gmane.org>
	<gqrs4i$b1l$1@ger.gmane.org>	<49D20B54.1010108@gmail.com>
	<49D29D7B.7000002@canterbury.ac.nz>
Message-ID: <49D345C6.2050507@gmail.com>

Greg Ewing wrote:
> Nick Coghlan wrote:
> 
>> Jim Fulton's example in that tracker issue shows that with a bit of
>> creativity you can provoke this behaviour *without* using a from-style
>> import. Torsten Bronger later brought up the same issue that Fredrik did
>> - it prevents some kinds of explicit relative import that look like they
>> should be fine.
> 
> I haven't been following this very closely, but if there's
> something that's making absolute and relative imports
> behave differently, I think it should be fixed. The only
> difference between an absolute and relative import of the
> same module should be the way you specify the module.

That's exactly the problem though. Because of the difference in the way
the target module is specified, the way it is looked up is different:

'import a.b.c' will look in sys.modules for "a.b.c", succeed and work,
even if "a.b.c" is in the process of being imported.

'from a.b import c' (or 'from . import c' in a subpackage of "a.b") will
only look in sys.modules for "a.b", and then look on that object for a
"c" attribute. The cached "a.b.c' module in sys.modules is ignored.

It doesn't appear to be an impossible problem to solve, but it probably
isn't going to be easy to fix in a backwards compatible way.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Wed Apr  1 12:53:19 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 1 Apr 2009 10:53:19 +0000 (UTC)
Subject: [Python-Dev] CSV, bytes and encodings
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>
	<loom.20090401T100027-754@post.gmane.org>
	<18899.17394.455907.841425@montanaro.dyndns.org>
Message-ID: <loom.20090401T104816-798@post.gmane.org>

<skip <at> pobox.com> writes:
> 
>     Antoine> The documentation is, IMO, wrong even in 2.x. Just yesterday I
>     Antoine> had to open a CSV file in 'rU' mode because it had Windows line
>     Antoine> endings and I'm under Linux....
> 
> See above.  You almost certainly didn't have fields containing CRLF pairs or
> didn't care that while reading the file your data values were silently
> altered.

Perhaps. But without using 'rU' the file couldn't be read at all.
(I'm not sure it was Windows line endings by the way; perhaps Macintosh ones;
anyway, it didn't work using 'rb')

I have to add that if individual fields really can contain newlines, then the
CSV module ought to be smarter when /saving/ those fields. I've inadvertently
tried to produce a CSV file with such fields and it ended up wrong when opened
as a spreadsheet (text after the newlines was ignored in Gnumeric and in
OpenOffice, while Excel displayed a spurious additional row containing only the
text after the newline).

Regards

Antoine.

From greg.ewing at canterbury.ac.nz  Wed Apr  1 13:11:10 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 01 Apr 2009 23:11:10 +1200
Subject: [Python-Dev] Broken import?
In-Reply-To: <49D345C6.2050507@gmail.com>
References: <ca471dc20903301544l3749c80i3d166fbb8780f502@mail.gmail.com>
	<ca471dc20903301617ydbb4bc9pfb221c2cba8df7e4@mail.gmail.com>
	<ca471dc20903301652w63dac59dn3275620c3b220fd3@mail.gmail.com>
	<gqrqv0$8v2$1@ger.gmane.org> <gqrs4i$b1l$1@ger.gmane.org>
	<49D20B54.1010108@gmail.com> <49D29D7B.7000002@canterbury.ac.nz>
	<49D345C6.2050507@gmail.com>
Message-ID: <49D34BCE.4050401@canterbury.ac.nz>

Nick Coghlan wrote:

> 'import a.b.c' will look in sys.modules for "a.b.c", succeed and work,
> even if "a.b.c" is in the process of being imported.
> 
> 'from a.b import c' (or 'from . import c' in a subpackage of "a.b") will
> only look in sys.modules for "a.b", and then look on that object for a
> "c" attribute. The cached "a.b.c' module in sys.modules is ignored.

Hasn't 'from a.b import c' always been that way, though?
Is the problem just that relative imports make it easier
to run into this behaviour, or has something about the
way imports work changed?

-- 
Greg

From ncoghlan at gmail.com  Wed Apr  1 13:50:07 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 01 Apr 2009 21:50:07 +1000
Subject: [Python-Dev] Broken import?
In-Reply-To: <49D34BCE.4050401@canterbury.ac.nz>
References: <ca471dc20903301544l3749c80i3d166fbb8780f502@mail.gmail.com>	<ca471dc20903301617ydbb4bc9pfb221c2cba8df7e4@mail.gmail.com>	<ca471dc20903301652w63dac59dn3275620c3b220fd3@mail.gmail.com>	<gqrqv0$8v2$1@ger.gmane.org>
	<gqrs4i$b1l$1@ger.gmane.org>	<49D20B54.1010108@gmail.com>
	<49D29D7B.7000002@canterbury.ac.nz>	<49D345C6.2050507@gmail.com>
	<49D34BCE.4050401@canterbury.ac.nz>
Message-ID: <49D354EF.1010300@gmail.com>

Greg Ewing wrote:
> Nick Coghlan wrote:
> 
>> 'import a.b.c' will look in sys.modules for "a.b.c", succeed and work,
>> even if "a.b.c" is in the process of being imported.
>>
>> 'from a.b import c' (or 'from . import c' in a subpackage of "a.b") will
>> only look in sys.modules for "a.b", and then look on that object for a
>> "c" attribute. The cached "a.b.c' module in sys.modules is ignored.
> 
> Hasn't 'from a.b import c' always been that way, though?
> Is the problem just that relative imports make it easier
> to run into this behaviour, or has something about the
> way imports work changed?

The former - while a few things have obviously changed in this area due
to PEP 328 and PEP 366, I don't believe any of that affected this aspect
of the semantics (the issue I linked dates from 2004!).

Instead, I'm pretty sure implicit relative imports use the 'import
a.b.c' rules and hence work in situations where explicit relative
imports now fail.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From chris at simplistix.co.uk  Wed Apr  1 14:12:41 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Wed, 01 Apr 2009 13:12:41 +0100
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
Message-ID: <49D35A39.7020507@simplistix.co.uk>

Guido van Rossum wrote:
> Well hold on for a minute, I remember we used to have an exec
> statement in a class body in the standard library, to define some file
> methods in socket.py IIRC. 

But why an exec?! Surely there must be some other way to do this than an 
exec?

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From skip at pobox.com  Wed Apr  1 14:51:28 2009
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 1 Apr 2009 07:51:28 -0500
Subject: [Python-Dev] CSV, bytes and encodings
In-Reply-To: <loom.20090401T104816-798@post.gmane.org>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>
	<loom.20090401T100027-754@post.gmane.org>
	<18899.17394.455907.841425@montanaro.dyndns.org>
	<loom.20090401T104816-798@post.gmane.org>
Message-ID: <18899.25424.820832.462451@montanaro.dyndns.org>

    Antoine> Perhaps. But without using 'rU' the file couldn't be read at
    Antoine> all.  (I'm not sure it was Windows line endings by the way;
    Antoine> perhaps Macintosh ones; anyway, it didn't work using 'rb')

Please file a bug report and assign to me.  Does it work in 2.x?  What was
the source of the file?

    Antoine> I have to add that if individual fields really can contain
    Antoine> newlines, then the CSV module ought to be smarter when /saving/
    Antoine> those fields. I've inadvertently tried to produce a CSV file
    Antoine> with such fields and it ended up wrong when opened as a
    Antoine> spreadsheet (text after the newlines was ignored in Gnumeric
    Antoine> and in OpenOffice, while Excel displayed a spurious additional
    Antoine> row containing only the text after the newline).

Sounds like you have a budding test case.

Of course, the problem with CSV files is that there is no standard.  In the
above paragraph you named three.  The CSV authors chose Excel's behavior as
the measuring stick.  Still, that's not written down anywhere.  You have to
read the tea leaves.

Skip

From rdmurray at bitdance.com  Wed Apr  1 16:54:19 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 1 Apr 2009 10:54:19 -0400 (EDT)
Subject: [Python-Dev] CSV, bytes and encodings
In-Reply-To: <18899.17394.455907.841425@montanaro.dyndns.org>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>
	<loom.20090401T100027-754@post.gmane.org>
	<18899.17394.455907.841425@montanaro.dyndns.org>
Message-ID: <Pine.LNX.4.64.0904011042310.26362@kimball.webabinitio.net>

On Wed, 1 Apr 2009 at 05:37, skip at pobox.com wrote:
> This case arises rarely, but it does turn up every now and again.  If you

For some definition of "rarely".  I don't handle CVS files generated by
Windows very often, but I've run into it a least a couple times.  That
says to me that it isn't all that rare in the wild.  (One out of
fifty?  But I'm sure it depends on your data sources; some people
will run into it often, others almost never.)

Of course, on unix it doesn't help much having those newlines preserved,
since there are few tools on unix other than the CSV module that even
attempt to deal with newlines inside quoted strings being data, but on
Windows it makes a difference.

It would actually be nice if the CSV module had an option for turning
those quoted newlines into spaces, but that's a feature request and
is out of scope for this discussion :)

>    Antoine> The documentation is, IMO, wrong even in 2.x. Just yesterday I
>    Antoine> had to open a CSV file in 'rU' mode because it had Windows line
>    Antoine> endings and I'm under Linux....

That sounds like a bug, IMO.  From the source code it looks like the
2.6 _csv module should be handling that, and certainly intended to
handle it.

--David

From peck at spss.com  Wed Apr  1 15:45:25 2009
From: peck at spss.com (Peck, Jon)
Date: Wed, 1 Apr 2009 08:45:25 -0500
Subject: [Python-Dev] Python 2.6 64-bit Mac Release
Message-ID: <6CD9B6A6B6CCBA4FA497F07182F4EE83014B6EAB@MIAEMAILEVS1.spss.com>

Apparently the Mac Python 2.6.1 Installer image does not include 64-bit binaries.  Is this going to change?  Is there some technical reason why these are not included?  We are hoping to support that in our next 64-bit release.

Thanks for your help.

Jon K. Peck

SPSS Inc.

peck at spss.com

(ip) phone 312-651-3435

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090401/e7b82ea3/attachment.htm>

From cto at whiz.com  Wed Apr  1 18:01:01 2009
From: cto at whiz.com (Frank The Extruder Lomax)
Date: April 1, 2009 12:01:01 EDT
Subject: [Python-Dev] All Hail the FLUFL
Message-ID: <cheez@whiz.com.2009.12.01.01.0400>

On behalf of the entire Python community and as CTO of Cheez Whiz Global
Conglomerates, Inc. I would like to extend My Thanks to our BDEVIL's many
Selfless and Dedicated years of service.  I can say without remorse that the
State of the Art in the Gas Propelled Cheezical Sciences would be 11 years
behind schedule if it weren't for the BDEVIL and Python.

However, as we at CWGC have been Contemplating an upgrade to our Pythonic
Orange Oscillation Process for many years - we're still on Python 1.5.1 and
due to our Centuries-Old corporate culture, only upgrade to multiples of two -
we have been dismayed by recent Unfortunate Decisions negatively impacting our
hope of world-wide long term acceptance of Python 3.0.2.  Thus I applaud you
for your chosen New Path, wish you a Speedy Shirpa Assisted Ascent, and I
welcome our new FLUFL's similarly Lofty Ascent to Pythonic Overlordhood.

You may not realize that the technology behind our groundbreaking Subsonic
Plasticine Extrusion of Coagulated Flavor Oil utilize inequalities to a vast
degree.  In fact, our many Published Papers confirm our mathematical
leadership in the Algebraic States of Less Than and More Than Simultaneity.
Were it not for the Diamond Operator, billions, nay! trillions of crackers
would have languished Unadorned, Unenjoyed, and Unloved.

For this reason, the sole choice of the Evil Hash-Equal was enough to force us
to Seriously Investigate a switch to Stenographic Non-deterministic SQL (a
promising new scripting language somewhat similar to Ruby).  It is with an
Overboiling Pot of Joy that we fully support Official FLUFL Act 2.  Trust me
when I say that with this single reversion, the world of High Velocity
Extremely High Pressure Milkyish Orange Goop Delivery Devices will never be
the same.

I am also Ecstatic at the reversal of DVCS decision.  It is with no Small
Irony that I admit the mere utterance of any derivative of the root word
"Mercury" is a firing offence in my Establishment.

In honor of our new FLUFL, I am directing our CFO Timmy "The Larch" Lomax (no
direct relation to myself) to donate the sum of USA $23,250 to the PSU in
furtherance of their mission.  If there is a PyCon sponsorship level Above
Diamond (may we suggest "Orange"?) we would be honored to claim that Pinnacle
for 2010.  Atlanta is located very near our Secret Manufacturing Facility and
I would be remiss if I did not direct additional PyCon branded delivery of
5000 cans of our Premium Velvet Brand Cheez Whiz Lunchables with Detachable
Shooters.  I think the 2010 conference attendees will appreciate the
diversion and hope this will entice people to join our Sprint next year.

foolish-ly y'rs,
frank

From rdmurray at bitdance.com  Wed Apr  1 17:00:06 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 1 Apr 2009 11:00:06 -0400 (EDT)
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <49D35A39.7020507@simplistix.co.uk>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
	<49D35A39.7020507@simplistix.co.uk>
Message-ID: <Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>

On Wed, 1 Apr 2009 at 13:12, Chris Withers wrote:
> Guido van Rossum wrote:
>>  Well hold on for a minute, I remember we used to have an exec
>>  statement in a class body in the standard library, to define some file
>>  methods in socket.py IIRC. 
>
> But why an exec?! Surely there must be some other way to do this than an 
> exec?

Maybe, but this sure is gnarly code:

     _s = ("def %s(self, *args): return self._sock.%s(*args)\n\n"
           "%s.__doc__ = _realsocket.%s.__doc__\n")
     for _m in _socketmethods:
         exec _s % (_m, _m, _m, _m)
     del _m, _s

Guido's memory is good, that's from the _socketobject class in
socket.py.

--David

From jeremy at alum.mit.edu  Wed Apr  1 17:20:32 2009
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Wed, 1 Apr 2009 11:20:32 -0400
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
	<693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com>
Message-ID: <e8bf7a530904010820k601beee8j298ec9b16d8b9c15@mail.gmail.com>

I posted in the bug report, but repeating here:  I don't remember why
exec in a nested function changed either.  It would help if someone
could summarize why we made the change.  (I hope I didn't do it <0.2
wink>.)

Jeremy

On Tue, Mar 31, 2009 at 11:36 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
> Because classes have now it's own local scope (according to Martin)
>
> It's not about exec in class, it's about exec in class in nested function.
>
> On Wed, Apr 1, 2009 at 5:25 AM, Guido van Rossum <guido at python.org> wrote:
>> Well hold on for a minute, I remember we used to have an exec
>> statement in a class body in the standard library, to define some file
>> methods in socket.py IIRC. ?It's a totally different case than exec in
>> a nested function, and I don't believe it should be turned into a
>> syntax error at all. An exec in a class body is probably meant to
>> define some methods or other class attributes. I actually think the
>> 2.5 behavior is correct, and I don't know why it changed in 2.6.
>>
>> --Guido
>>
>> On Tue, Mar 31, 2009 at 8:15 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
>>> So. The issue was closed and I suppose it was closed by not entirely
>>> understanding
>>> the problem (or I didn't get it completely).
>>>
>>> The question is - what the following code should do?
>>>
>>> def f():
>>> ?a = 2
>>> ?class C:
>>> ? ?exec 'a = 42'
>>> ? ?abc = a
>>> ?return C
>>>
>>> print f().abc
>>>
>>> (quick answer - on python2.5 it return 42, on python 2.6 and up it
>>> returns 2, the patch changes
>>> it to syntax error).
>>>
>>> I would say that returning 2 is the less obvious thing to do. The
>>> reason why IMO this should
>>> be a syntax error is this code:
>>>
>>> def f():
>>> ?a = 2
>>> ?def g():
>>> ? ?exec 'a = 42'
>>> ? ?abc = a
>>>
>>> which throws syntax error.
>>>
>>> Cheers,
>>> fijal
>>> _______________________________________________
>>> Python-Dev mailing list
>>> Python-Dev at python.org
>>> http://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>
>>
>>
>>
>> --
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
>

From brett at python.org  Wed Apr  1 17:21:48 2009
From: brett at python.org (Brett Cannon)
Date: Wed, 1 Apr 2009 08:21:48 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <ca471dc20903312034s78240531w6a91761156806bce@mail.gmail.com>
References: <49D26BB1.8050108@hastings.org>
	<ca471dc20903312034s78240531w6a91761156806bce@mail.gmail.com>
Message-ID: <bbaeab100904010821u60fd02cdxf53f9c5b1e1833dd@mail.gmail.com>

On Tue, Mar 31, 2009 at 20:34, Guido van Rossum <guido at python.org> wrote:

> Can you get Jim Fulton's feedback? ISTR he originated this.
>

I thought Neal started this idea?

-Brett

>
> On Tue, Mar 31, 2009 at 12:14 PM, Larry Hastings <larry at hastings.org>
> wrote:
> >
> > The CObject API has two flaws.
> >
> > First, there is no usable type safety mechanism.  You can store a void
> > *object, and a void *description.  There is no established schema for
> > the description; it could be an integer cast to a pointer, or it could
> > point to memory of any configuration, or it could be NULL.  Thus users
> > of the CObject API generally ignore it--thus working without any type
> > safety whatsoever.  A programmer could crash the interpreter from pure
> > Python by mixing and matching CObjects from different modules (e.g. give
> > "curses" a CObject from "_ctypes").
> >
> > Second, the destructor callback is defined as taking *either* one *or*
> > two parameters, depending on whether the "descr" pointer is non-NULL. One
> > can debate the finer points of what is and isn't defined behavior in
> > C, but at its heart this is a sloppy API.
> >
> > MvL and I discussed this last night and decided to float a revision of
> > the API.  I wrote the patch, though, so don't blame Martin if you don't
> > like my specific approach.
> >
> > The color of this particular bike shed is:
> > * The PyCObject is now a private data structure; you must use accessors.
> >  I added accessors for all the members.
> > * The constructors and the main accessor (PyCObject_AsVoidPtr) now all
> >  *require* a "const char *type" parameter, which must be a non-NULL C
> >  string of non-zero length.  If you call that accessor and the "type"
> >  is invalid *or doesn't match*, it fails.
> > * The destructor now takes the PyObject *, not the PyCObject *.  You
> >  must use accessors to get your hands on the data inside to free it.
> >
> > Yes, you can easily skip around the "matching type" restriction by
> > calling PyCObject_AsVoidPtr(cobj, PyCObject_GetType(cobj)).  The point
> > of my API changes is to *encourage* correct use.
> >
> > I've posted a patch implementing this change in the 3.1 trunk to the
> > bug tracker:
> >
> >   http://bugs.python.org/issue5630
> >
> > I look forward to your comments!
> >
> >
> > /larry/
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > http://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> > http://mail.python.org/mailman/options/python-dev/guido%40python.org
> >
>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/<http://www.python.org/%7Eguido/>
> )
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090401/f9890350/attachment.htm>

From kristjan at ccpgames.com  Wed Apr  1 17:34:42 2009
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Wed, 1 Apr 2009 15:34:42 +0000
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D26BB1.8050108@hastings.org>
References: <49D26BB1.8050108@hastings.org>
Message-ID: <930F189C8A437347B80DF2C156F7EC7F056D52762C@exchis.ccp.ad.local>

What are the semantics of the "type" argument for PyCObject_FromVoidPtr()?
-Does it do a strdup, or is the type required to be valid while the object exists (e.g. a static string)?
-How is the type match determined, strcmp, or pointer comparison?

-----Original Message-----
From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Larry Hastings
Sent: 31. mars 2009 19:15
To: Python-Dev at python.org
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!

* The constructors and the main accessor (PyCObject_AsVoidPtr) now all
  *require* a "const char *type" parameter, which must be a non-NULL C
  string of non-zero length.  If you call that accessor and the "type"
  is invalid *or doesn't match*, it fails.

From ronaldoussoren at mac.com  Wed Apr  1 18:17:40 2009
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Wed, 01 Apr 2009 11:17:40 -0500
Subject: [Python-Dev] Python 2.6 64-bit Mac Release
In-Reply-To: <6CD9B6A6B6CCBA4FA497F07182F4EE83014B6EAB@MIAEMAILEVS1.spss.com>
References: <6CD9B6A6B6CCBA4FA497F07182F4EE83014B6EAB@MIAEMAILEVS1.spss.com>
Message-ID: <CB4ECE78-9904-4A81-B3D0-0D28130F153C@mac.com>

On 1 Apr, 2009, at 8:45, Peck, Jon wrote:

> Apparently the Mac Python 2.6.1 Installer image does not include 64- 
> bit binaries.  Is this going to change?  Is there some technical  
> reason why these are not included?  We are hoping to support that in  
> our next 64-bit release.

The 2.6 installer image does not include 64-bit binaries.  As of this  
week the script that creates the installer can build an installer that  
does support 64-bit code as well, but that only works on Leopard  
systems.

I'm thinking about how to distribute binaries that support 64-bit code  
without unduly complicating the world. The easiest option for me would  
be to have two installers: one 32-bit only that supports OSX 10.3.9  
and later and a 4-way universal one that supports OSX Leopard and  
later.  It might be possible to have a single installer that supports  
64-bit code on Leopard but is usable on 10.3.9 as well, but I haven't  
checked yet how much that would complicate the build.

Ronald

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090401/205164b3/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2224 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090401/205164b3/attachment.bin>

From fuzzyman at voidspace.org.uk  Wed Apr  1 18:19:42 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Wed, 01 Apr 2009 11:19:42 -0500
Subject: [Python-Dev] Python 2.6 64-bit Mac Release
In-Reply-To: <CB4ECE78-9904-4A81-B3D0-0D28130F153C@mac.com>
References: <6CD9B6A6B6CCBA4FA497F07182F4EE83014B6EAB@MIAEMAILEVS1.spss.com>
	<CB4ECE78-9904-4A81-B3D0-0D28130F153C@mac.com>
Message-ID: <49D3941E.103@voidspace.org.uk>

Ronald Oussoren wrote:
>
> On 1 Apr, 2009, at 8:45, Peck, Jon wrote:
>
>> Apparently the Mac Python 2.6.1 Installer image does not include 
>> 64-bit binaries.  Is this going to change?  Is there some technical 
>> reason why these are not included?  We are hoping to support that in 
>> our next 64-bit release.
>
> The 2.6 installer image does not include 64-bit binaries.  As of this 
> week the script that creates the installer can build an installer that 
> does support 64-bit code as well, but that only works on Leopard systems.
>
> I'm thinking about how to distribute binaries that support 64-bit code 
> without unduly complicating the world. The easiest option for me would 
> be to have two installers: one 32-bit only that supports OSX 10.3.9 
> and later and a 4-way universal one that supports OSX Leopard and 
> later.  It might be possible to have a single installer that supports 
> 64-bit code on Leopard but is usable on 10.3.9 as well, but I haven't 
> checked yet how much that would complicate the build.
>

Two installers sounds OK to me, particularly if it simplifies the build 
process but allows us to still support 64bit.

Michael

> Ronald
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

-- 
http://www.ironpythoninaction.com/

From jeremy at alum.mit.edu  Wed Apr  1 18:21:03 2009
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Wed, 1 Apr 2009 12:21:03 -0400
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
	<49D35A39.7020507@simplistix.co.uk>
	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>
Message-ID: <e8bf7a530904010921o5fd2a5ffl4a6e492fa79d6c69@mail.gmail.com>

Eeek, I think it was me.  Part of the AST changes involved raising a
SyntaxError when exec was used in a scope that had a free variable,
since the behavior is pretty much undefined.  If the compiler decides
a variable is free, then it can't be assigned to in the function body.
 The compiled exec code can't know whether the variable is local or
free in the exec context, only that it should generate a STORE_NAME
opcode.  The STORE_NAME can't possibly work.

It looks like I did a bad job of documenting the change, though.  I
had forgotton about it ,because it was three or four years ago.

It looks like the same exception should be raised for the class statement.

Jeremy

On Wed, Apr 1, 2009 at 11:00 AM, R. David Murray <rdmurray at bitdance.com> wrote:
> On Wed, 1 Apr 2009 at 13:12, Chris Withers wrote:
>>
>> Guido van Rossum wrote:
>>>
>>> ?Well hold on for a minute, I remember we used to have an exec
>>> ?statement in a class body in the standard library, to define some file
>>> ?methods in socket.py IIRC.
>>
>> But why an exec?! Surely there must be some other way to do this than an
>> exec?
>
> Maybe, but this sure is gnarly code:
>
> ? ?_s = ("def %s(self, *args): return self._sock.%s(*args)\n\n"
> ? ? ? ? ?"%s.__doc__ = _realsocket.%s.__doc__\n")
> ? ?for _m in _socketmethods:
> ? ? ? ?exec _s % (_m, _m, _m, _m)
> ? ?del _m, _s
>
> Guido's memory is good, that's from the _socketobject class in
> socket.py.
>
> --David
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
>

From ron.duplain at gmail.com  Wed Apr  1 18:50:48 2009
From: ron.duplain at gmail.com (Ron DuPlain)
Date: Wed, 1 Apr 2009 11:50:48 -0500
Subject: [Python-Dev] 3to2 Project
In-Reply-To: <1afaf6160903301929l4120abe5g96e2ca2fdb722896@mail.gmail.com>
References: <4222a8490903300744t498e79daodea9cff32e4a94c1@mail.gmail.com>
	<43aa6ff70903301037y215d979he36246d36c987493@mail.gmail.com>
	<1afaf6160903301929l4120abe5g96e2ca2fdb722896@mail.gmail.com>
Message-ID: <2b485bad0904010950h7c3f3275n1f03c4b2cf2dcc3e@mail.gmail.com>

On Mon, Mar 30, 2009 at 9:29 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2009/3/30 Collin Winter <collinw at gmail.com>:
>> If anyone is interested in working on this during the PyCon sprints or
>> otherwise, here are some easy, concrete starter projects that would
>> really help move this along:
>> - The core refactoring engine needs to be broken out from 2to3. In
>> particular, the tests/ and fixes/ need to get pulled up a directory,
>> out of lib2to3/.
>> - Once that's done, lib2to3 should then be renamed to something like
>> librefactor or something else that indicates its more general nature.
>> This will allow both 2to3 and 3to2 to more easily share the core
>> components.
>
> FWIW, I think it is unfortunately too late to make this change. We've
> already released it as lib2to3 in the standard library and I have
> actually seen it used in other projects. (PythonScope, for example.)
>

Paul Kippes and I have been sprinting on this.  We put lib2to3 into a
refactor package and kept a shell lib2to3 to support the old
interface.

We are able to run 2to3, 3to2, lib2to3 tests, and refactor tests.  We
only have a few simple 3to2 fixes now, but they should be easy to add.
 We kept the old lib2to3 tests to make sure we didn't break anything.
As things settle down, I'd like to verify that our new lib2to3 is
backward-compatible (since right now it points to the new refactor
lib) with one of the external projects.

We've been using hg to push changesets between each other, but we'll
be committing to the svn sandbox before the week is out.  I'm heading
out today, but Paul is sticking around another day.

It's a start,

Ron

>
> --
> Regards,
> Benjamin

From fuzzyman at voidspace.org.uk  Wed Apr  1 19:51:50 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Wed, 01 Apr 2009 12:51:50 -0500
Subject: [Python-Dev] Wing IDE and python.wpr
Message-ID: <49D3A9B6.9020609@voidspace.org.uk>

Hello all,

How many are using the Wing IDE to work on core Python?

It would be nice to have a 'python.wpr' checked in to trunk, as I have 
to recreate the project file every time I do a new checkout. Would this 
be useful for anyone else? Where is a good place for it to live? 
Littering the top level directory seems like a bad idea but I can't see 
anywhere else immediately *obvious* (no reason it has to live at the top 
level).

Wing can be configured to use two files for the project - one file for 
the basic configuration (which would be checked in) and one for your 
personal settings (which files you have open, how many windows you are 
using etc) and would be svn-ignored.

Michael Foord

-- 
http://www.ironpythoninaction.com/

From larry at hastings.org  Wed Apr  1 20:40:36 2009
From: larry at hastings.org (Larry Hastings)
Date: Wed, 01 Apr 2009 11:40:36 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <bbaeab100904010821u60fd02cdxf53f9c5b1e1833dd@mail.gmail.com>
References: <49D26BB1.8050108@hastings.org>
	<ca471dc20903312034s78240531w6a91761156806bce@mail.gmail.com>
	<bbaeab100904010821u60fd02cdxf53f9c5b1e1833dd@mail.gmail.com>
Message-ID: <49D3B524.5090708@hastings.org>

Brett Cannon wrote:
> On Tue, Mar 31, 2009 at 20:34, Guido van Rossum <guido at python.org 
> <mailto:guido at python.org>> wrote:
>
>     Can you get Jim Fulton's feedback? ISTR he originated this.
>
>
> I thought Neal started this idea?

The earliest revision spotted in "svn blame cobject.[ch]" is 5782:

    svn log -r 5782
    ------------------------------------------------------------------------
    r5782 | guido | 1996-01-11 16:44:03 -0800 (Thu, 11 Jan 1996) | 2 lines

    opaque C object a la Jim Fulton

I'll email Jim Fulton and inquire.

/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090401/3d9af588/attachment.htm>

From fuzzyman at voidspace.org.uk  Wed Apr  1 20:44:40 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Wed, 01 Apr 2009 13:44:40 -0500
Subject: [Python-Dev] Wing IDE and python.wpr
In-Reply-To: <49D3A9B6.9020609@voidspace.org.uk>
References: <49D3A9B6.9020609@voidspace.org.uk>
Message-ID: <49D3B618.2010203@voidspace.org.uk>

Michael Foord wrote:
> Hello all,
>
> How many are using the Wing IDE to work on core Python?
>
> It would be nice to have a 'python.wpr' checked in to trunk, as I have 
> to recreate the project file every time I do a new checkout. Would 
> this be useful for anyone else? Where is a good place for it to live? 
> Littering the top level directory seems like a bad idea but I can't 
> see anywhere else immediately *obvious* (no reason it has to live at 
> the top level).
>
> Wing can be configured to use two files for the project - one file for 
> the basic configuration (which would be checked in) and one for your 
> personal settings (which files you have open, how many windows you are 
> using etc) and would be svn-ignored.
The Wing project file is now checked in. It is Misc/python-wing.wpr

The project is configured with SVN integration enabled, with two file 
configuration and the wpu file SVN ignored plus the main project 
directory added.

The wpr file is text so changes are diff friendly.

There is an issue with the way the project is displayed - the Misc 
directory is the top-level with '..' showing as another directory in the 
project. This issue will be resolved in the next version of Wing.

There are various other feature-requests now with Wing to better support 
using it for developing Python. Currently the debugger doesn't work with 
a newly built version of Python and the executable name / location is 
platform dependent and so setting a custom executable would only work on 
one platform.

It would be easy to add custom tools to (for example) integrate regrtest 
or do the configure / make dance on a fresh checkout.

All the best,

Michael

>
> Michael Foord
>

-- 
http://www.ironpythoninaction.com/

From larry at hastings.org  Wed Apr  1 20:58:00 2009
From: larry at hastings.org (Larry Hastings)
Date: Wed, 01 Apr 2009 11:58:00 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F056D52762C@exchis.ccp.ad.local>
References: <49D26BB1.8050108@hastings.org>
	<930F189C8A437347B80DF2C156F7EC7F056D52762C@exchis.ccp.ad.local>
Message-ID: <49D3B938.5000202@hastings.org>

Kristj?n Valur J?nsson wrote:
> What are the semantics of the "type" argument for PyCObject_FromVoidPtr()?
>   

 From the patch, from the documentation comment above the prototype for 
PyCObject_FromVoidPtr() in Include/cobject.h:

    The "type" string must point to a legal C string of non-zero length,

> -Does it do a strdup, or is the type required to be valid while the object exists (e.g. a static string)?
>   

 From the patch, continuing on from where we just left off:

    and this string must outlive the CObject.

> -How is the type match determined, strcmp, or pointer comparison?

 From the patch, observing the code in the static function 
_is_legal_cobject_and_type() in Objects/cobject.c:

        if (!type || !*type) {
            PyErr_SetString(PyExc_TypeError, invalidType);
            return 0;
        }
        if (strcmp(type, self->type)) {
            PyErr_SetString(PyExc_TypeError, incorrectType);
            return 0;
        }

A method for answering further such questions suggests itself,

/larry//
/

From jim at zope.com  Wed Apr  1 23:29:19 2009
From: jim at zope.com (Jim Fulton)
Date: Wed, 1 Apr 2009 17:29:19 -0400
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D26BB1.8050108@hastings.org>
References: <49D26BB1.8050108@hastings.org>
Message-ID: <FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>

On Mar 31, 2009, at 3:14 PM, Larry Hastings wrote:

(Thanks for calling my attention to this. :)

>
> The CObject API has two flaws.
>
> First, there is no usable type safety mechanism.  You can store a void
> *object, and a void *description.  There is no established schema for
> the description; it could be an integer cast to a pointer, or it could
> point to memory of any configuration, or it could be NULL.  Thus users
> of the CObject API generally ignore it--thus working without any type
> safety whatsoever.  A programmer could crash the interpreter from pure
> Python by mixing and matching CObjects from different modules (e.g.  
> give
> "curses" a CObject from "_ctypes").

The description field wasn't in the original CObject implementation  
that I was involved with many years ago.  Looking at it now, I don't  
think it is intended as a type-safety mechanism at all, but as a way  
to pass data to the destructor.  I don't know what motivated this. (I  
don't know why it it's called "description". This name seems to be  
very confusing.)

The only type-safety mechanism for a CObject is it's identity.  If you  
want to make sure you're using the foomodule api, make sure the  
address of the CObject is the same as the address of the api object  
exported by the module.

The exporting module should automate use of the C API by providing an  
appropriate header file, as described in http://docs.python.org/extending/extending.html#providing-a-c-api-for-an-extension-module 
.

> Second, the destructor callback is defined as taking *either* one *or*
> two parameters, depending on whether the "descr" pointer is non- 
> NULL. One can debate the finer points of what is and isn't defined  
> behavior in
> C, but at its heart this is a sloppy API.

<shrug>

It was necessary for backward compatibility. I don't know what  
motivated this, so I don't know if the benefit was worth the ugliness.

> MvL and I discussed this last night and decided to float a revision of
> the API.  I wrote the patch, though, so don't blame Martin if you  
> don't
> like my specific approach.
>
> The color of this particular bike shed is:
> * The PyCObject is now a private data structure; you must use  
> accessors.
> I added accessors for all the members.

The original implementation didn't expose the structure. I don't know  
why it was exposed. It would be backward incompatible to hide it again  
now.

> * The constructors and the main accessor (PyCObject_AsVoidPtr) now all
> *require* a "const char *type" parameter, which must be a non-NULL C
> string of non-zero length.  If you call that accessor and the "type"
> is invalid *or doesn't match*, it fails.

That would break backward compatibility. Are you proposing this for  
Python 3?

What would be the gain in this? The CObject is already a type  
identifier for itself.  In any case, client code generally doesn't  
mess with CObjects directly anyway.

> * The destructor now takes the PyObject *, not the PyCObject *.  You
> must use accessors to get your hands on the data inside to free it.

It currently isn't passed the CObject, but the C pointer that it  
holds.  In any case, changing the API isn't practical, at least not  
for Python 2.

> Yes, you can easily skip around the "matching type" restriction by
> calling PyCObject_AsVoidPtr(cobj, PyCObject_GetType(cobj)).  The point
> of my API changes is to *encourage* correct use.
>
> I've posted a patch implementing this change in the 3.1 trunk to the
> bug tracker:
>
>   http://bugs.python.org/issue5630
>
> I look forward to your comments!

-1

I don't see that this gains anything.

1. All you're adding, afaict is a name for the API and the (address of  
the) CObject itself already provides this.

2. Only code provided by the module provider should be accessing the  
CObject exported by the module.

Jim

--
Jim Fulton
Zope Corporation

From david.christian at gmail.com  Wed Apr  1 23:49:31 2009
From: david.christian at gmail.com (David Christian)
Date: Wed, 1 Apr 2009 17:49:31 -0400
Subject: [Python-Dev] bdb.py trace C implementation?
Message-ID: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com>

Hi all,
I've recently written a C version of the trace function used in
figleaf (the coverage tool written by Titus).  After a few rewrites to
add in caching, etc, it gives users a significant speedup.  One person
stated that switching to the C version caused coverage to decrease
from a 442% slowdown to only a 56% slowdown.

You can see my C implementation at:
 http://github.com/ctb/figleaf/blob/e077155956c288b68704b09889ebcd675ba02240/figleaf/_coverage.c

(Specific comments about the implementation welcome off-list).

I'd like to attempt something similar for bdb.py (only for the trace
function).  A naive C trace function which duplicated the current
python function should speed up bdb significantly.  I would initially
write the smallest part of the C implementation that I could.
Basically the tracing function would call back out to python at any
point where a line requires action.

I'd be willing to maintain the C implementation.  I would be willing
to write those tests that are possible as well.

Is this something that would be likely to be accepted?

Thanks,
David Christian
Senior Software Engineer
rPath, Inc.

From rdmurray at bitdance.com  Thu Apr  2 00:22:26 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 1 Apr 2009 18:22:26 -0400 (EDT)
Subject: [Python-Dev] CSV, bytes and encodings
In-Reply-To: <loom.20090401T104816-798@post.gmane.org>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>
	<loom.20090401T100027-754@post.gmane.org>
	<18899.17394.455907.841425@montanaro.dyndns.org>
	<loom.20090401T104816-798@post.gmane.org>
Message-ID: <Pine.LNX.4.64.0904011756270.26362@kimball.webabinitio.net>

On Wed, 1 Apr 2009 at 10:53, Antoine Pitrou wrote:
> Perhaps. But without using 'rU' the file couldn't be read at all.
> (I'm not sure it was Windows line endings by the way; perhaps Macintosh ones;
> anyway, it didn't work using 'rb')

I just tested it in 2.6.  It must have been old-mac (\r), which indeed
gave me the error message you mentioned.  Windows lineneds worked fine
for me reading in binary mode on linux.

> I have to add that if individual fields really can contain newlines, then the
> CSV module ought to be smarter when /saving/ those fields. I've inadvertently
> tried to produce a CSV file with such fields and it ended up wrong when opened
> as a spreadsheet (text after the newlines was ignored in Gnumeric and in
> OpenOffice, while Excel displayed a spurious additional row containing only the
> text after the newline).

I just added some tests to trunk that seem to indicate this case is
handled correctly in terms of preserving the data.  Maybe you didn't
write the file such that the fields with the newlines were quoted?
And of course how non-Excel applications handle that data on import
can be different from how Excel handles it.

--David

From benjamin at python.org  Thu Apr  2 00:25:57 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 1 Apr 2009 17:25:57 -0500
Subject: [Python-Dev] bdb.py trace C implementation?
In-Reply-To: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com>
References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com>
Message-ID: <1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com>

2009/4/1 David Christian <david.christian at gmail.com>:
> Hi all,
> I've recently written a C version of the trace function used in
> figleaf (the coverage tool written by Titus). ?After a few rewrites to
> add in caching, etc, it gives users a significant speedup. ?One person
> stated that switching to the C version caused coverage to decrease
> from a 442% slowdown to only a 56% slowdown.
>
> You can see my C implementation at:
> ?http://github.com/ctb/figleaf/blob/e077155956c288b68704b09889ebcd675ba02240/figleaf/_coverage.c
>
> (Specific comments about the implementation welcome off-list).
>
> I'd like to attempt something similar for bdb.py (only for the trace
> function). ?A naive C trace function which duplicated the current
> python function should speed up bdb significantly. ?I would initially
> write the smallest part of the C implementation that I could.
> Basically the tracing function would call back out to python at any
> point where a line requires action.
>
> I'd be willing to maintain the C implementation. ?I would be willing
> to write those tests that are possible as well.
>
> Is this something that would be likely to be accepted?

Generally debugging doesn't require good performance, so this is
definitely low priority. However, if you can contribute it, I don't
have a problem with it.

-- 
Regards,
Benjamin

From rdmurray at bitdance.com  Thu Apr  2 00:44:58 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 1 Apr 2009 18:44:58 -0400 (EDT)
Subject: [Python-Dev] CSV, bytes and encodings
In-Reply-To: <Pine.LNX.4.64.0904011756270.26362@kimball.webabinitio.net>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>
	<loom.20090401T100027-754@post.gmane.org>
	<18899.17394.455907.841425@montanaro.dyndns.org>
	<loom.20090401T104816-798@post.gmane.org>
	<Pine.LNX.4.64.0904011756270.26362@kimball.webabinitio.net>
Message-ID: <Pine.LNX.4.64.0904011838100.26362@kimball.webabinitio.net>

OK, Antoine, having merged my newline tests to py3k and having
them work when lineend is set to '', as you suggested on the
ticket, I'm inclined to agree with you that this is a doc bug.

Skip?

--David

From guido at python.org  Thu Apr  2 00:48:59 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 1 Apr 2009 15:48:59 -0700
Subject: [Python-Dev] bdb.py trace C implementation?
In-Reply-To: <1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com>
References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> 
	<1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com>
Message-ID: <ca471dc20904011548l90a34f0x4632dc74298f71a2@mail.gmail.com>

On Wed, Apr 1, 2009 at 3:25 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2009/4/1 David Christian <david.christian at gmail.com>:
>> Hi all,
>> I've recently written a C version of the trace function used in
>> figleaf (the coverage tool written by Titus). ?After a few rewrites to
>> add in caching, etc, it gives users a significant speedup. ?One person
>> stated that switching to the C version caused coverage to decrease
>> from a 442% slowdown to only a 56% slowdown.
>>
>> You can see my C implementation at:
>> ?http://github.com/ctb/figleaf/blob/e077155956c288b68704b09889ebcd675ba02240/figleaf/_coverage.c
>>
>> (Specific comments about the implementation welcome off-list).
>>
>> I'd like to attempt something similar for bdb.py (only for the trace
>> function). ?A naive C trace function which duplicated the current
>> python function should speed up bdb significantly. ?I would initially
>> write the smallest part of the C implementation that I could.
>> Basically the tracing function would call back out to python at any
>> point where a line requires action.
>>
>> I'd be willing to maintain the C implementation. ?I would be willing
>> to write those tests that are possible as well.
>>
>> Is this something that would be likely to be accepted?
>
> Generally debugging doesn't require good performance, so this is
> definitely low priority. However, if you can contribute it, I don't
> have a problem with it.

Tracing has other uses besides debugging though. In particular,
coverage, which usually wants per-line data. Also, sometimes if you
set a breakpoint in a function it turns on tracing for the entire
function. This can sometimes be annoyingly slow. So, personally, I am
more positive than that, and hope it will make it in.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From benjamin at python.org  Thu Apr  2 00:53:30 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 1 Apr 2009 17:53:30 -0500
Subject: [Python-Dev] bdb.py trace C implementation?
In-Reply-To: <ca471dc20904011548l90a34f0x4632dc74298f71a2@mail.gmail.com>
References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com>
	<1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com>
	<ca471dc20904011548l90a34f0x4632dc74298f71a2@mail.gmail.com>
Message-ID: <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com>

2009/4/1 Guido van Rossum <guido at python.org>:
> Tracing has other uses besides debugging though.

The OP said he wished to implement a C trace function for bdb.
Wouldn't that make it only applicable to debugging?

-- 
Regards,
Benjamin

From robert.kern at gmail.com  Thu Apr  2 01:00:56 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 01 Apr 2009 18:00:56 -0500
Subject: [Python-Dev] bdb.py trace C implementation?
In-Reply-To: <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com>
References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com>	<1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com>	<ca471dc20904011548l90a34f0x4632dc74298f71a2@mail.gmail.com>
	<1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com>
Message-ID: <gr0rn9$isu$1@ger.gmane.org>

On 2009-04-01 17:53, Benjamin Peterson wrote:
> 2009/4/1 Guido van Rossum<guido at python.org>:
>> Tracing has other uses besides debugging though.
>
> The OP said he wished to implement a C trace function for bdb.
> Wouldn't that make it only applicable to debugging?

Once you are at the breakpoint and stepping through the code manually, the 
performance is not all that important. However, up until that breakpoint, you 
are running a lot of code "in bulk". It would be useful to have a performant 
trace function that interferes with that code the least.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

From david.christian at gmail.com  Thu Apr  2 01:07:32 2009
From: david.christian at gmail.com (David Christian)
Date: Wed, 1 Apr 2009 19:07:32 -0400
Subject: [Python-Dev] bdb.py trace C implementation?
In-Reply-To: <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com>
References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> 
	<1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com> 
	<ca471dc20904011548l90a34f0x4632dc74298f71a2@mail.gmail.com> 
	<1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com>
Message-ID: <63940b00904011607y75a602acxea87905d9923f66e@mail.gmail.com>

On Wed, Apr 1, 2009 at 6:53 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2009/4/1 Guido van Rossum <guido at python.org>:
>> Tracing has other uses besides debugging though.
>
> The OP said he wished to implement a C trace function for bdb.
> Wouldn't that make it only applicable to debugging?
>
> Benjamin
>
I was suggesting a speedup for debugging.  However, I could certainly
also contribute my figleaf work that I referenced earlier, with a few
tweaks, as a tracing replacement for the tracing function in trace.py.

My concern with moving the coverage tracing code in particular to the
standard library is that it tries to extract the maximum speed by
being clever*, and certainly has not been out in the wild for long
enough.  I would write something much more conservative as a starting
point for bdb.py.  I expect that any C implementation that was
thinking about performance at all would be much better than the status
quo.

* figleaf checks a regular expression to determine whether or not we
wish to trace a particular file.  If the file is not being traced, I
switch to the profiler instead of the line tracer, which means that
the trace function only gets called twice per function instead of once
per line.  This can give a large speedup when you are skipping the
entire standard library, at some measurable cost per function call,
and a cost in code complexity.

---
David Christian
Senior Software Engineer
rPath, Inc

From guido at python.org  Thu Apr  2 01:14:25 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 1 Apr 2009 16:14:25 -0700
Subject: [Python-Dev] bdb.py trace C implementation?
In-Reply-To: <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com>
References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> 
	<1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com> 
	<ca471dc20904011548l90a34f0x4632dc74298f71a2@mail.gmail.com> 
	<1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com>
Message-ID: <ca471dc20904011614i2798a2e8p9ac26fa1a39b64be@mail.gmail.com>

On Wed, Apr 1, 2009 at 3:53 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2009/4/1 Guido van Rossum <guido at python.org>:
>> Tracing has other uses besides debugging though.
>
> The OP said he wished to implement a C trace function for bdb.
> Wouldn't that make it only applicable to debugging?

I honestly don't recall, but I believe pretty much everyone who uses
tracing does so via bdb.py. And yes, when debugging sometimes you have
to silently skip 1000 iterations until a condition becomes true, and
the tracking speed matters.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From larry at hastings.org  Thu Apr  2 01:26:15 2009
From: larry at hastings.org (Larry Hastings)
Date: Wed, 01 Apr 2009 16:26:15 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
Message-ID: <49D3F817.9080201@hastings.org>

Jim Fulton wrote:
> The only type-safety mechanism for a CObject is it's identity.  If you 
> want to make sure you're using the foomodule api, make sure the 
> address of the CObject is the same as the address of the api object 
> exported by the module.
That doesn't help.  Here's a program that crashes the interpreter, 
something I shouldn't be able to do from pure Python:

    import _socket
    import cStringIO
    cStringIO.cStringIO_CAPI = _socket.CAPI

    import cPickle
    s = cPickle.dumps([1, 2, 3])

How can cPickle determine that cStringIO.cStringIO_CAPI is legitimate?

> That would break backward compatibility. Are you proposing this for 
> Python 3?

I'm proposing this for Python 3.1.  My understanding is that breaking 
backwards compatibility is still on the table, which is why I wrote the 
patch the way I did.  If we have to preserve the existing API, I still 
think we should add new APIs and deprecate the old ones.

It's worth noting that there's been demand for this for a long time.  
Check out this comment from Include/datetime.h:

    #define PyDateTime_IMPORT \
            PyDateTimeAPI = (PyDateTime_CAPI*)
    PyCObject_Import("datetime", \

    "datetime_CAPI")

    /* This macro would be used if PyCObject_ImportEx() was created.
    #define PyDateTime_IMPORT \
            PyDateTimeAPI = (PyDateTime_CAPI*)
    PyCObject_ImportEx("datetime", \

    "datetime_CAPI", \

    DATETIME_API_MAGIC)
    */

That was checked in by Tim Peters on 2004-06-20, r36214.  (At least, in 
the py3k/trunk branch; I'd hope it would be the same revision number in 
other branches.)

/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090401/e118e00b/attachment.htm>

From jpe at wingware.com  Thu Apr  2 01:29:20 2009
From: jpe at wingware.com (John Ehresman)
Date: Wed, 01 Apr 2009 18:29:20 -0500
Subject: [Python-Dev] PyDict_SetItem hook
Message-ID: <49D3F8D0.8070805@wingware.com>

I've written a proof of concept patch to add a hook to PyDict_SetItem at 
  http://bugs.python.org/issue5654  My motivation is to enable 
watchpoints in a python debugger that are called when an attribute or 
global changes.  I know that this won't cover function locals and 
objects with slots (as Martin pointed out).

We talked about this at the sprints and a few issues came up:

* Is this worth it for debugger watchpoint support?  This is a feature 
that probably wouldn't be used regularly but is extremely useful in some 
situations.

* Would it be better to create a namespace dict subclass of dict, use it 
for modules, classes, & instances, and only allow watches of the 
subclass instances?

* To what extent should non-debugger code use the hook?  At one end of 
the spectrum, the hook could be made readily available for non-debug use 
and at the other end, it could be documented as being debug only, 
disabled in python -O, & not exposed in the stdlib to python code.

John

From chris at simplistix.co.uk  Thu Apr  2 01:48:13 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Thu, 02 Apr 2009 00:48:13 +0100
Subject: [Python-Dev] Get the standard library to declare the versions it
	provides!
In-Reply-To: <B7D9E610-72C6-4036-B678-CC2CDDFDB8C8@acm.org>
References: <20090327204953.12555.1384799699.divmod.xquotient.6636@weber.divmod.com>	<49CD43B7.3050904@v.loewis.de>
	<49CD4A4F.30900@trueblade.com>	<878wmqxjaz.fsf@xemacs.org>
	<49CE2726.3050307@trueblade.com>
	<B7D9E610-72C6-4036-B678-CC2CDDFDB8C8@acm.org>
Message-ID: <49D3FD3D.4020503@simplistix.co.uk>

Fred Drake wrote:
> Even simple cases present issues with regard to this.  For example, I 
> work on a project that relies on the uuid module, so we declare a 
> dependency on Ka-Ping Ye's uuid module (since we're using Python 2.4).  
> How should we write that in a version-agnostic way if we want to use the 
> standard library version of that module with newer Pythons?

Well, that could be done be getting standard library modules to:

- declare what version they are
- be overridable why installed packages

That way, the fact that the standard library's development moves at the 
speed of frozen tar wouldn't stop packages in it being developed and 
released seperately for people who want to use newer versions of them 
and aren't in a situation where they need "batteries included"...

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From guido at python.org  Thu Apr  2 01:53:28 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 1 Apr 2009 16:53:28 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D3F817.9080201@hastings.org>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com> 
	<49D3F817.9080201@hastings.org>
Message-ID: <ca471dc20904011653s212fb43m9c93fc4dfb966c25@mail.gmail.com>

2009/4/1 Larry Hastings <larry at hastings.org>:
>
> Jim Fulton wrote:
>
> The only type-safety mechanism for a CObject is it's identity.? If you want
> to make sure you're using the foomodule api, make sure the address of the
> CObject is the same as the address of the api object exported by the module.
>
> That doesn't help.? Here's a program that crashes the interpreter, something
> I shouldn't be able to do from pure Python:
>
> import _socket
> import cStringIO
> cStringIO.cStringIO_CAPI = _socket.CAPI
>
> import cPickle
> s = cPickle.dumps([1, 2, 3])
>
> How can cPickle determine that cStringIO.cStringIO_CAPI is legitimate?

This is a bug in cPickle. It calls the PycString_IMPORT macro at the
very end of its init_stuff() function without checking for success.
This macro calls PyCObject_Import("cStringIO", "cStringIO_CAPI") which
in turn calls PyCObject_AsVoidPtr() on the object that it finds as
cStringIO.cStringIO_CAPI, and this function *does* do a type check and
sets an exception if the object isn't a PyCObject instance. However
cPickle's initialization doesn't check for errors immediately and
apparently some later code overrides the exception.

The fix should be simple: insert

  if (PyErr_Occurred()) return -1;

immediately after the line

  PycString_IMPORT;

in init_stuff() in cPickle.c. This will cause the import of cPickle to
fail with an exception and all should be well.

I have to say, I haven't understood this whole thread, but I'm
skeptical about a redesign. But perhaps you can come up with an
example that doesn't rely on this cPickle bug?

--Guido

> That would break backward compatibility. Are you proposing this for Python
> 3?
>
> I'm proposing this for Python 3.1.? My understanding is that breaking
> backwards compatibility is still on the table, which is why I wrote the
> patch the way I did.? If we have to preserve the existing API, I still think
> we should add new APIs and deprecate the old ones.
>
> It's worth noting that there's been demand for this for a long time.? Check
> out this comment from Include/datetime.h:
>
> #define PyDateTime_IMPORT \
> ??????? PyDateTimeAPI = (PyDateTime_CAPI*) PyCObject_Import("datetime", \
> ??????????????????????????????????????????????????????????? "datetime_CAPI")
>
> /* This macro would be used if PyCObject_ImportEx() was created.
> #define PyDateTime_IMPORT \
> ??????? PyDateTimeAPI = (PyDateTime_CAPI*) PyCObject_ImportEx("datetime", \
> ??????????????????????????????????????????????????????????? "datetime_CAPI",
> \
>
> DATETIME_API_MAGIC)
> */
>
> That was checked in by Tim Peters on 2004-06-20, r36214.? (At least, in the
> py3k/trunk branch; I'd hope it would be the same revision number in other
> branches.)
>
>
> /larry/
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From collinw at gmail.com  Thu Apr  2 02:31:29 2009
From: collinw at gmail.com (Collin Winter)
Date: Wed, 1 Apr 2009 17:31:29 -0700
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <49D3F8D0.8070805@wingware.com>
References: <49D3F8D0.8070805@wingware.com>
Message-ID: <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>

On Wed, Apr 1, 2009 at 4:29 PM, John Ehresman <jpe at wingware.com> wrote:
> I've written a proof of concept patch to add a hook to PyDict_SetItem at
> ?http://bugs.python.org/issue5654 ?My motivation is to enable watchpoints in
> a python debugger that are called when an attribute or global changes. ?I
> know that this won't cover function locals and objects with slots (as Martin
> pointed out).
>
> We talked about this at the sprints and a few issues came up:
>
> * Is this worth it for debugger watchpoint support? ?This is a feature that
> probably wouldn't be used regularly but is extremely useful in some
> situations.
>
> * Would it be better to create a namespace dict subclass of dict, use it for
> modules, classes, & instances, and only allow watches of the subclass
> instances?
>
> * To what extent should non-debugger code use the hook? ?At one end of the
> spectrum, the hook could be made readily available for non-debug use and at
> the other end, it could be documented as being debug only, disabled in
> python -O, & not exposed in the stdlib to python code.

Have you measured the impact on performance?

Collin

From larry at hastings.org  Thu Apr  2 02:39:34 2009
From: larry at hastings.org (Larry Hastings)
Date: Wed, 01 Apr 2009 17:39:34 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <ca471dc20904011653s212fb43m9c93fc4dfb966c25@mail.gmail.com>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<49D3F817.9080201@hastings.org>
	<ca471dc20904011653s212fb43m9c93fc4dfb966c25@mail.gmail.com>
Message-ID: <49D40946.1050100@hastings.org>

Guido van Rossum wrote:
> This is a bug in cPickle. It calls the PycString_IMPORT macro at the
> very end of its init_stuff() function without checking for success.
>   

The bug you cite is a genuine bug, but that's not what I'm exploiting.

% python
 >>> import _socket
 >>> _socket.CAPI
<PyCObject object at 0xb7d5b500>

The PyCObject_Import() call in PycString_IMPORT doesn't return 
failure--it returns a valid CObject.  I stuck the *wrong* CObject in 
cStringIO on purpose.  With the current API there's no way for cPickle 
to tell that it's using the wrong one.

For what it's worth, the previous example was for Python 2.x.  (Python 3 
doesn't have "cStringIO" or "cPickle".)  Here's an example that crashes 
python in my py3k/trunk (sync'd Monday morning).  And this one's only 
three lines:

    import unicodedata
    import _multibytecodec
    _multibytecodec.__create_codec(unicodedata.ucnhash_CAPI)

/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090401/8f35da5d/attachment.htm>

From ocean-city at m2.ccsnet.ne.jp  Thu Apr  2 02:46:30 2009
From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto)
Date: Thu, 02 Apr 2009 09:46:30 +0900
Subject: [Python-Dev] 3.1a2
In-Reply-To: <49D26ED4.7090205@m2.ccsnet.ne.jp>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<49D26ED4.7090205@m2.ccsnet.ne.jp>
Message-ID: <49D40AE6.9040408@m2.ccsnet.ne.jp>

Hirokazu Yamamoto wrote:
> 
> I added #5499 to release blocker because it needs specification 
> decision. (It's too strong?)

Thank you for fixing this. I also added

#5391: mmap: read_byte/write_byte and object type
#5410: msvcrt bytes cleanup

which depend on this issue. These are also API spec issue.
#5410 is easy, but #5391 still needs decision which of getarg("c") or 
getarg("b") read_byte/write_byte should use.

From benjamin at python.org  Thu Apr  2 03:17:24 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 1 Apr 2009 20:17:24 -0500
Subject: [Python-Dev] 3.1a2
In-Reply-To: <49D40AE6.9040408@m2.ccsnet.ne.jp>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<49D26ED4.7090205@m2.ccsnet.ne.jp> <49D40AE6.9040408@m2.ccsnet.ne.jp>
Message-ID: <1afaf6160904011817v1ca3d47ep1f7640a636a0c615@mail.gmail.com>

2009/4/1 Hirokazu Yamamoto <ocean-city at m2.ccsnet.ne.jp>:
>
> Hirokazu Yamamoto wrote:
>>
>> I added #5499 to release blocker because it needs specification decision.
>> (It's too strong?)
>
> Thank you for fixing this. I also added
>
> #5391: mmap: read_byte/write_byte and object type
> #5410: msvcrt bytes cleanup
>
> which depend on this issue. These are also API spec issue.
> #5410 is easy, but #5391 still needs decision which of getarg("c") or
> getarg("b") read_byte/write_byte should use.

I'm afraid neither of these bugs are anywhere near my areas of
expertise, so I'll leave resolution of them to the experts. :)

-- 
Regards,
Benjamin

From lists at cheimes.de  Thu Apr  2 03:23:41 2009
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 02 Apr 2009 03:23:41 +0200
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <49D3F8D0.8070805@wingware.com>
References: <49D3F8D0.8070805@wingware.com>
Message-ID: <gr142t$5ei$1@ger.gmane.org>

John Ehresman wrote:
> * To what extent should non-debugger code use the hook?  At one end of
> the spectrum, the hook could be made readily available for non-debug use
> and at the other end, it could be documented as being debug only,
> disabled in python -O, & not exposed in the stdlib to python code.

To explain Collin's mail:
Python's dict implementation is crucial to the performance of any Python
program. Modules, types, instances all rely on the speed of Python's
dict type because most of them use a dict to store their name space.
Even the smallest change to the C code may lead to a severe performance
penalty. This is especially true for set and get operations.

From python at rcn.com  Thu Apr  2 03:37:34 2009
From: python at rcn.com (Raymond Hettinger)
Date: Wed, 1 Apr 2009 18:37:34 -0700
Subject: [Python-Dev] PyDict_SetItem hook
References: <49D3F8D0.8070805@wingware.com> <gr142t$5ei$1@ger.gmane.org>
Message-ID: <686ADDF37DF5413D93F43090806E0E5B@RaymondLaptop1>

> John Ehresman wrote:
>> * To what extent should non-debugger code use the hook?  At one end of
>> the spectrum, the hook could be made readily available for non-debug use
>> and at the other end, it could be documented as being debug only,
>> disabled in python -O, & not exposed in the stdlib to python code.
> 
> To explain Collin's mail:
> Python's dict implementation is crucial to the performance of any Python
> program. Modules, types, instances all rely on the speed of Python's
> dict type because most of them use a dict to store their name space.
> Even the smallest change to the C code may lead to a severe performance
> penalty. This is especially true for set and get operations.

See my comments in http://bugs.python.org/issue5654

Raymond

From guido at python.org  Thu Apr  2 04:08:56 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 1 Apr 2009 19:08:56 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D40946.1050100@hastings.org>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com> 
	<49D3F817.9080201@hastings.org>
	<ca471dc20904011653s212fb43m9c93fc4dfb966c25@mail.gmail.com> 
	<49D40946.1050100@hastings.org>
Message-ID: <ca471dc20904011908t55aa5a6cpa96d5d330f25a2be@mail.gmail.com>

On Wed, Apr 1, 2009 at 5:39 PM, Larry Hastings <larry at hastings.org> wrote:
>
> Guido van Rossum wrote:
>
> This is a bug in cPickle. It calls the PycString_IMPORT macro at the
> very end of its init_stuff() function without checking for success.
>
>
> The bug you cite is a genuine bug, but that's not what I'm exploiting.
>
> % python
>>>> import _socket
>>>> _socket.CAPI
> <PyCObject object at 0xb7d5b500>
>
> The PyCObject_Import() call in PycString_IMPORT doesn't return failure--it
> returns a valid CObject.? I stuck the *wrong* CObject in cStringIO on
> purpose.? With the current API there's no way for cPickle to tell that it's
> using the wrong one.

Ouch. So true.

> For what it's worth, the previous example was for Python 2.x.? (Python 3
> doesn't have "cStringIO" or "cPickle".)? Here's an example that crashes
> python in my py3k/trunk (sync'd Monday morning).? And this one's only three
> lines:
>
> import unicodedata
> import _multibytecodec
> _multibytecodec.__create_codec(unicodedata.ucnhash_CAPI)

Yeah, any two CAPI objects can be used to play this trick, as long as
you have some place that calls them. :-(

So what's your solution? If it was me I'd change the API to put the
full module name and variable name of the object inside the object and
have the IMPORT call check that. Then you can only have crashes if
some extension module cheats, and surely there are many other ways
that C extensions can cheat, so that doesn't bother me. :)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jpe at wingware.com  Thu Apr  2 04:16:51 2009
From: jpe at wingware.com (John Ehresman)
Date: Wed, 01 Apr 2009 21:16:51 -0500
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
Message-ID: <49D42013.3010600@wingware.com>

Collin Winter wrote:
> Have you measured the impact on performance?

I've tried to test using pystone, but am seeing more differences between 
runs than there is between python w/ the patch and w/o when there is no 
hook installed.  The highest pystone is actually from the binary w/ the 
patch, which I don't really believe unless it's some low level code 
generation affect.  The cost is one test of a global variable and then a 
switch to the branch that doesn't call the hooks.

I'd be happy to try to come up with better numbers next week after I get 
home from pycon.

John

From larry at hastings.org  Thu Apr  2 04:58:30 2009
From: larry at hastings.org (Larry Hastings)
Date: Wed, 01 Apr 2009 19:58:30 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <ca471dc20904011908t55aa5a6cpa96d5d330f25a2be@mail.gmail.com>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<49D3F817.9080201@hastings.org>
	<ca471dc20904011653s212fb43m9c93fc4dfb966c25@mail.gmail.com>
	<49D40946.1050100@hastings.org>
	<ca471dc20904011908t55aa5a6cpa96d5d330f25a2be@mail.gmail.com>
Message-ID: <49D429D6.90006@hastings.org>

Guido van Rossum wrote:
> Yeah, any two CAPI objects can be used to play this trick, as long as
> you have some place that calls them. :-(

FWIW, I can't take credit for this observation.  Neal Norwitz threw me 
at this class of problem at the Py3k sprints in August 2007 at Google 
Mountain View, specifically with curses, though the approach he 
suggested then was removing the CObjects.  Then, Monday night MvL and I 
re-established the problem based on my dim memories.

> So what's your solution? If it was me I'd change the API to put the
> full module name and variable name of the object inside the object and
> have the IMPORT call check that. Then you can only have crashes if
> some extension module cheats, and surely there are many other ways
> that C extensions can cheat, so that doesn't bother me. :)

My proposed API requires that the creator of the CObject pass in a 
"type" string, which must be of nonzero length, and the caller must pass 
in a matching string.  I figured that was easy to get right and 
sufficient for "consenting adults".  Note also this cheap 
exported-vtable hack isn't the only use of CObjects; for example _ctypes 
uses them to wrap plenty of one-off objects which are never set as 
attributes of the _ctypes module.  We'd like a solution that enforces 
some safety for those too, without creating spurious module attributes.

/larry//
/

From dalcinl at gmail.com  Thu Apr  2 05:36:32 2009
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Thu, 2 Apr 2009 00:36:32 -0300
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D429D6.90006@hastings.org>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<49D3F817.9080201@hastings.org>
	<ca471dc20904011653s212fb43m9c93fc4dfb966c25@mail.gmail.com>
	<49D40946.1050100@hastings.org>
	<ca471dc20904011908t55aa5a6cpa96d5d330f25a2be@mail.gmail.com>
	<49D429D6.90006@hastings.org>
Message-ID: <e7ba66e40904012036y4ffac5e0j248d109d43a288a9@mail.gmail.com>

On Wed, Apr 1, 2009 at 11:58 PM, Larry Hastings <larry at hastings.org> wrote:
>
> Guido van Rossum wrote:
>>
>> Yeah, any two CAPI objects can be used to play this trick, as long as
>> you have some place that calls them. :-(
>
> FWIW, I can't take credit for this observation. ?Neal Norwitz threw me at
> this class of problem at the Py3k sprints in August 2007 at Google Mountain
> View, specifically with curses, though the approach he suggested then was
> removing the CObjects.
>

IMHO, removing them would be a really bad idea... PyCObject's are the
documented recommended way to make ext modules export its API's, and
that works pretty well in practice, and more well now with your
approach.

>
>> So what's your solution? If it was me I'd change the API to put the
>> full module name and variable name of the object inside the object and
>> have the IMPORT call check that. Then you can only have crashes if
>> some extension module cheats, and surely there are many other ways
>> that C extensions can cheat, so that doesn't bother me. :)
>
> My proposed API requires that the creator of the CObject pass in a "type"
> string, which must be of nonzero length, and the caller must pass in a
> matching string. ?I figured that was easy to get right and sufficient for
> "consenting adults".

Just for reference, I'll comment how Cython uses this. First, Cython
exports API in a function-by-function basis (instead of a single
pointer to a C struct with function pointers, as e.g. cStringIO, or an
array of func pointers, as e.g. NumPy). All these are cached in a
"private" module global (a dict) named "__pyx_api__". See the link
below, for example:

http://mpi4py.scipy.org/docs/api/mpi4py.MPI-module.html#__pyx_capi__

So the dict keys are the exported function names. Moreover, the
PyCObject's "desc" are a C string with the function signature. Cython
retrieves a function by name from the dict and checks that the
expected signature match. BTW, now I believe Cython should also use
the function name for the "descr" :-)

The only issue with this approach for Cython is that PyCObject
currently stores "void*" (i.e., pointers to data), but does not have
room for "void(*)(void)" (i.e. pointers to functions, aka code).
Recently I had to write some hackery using type-punning with unions to
avoid the illegal conversion problem between pointers to data and
functions.

Larry, I did not understand your comments in the tracker about this.
Why do you see the above approach a miss-use of the API? All this
works extremely well in practice... A Cython-implement extension
module can export its API, and next you can consume it from Cython,
and moreover from hand-written C extension (and then you can easily
write SWIG typemaps).  And as the function are exported one by one,
you can even add stuff to some module API, and the consumers will not
notice the thing (API tables implemented with pointer to C struct or
array of function pointers, you need to be more careful for API
exporting being backward)

-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

From guido at python.org  Thu Apr  2 05:51:55 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 1 Apr 2009 20:51:55 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D429D6.90006@hastings.org>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com> 
	<49D3F817.9080201@hastings.org>
	<ca471dc20904011653s212fb43m9c93fc4dfb966c25@mail.gmail.com> 
	<49D40946.1050100@hastings.org>
	<ca471dc20904011908t55aa5a6cpa96d5d330f25a2be@mail.gmail.com> 
	<49D429D6.90006@hastings.org>
Message-ID: <ca471dc20904012051l32ea0d7bp3679f77040d91d05@mail.gmail.com>

On Wed, Apr 1, 2009 at 7:58 PM, Larry Hastings <larry at hastings.org> wrote:
> Guido van Rossum wrote:
>> Yeah, any two CAPI objects can be used to play this trick, as long as
>> you have some place that calls them. :-(
>
> FWIW, I can't take credit for this observation. ?Neal Norwitz threw me at
> this class of problem at the Py3k sprints in August 2007 at Google Mountain
> View, specifically with curses, though the approach he suggested then was
> removing the CObjects. ?Then, Monday night MvL and I re-established the
> problem based on my dim memories.
>
>> So what's your solution? If it was me I'd change the API to put the
>> full module name and variable name of the object inside the object and
>> have the IMPORT call check that. Then you can only have crashes if
>> some extension module cheats, and surely there are many other ways
>> that C extensions can cheat, so that doesn't bother me. :)
>
> My proposed API requires that the creator of the CObject pass in a "type"
> string, which must be of nonzero length, and the caller must pass in a
> matching string. ?I figured that was easy to get right and sufficient for
> "consenting adults".

OK, my proposal would be to agree on the value of this string too:
"module.variable".

> Note also this cheap exported-vtable hack isn't the
> only use of CObjects; for example _ctypes uses them to wrap plenty of
> one-off objects which are never set as attributes of the _ctypes module.
> ?We'd like a solution that enforces some safety for those too, without
> creating spurious module attributes.

Why would you care about safety for ctypes? It's about as unsafe as it
gets anyway. Coredump emptor I say.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From aahz at pythoncraft.com  Thu Apr  2 06:35:22 2009
From: aahz at pythoncraft.com (Aahz)
Date: Wed, 1 Apr 2009 21:35:22 -0700
Subject: [Python-Dev] CSV, bytes and encodings
In-Reply-To: <18899.25424.820832.462451@montanaro.dyndns.org>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>
	<loom.20090401T100027-754@post.gmane.org>
	<18899.17394.455907.841425@montanaro.dyndns.org>
	<loom.20090401T104816-798@post.gmane.org>
	<18899.25424.820832.462451@montanaro.dyndns.org>
Message-ID: <20090402043522.GA21023@panix.com>

On Wed, Apr 01, 2009, skip at pobox.com wrote:
> 
>     Antoine> Perhaps. But without using 'rU' the file couldn't be read at
>     Antoine> all.  (I'm not sure it was Windows line endings by the way;
>     Antoine> perhaps Macintosh ones; anyway, it didn't work using 'rb')
> 
> Please file a bug report and assign to me.  Does it work in 2.x?  What was
> the source of the file?

Perhaps there have been changes, but in my last job, I was running into
this problem with Python 2.3, and I also needed to open with 'rU' under
Linux.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."  --Brian W. Kernighan

From solipsis at pitrou.net  Thu Apr  2 07:23:22 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 02 Apr 2009 07:23:22 +0200
Subject: [Python-Dev] CSV, bytes and encodings
In-Reply-To: <Pine.LNX.4.64.0904011756270.26362@kimball.webabinitio.net>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>
	<loom.20090401T100027-754@post.gmane.org>
	<18899.17394.455907.841425@montanaro.dyndns.org>
	<loom.20090401T104816-798@post.gmane.org>
	<Pine.LNX.4.64.0904011756270.26362@kimball.webabinitio.net>
Message-ID: <1238649802.6033.5.camel@fsol>

Le mercredi 01 avril 2009 ? 18:22 -0400, R. David Murray a ?crit :
> I just added some tests to trunk that seem to indicate this case is
> handled correctly in terms of preserving the data.  Maybe you didn't
> write the file such that the fields with the newlines were quoted?

I used the default csv.writer into a StringIO, and the whole was then
returned as the response of an HTTP request (with the proper
Content-Type and Content-Disposition headers). I assume quoting is
enabled by default?

> And of course how non-Excel applications handle that data on import
> can be different from how Excel handles it.

Of course, but when three major spreadsheet software (including Excel
itself) choke on the embedded newline, there might be a problem (or
not :)).
(please note that as for Excel I couldn't test myself, a client of mine
did)

Regards

Antoine.

From rdmurray at bitdance.com  Thu Apr  2 07:27:05 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Thu, 2 Apr 2009 01:27:05 -0400 (EDT)
Subject: [Python-Dev] CSV, bytes and encodings
In-Reply-To: <1238649802.6033.5.camel@fsol>
References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com>
	<Pine.LNX.4.64.0904010109190.26362@kimball.webabinitio.net>
	<loom.20090401T100027-754@post.gmane.org>
	<18899.17394.455907.841425@montanaro.dyndns.org>
	<loom.20090401T104816-798@post.gmane.org>
	<Pine.LNX.4.64.0904011756270.26362@kimball.webabinitio.net>
	<1238649802.6033.5.camel@fsol>
Message-ID: <Pine.LNX.4.64.0904020123410.26362@kimball.webabinitio.net>

On Thu, 2 Apr 2009 at 07:23, Antoine Pitrou wrote:
> Le mercredi 01 avril 2009 ?? 18:22 -0400, R. David Murray a ??crit :
>> I just added some tests to trunk that seem to indicate this case is
>> handled correctly in terms of preserving the data.  Maybe you didn't
>> write the file such that the fields with the newlines were quoted?
>
> I used the default csv.writer into a StringIO, and the whole was then
> returned as the response of an HTTP request (with the proper
> Content-Type and Content-Disposition headers). I assume quoting is
> enabled by default?

Yes, it is.  The files I've encountered that had embedded newlines I
never tried to open in Excel or any other spreadsheet, so all _I'm_
sure of is that Excel produces them.

>> And of course how non-Excel applications handle that data on import
>> can be different from how Excel handles it.
>
> Of course, but when three major spreadsheet software (including Excel
> itself) choke on the embedded newline, there might be a problem (or
> not :)).
> (please note that as for Excel I couldn't test myself, a client of mine
> did)

I've made a note to test this, out of curiosity, when I get home.

--David

From ajaksu at gmail.com  Thu Apr  2 10:51:59 2009
From: ajaksu at gmail.com (Daniel (ajax) Diniz)
Date: Thu, 2 Apr 2009 05:51:59 -0300
Subject: [Python-Dev] Left the GSoC-mentors list
Message-ID: <2d75d7660904020151h7eabc461ged408e986f3cc34c@mail.gmail.com>

Hi,
I've just left the soc2009-mentors list on request, as I'm not a
mentor. So if you need my input on the mentor side regarding ideas
I've contributed to [1] (struct, socket, core helper tools or
Roundup), please CC me.

Best regards,
Daniel

[1] http://wiki.python.org/moin/SummerOfCode/2009/Incoming

From kristjan at ccpgames.com  Thu Apr  2 12:02:39 2009
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Thu, 2 Apr 2009 10:02:39 +0000
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D3B938.5000202@hastings.org>
References: <49D26BB1.8050108@hastings.org>
	<930F189C8A437347B80DF2C156F7EC7F056D52762C@exchis.ccp.ad.local>
	<49D3B938.5000202@hastings.org>
Message-ID: <930F189C8A437347B80DF2C156F7EC7F056D52773A@exchis.ccp.ad.local>

Thanks Larry.
I didn't notice the patch, or indeed the defect, hence my question.
A clarification in the documentation that a string comparison is indeed used might be useful.
As a user of CObject I appreciate this effort.
K

-----Original Message-----
From: Larry Hastings [mailto:larry at hastings.org] 

A method for answering further such questions suggests itself,

From greg.ewing at canterbury.ac.nz  Thu Apr  2 13:28:34 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 02 Apr 2009 23:28:34 +1200
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
Message-ID: <49D4A162.2020209@canterbury.ac.nz>

Jim Fulton wrote:

> The only type-safety mechanism for a CObject is it's identity.  If you  
> want to make sure you're using the foomodule api, make sure the  address 
> of the CObject is the same as the address of the api object  exported by 
> the module.

I don't follow that. If you already have the address of the
thing you want to use, you don't need a CObject.

> 2. Only code provided by the module provider should be accessing the  
> CObject exported by the module.

Not following that either. Without attaching some kind of
metadata to a CObject, I don't see how you can know whether
a CObject passed to you from Python code is one that you
created yourself, or by some other unrelated piece of
code.

Attaching some kind of type info to a CObject and having
an easy way of checking it makes sense to me. If the
existing CObject API can't be changed, maybe a new
enhanced one could be added.

-- 
Greg

From gjcarneiro at gmail.com  Thu Apr  2 14:25:54 2009
From: gjcarneiro at gmail.com (Gustavo Carneiro)
Date: Thu, 2 Apr 2009 13:25:54 +0100
Subject: [Python-Dev] OSError.errno => exception hierarchy?
Message-ID: <a467ca4f0904020525taabdeb8gd75ce8f73b418d66@mail.gmail.com>

Apologies if this has already been discussed.

I was expecting that by now, python 3.0, the following code:

            # clean the target dir
            import errno
            try:
                shutil.rmtree(trace_output_path)
            except OSError, ex:
                if ex.errno not in [errno.ENOENT]:
                    raise

Would have become something simpler, like this:

            # clean the target dir
            try:
                shutil.rmtree(trace_output_path)
            except OSErrorNoEntry:       # or maybe os.ErrorNoEntry
                pass

Apparently no one has bothered yet to turn OSError + errno into a hierarchy
of OSError subclasses, as it should.  What's the problem, no will to do it,
or no manpower?

Regards,

-- 
Gustavo J. A. M. Carneiro
INESC Porto, Telecommunications and Multimedia Unit
"The universe is always one step beyond logic." -- Frank Herbert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090402/5bd049a9/attachment.htm>

From hrvoje.niksic at avl.com  Thu Apr  2 14:42:44 2009
From: hrvoje.niksic at avl.com (Hrvoje Niksic)
Date: Thu, 02 Apr 2009 14:42:44 +0200
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <10203019.4341695.1238671712090.JavaMail.xicrypt@atgrzls001>
References: <49D26BB1.8050108@hastings.org>	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<10203019.4341695.1238671712090.JavaMail.xicrypt@atgrzls001>
Message-ID: <49D4B2C4.4060107@avl.com>

Greg Ewing wrote:
> Attaching some kind of type info to a CObject and having
> an easy way of checking it makes sense to me. If the
> existing CObject API can't be changed, maybe a new
> enhanced one could be added.

I thought the entire *point* of C object was that it's an opaque box 
without any info whatsoever, except that which is known and shared by 
its creator and its consumer.

If we're adding type information, then please make it a Python object 
rather than a C string.  That way the creator and the consumer can use a 
richer API to query the "type", such as by calling its methods or by 
inspecting it in some other way.  Instead of comparing strings with 
strcmp, it could use PyObject_RichCompareBool, which would allow a much 
more flexible way to define "types".  Using a PyObject also ensures that 
the lifecycle of the attached "type" is managed by the well-understood 
reference-counting mechanism.

From jim at zope.com  Thu Apr  2 15:16:29 2009
From: jim at zope.com (Jim Fulton)
Date: Thu, 2 Apr 2009 09:16:29 -0400
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <ca471dc20904012051l32ea0d7bp3679f77040d91d05@mail.gmail.com>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<49D3F817.9080201@hastings.org>
	<ca471dc20904011653s212fb43m9c93fc4dfb966c25@mail.gmail.com>
	<49D40946.1050100@hastings.org>
	<ca471dc20904011908t55aa5a6cpa96d5d330f25a2be@mail.gmail.com>
	<49D429D6.90006@hastings.org>
	<ca471dc20904012051l32ea0d7bp3679f77040d91d05@mail.gmail.com>
Message-ID: <64D315D7-D01E-4F8C-90C1-879D8B89EB8E@zope.com>

On Apr 1, 2009, at 11:51 PM, Guido van Rossum wrote:
...
>> Note also this cheap exported-vtable hack isn't the
>> only use of CObjects; for example _ctypes uses them to wrap plenty of
>> one-off objects which are never set as attributes of the _ctypes  
>> module.
>>  We'd like a solution that enforces some safety for those too,  
>> without
>> creating spurious module attributes.
>
> Why would you care about safety for ctypes? It's about as unsafe as it
> gets anyway. Coredump emptor I say.

At which point, I wonder why we worry so much about someone  
intentionally breaking a CObject as in Larry's example.

Jim

--
Jim Fulton
Zope Corporation

From jim at zope.com  Thu Apr  2 15:22:44 2009
From: jim at zope.com (Jim Fulton)
Date: Thu, 2 Apr 2009 09:22:44 -0400
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D4A162.2020209@canterbury.ac.nz>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<49D4A162.2020209@canterbury.ac.nz>
Message-ID: <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com>

On Apr 2, 2009, at 7:28 AM, Greg Ewing wrote:

> Jim Fulton wrote:
>
>> The only type-safety mechanism for a CObject is it's identity.  If  
>> you  want to make sure you're using the foomodule api, make sure  
>> the  address of the CObject is the same as the address of the api  
>> object  exported by the module.
>
> I don't follow that. If you already have the address of the
> thing you want to use, you don't need a CObject.

I was refering to the identity of the CObject itself.

>> 2. Only code provided by the module provider should be accessing  
>> the  CObject exported by the module.
>
> Not following that either. Without attaching some kind of
> metadata to a CObject, I don't see how you can know whether
> a CObject passed to you from Python code is one that you
> created yourself, or by some other unrelated piece of
> code.

The original use case for CObjects was to export an API from a module,  
in which case, you'd be importing the API from the module.  The  
presence in the module indicates the type. Of course, this doesn't  
account for someone intentionally replacing the module's CObject with  
a fake.

> Attaching some kind of type info to a CObject and having
> an easy way of checking it makes sense to me. If the
> existing CObject API can't be changed, maybe a new
> enhanced one could be added.

I don't think backward compatibility needs to be a consideration for  
Python 3 at this point.  I don't see much advantage in the proposal,  
but I can live with it for Python 3.

Jim

--
Jim Fulton
Zope Corporation

From kristjan at ccpgames.com  Thu Apr  2 15:36:37 2009
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Thu, 2 Apr 2009 13:36:37 +0000
Subject: [Python-Dev] py3k regression tests on Windows
Message-ID: <930F189C8A437347B80DF2C156F7EC7F056DD0B8E0@exchis.ccp.ad.local>

Hello there.
Yesterday I created a number of defects for regression test failures on Windows:
http://bugs.python.org/issue5646 : test_importlib fails for py3k on Windows
http://bugs.python.org/issue5645 : test_memoryio fails for py3k on windows
http://bugs.python.org/issue5643 : test__locale fails with RADIXCHAR on Windows

Does anyone feel like taking a look?

K
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090402/070b463a/attachment.htm>

From martin at v.loewis.de  Thu Apr  2 17:32:02 2009
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 02 Apr 2009 10:32:02 -0500
Subject: [Python-Dev] PEP 382: Namespace Packages
Message-ID: <49D4DA72.60401@v.loewis.de>

I propose the following PEP for inclusion to Python 3.1.
Please comment.

Regards,
Martin

Abstract
========

Namespace packages are a mechanism for splitting a single Python
package across multiple directories on disk. In current Python
versions, an algorithm to compute the packages __path__ must be
formulated. With the enhancement proposed here, the import machinery
itself will construct the list of directories that make up the
package.

Terminology
===========

Within this PEP, the term package refers to Python packages as defined
by Python's import statement. The term distribution refers to
separately installable sets of Python modules as stored in the Python
package index, and installed by distutils or setuptools. The term
vendor package refers to groups of files installed by an operating
system's packaging mechanism (e.g. Debian or Redhat packages install
on Linux systems).

The term portion refers to a set of files in a single directory (possibly
stored in a zip file) that contribute to a namespace package.

Namespace packages today
========================

Python currently provides the pkgutil.extend_path to denote a package as
a namespace package. The recommended way of using it is to put::

        from pkgutil import extend_path
        __path__ = extend_path(__path__, __name__)

int the package's ``__init__.py``. Every distribution needs to provide
the same contents in its ``__init__.py``, so that extend_path is
invoked independent of which portion of the package gets imported
first. As a consequence, the package's ``__init__.py`` cannot
practically define any names as it depends on the order of the package
fragments on sys.path which portion is imported first. As a special
feature, extend_path reads files named ``*.pkg`` which allow to
declare additional portions.

setuptools provides a similar function pkg_resources.declare_namespace
that is used in the form::

    import pkg_resources
    pkg_resources.declare_namespace(__name__)

In the portion's __init__.py, no assignment to __path__ is necessary,
as declare_namespace modifies the package __path__ through sys.modules.
As a special feature, declare_namespace also supports zip files, and
registers the package name internally so that future additions to sys.path
by setuptools can properly add additional portions to each package.

setuptools allows declaring namespace packages in a distribution's
setup.py, so that distribution developers don't need to put the
magic __path__ modification into __init__.py themselves.

Rationale
=========

The current imperative approach to namespace packages has lead to
multiple slightly-incompatible mechanisms for providing namespace
packages. For example, pkgutil supports ``*.pkg`` files; setuptools
doesn't. Likewise, setuptools supports inspecting zip files, and
supports adding portions to its _namespace_packages variable, whereas
pkgutil doesn't.

In addition, the current approach causes problems for system vendors.
Vendor packages typically must not provide overlapping files, and an
attempt to install a vendor package that has a file already on disk
will fail or cause unpredictable behavior. As vendors might chose to
package distributions such that they will end up all in a single
directory for the namespace package, all portions would contribute
conflicting __init__.py files.

Specification
=============

Rather than using an imperative mechanism for importing packages, a
declarative approach is proposed here, as an extension to the existing
``*.pkg`` mechanism.

The import statement is extended so that it directly considers ``*.pkg``
files during import; a directory is considered a package if it either
contains a file named __init__.py, or a file whose name ends with
".pkg".

In addition, the format of the ``*.pkg`` file is extended: a line with
the single character ``*`` indicates that the entire sys.path will
be searched for portions of the namespace package at the time the
namespace packages is imported.

Importing a package will immediately compute the package's __path__;
the ``*.pkg`` files are not considered anymore after the initial import.
If a ``*.pkg`` package contains an asterisk, this asterisk is prepended
to the package's __path__ to indicate that the package is a namespace
package (and that thus further extensions to sys.path might also
want to extend __path__). At most one such asterisk gets prepended
to the path.

extend_path will be extended to recognize namespace packages according
to this PEP, and avoid adding directories twice to __path__.

No other change to the importing mechanism is made; searching
modules (including __init__.py) will continue to stop at the first
module encountered.

Discussion
==========

With the addition of ``*.pkg`` files to the import mechanism, namespace
packages can stop filling out the namespace package's __init__.py.
As a consequence, extend_path and declare_namespace become obsolete.

It is recommended that distributions put a file <distribution>.pkg
into their namespace packages, with a single asterisk. This allows
vendor packages to install multiple portions of namespace package
into a single directory, with no risk of overlapping files.

Namespace packages can start providing non-trivial __init__.py
implementations; to do so, it is recommended that a single distribution
provides a portion with just the namespace package's __init__.py
(and potentially other modules that belong to the namespace package
proper).

The mechanism is mostly compatible with the existing namespace
mechanisms. extend_path will be adjusted to this specification;
any other mechanism might cause portions to get added twice to
__path__.

Copyright
=========

This document has been placed in the public domain.

From pje at telecommunity.com  Thu Apr  2 19:14:42 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Thu, 02 Apr 2009 13:14:42 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D4DA72.60401@v.loewis.de>
References: <49D4DA72.60401@v.loewis.de>
Message-ID: <20090402171218.9DDEF3A40A7@sparrow.telecommunity.com>

At 10:32 AM 4/2/2009 -0500, Martin v. L?wis wrote:
>I propose the following PEP for inclusion to Python 3.1.
>Please comment.

An excellent idea.  One thing I am not 100% clear on, is how to get 
additions to sys.path to work correctly with this.  Currently, when 
pkg_resources adds a new egg to sys.path, it uses its existing 
registry of namespace packages in order to locate which packages need 
__path__ fixups.  It seems under this proposal that it would have to 
scan sys.modules for objects with __path__ attributes that are lists 
that begin with a '*', instead...  which is a bit troubling because 
sys.modules doesn't always only contain module objects.  Many major 
frameworks place lazy module objects, and module proxies or wrappers 
of various sorts in there, so scanning through it arbitrarily is not 
really a good idea.

Perhaps we could add something like a sys.namespace_packages that 
would be updated by this mechanism?  Then, pkg_resources could check 
both that and its internal registry to be both backward and forward compatible.

Apart from that, this mechanism sounds great!  I only wish there was 
a way to backport it all the way to 2.3 so I could drop the messy 
bits from setuptools.

From guido at python.org  Thu Apr  2 19:19:17 2009
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Apr 2009 10:19:17 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com> 
	<49D4A162.2020209@canterbury.ac.nz>
	<6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com>
Message-ID: <ca471dc20904021019y66c44938n97ac9db677995249@mail.gmail.com>

On Thu, Apr 2, 2009 at 6:22 AM, Jim Fulton <jim at zope.com> wrote:
> The original use case for CObjects was to export an API from a module, in
> which case, you'd be importing the API from the module.

I consider this the *only* use case. What other use cases are there?

> The presence in the
> module indicates the type. Of course, this doesn't account for someone
> intentionally replacing the module's CObject with a fake.

And that's the problem. I would like the following to hold: given a
finite number of extension modules that I trust to be safe (i.e.
excluding ctypes!), pure Python code should not be able to cause any
of their CObjects to be passed off for another.

Putting an identity string in the CObject and checking that string in
PyCObject_Import() solves this.

Adding actual information about what the CObject *means* is
emphatically out of scope. Once a CObject is identified as having the
correct module and name, I am okay with trusting it, because Python
code has no way to create CObjects. I have to trust the extension that
exports the CObject anyway, since after all it is C code that could do
anything at all. But I need to be able to trust that the app cannot
swap CObjects.

>> Attaching some kind of type info to a CObject and having
>> an easy way of checking it makes sense to me. If the
>> existing CObject API can't be changed, maybe a new
>> enhanced one could be added.
>
> I don't think backward compatibility needs to be a consideration for Python
> 3 at this point. ?I don't see much advantage in the proposal, but I can live
> with it for Python 3.

Good. Let's solve this for 3.1, and figure out whether or how to
backport later, since for 2.6 (and probably 2.7) binary backwards
compatibility is most important.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From solipsis at pitrou.net  Thu Apr  2 19:24:04 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 2 Apr 2009 17:24:04 +0000 (UTC)
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<49D4A162.2020209@canterbury.ac.nz>
	<6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com>
	<ca471dc20904021019y66c44938n97ac9db677995249@mail.gmail.com>
Message-ID: <loom.20090402T172200-58@post.gmane.org>

Guido van Rossum <guido <at> python.org> writes:
> 
> On Thu, Apr 2, 2009 at 6:22 AM, Jim Fulton <jim <at> zope.com> wrote:
> > The original use case for CObjects was to export an API from a module, in
> > which case, you'd be importing the API from the module.
> 
> I consider this the *only* use case. What other use cases are there?

I don't know if it is good style, but I could imagine it being used to
accumulate non-PyObject data in a Python container (e.g. a list), without too
much overhead.

It is used in getargs.c to manage a list of "destructors" of temporarily created
data for when a call to PyArg_Parse* fails.

From guido at python.org  Thu Apr  2 19:53:40 2009
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Apr 2009 10:53:40 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <loom.20090402T172200-58@post.gmane.org>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com> 
	<49D4A162.2020209@canterbury.ac.nz>
	<6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com> 
	<ca471dc20904021019y66c44938n97ac9db677995249@mail.gmail.com> 
	<loom.20090402T172200-58@post.gmane.org>
Message-ID: <ca471dc20904021053x636553dep9e406553e4e33882@mail.gmail.com>

On Thu, Apr 2, 2009 at 10:24 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Guido van Rossum <guido <at> python.org> writes:
>>
>> On Thu, Apr 2, 2009 at 6:22 AM, Jim Fulton <jim <at> zope.com> wrote:
>> > The original use case for CObjects was to export an API from a module, in
>> > which case, you'd be importing the API from the module.
>>
>> I consider this the *only* use case. What other use cases are there?
>
> I don't know if it is good style, but I could imagine it being used to
> accumulate non-PyObject data in a Python container (e.g. a list), without too
> much overhead.
>
> It is used in getargs.c to manage a list of "destructors" of temporarily created
> data for when a call to PyArg_Parse* fails.

Well, that sounds like it really just needs to manage a
variable-length array of void pointers, and using PyList and PyCObject
is just laziness (and perhaps the wrong kind -- I imagine I could
write the same code without using Python objects and it would be
cleaner *and* faster).

So no, I don't consider that a valid use case, or at least not one we
need to consider for backwards compatibility of the PyCObject design.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From thomas at python.org  Thu Apr  2 20:41:07 2009
From: thomas at python.org (Thomas Wouters)
Date: Thu, 2 Apr 2009 20:41:07 +0200
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <49D42013.3010600@wingware.com>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
Message-ID: <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>

On Thu, Apr 2, 2009 at 04:16, John Ehresman <jpe at wingware.com> wrote:

> Collin Winter wrote:
>
>> Have you measured the impact on performance?
>>
>
> I've tried to test using pystone, but am seeing more differences between
> runs than there is between python w/ the patch and w/o when there is no hook
> installed.  The highest pystone is actually from the binary w/ the patch,
> which I don't really believe unless it's some low level code generation
> affect.  The cost is one test of a global variable and then a switch to the
> branch that doesn't call the hooks.
>
> I'd be happy to try to come up with better numbers next week after I get
> home from pycon.
>

Pystone is pretty much a useless benchmark. If it measures anything, it's
the speed of the bytecode dispatcher (and it doesn't measure it particularly
well.) PyBench isn't any better, in my experience. Collin has collected a
set of reasonable benchmarks for Unladen Swallow, but they still leave a lot
to be desired. From the discussions at the VM and Language summits before
PyCon, I don't think anyone else has better benchmarks, though, so I would
suggest using Unladen Swallow's:
http://code.google.com/p/unladen-swallow/wiki/Benchmarks

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090402/23b4c5c3/attachment.htm>

From ron.duplain at gmail.com  Thu Apr  2 20:44:46 2009
From: ron.duplain at gmail.com (Ron DuPlain)
Date: Thu, 2 Apr 2009 14:44:46 -0400
Subject: [Python-Dev] 3to2 Project
In-Reply-To: <2b485bad0904010950h7c3f3275n1f03c4b2cf2dcc3e@mail.gmail.com>
References: <4222a8490903300744t498e79daodea9cff32e4a94c1@mail.gmail.com>
	<43aa6ff70903301037y215d979he36246d36c987493@mail.gmail.com>
	<1afaf6160903301929l4120abe5g96e2ca2fdb722896@mail.gmail.com>
	<2b485bad0904010950h7c3f3275n1f03c4b2cf2dcc3e@mail.gmail.com>
Message-ID: <2b485bad0904021144r614d468av45c26529019a56e3@mail.gmail.com>

On Wed, Apr 1, 2009 at 12:50 PM, Ron DuPlain <ron.duplain at gmail.com> wrote:
> On Mon, Mar 30, 2009 at 9:29 PM, Benjamin Peterson <benjamin at python.org> wrote:
>> 2009/3/30 Collin Winter <collinw at gmail.com>:
>>> If anyone is interested in working on this during the PyCon sprints or
>>> otherwise, here are some easy, concrete starter projects that would
>>> really help move this along:
>>> - The core refactoring engine needs to be broken out from 2to3. In
>>> particular, the tests/ and fixes/ need to get pulled up a directory,
>>> out of lib2to3/.
>>> - Once that's done, lib2to3 should then be renamed to something like
>>> librefactor or something else that indicates its more general nature.
>>> This will allow both 2to3 and 3to2 to more easily share the core
>>> components.
>>
>> FWIW, I think it is unfortunately too late to make this change. We've
>> already released it as lib2to3 in the standard library and I have
>> actually seen it used in other projects. (PythonScope, for example.)
>>
>
> Paul Kippes and I have been sprinting on this. ?We put lib2to3 into a
> refactor package and kept a shell lib2to3 to support the old
> interface.
>
> We are able to run 2to3, 3to2, lib2to3 tests, and refactor tests. ?We
> only have a few simple 3to2 fixes now, but they should be easy to add.
> ?We kept the old lib2to3 tests to make sure we didn't break anything.
> As things settle down, I'd like to verify that our new lib2to3 is
> backward-compatible (since right now it points to the new refactor
> lib) with one of the external projects.
>
> We've been using hg to push changesets between each other, but we'll
> be committing to the svn sandbox before the week is out. ?I'm heading
> out today, but Paul is sticking around another day.
>
> It's a start,
>
> Ron
>

See sandbox/trunk/refactor_pkg.
More fixers to come...

-Ron

From python at rcn.com  Thu Apr  2 20:58:18 2009
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 2 Apr 2009 11:58:18 -0700
Subject: [Python-Dev] PyDict_SetItem hook
References: <49D3F8D0.8070805@wingware.com><43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com><49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
Message-ID: <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>

The measurements are just a distractor.  We all already know that the hook is being added to a critical path.  Everyone will pay a cost for a feature that few people will use.  This is a really bad idea.  It is not part of a thorough, thought-out framework of container hooks (something that would need a PEP at the very least).    The case for how it helps us is somewhat thin.  The case for DTrace hooks was much stronger.  

If something does go in, it should be #ifdef'd out by default.  But then, I don't think it should go in at all.  

Raymond

  On Thu, Apr 2, 2009 at 04:16, John Ehresman <jpe at wingware.com> wrote:

    Collin Winter wrote:

      Have you measured the impact on performance?

    I've tried to test using pystone, but am seeing more differences between runs than there is between python w/ the patch and w/o when there is no hook installed.  The highest pystone is actually from the binary w/ the patch, which I don't really believe unless it's some low level code generation affect.  The cost is one test of a global variable and then a switch to the branch that doesn't call the hooks.

    I'd be happy to try to come up with better numbers next week after I get home from pycon.

  Pystone is pretty much a useless benchmark. If it measures anything, it's the speed of the bytecode dispatcher (and it doesn't measure it particularly well.) PyBench isn't any better, in my experience. Collin has collected a set of reasonable benchmarks for Unladen Swallow, but they still leave a lot to be desired. From the discussions at the VM and Language summits before PyCon, I don't think anyone else has better benchmarks, though, so I would suggest using Unladen Swallow's: http://code.google.com/p/unladen-swallow/wiki/Benchmarks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090402/c1a9d6ce/attachment.htm>

From larry at hastings.org  Thu Apr  2 21:22:51 2009
From: larry at hastings.org (Larry Hastings)
Date: Thu, 02 Apr 2009 12:22:51 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <ca471dc20904021019y66c44938n97ac9db677995249@mail.gmail.com>
References: <49D26BB1.8050108@hastings.org>	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<49D4A162.2020209@canterbury.ac.nz>	<6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com>
	<ca471dc20904021019y66c44938n97ac9db677995249@mail.gmail.com>
Message-ID: <49D5108B.3070706@hastings.org>

Guido van Rossum wrote:
> On Thu, Apr 2, 2009 at 6:22 AM, Jim Fulton <jim at zope.com> wrote:
>   
>> The original use case for CObjects was to export an API from a module, in
>> which case, you'd be importing the API from the module.
>>     
> I consider this the *only* use case. What other use cases are there?

Exporting a C/C++ data structure:

    http://wiki.cacr.caltech.edu/danse/index.php/Lots_more_details_on_writing_wrappers
    http://www.cacr.caltech.edu/projects/ARCS/array_kluge/array_klugemodule/html/misc_8h.html
    http://svn.xiph.org/trunk/vorbisfile-python/vorbisfile.c

Some folks don't register a proper type; they just wrap their objects in 
CObjects and add module methods.

The "obscure" method in the "Robin" package ( 
http://code.google.com/p/robin/ ) curiously wraps a *Python* object in a 
CObject:

    http://code.google.com/p/robin/source/browse/trunk/src/robin/frontends/python/module.cc

I must admit I don't understand why this is a good idea.

There are many more wild & wooly use cases to be found if you Google for 
"PyCObject_FromVoidPtr".  Using CObject to exporting C APIs seems to be 
the minority, outside the CPython sources anyway.

/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090402/337d0fc0/attachment-0001.htm>

From larry at hastings.org  Thu Apr  2 21:26:13 2009
From: larry at hastings.org (Larry Hastings)
Date: Thu, 02 Apr 2009 12:26:13 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D4B2C4.4060107@avl.com>
References: <49D26BB1.8050108@hastings.org>	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>	<10203019.4341695.1238671712090.JavaMail.xicrypt@atgrzls001>
	<49D4B2C4.4060107@avl.com>
Message-ID: <49D51155.3030606@hastings.org>

Hrvoje Niksic wrote:
> If we're adding type information, then please make it a Python object 
> rather than a C string.  That way the creator and the consumer can use 
> a richer API to query the "type", such as by calling its methods or by 
> inspecting it in some other way.

I'm not writing my patch that way; it would be too cumbersome for what 
is ostensibly an easy, light-weight API.  If you're going that route you 
might as well create a real PyTypeObject for the blob you're passing in.

But please feel free to contribute your own competing patch; you may 
start with my patch if you like.

YAGNI,

/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090402/76127e89/attachment.htm>

From larry at hastings.org  Thu Apr  2 21:28:48 2009
From: larry at hastings.org (Larry Hastings)
Date: Thu, 02 Apr 2009 12:28:48 -0700
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <ca471dc20904012051l32ea0d7bp3679f77040d91d05@mail.gmail.com>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<49D3F817.9080201@hastings.org>
	<ca471dc20904011653s212fb43m9c93fc4dfb966c25@mail.gmail.com>
	<49D40946.1050100@hastings.org>
	<ca471dc20904011908t55aa5a6cpa96d5d330f25a2be@mail.gmail.com>
	<49D429D6.90006@hastings.org>
	<ca471dc20904012051l32ea0d7bp3679f77040d91d05@mail.gmail.com>
Message-ID: <49D511F0.1040104@hastings.org>

Guido van Rossum wrote:
> OK, my proposal would be to agree on the value of this string too:
> "module.variable".
>   

That's a fine idea for cases where the CObject is stored as an attribute 
of a module; my next update of my patch will change the existing uses to 
use that format.

> Why would you care about safety for ctypes? It's about as unsafe as it
> gets anyway. Coredump emptor I say.

_ctypes and exporting C APIs are not the only use cases of CObjects in 
the wild.  Please see, uh, that email I wrote like five minutes ago, 
also a reply to you.

/larry/

From chris at simplistix.co.uk  Thu Apr  2 22:03:34 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Thu, 02 Apr 2009 21:03:34 +0100
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D4DA72.60401@v.loewis.de>
References: <49D4DA72.60401@v.loewis.de>
Message-ID: <49D51A16.70804@simplistix.co.uk>

Martin v. L?wis wrote:
> I propose the following PEP for inclusion to Python 3.1.
> Please comment.

Would this support the following case:

I have a package called mortar, which defines useful stuff:

from mortar import content, ...

I now want to distribute large optional chunks separately, but ideally 
so that the following will will work:

from mortar.rbd import ...
from mortar.zodb import ...
from mortar.wsgi import ...

Does the PEP support this? The only way I can currently think to do this 
would result in:

from mortar import content,..
from mortar_rbd import ...
from mortar_zodb import ...
from mortar_wsgi import ...

...which looks a bit unsightly to me.

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From chris at simplistix.co.uk  Thu Apr  2 22:03:49 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Thu, 02 Apr 2009 21:03:49 +0100
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090402171218.9DDEF3A40A7@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de>
	<20090402171218.9DDEF3A40A7@sparrow.telecommunity.com>
Message-ID: <49D51A25.10400@simplistix.co.uk>

P.J. Eby wrote:
> Apart from that, this mechanism sounds great!  I only wish there was a 
> way to backport it all the way to 2.3 so I could drop the messy bits 
> from setuptools.

Maybe we could? :-)

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From benjamin at python.org  Thu Apr  2 22:25:09 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 2 Apr 2009 15:25:09 -0500
Subject: [Python-Dev] OSError.errno => exception hierarchy?
In-Reply-To: <a467ca4f0904020525taabdeb8gd75ce8f73b418d66@mail.gmail.com>
References: <a467ca4f0904020525taabdeb8gd75ce8f73b418d66@mail.gmail.com>
Message-ID: <1afaf6160904021325m6d5eb230o3f762ec5b5568d1f@mail.gmail.com>

2009/4/2 Gustavo Carneiro <gjcarneiro at gmail.com>:
> Apologies if this has already been discussed.

I don't believe it has ever been discussed to be implemented.

> Apparently no one has bothered yet to turn OSError + errno into a hierarchy
> of OSError subclasses, as it should.? What's the problem, no will to do it,
> or no manpower?

Python doesn't need any more builtin exceptions to clutter the
namespace. Besides, what's wrong with just checking the errno?

-- 
Regards,
Benjamin

From mal at egenix.com  Thu Apr  2 22:33:25 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 02 Apr 2009 22:33:25 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D4DA72.60401@v.loewis.de>
References: <49D4DA72.60401@v.loewis.de>
Message-ID: <49D52115.6020001@egenix.com>

On 2009-04-02 17:32, Martin v. L?wis wrote:
> I propose the following PEP for inclusion to Python 3.1.

Thanks for picking this up.

I'd like to extend the proposal to Python 2.7 and later.

> Please comment.
> 
> Regards,
> Martin
> 
> Specification
> =============
> 
> Rather than using an imperative mechanism for importing packages, a
> declarative approach is proposed here, as an extension to the existing
> ``*.pkg`` mechanism.
> 
> The import statement is extended so that it directly considers ``*.pkg``
> files during import; a directory is considered a package if it either
> contains a file named __init__.py, or a file whose name ends with
> ".pkg".

That's going to slow down Python package detection a lot - you'd
replace an O(1) test with an O(n) scan.

Alternative Approach:
---------------------

Wouldn't it be better to stick with a simpler approach and look for
"__pkg__.py" files to detect namespace packages using that O(1) check ?

This would also avoid any issues you'd otherwise run into if you want
to maintain this scheme in an importer that doesn't have access to a list
of files in a package directory, but is well capable for the checking
the existence of a file.

Mechanism:
----------

If the import mechanism finds a matching namespace package (a directory
with a __pkg__.py file), it then goes into namespace package scan mode and
scans the complete sys.path for more occurrences of the same namespace
package.

The import loads all __pkg__.py files of matching namespace packages
having the same package name during the search.

One of the namespace packages, the defining namespace package, will have
to include a __init__.py file.

After having scanned all matching namespace packages and loading
the __pkg__.py files in the order of the search, the import mechanism
then sets the packages .__path__ attribute to include all namespace
package directories found on sys.path and finally executes the
__init__.py file.

(Please let me know if the above is not clear, I will then try to
follow up on it.)

Discussion:
-----------

The above mechanism allows the same kind of flexibility we already
have with the existing normal __init__.py mechanism.

* It doesn't add yet another .pth-style sys.path extension (which are
difficult to manage in installations).

* It always uses the same naive sys.path search strategy. The strategy
is not determined by some file contents.

* The search is only done once - on the first import of the package.

* It's possible to have a defining package dir and add-one package
dirs.

* Namespace packages are easy to recognize by testing for a single
resource.

* Namespace __pkg__.py modules can provide extra meta-information,
logging, etc. to simplify debugging namespace package setups.

* It's possible to freeze such setups, to put them into ZIP files,
or only have parts of it in a ZIP file and the other parts in the
file-system.

Caveats:

* Changes to sys.path will not result in an automatic rescan for
additional namespace packages, if the package was already loaded.
However, we could have a function to make such a rescan explicit.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 02 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From jackdied at gmail.com  Thu Apr  2 22:35:41 2009
From: jackdied at gmail.com (Jack diederich)
Date: Thu, 2 Apr 2009 16:35:41 -0400
Subject: [Python-Dev] OSError.errno => exception hierarchy?
In-Reply-To: <1afaf6160904021325m6d5eb230o3f762ec5b5568d1f@mail.gmail.com>
References: <a467ca4f0904020525taabdeb8gd75ce8f73b418d66@mail.gmail.com>
	<1afaf6160904021325m6d5eb230o3f762ec5b5568d1f@mail.gmail.com>
Message-ID: <b8e622740904021335l14ba1644hc05f50b2657f58bd@mail.gmail.com>

On Thu, Apr 2, 2009 at 4:25 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2009/4/2 Gustavo Carneiro <gjcarneiro at gmail.com>:
>> Apologies if this has already been discussed.
>
> I don't believe it has ever been discussed to be implemented.
>
>> Apparently no one has bothered yet to turn OSError + errno into a hierarchy
>> of OSError subclasses, as it should.? What's the problem, no will to do it,
>> or no manpower?
>
> Python doesn't need any more builtin exceptions to clutter the
> namespace. Besides, what's wrong with just checking the errno?

The problem is manpower (this has been no ones itch).  In order to
have a hierarchy of OSError exceptions the underlying code would have
to raise them.  That means diving into all the C code that raises
OSError and cleaning them up.

I'm +1 on the idea but -1 on doing the work myself.

-Jack

From barry at python.org  Thu Apr  2 22:44:09 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 2 Apr 2009 15:44:09 -0500
Subject: [Python-Dev] OSError.errno => exception hierarchy?
In-Reply-To: <b8e622740904021335l14ba1644hc05f50b2657f58bd@mail.gmail.com>
References: <a467ca4f0904020525taabdeb8gd75ce8f73b418d66@mail.gmail.com>
	<1afaf6160904021325m6d5eb230o3f762ec5b5568d1f@mail.gmail.com>
	<b8e622740904021335l14ba1644hc05f50b2657f58bd@mail.gmail.com>
Message-ID: <1304C4AA-F450-49D4-9EC3-CDE3B414FA40@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Apr 2, 2009, at 3:35 PM, Jack diederich wrote:

> On Thu, Apr 2, 2009 at 4:25 PM, Benjamin Peterson  
> <benjamin at python.org> wrote:
>> 2009/4/2 Gustavo Carneiro <gjcarneiro at gmail.com>:
>>> Apologies if this has already been discussed.
>>
>> I don't believe it has ever been discussed to be implemented.
>>
>>> Apparently no one has bothered yet to turn OSError + errno into a  
>>> hierarchy
>>> of OSError subclasses, as it should.  What's the problem, no will  
>>> to do it,
>>> or no manpower?
>>
>> Python doesn't need any more builtin exceptions to clutter the
>> namespace. Besides, what's wrong with just checking the errno?
>
> The problem is manpower (this has been no ones itch).  In order to
> have a hierarchy of OSError exceptions the underlying code would have
> to raise them.  That means diving into all the C code that raises
> OSError and cleaning them up.
>
> I'm +1 on the idea but -1 on doing the work myself.

I'm +0/-1 (idea/work) on doing them all, but I think a /few/ errnos  
would be very handy.  I certainly check ENOENT and EEXIST very  
frequently, so being able to easily catch or ignore those would be a  
big win.  I'm sure there's one or two others that would give big bang  
for little buck.

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdUjmnEjvBPtnXfVAQKsqAP+Ol4N2EqmNl0AFRIyxyvY+i7JEWhcJMQl
7fNm/lVJt3s7+5oO7egzNJYAjCmvjd9Vdh4poAqWvmcrcJB3a0WDxf8ZTJnCErJx
ehdSpx9JO0nohrhcHM+EwcvQS39vZFFlLgOkCS5O57Wy5GdynAGBlPQY5abwJGEe
V8or9I16W/E=
=JG7r
-----END PGP SIGNATURE-----

From chris at simplistix.co.uk  Thu Apr  2 23:16:28 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Thu, 02 Apr 2009 22:16:28 +0100
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
	<49D35A39.7020507@simplistix.co.uk>
	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>
Message-ID: <49D52B2C.5050509@simplistix.co.uk>

R. David Murray wrote:
> On Wed, 1 Apr 2009 at 13:12, Chris Withers wrote:
>> Guido van Rossum wrote:
>>>  Well hold on for a minute, I remember we used to have an exec
>>>  statement in a class body in the standard library, to define some file
>>>  methods in socket.py IIRC. 
>>
>> But why an exec?! Surely there must be some other way to do this than 
>> an exec?
> 
> Maybe, but this sure is gnarly code:
> 
>     _s = ("def %s(self, *args): return self._sock.%s(*args)\n\n"
>           "%s.__doc__ = _realsocket.%s.__doc__\n")
>     for _m in _socketmethods:
>         exec _s % (_m, _m, _m, _m)
>     del _m, _s

I played around with this and managed to rewrite it as:

from functools import partial
from new import instancemethod

def meth(name,self,*args):
     return getattr(self._sock,name)(*args)

for _m in _socketmethods:
     p = partial(meth,_m)
     p.__name__ = _m
     p.__doc__ = getattr(_realsocket,_m).__doc__
     m = instancemethod(p,None,_socketobject)
     setattr(_socketobject,_m,m)

Have I missed something or is that a suitable replacement that gets rid 
of the exec nastiness?

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From guido at python.org  Thu Apr  2 23:18:30 2009
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Apr 2009 14:18:30 -0700
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <49D52B2C.5050509@simplistix.co.uk>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> 
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com> 
	<49D35A39.7020507@simplistix.co.uk>
	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net> 
	<49D52B2C.5050509@simplistix.co.uk>
Message-ID: <ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com>

On Thu, Apr 2, 2009 at 2:16 PM, Chris Withers <chris at simplistix.co.uk> wrote:
> R. David Murray wrote:
>>
>> On Wed, 1 Apr 2009 at 13:12, Chris Withers wrote:
>>>
>>> Guido van Rossum wrote:
>>>>
>>>> ?Well hold on for a minute, I remember we used to have an exec
>>>> ?statement in a class body in the standard library, to define some file
>>>> ?methods in socket.py IIRC.
>>>
>>> But why an exec?! Surely there must be some other way to do this than an
>>> exec?
>>
>> Maybe, but this sure is gnarly code:
>>
>> ? ?_s = ("def %s(self, *args): return self._sock.%s(*args)\n\n"
>> ? ? ? ? ?"%s.__doc__ = _realsocket.%s.__doc__\n")
>> ? ?for _m in _socketmethods:
>> ? ? ? ?exec _s % (_m, _m, _m, _m)
>> ? ?del _m, _s
>
> I played around with this and managed to rewrite it as:
>
> from functools import partial
> from new import instancemethod
>
> def meth(name,self,*args):
> ? ?return getattr(self._sock,name)(*args)
>
> for _m in _socketmethods:
> ? ?p = partial(meth,_m)
> ? ?p.__name__ = _m
> ? ?p.__doc__ = getattr(_realsocket,_m).__doc__
> ? ?m = instancemethod(p,None,_socketobject)
> ? ?setattr(_socketobject,_m,m)
>
> Have I missed something or is that a suitable replacement that gets rid of
> the exec nastiness?

That code in socket.py is much older that functools... I don't know if
the dependency matters, probably not.

But anyways this is moot, the bug was only about exec in a class body
*nested inside a function*.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Apr  2 23:19:58 2009
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Apr 2009 14:19:58 -0700
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> 
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> 
	<78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>
Message-ID: <ca471dc20904021419y33932768j8bc7e97e46361d6a@mail.gmail.com>

Wow. Can you possibly be more negative?

2009/4/2 Raymond Hettinger <python at rcn.com>:
> The measurements are just a distractor.? We all already know that the hook
> is being added to a critical path.? Everyone will pay a cost for a feature
> that few people will use.? This is a really bad idea.? It is not part of a
> thorough, thought-out framework of container hooks (something that would
> need a PEP at the very least).??? The case for how it helps us is somewhat
> thin.? The case for DTrace hooks was much stronger.
>
> If something does go in, it should be #ifdef'd out by default.? But then, I
> don't think it should go in at all.
>
>
> Raymond
>
>
>
>
> On Thu, Apr 2, 2009 at 04:16, John Ehresman <jpe at wingware.com> wrote:
>>
>> Collin Winter wrote:
>>>
>>> Have you measured the impact on performance?
>>
>> I've tried to test using pystone, but am seeing more differences between
>> runs than there is between python w/ the patch and w/o when there is no hook
>> installed. ?The highest pystone is actually from the binary w/ the patch,
>> which I don't really believe unless it's some low level code generation
>> affect. ?The cost is one test of a global variable and then a switch to the
>> branch that doesn't call the hooks.
>>
>> I'd be happy to try to come up with better numbers next week after I get
>> home from pycon.
>
> Pystone is pretty much a useless benchmark. If it measures anything, it's
> the speed of the bytecode dispatcher (and it doesn't measure it particularly
> well.) PyBench isn't any better, in my experience. Collin has collected a
> set of reasonable benchmarks for Unladen Swallow, but they still leave a lot
> to be desired. From the discussions at the VM and Language summits before
> PyCon, I don't think anyone else has better benchmarks, though, so I would
> suggest using Unladen Swallow's:
> http://code.google.com/p/unladen-swallow/wiki/Benchmarks
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From chris at simplistix.co.uk  Thu Apr  2 23:21:31 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Thu, 02 Apr 2009 22:21:31 +0100
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
	<49D35A39.7020507@simplistix.co.uk>
	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>
	<49D52B2C.5050509@simplistix.co.uk>
	<ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com>
Message-ID: <49D52C5B.7010506@simplistix.co.uk>

Guido van Rossum wrote:
>> from functools import partial
>> from new import instancemethod
>>
>> def meth(name,self,*args):
>>    return getattr(self._sock,name)(*args)
>>
>> for _m in _socketmethods:
>>    p = partial(meth,_m)
>>    p.__name__ = _m
>>    p.__doc__ = getattr(_realsocket,_m).__doc__
>>    m = instancemethod(p,None,_socketobject)
>>    setattr(_socketobject,_m,m)
>>
>> Have I missed something or is that a suitable replacement that gets rid of
>> the exec nastiness?
> 
> That code in socket.py is much older that functools... I don't know if
> the dependency matters, probably not.
> 
> But anyways this is moot, the bug was only about exec in a class body
> *nested inside a function*.

Indeed, I just hate seeing execs and it was an interesting mental 
exercise to try and get rid of the above one ;-)

Assuming it breaks no tests, would there be objection to me committing 
the above change to the Python 3 trunk?

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From amauryfa at gmail.com  Thu Apr  2 23:27:20 2009
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Thu, 2 Apr 2009 23:27:20 +0200
Subject: [Python-Dev] OSError.errno => exception hierarchy?
In-Reply-To: <b8e622740904021335l14ba1644hc05f50b2657f58bd@mail.gmail.com>
References: <a467ca4f0904020525taabdeb8gd75ce8f73b418d66@mail.gmail.com>
	<1afaf6160904021325m6d5eb230o3f762ec5b5568d1f@mail.gmail.com>
	<b8e622740904021335l14ba1644hc05f50b2657f58bd@mail.gmail.com>
Message-ID: <e27efe130904021427p6bef15y347f7729807b5fbb@mail.gmail.com>

Hello,

On Thu, Apr 2, 2009 at 22:35, Jack diederich <jackdied at gmail.com> wrote:
> On Thu, Apr 2, 2009 at 4:25 PM, Benjamin Peterson <benjamin at python.org> wrote:
>> 2009/4/2 Gustavo Carneiro <gjcarneiro at gmail.com>:
>>> Apologies if this has already been discussed.
>>
>> I don't believe it has ever been discussed to be implemented.
>>
>>> Apparently no one has bothered yet to turn OSError + errno into a hierarchy
>>> of OSError subclasses, as it should.? What's the problem, no will to do it,
>>> or no manpower?
>>
>> Python doesn't need any more builtin exceptions to clutter the
>> namespace. Besides, what's wrong with just checking the errno?
>
> The problem is manpower (this has been no ones itch). ?In order to
> have a hierarchy of OSError exceptions the underlying code would have
> to raise them. ?That means diving into all the C code that raises
> OSError and cleaning them up.
>
> I'm +1 on the idea but -1 on doing the work myself.
>
> -Jack

The py library (http://codespeak.net/py/dist/) already has a py.error
module that provide an exception class for each errno.
See for example how they use py.error.ENOENT, py.error.EACCES... to
implement some kind of FilePath object:
    http://codespeak.net/svn/py/dist/py/path/local/local.py

But I'm not sure I would like this kind of code in core python. Too
much magic...

-- 
Amaury Forgeot d'Arc

From guido at python.org  Thu Apr  2 23:49:22 2009
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Apr 2009 14:49:22 -0700
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <49D52C5B.7010506@simplistix.co.uk>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> 
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com> 
	<49D35A39.7020507@simplistix.co.uk>
	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net> 
	<49D52B2C.5050509@simplistix.co.uk>
	<ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com> 
	<49D52C5B.7010506@simplistix.co.uk>
Message-ID: <ca471dc20904021449x659f930bn7b61feec6b640752@mail.gmail.com>

On Thu, Apr 2, 2009 at 2:21 PM, Chris Withers <chris at simplistix.co.uk> wrote:
> Guido van Rossum wrote:
>>>
>>> from functools import partial
>>> from new import instancemethod
>>>
>>> def meth(name,self,*args):
>>> ? return getattr(self._sock,name)(*args)
>>>
>>> for _m in _socketmethods:
>>> ? p = partial(meth,_m)
>>> ? p.__name__ = _m
>>> ? p.__doc__ = getattr(_realsocket,_m).__doc__
>>> ? m = instancemethod(p,None,_socketobject)
>>> ? setattr(_socketobject,_m,m)
>>>
>>> Have I missed something or is that a suitable replacement that gets rid
>>> of
>>> the exec nastiness?
>>
>> That code in socket.py is much older that functools... I don't know if
>> the dependency matters, probably not.
>>
>> But anyways this is moot, the bug was only about exec in a class body
>> *nested inside a function*.
>
> Indeed, I just hate seeing execs and it was an interesting mental exercise
> to try and get rid of the above one ;-)
>
> Assuming it breaks no tests, would there be objection to me committing the
> above change to the Python 3 trunk?

That's up to Benjamin. Personally, I live by "if it ain't broke, don't
fix it." :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From orsenthil at gmail.com  Thu Apr  2 23:50:20 2009
From: orsenthil at gmail.com (Senthil Kumaran)
Date: Thu, 2 Apr 2009 16:50:20 -0500
Subject: [Python-Dev] [issue3609] does parse_header really belong in CGI
	module?
In-Reply-To: <1238708765.18.0.659138371932.issue3609@psf.upfronthosting.co.za>
References: <1219193477.24.0.768590998992.issue3609@psf.upfronthosting.co.za>
	<1238708765.18.0.659138371932.issue3609@psf.upfronthosting.co.za>
Message-ID: <7c42eba10904021450k3756ee0ftfe282d065024f2bb@mail.gmail.com>

http://bugs.python.org/issue3609  requests to move the function
parse_header present in cgi module to email package.

The reasons for this request are:

1) The MIME type header parsing methods rightly belong to email
package. Confirming to RFC 2045.
2) parse_qs, parse_qsl were similarly moved from cgi to urlparse.

The question here is, should the relocation happen in Python 2.7 as
well as in Python 3K or only in Python 3k?

If changes happen in Python 2.7, then cgi.parse_header will have
DeprecationWarning just in case we go for more versions in Python 2.x
series.

Does anyone have any concerns with this change?

-- 
Senthil

From chris at simplistix.co.uk  Thu Apr  2 23:57:07 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Thu, 02 Apr 2009 22:57:07 +0100
Subject: [Python-Dev] Package Management - thoughts from the peanut gallery
Message-ID: <49D534B3.8020801@simplistix.co.uk>

Hey All,

I have to admit to not having the willpower to plough through the 200 
unread messages in the packaging thread when I got back from PyCon but 
just wanted to throw out a few thoughts on what my python packaging 
utopia would look like:

- python would have a package format that included version numbers and 
dependencies.

- this package format would "play nice" with os-specific ideas of how 
packages should be structured.

- python itself would have a version number, so it could be treated as 
just another dependency by packages (ie: python >=2.3,<3)

- python would ship with a package manager that would let you install 
and uninstall python packages, resolving dependencies in the process and 
complaining if it couldn't or if there were clashes

- this package manager would facilitate the building of os-specific 
packages (.deb, .rpm) including providing dependency information, so 
making life *much* easier for these packagers.

- the standard library packages would be no different from any other 
package, and could be overridden as and when new versions became 
available on PyPI, should an end user so desire. They would also be free 
to have their own release lifecycles (unittest, distutils, email, I'm 
looking at you!)

- python would still ship "batteries included" with versions of these 
packages appropriate for the release, to keep those in corporate 
shackles or with no network happy. In fact, creating 
application-specific "bundles" like this would become trivial, helping 
those who have apps where they want to ship as single, isolated lumps 
which the os-specific package managers could use without having to worry 
about any python package dependencies.

Personally I feel all of the above are perfectly possible, and can't see 
anyone being left unhappy by them. I'm sure I've missed something then, 
otherwise why not make it happen?

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From fuzzyman at voidspace.org.uk  Thu Apr  2 23:58:23 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Thu, 02 Apr 2009 16:58:23 -0500
Subject: [Python-Dev] unittest package
Message-ID: <49D534FF.60901@voidspace.org.uk>

Hello all,

The unittest module is around 1500 lines of code now, and the tests are 
3000 lines.

It would be much easier to maintain as a package rather than a module. 
Shall I work on a suggested structure or are there objections in principle?

Obviously all the functionality would still be available from the 
top-level unittest namespace (for backwards compatibility).

Michael

-- 
http://www.ironpythoninaction.com/

From python at rcn.com  Fri Apr  3 00:07:03 2009
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 2 Apr 2009 15:07:03 -0700
Subject: [Python-Dev] PyDict_SetItem hook
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>
	<ca471dc20904021419y33932768j8bc7e97e46361d6a@mail.gmail.com>
Message-ID: <C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>

> Wow. Can you possibly be more negative?

I think it's worse to give the poor guy the run around
by making him run lots of random benchmarks.  In
the end, someone will run a timeit or have a specific
case that shows the full effect.  All of the respondents 
so far seem to have a clear intuition that hook is right 
in the middle of a critical path.  Their intuition matches
what I learned by spending a month trying to find ways
to optimize dictionaries.

Am surprised that there has been no discussion of why 
this should be in the default build (as opposed to a 
compile time option).  AFAICT, users have not previously
requested a hook like this.

Also, there has been no discussion for an overall strategy
for monitoring containers in general.  Lists and tuples will
both defy this approach because there is so much code
that accesses the arrays directly.  Am not sure whether the
setitem hook would work for other implementations either.

It seems weird to me that Collin's group can be working
so hard just to get a percent or two improvement in 
specific cases for pickling while python-dev is readily 
entertaining a patch that slows down the entire language.  

If my thoughts on the subject bug you, I'll happily
withdraw from the thread.  I don't aspire to be a
source of negativity.  I just happen to think this 
proposal isn't a good idea.

Raymond

----- Original Message ----- 
From: "Guido van Rossum" <guido at python.org>
To: "Raymond Hettinger" <python at rcn.com>
Cc: "Thomas Wouters" <thomas at python.org>; "John Ehresman" <jpe at wingware.com>; <python-dev at python.org>
Sent: Thursday, April 02, 2009 2:19 PM
Subject: Re: [Python-Dev] PyDict_SetItem hook

Wow. Can you possibly be more negative?

2009/4/2 Raymond Hettinger <python at rcn.com>:
> The measurements are just a distractor. We all already know that the hook
> is being added to a critical path. Everyone will pay a cost for a feature
> that few people will use. This is a really bad idea. It is not part of a
> thorough, thought-out framework of container hooks (something that would
> need a PEP at the very least). The case for how it helps us is somewhat
> thin. The case for DTrace hooks was much stronger.
>
> If something does go in, it should be #ifdef'd out by default. But then, I
> don't think it should go in at all.
>
>
> Raymond
>
>
>
>
> On Thu, Apr 2, 2009 at 04:16, John Ehresman <jpe at wingware.com> wrote:
>>
>> Collin Winter wrote:
>>>
>>> Have you measured the impact on performance?
>>
>> I've tried to test using pystone, but am seeing more differences between
>> runs than there is between python w/ the patch and w/o when there is no hook
>> installed. The highest pystone is actually from the binary w/ the patch,
>> which I don't really believe unless it's some low level code generation
>> affect. The cost is one test of a global variable and then a switch to the
>> branch that doesn't call the hooks.
>>
>> I'd be happy to try to come up with better numbers next week after I get
>> home from pycon.
>
> Pystone is pretty much a useless benchmark. If it measures anything, it's
> the speed of the bytecode dispatcher (and it doesn't measure it particularly
> well.) PyBench isn't any better, in my experience. Collin has collected a
> set of reasonable benchmarks for Unladen Swallow, but they still leave a lot
> to be desired. From the discussions at the VM and Language summits before
> PyCon, I don't think anyone else has better benchmarks, though, so I would
> suggest using Unladen Swallow's:
> http://code.google.com/p/unladen-swallow/wiki/Benchmarks
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From robert.collins at canonical.com  Fri Apr  3 00:00:42 2009
From: robert.collins at canonical.com (Robert Collins)
Date: Fri, 03 Apr 2009 09:00:42 +1100
Subject: [Python-Dev] unittest package
In-Reply-To: <49D534FF.60901@voidspace.org.uk>
References: <49D534FF.60901@voidspace.org.uk>
Message-ID: <1238709643.2700.147.camel@lifeless-64>

On Thu, 2009-04-02 at 16:58 -0500, Michael Foord wrote:
> Hello all,
> 
> The unittest module is around 1500 lines of code now, and the tests are 
> 3000 lines.
> 
> It would be much easier to maintain as a package rather than a module. 
> Shall I work on a suggested structure or are there objections in principle?
> 
> Obviously all the functionality would still be available from the 
> top-level unittest namespace (for backwards compatibility).
> 
> Michael

I'd like to see this; jmls' testtools package has a layout for this
which is quite nice.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090403/dce00be5/attachment-0001.pgp>

From solipsis at pitrou.net  Fri Apr  3 00:14:29 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 2 Apr 2009 22:14:29 +0000 (UTC)
Subject: [Python-Dev] =?utf-8?q?PyDict=5FSetItem_hook?=
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>
	<ca471dc20904021419y33932768j8bc7e97e46361d6a@mail.gmail.com>
	<C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
Message-ID: <loom.20090402T221228-422@post.gmane.org>

Raymond Hettinger <python <at> rcn.com> writes:
> 
> It seems weird to me that Collin's group can be working
> so hard just to get a percent or two improvement in 
> specific cases for pickling while python-dev is readily 
> entertaining a patch that slows down the entire language.  

I think it's really more than a percent or two:
http://bugs.python.org/issue5670

Regards

Antoine.

From python at rcn.com  Fri Apr  3 00:20:32 2009
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 2 Apr 2009 15:20:32 -0700
Subject: [Python-Dev] PyDict_SetItem hook
References: <49D3F8D0.8070805@wingware.com><43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com><49D42013.3010600@wingware.com><9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com><78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1><ca471dc20904021419y33932768j8bc7e97e46361d6a@mail.gmail.com><C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
	<loom.20090402T221228-422@post.gmane.org>
Message-ID: <C66E9D7E637748FFA006E5AB4C9FE14F@RaymondLaptop1>

>> It seems weird to me that Collin's group can be working
>> so hard just to get a percent or two improvement in 
>> specific cases for pickling while python-dev is readily 
>> entertaining a patch that slows down the entire language.  

[Antoine Pitrou]
> I think it's really more than a percent or two:
> http://bugs.python.org/issue5670

For lists, it was a percent or two:
http://bugs.python.org/issue5671

I expect Collin's overall efforts to payoff nicely.  I was
just pointing-out the contrast between module specific
optimization efforts versus anti-optimizations that affect
the whole language.

Raymond

From amauryfa at gmail.com  Fri Apr  3 00:26:23 2009
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Fri, 3 Apr 2009 00:26:23 +0200
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <gr142t$5ei$1@ger.gmane.org>
References: <49D3F8D0.8070805@wingware.com> <gr142t$5ei$1@ger.gmane.org>
Message-ID: <e27efe130904021526r5e17d9dcx79e8a86b3e30ae3f@mail.gmail.com>

On Thu, Apr 2, 2009 at 03:23, Christian Heimes <lists at cheimes.de> wrote:
> John Ehresman wrote:
>> * To what extent should non-debugger code use the hook? ?At one end of
>> the spectrum, the hook could be made readily available for non-debug use
>> and at the other end, it could be documented as being debug only,
>> disabled in python -O, & not exposed in the stdlib to python code.
>
> To explain Collin's mail:
> Python's dict implementation is crucial to the performance of any Python
> program. Modules, types, instances all rely on the speed of Python's
> dict type because most of them use a dict to store their name space.
> Even the smallest change to the C code may lead to a severe performance
> penalty. This is especially true for set and get operations.

A change that would have no performance impact could be to set mp->ma_lookup
to another function, that calls all the hooks it wants before calling
the "super()" method
(lookdict).
This ma_lookup is already an attribute of every dict, so a debugger
could trace only
the namespaces it monitors.

The only problem here is that ma_lookup is called with the key and its hash,
but not with the value, and you cannot know whether you are reading or
setting the dict.
It is easy to add an argument and call ma_lookup with the value (or
NULL, or -1 depending
on the action: set, get or del), but this may have a slight impact
(benchmark needed!)
even if this argument is not used by the standard function.

-- 
Amaury Forgeot d'Arc

From thomas at python.org  Fri Apr  3 00:44:18 2009
From: thomas at python.org (Thomas Wouters)
Date: Fri, 3 Apr 2009 00:44:18 +0200
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>
	<ca471dc20904021419y33932768j8bc7e97e46361d6a@mail.gmail.com>
	<C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
Message-ID: <9e804ac0904021544o3c6b1263o6db0a80d15acc3c1@mail.gmail.com>

On Fri, Apr 3, 2009 at 00:07, Raymond Hettinger <python at rcn.com> wrote:

>
> It seems weird to me that Collin's group can be working
> so hard just to get a percent or two improvement in specific cases for
> pickling while python-dev is readily entertaining a patch that slows down
> the entire language.

Collin's group has unfortunately seen that you cannot know the actual impact
of a change until you measure it. GCC performance, for instance, is
extremely unpredictable, and I can easily see a change like this proving to
have zero impact -- or even positive impact -- on most platforms because,
say, it warms the cache for the common case. I doubt it will, but you can't
*know* until you measure it.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090403/5c4d2973/attachment.htm>

From guido at python.org  Fri Apr  3 00:57:22 2009
From: guido at python.org (Guido van Rossum)
Date: Thu, 2 Apr 2009 15:57:22 -0700
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> 
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> 
	<78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>
	<ca471dc20904021419y33932768j8bc7e97e46361d6a@mail.gmail.com> 
	<C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
Message-ID: <ca471dc20904021557w11b5556aif88522fb46714211@mail.gmail.com>

On Thu, Apr 2, 2009 at 3:07 PM, Raymond Hettinger <python at rcn.com> wrote:
>> Wow. Can you possibly be more negative?
>
> I think it's worse to give the poor guy the run around

Mind your words please.

> by making him run lots of random benchmarks. ?In
> the end, someone will run a timeit or have a specific
> case that shows the full effect. ?All of the respondents so far seem to have
> a clear intuition that hook is right in the middle of a critical path.
> ?Their intuition matches
> what I learned by spending a month trying to find ways
> to optimize dictionaries.
>
> Am surprised that there has been no discussion of why this should be in the
> default build (as opposed to a compile time option). ?AFAICT, users have not
> previously
> requested a hook like this.

I may be partially to blame for this. John and Stephan are requesting
this because it would (mostly) fulfill one of the top wishes of the
users of Wingware. So the use case is certainly real.

> Also, there has been no discussion for an overall strategy
> for monitoring containers in general. ?Lists and tuples will
> both defy this approach because there is so much code
> that accesses the arrays directly. ?Am not sure whether the
> setitem hook would work for other implementations either.

The primary use case is some kind of trap on assignment. While this
cannot cover all cases, most non-local variables are stored in dicts.
List mutations are not in the same league, as use case.

> It seems weird to me that Collin's group can be working
> so hard just to get a percent or two improvement in specific cases for
> pickling while python-dev is readily entertaining a patch that slows down
> the entire language.

I don't actually believe that you can know whether this affects
performance at all without serious benchmarking. The patch amounts to
a single global flag check as long as the feature is disabled, and
that flag could be read from the L1 cache.

> If my thoughts on the subject bug you, I'll happily
> withdraw from the thread. ?I don't aspire to be a
> source of negativity. ?I just happen to think this proposal isn't a good
> idea.

I think we need more proof either way.

> Raymond
>
>
>
> ----- Original Message ----- From: "Guido van Rossum" <guido at python.org>
> To: "Raymond Hettinger" <python at rcn.com>
> Cc: "Thomas Wouters" <thomas at python.org>; "John Ehresman"
> <jpe at wingware.com>; <python-dev at python.org>
> Sent: Thursday, April 02, 2009 2:19 PM
> Subject: Re: [Python-Dev] PyDict_SetItem hook
>
>
> Wow. Can you possibly be more negative?
>
> 2009/4/2 Raymond Hettinger <python at rcn.com>:
>>
>> The measurements are just a distractor. We all already know that the hook
>> is being added to a critical path. Everyone will pay a cost for a feature
>> that few people will use. This is a really bad idea. It is not part of a
>> thorough, thought-out framework of container hooks (something that would
>> need a PEP at the very least). The case for how it helps us is somewhat
>> thin. The case for DTrace hooks was much stronger.
>>
>> If something does go in, it should be #ifdef'd out by default. But then, I
>> don't think it should go in at all.
>>
>>
>> Raymond
>>
>>
>>
>>
>> On Thu, Apr 2, 2009 at 04:16, John Ehresman <jpe at wingware.com> wrote:
>>>
>>> Collin Winter wrote:
>>>>
>>>> Have you measured the impact on performance?
>>>
>>> I've tried to test using pystone, but am seeing more differences between
>>> runs than there is between python w/ the patch and w/o when there is no
>>> hook
>>> installed. The highest pystone is actually from the binary w/ the patch,
>>> which I don't really believe unless it's some low level code generation
>>> affect. The cost is one test of a global variable and then a switch to
>>> the
>>> branch that doesn't call the hooks.
>>>
>>> I'd be happy to try to come up with better numbers next week after I get
>>> home from pycon.
>>
>> Pystone is pretty much a useless benchmark. If it measures anything, it's
>> the speed of the bytecode dispatcher (and it doesn't measure it
>> particularly
>> well.) PyBench isn't any better, in my experience. Collin has collected a
>> set of reasonable benchmarks for Unladen Swallow, but they still leave a
>> lot
>> to be desired. From the discussions at the VM and Language summits before
>> PyCon, I don't think anyone else has better benchmarks, though, so I would
>> suggest using Unladen Swallow's:
>> http://code.google.com/p/unladen-swallow/wiki/Benchmarks
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>>
>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Fri Apr  3 01:07:29 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 2 Apr 2009 18:07:29 -0500
Subject: [Python-Dev] unittest package
In-Reply-To: <49D534FF.60901@voidspace.org.uk>
References: <49D534FF.60901@voidspace.org.uk>
Message-ID: <55E8EAA0-868C-4AEA-B0AE-7DB85F66B348@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Apr 2, 2009, at 4:58 PM, Michael Foord wrote:

> The unittest module is around 1500 lines of code now, and the tests  
> are 3000 lines.
>
> It would be much easier to maintain as a package rather than a  
> module. Shall I work on a suggested structure or are there  
> objections in principle?

+1/jfdi :)

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdVFMXEjvBPtnXfVAQJeeQQAl5yYTLCUT4M4jQBY0yb39uNexREytnmp
Oo+8gaehi2at62WbeIXa3GRojfWcpAJfGEWIWxsIEe8vRBMfNJphfsiN62rD1CIt
Awn9SPPka9Xxfd3fsdvfKxDpJysK1pcqNFi5e49lXgbmt8XJ/09RbviMUHFmhlxb
eVYkHYmelFQ=
=eNHE
-----END PGP SIGNATURE-----

From gjcarneiro at gmail.com  Fri Apr  3 01:13:05 2009
From: gjcarneiro at gmail.com (Gustavo Carneiro)
Date: Fri, 3 Apr 2009 00:13:05 +0100
Subject: [Python-Dev] OSError.errno => exception hierarchy?
In-Reply-To: <ca471dc20904021337i43a1e995sc0d62ab9a64ef99d@mail.gmail.com>
References: <a467ca4f0904020525taabdeb8gd75ce8f73b418d66@mail.gmail.com>
	<ca471dc20904020757o57654981o2170a54f201d5031@mail.gmail.com>
	<a467ca4f0904020912g7d4056c5p955198980d9e4218@mail.gmail.com>
	<ca471dc20904021037r5e1f2fefn7ce83caadf505e34@mail.gmail.com>
	<a467ca4f0904021310u32bc492bx3964fe299ac44a10@mail.gmail.com>
	<ca471dc20904021337i43a1e995sc0d62ab9a64ef99d@mail.gmail.com>
Message-ID: <a467ca4f0904021613g5ef4b33ag105da705e1b4b9b3@mail.gmail.com>

(cross-posting back to python-dev to finalize discussions)

2009/4/2 Guido van Rossum <guido at python.org>
[...]

> > The problem you report:
> >>
> >>  try:
> >>    ...
> >>  except OSWinError:
> >>    ...
> >>  except OSLinError:
> >>    ...
> >>
> >
> > Would be solved if both OSWinError and OSLinError were always defined in
> > both Linux and Windows Python.  Programs could be written to catch both
> > OSWinError and OSLinError, except that on Linux OSWinError would never
> > actually be raised, and on Windows OSLinError would never occur.  Problem
> > solved.
>
> Yeah, but now you'd have to generate the list of exceptions (which
> would be enormously long) based on the union of all errno codes in the
> universe.
>
> Unless you only want to do it for some errno codes and not for others,
> which sounds like asking for trouble.
>
> Also you need a naming scheme that works for all errnos and doesn't
> require manual work. Frankly, the only scheme that I can think of that
> could be automated would be something like OSError_ENAME.
>
> And, while OSError is built-in, I think these exceptions (because
> there are so many) should not be built-in, and probably not even live
> in the 'os' namespace -- the best place for them would be the errno
> module, so errno.OSError_ENAME.
>
> > The downsides of this?  I can only see memory, at the moment, but I might
> be
> > missing something.
>
> It's an enormous amount of work to make it happen across all
> platforms. And it doesn't really solve an important problem.

I partially agree.  It will be a lot of work.  I think the problem is valid,
although not very important, I agree.

>
>
> > Now just one final word why I think this matters.  The currently correct
> way
> > to remove a directory tree and only ignore the error "it does not exist"
> is:
> >
> > try:
> >     shutil.rmtree("dirname")
> > except OSError, e:
> >     if errno.errorcode[e.errno] != 'ENOENT':
> >        raise
> >
> > However, only very experienced programmers will know to write that
> correct
> > code (apparently I am not experienced enought!).
>
> That doesn't strike me as correct at all, since it doesn't distinguish
> between ENOENT being raised for some file deep down in the tree vs.
> the root not existing. (This could happen if after you did
> os.listdir() some other process deleted some file.)

OK.  Maybe in a generic case this could happen, although I'm sure this won't
happen in my particular scenario.  This is about a build system, and I am
assuming there are no two concurrent builds (or else a lot of other things
would fail anyway).

> A better way might be
>
> try:
>  shutil.rmtree(<dir>)
> except OSError:
>  if os.path.exists(<dir>):
>   raise

Sure, this works, but at the cost of an extra system call.  I think it's
more elegant to check the errno (assuming the corner case you pointed out
above is not an issue).

> Though I don't know what you wish to happen of <dir> were a dangling
> symlink.
>
> > What I am proposing is that the simpler correct code would be something
> > like:
> >
> > try:
> >     shutil.rmtree("dirname")
> > except OSNoEntryError:
> >     pass
> >
> > Much simpler, no?
>
> And wrong.
>
> > Right now, developers are tempted to write code like:
> >
> >     shutil.rmtree("dirname", ignore_errors=True)
> >
> > Or:
> >
> > try:
> >     shutil.rmtree("dirname")
> > except OSError:
> >     pass
> >
> > Both of which follow the error hiding anti-pattern [1].
> >
> > [1] http://en.wikipedia.org/wiki/Error_hiding
> >
> > Thanks for reading this far.
>
> Thanks for not wasting any more of my time.

OK, I won't waste more time.  If this were an obvious improvement beyond
doubt to most people, I would pursue it, but since it's not, I can live with
it.

Thanks anyway,

-- 
Gustavo J. A. M. Carneiro
INESC Porto, Telecommunications and Multimedia Unit
"The universe is always one step beyond logic." -- Frank Herbert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090403/5db48395/attachment.htm>

From daniel at stutzbachenterprises.com  Fri Apr  3 01:14:59 2009
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Thu, 2 Apr 2009 18:14:59 -0500
Subject: [Python-Dev] __length_hint__
Message-ID: <eae285400904021614n227a085aj6f708122b0a211c1@mail.gmail.com>

Iterators can implement a method called __length_hint__ that provides a hint
to certain internal routines (such as list.extend) so they can operate more
efficiently.  As far as I can tell, __length_hint__ is currently
undocumented.  Should it be?

If so, are there any constraints on what an iterator should return?  I can
think of 3 possible rules, each with advantages and disadvantages:
1. return your best guess
2. return your best guess that you are certain is not higher than the true
value
3. return your best guess that you are certain is not lower than the true
value

Also, I've noticed that if a VERY large hint is returned by the iterator,
list.extend will sometimes disregard the hint and try to allocate memory
incrementally (correct for rule #1 or #2).  However, in another code path it
will throw a MemoryError immediately based on the hint (correct for rule
#3).

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090402/c714b7d1/attachment.htm>

From benjamin at python.org  Fri Apr  3 01:17:08 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 2 Apr 2009 18:17:08 -0500
Subject: [Python-Dev] __length_hint__
In-Reply-To: <eae285400904021614n227a085aj6f708122b0a211c1@mail.gmail.com>
References: <eae285400904021614n227a085aj6f708122b0a211c1@mail.gmail.com>
Message-ID: <1afaf6160904021617k5c810c86sc7e2d99508076f5@mail.gmail.com>

2009/4/2 Daniel Stutzbach <daniel at stutzbachenterprises.com>:
> Iterators can implement a method called __length_hint__ that provides a hint
> to certain internal routines (such as list.extend) so they can operate more
> efficiently.? As far as I can tell, __length_hint__ is currently
> undocumented.? Should it be?

This has been discussed, and no, it is a implementation detail mostly
for the optimization of builtin iterators.

>
> If so, are there any constraints on what an iterator should return?? I can
> think of 3 possible rules, each with advantages and disadvantages:
> 1. return your best guess
> 2. return your best guess that you are certain is not higher than the true
> value
> 3. return your best guess that you are certain is not lower than the true
> value
>
> Also, I've noticed that if a VERY large hint is returned by the iterator,
> list.extend will sometimes disregard the hint and try to allocate memory
> incrementally (correct for rule #1 or #2).? However, in another code path it
> will throw a MemoryError immediately based on the hint (correct for rule
> #3).

Perhaps Raymond can shed some light on these.

-- 
Regards,
Benjamin

From python at rcn.com  Fri Apr  3 01:30:39 2009
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 2 Apr 2009 16:30:39 -0700
Subject: [Python-Dev] __length_hint__
References: <eae285400904021614n227a085aj6f708122b0a211c1@mail.gmail.com>
	<1afaf6160904021617k5c810c86sc7e2d99508076f5@mail.gmail.com>
Message-ID: <CF0D342CCC9C4BE4ACF051C882E68E5F@RaymondLaptop1>

>> Iterators can implement a method called __length_hint__ that provides a hint
>> to certain internal routines (such as list.extend) so they can operate more
>> efficiently. As far as I can tell, __length_hint__ is currently
>> undocumented. Should it be?
> 
> This has been discussed, and no, it is a implementation detail mostly
> for the optimization of builtin iterators.

Right.  That matches my vague recollection on the subject.

>> If so, are there any constraints on what an iterator should return? I can
>> think of 3 possible rules, each with advantages and disadvantages:
>> 1. return your best guess

Yes.

BTW, the same rule also applies to __len__.  IIRC, Tim proposed 
to add that to the docs somewhere.

> Perhaps Raymond can shed some light on these.

Can't guess the future of __length_hint__(). 
Since it doesn't have a slot, the attribute lookup
can actually slow down cases with a small number
of iterands.

The original idea was based on some research on
map/fold operations, noting that iterators can
sometimes be processed more efficiently if
accompanied by some metadata (i.e. the iterator has 
a known length, consists of unique items, is sorted, 
is all of a certain type, is re-iterable, etc.).

Raymond

From ben+python at benfinney.id.au  Fri Apr  3 02:25:57 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Fri, 03 Apr 2009 11:25:57 +1100
Subject: [Python-Dev] UnicodeDecodeError bug in distutils
References: <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com>
	<94bdd2610702241247t568a942dw2fe1b10883b62d20@mail.gmail.com>
	<200702242309.46022.pogonyshev@gmx.net>
	<94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com>
	<45E0C012.7090801@palladion.com>
	<5.1.1.6.0.20070224203115.0270a5a8@sparrow.telecommunity.com>
Message-ID: <877i22fuqy.fsf_-_@benfinney.id.au>

"Phillip J. Eby" <pje at telecommunity.com> writes:

> However, there's currently no standard, as far as I know, for what
> encoding the PKG-INFO file should use.

Who would define such a standard? My vote goes for ?default is UTF-8?.

> Meanwhile, the 'register' command accepts Unicode, but is broken in
> handling it. [?]
> 
> Unfortunately, this isn't fixable until there's a new 2.5.x release.
> For previous Python versions, both register and write_pkg_info()
> accepted 8-bit strings and passed them on as-is, so the only
> workaround for this issue at the moment is to revert to Python 2.4
> or less.

What is the prognosis on this issue? It's still hitting me in Python
2.5.4.

-- 
 \       ?Everything you read in newspapers is absolutely true, except |
  `\        for that rare story of which you happen to have first-hand |
_o__)                                         knowledge.? ?Erwin Knoll |
Ben Finney

From pje at telecommunity.com  Fri Apr  3 02:44:00 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Thu, 02 Apr 2009 20:44:00 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D52115.6020001@egenix.com>
References: <49D4DA72.60401@v.loewis.de>
 <49D52115.6020001@egenix.com>
Message-ID: <20090403004135.B76443A40A7@sparrow.telecommunity.com>

At 10:33 PM 4/2/2009 +0200, M.-A. Lemburg wrote:
>That's going to slow down Python package detection a lot - you'd
>replace an O(1) test with an O(n) scan.

I thought about this too, but it's pretty trivial considering that 
the only time it takes effect is when you have a directory name that 
matches the name you're importing, and that it will only happen once 
for that directory, unless there is no package on sys.path with that 
name, and the program tries to import the package multiple times.  In 
other words, the overhead isn't likely to be much, compared to the 
time needed to say, open and marshal even a trivial __init__.py file.

>Alternative Approach:
>---------------------
>
>Wouldn't it be better to stick with a simpler approach and look for
>"__pkg__.py" files to detect namespace packages using that O(1) check ?

I thought the same thing (or more precisely, a single .pkg file), but 
when I got lower in the PEP I saw the reason was to support system 
packages not having overlapping filenames.  The PEP could probably be 
a little clearer about the connection between needing *.pkg and the 
system-package use case.

>One of the namespace packages, the defining namespace package, will have
>to include a __init__.py file.

Note that there is no such thing as a "defining namespace package" -- 
namespace package contents are symmetrical peers.

>The above mechanism allows the same kind of flexibility we already
>have with the existing normal __init__.py mechanism.
>
>* It doesn't add yet another .pth-style sys.path extension (which are
>difficult to manage in installations).
>
>* It always uses the same naive sys.path search strategy. The strategy
>is not determined by some file contents.

The above are also true for using only a '*' in .pkg files -- in that 
event there are no sys.path changes.  (Frankly, I'm doubtful that 
anybody is using extend_path and .pkg files to begin with, so I'd be 
fine with a proposal that instead used something like '.nsp' files 
that didn't even need to be opened and read -- which would let the 
directory scan stop at the first .nsp file found.

>* The search is only done once - on the first import of the package.

I believe the PEP does this as well, IIUC.

>* It's possible to have a defining package dir and add-one package
>dirs.

Also possible in the PEP, although the __init__.py must be in the 
first such directory on sys.path.  (However, such "defining" packages 
are not that common now, due to tool limitations.)

From greg.ewing at canterbury.ac.nz  Fri Apr  3 03:11:42 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Apr 2009 14:11:42 +1300
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <49D4B2C4.4060107@avl.com>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<10203019.4341695.1238671712090.JavaMail.xicrypt@atgrzls001>
	<49D4B2C4.4060107@avl.com>
Message-ID: <49D5624E.9050502@canterbury.ac.nz>

Hrvoje Niksic wrote:

> I thought the entire *point* of C object was that it's an opaque box 
> without any info whatsoever, except that which is known and shared by 
> its creator and its consumer.

But there's no way of telling who created a given
CObject, so *nobody* knows anything about it for
certain.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Fri Apr  3 03:18:50 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Apr 2009 14:18:50 +1300
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<49D4A162.2020209@canterbury.ac.nz>
	<6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com>
Message-ID: <49D563FA.7050909@canterbury.ac.nz>

Jim Fulton wrote:

> The original use case for CObjects was to export an API from a module, 
> in which case, you'd be importing the API from the module.  The presence 
> in the module indicates the type.

Sure, but it can't hurt to have an additional sanity
check.

Also, there are wider uses for CObjects than this.
I see it as a quick way of creating a wrapper when
you don't want to go to the trouble of a full-blown
extension type. A small amount of metadata would
make CObjects much more useful.

-- 
Greg

From doko at ubuntu.com  Fri Apr  3 03:21:10 2009
From: doko at ubuntu.com (Matthias Klose)
Date: Fri, 03 Apr 2009 03:21:10 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D4DA72.60401@v.loewis.de>
References: <49D4DA72.60401@v.loewis.de>
Message-ID: <49D56486.8020708@ubuntu.com>

Martin v. L?wis schrieb:
> I propose the following PEP for inclusion to Python 3.1.
> Please comment.
> 
> Regards,
> Martin
> 
> Abstract
> ========
> 
> Namespace packages are a mechanism for splitting a single Python
> package across multiple directories on disk. In current Python
> versions, an algorithm to compute the packages __path__ must be
> formulated. With the enhancement proposed here, the import machinery
> itself will construct the list of directories that make up the
> package.

+1

speaking as a downstream packaging python for Debian/Ubuntu I welcome this
approach.  The current practice of shipping the very same file (__init__.py) in
different packages leads to conflicts for the installation of these packages
(this is not specific to dpkg, but is true for rpm packaging as well).

Current practice of packaging (for downstreams) so called "name space packages" is:

 - either to split out the namespace __init__.py into a separate
   (linux distribution) package (needing manual packaging effort for each
   name space package)

 - using downstream specific packaging techniques to handle conflicting files
   (diversions)

 - replicating the current behaviour of setuptools simply overwriting the
   file conflicts.

Following this proposal (downstream) packaging of namespace packages is made
possible independent of any manual downstream packaging decisions or any
downstream specific packaging decisions.

  Matthias

From pje at telecommunity.com  Fri Apr  3 05:12:18 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Thu, 02 Apr 2009 23:12:18 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D56486.8020708@ubuntu.com>
References: <49D4DA72.60401@v.loewis.de>
 <49D56486.8020708@ubuntu.com>
Message-ID: <20090403030953.32A493A40A7@sparrow.telecommunity.com>

At 03:21 AM 4/3/2009 +0200, Matthias Klose wrote:
>+1 speaking as a downstream packaging python for Debian/Ubuntu I 
>welcome this approach.  The current practice of shipping the very 
>same file (__init__.py) in different packages leads to conflicts for 
>the installation of these packages (this is not specific to dpkg, 
>but is true for rpm packaging as well). Current practice of 
>packaging (for downstreams) so called "name space packages" is: - 
>either to split out the namespace __init__.py into a 
>separate    (linux distribution) package (needing manual packaging 
>effort for each    name space package) - using downstream specific 
>packaging techniques to handle conflicting files    (diversions) - 
>replicating the current behaviour of setuptools simply overwriting 
>the    file conflicts. Following this proposal (downstream) 
>packaging of namespace packages is made possible independent of any 
>manual downstream packaging decisions or any downstream specific 
>packaging decisions

A clarification: setuptools does not currently install the 
__init__.py file when installing in 
--single-version-externally-managed or --root mode.  Instead, it uses 
a project-version-nspkg.pth file that essentially simulates a 
variation of Martin's .pkg proposal, by abusing .pth file 
support.  If this PEP is adopted, setuptools would replace its 
nspkg.pth file with a .pkg file on Python versions that provide 
native support for .pkg imports, keeping the .pth file only for older Pythons.

(.egg files and directories will not be affected by the change, 
unless the zipimport module will also supports .pkg files...  and 
again, only for Python versions that support the new approach.)

From stephen at xemacs.org  Fri Apr  3 06:12:36 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 03 Apr 2009 13:12:36 +0900
Subject: [Python-Dev] [issue3609] does parse_header really belong in
	CGI	module?
In-Reply-To: <7c42eba10904021450k3756ee0ftfe282d065024f2bb@mail.gmail.com>
References: <1219193477.24.0.768590998992.issue3609@psf.upfronthosting.co.za>
	<1238708765.18.0.659138371932.issue3609@psf.upfronthosting.co.za>
	<7c42eba10904021450k3756ee0ftfe282d065024f2bb@mail.gmail.com>
Message-ID: <87zleytlxn.fsf@xemacs.org>

Senthil Kumaran writes:

 > http://bugs.python.org/issue3609  requests to move the function
 > parse_header present in cgi module to email package.
 > 
 > The reasons for this request are:
 > 
 > 1) The MIME type header parsing methods rightly belong to email
 > package. Confirming to RFC 2045.

In practice, the "mail" part of the name is historical; RFC 822-style
headers are used in many protocols, most prominently email, netnews
(less important nowadays :-( ), and HTTP.  If there are differences in
usage, the parsing methods may be different.  If not, then this
functionality is redundant in email, which has its own parser.  It
can't be right for email to have two parsers and CGI none!

Anyway, "moving" the function is almost certainly the *wrong* thing to
do, as the email package has its own conventions and organization.  In
particular, in email header parsing is done by methods of the message
and header objects (in their respective initializations), rather than
by a (global) function.

Since Barry et al have been sprinting on email TNG, you really ought
to coordinate this with them.  I think it would be good to have header
parsing and generation in a free-standing package separate from other
aspects of handling Internet protocols, but this will require
coordination of several modules besides email and cgi.

From alexandre at peadrop.com  Fri Apr  3 06:10:56 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Fri, 3 Apr 2009 00:10:56 -0400
Subject: [Python-Dev] Should the io-c modules be put in their own directory?
Message-ID: <acd65fa20904022110p2cf46105n4061286bc249d55e@mail.gmail.com>

Hello,

I just noticed that the new io-c modules were merged in the py3k
branch (I know, I am kind late on the news?blame school work). Anyway,
I am just wondering if it would be a good idea to put the io-c modules
in a sub-directory (like sqlite), instead of scattering them around in
the Modules/ directory.

Cheers,
-- Alexandre

From stephen at xemacs.org  Fri Apr  3 06:55:58 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 03 Apr 2009 13:55:58 +0900
Subject: [Python-Dev] Package Management - thoughts from the peanut gallery
In-Reply-To: <49D534B3.8020801@simplistix.co.uk>
References: <49D534B3.8020801@simplistix.co.uk>
Message-ID: <87y6uitjxd.fsf@xemacs.org>

Chris Withers writes:

 > Personally I feel all of the above are perfectly possible, and can't see 
 > anyone being left unhappy by them. I'm sure I've missed something then, 
 > otherwise why not make it happen?

Labor shortage.

We will need a PEP, the PEP will need a sample implementation, and
a proponent.  Who's gonna bell the cat?

From hrvoje.niksic at avl.com  Fri Apr  3 09:59:24 2009
From: hrvoje.niksic at avl.com (Hrvoje Niksic)
Date: Fri, 03 Apr 2009 09:59:24 +0200
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <20295095.32108.1238700461315.JavaMail.xicrypt@atgrzls001>
References: <49D26BB1.8050108@hastings.org>	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>	<10203019.4341695.1238671712090.JavaMail.xicrypt@atgrzls001>
	<49D4B2C4.4060107@avl.com>
	<20295095.32108.1238700461315.JavaMail.xicrypt@atgrzls001>
Message-ID: <49D5C1DC.10801@avl.com>

Larry Hastings wrote:
>> If we're adding type information, then please make it a Python object 
>> rather than a C string.  That way the creator and the consumer can use 
>> a richer API to query the "type", such as by calling its methods or by 
>> inspecting it in some other way.
> 
> I'm not writing my patch that way; it would be too cumbersome for what 
> is ostensibly an easy, light-weight API.   If you're going that route
 > you might as well create a real PyTypeObject for the blob you're
 > passing in.

Well, that's exactly the point, given a PyObject* tag, you can add any 
kind of type identification you need, including some Python type.  (It 
is assumed that the actual pointer you're passing is not a PyObject 
itself, of course, otherwise you wouldn't need PyCObject at all.)

I have no desire to compete with your patch, it was a suggestion for 
(what I see as) improvement.

From ziade.tarek at gmail.com  Fri Apr  3 10:01:51 2009
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 3 Apr 2009 10:01:51 +0200
Subject: [Python-Dev] Package Management - thoughts from the peanut
	gallery
In-Reply-To: <87y6uitjxd.fsf@xemacs.org>
References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org>
Message-ID: <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>

Guys,

I have taken the commitment to lead these tasks and synchronize the people
that are willing to help on this.

We are working on several tasks and PEPS to make things happen since the
summit.

There's no public roadmap yet on when things will be done (because there's
no 100% certitude yet on what shall be done).
But that it will probably be too late to see it happen in 3.1. Python 2.7
will be our target.

The tasks discussed so far are:

- version definition (http://wiki.python.org/moin/DistutilsVersionFight)
- egg.info standardification (PEP 376)
- metadata enhancement (rewrite PEP 345)
- static metadata definition work  (*)
- creation of a network of OS packager people
- PyPI mirroring (PEP 381)

Each one of this task has a leader, except the one with (*). I just got back
from travelling, and I will reorganize
http://wiki.python.org/moin/Distutils asap to it is up-to-date.

If you want to work on one of this task or feel there's a new task you can
start, please, join Distutils SIG or contact me,

Regards
Tarek

On Fri, Apr 3, 2009 at 6:55 AM, Stephen J. Turnbull <stephen at xemacs.org>wrote:

> Chris Withers writes:
>
>  > Personally I feel all of the above are perfectly possible, and can't see
>  > anyone being left unhappy by them. I'm sure I've missed something then,
>  > otherwise why not make it happen?
>
> Labor shortage.
>
> We will need a PEP, the PEP will need a sample implementation, and
> a proponent.  Who's gonna bell the cat?
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com
>

-- 
Tarek Ziad? | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090403/1f69a29a/attachment.htm>

From ziade.tarek at gmail.com  Fri Apr  3 10:46:56 2009
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 3 Apr 2009 10:46:56 +0200
Subject: [Python-Dev] UnicodeDecodeError bug in distutils
In-Reply-To: <877i22fuqy.fsf_-_@benfinney.id.au>
References: <94bdd2610702241247t568a942dw2fe1b10883b62d20@mail.gmail.com>
	<200702242309.46022.pogonyshev@gmx.net>
	<94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com>
	<45E0C012.7090801@palladion.com>
	<5.1.1.6.0.20070224203115.0270a5a8@sparrow.telecommunity.com>
	<877i22fuqy.fsf_-_@benfinney.id.au>
Message-ID: <94bdd2610904030146m569aa5a2q5b7bdc542f4570e5@mail.gmail.com>

On Fri, Apr 3, 2009 at 2:25 AM, Ben Finney <ben+python at benfinney.id.au> wrote:
> "Phillip J. Eby" <pje at telecommunity.com> writes:
>
>> However, there's currently no standard, as far as I know, for what
>> encoding the PKG-INFO file should use.
>
> Who would define such a standard?

PEP 376 where we can explain that all files in egg-info should be in a
specific encoding

> ?My vote goes for ?default is UTF-8?.

+1

>
>> Meanwhile, the 'register' command accepts Unicode, but is broken in
>> handling it. [?]

how so ?

Tarek

From solipsis at pitrou.net  Fri Apr  3 11:14:51 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 3 Apr 2009 09:14:51 +0000 (UTC)
Subject: [Python-Dev] Should the io-c modules be put in their own
	directory?
References: <acd65fa20904022110p2cf46105n4061286bc249d55e@mail.gmail.com>
Message-ID: <loom.20090403T085522-602@post.gmane.org>

Alexandre Vassalotti <alexandre <at> peadrop.com> writes:
> 
> I just noticed that the new io-c modules were merged in the py3k
> branch (I know, I am kind late on the news?blame school work). Anyway,
> I am just wondering if it would be a good idea to put the io-c modules
> in a sub-directory (like sqlite), instead of scattering them around in
> the Modules/ directory.

Welcome back!

I have no particular opinion on this. I suggest waiting for Benjamin's advice
and following it :-)

(unless the FLUFL wants to chime in)

Benjamin-makes-boring-decisions-easy'ly yrs,

Antoine.

From solipsis at pitrou.net  Fri Apr  3 11:27:40 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 3 Apr 2009 09:27:40 +0000 (UTC)
Subject: [Python-Dev] =?utf-8?q?PyDict=5FSetItem_hook?=
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
Message-ID: <loom.20090403T092111-554@post.gmane.org>

Thomas Wouters <thomas <at> python.org> writes:
> 
> 
> Pystone is pretty much a useless benchmark. If it measures anything, it's the
speed of the bytecode dispatcher (and it doesn't measure it particularly well.)
PyBench isn't any better, in my experience.

I don't think pybench is useless. It gives a lot of performance data about
crucial internal operations of the interpreter. It is of course very little
real-world, but conversely makes you know immediately where a performance
regression has happened. (by contrast, if you witness a regression in a
high-level benchmark, you still have a lot of investigation to do to find out
where exactly something bad happened)

Perhaps someone should start maintaining a suite of benchmarks, high-level and
low-level; we currently have them all scattered around (pybench, pystone,
stringbench, richard, iobench, and the various Unladen Swallow benchmarks; not
to mention other third-party stuff that can be found in e.g. the Computer
Language Shootout).

I also know Gregory P. Smith had emitted the idea of plotting benchmark figures
for each new revision of trunk or py3k (and, perhaps, other implementations),
but I don't know if he's willing to do it himself :-)

Regards

Antoine.

From eckhardt at satorlaser.com  Fri Apr  3 12:09:38 2009
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Fri, 3 Apr 2009 12:09:38 +0200
Subject: [Python-Dev] sequence slice that wraps, bug or intention?
Message-ID: <200904031209.38726.eckhardt@satorlaser.com>

Hi!

I just stumbled across something in Python 2.6 where I'm not sure if it is by 
design or a fault:

x = 'abdc'
x[-3:-3] -> ''
x[-3:-2] -> 'b'
x[-3:-1] -> 'bc'
x[-3: 0] -> ''

The one that actually bothers me here is the last one, I would have expected 
it to yield 'bcd' instead, because otherwise I don't see a way to specify a 
slice that starts with a negative index but still includes the last element.

Similarly, I would expect x[-1,1] to yield 'ca' or at least raise an error, 
but not to return an empty string.

Bug?

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
Sator Laser GmbH, Fangdieckstra?e 75a, 22547 Hamburg, Deutschland
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932
**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.
**************************************************************************************

From kristjan at ccpgames.com  Fri Apr  3 12:22:58 2009
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Fri, 3 Apr 2009 10:22:58 +0000
Subject: [Python-Dev] Let's update CObject API so it is safe and regular!
In-Reply-To: <ca471dc20904021019y66c44938n97ac9db677995249@mail.gmail.com>
References: <49D26BB1.8050108@hastings.org>
	<FCC9D291-6F66-4CE4-8FF9-10F9EC82D699@zope.com>
	<49D4A162.2020209@canterbury.ac.nz>
	<6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com>
	<ca471dc20904021019y66c44938n97ac9db677995249@mail.gmail.com>
Message-ID: <930F189C8A437347B80DF2C156F7EC7F056DD0BA54@exchis.ccp.ad.local>

Here's one from EVE, where the DB module creates raw data, for our Crowsets, and then hands it over to another module for consumption (actual creation of the CRow and CrowDescriptor objects:

	BluePy raw(PyCObject_FromVoidPtr(&mColumnList, 0));
	if (!raw)
		return 0;
	return PyObject_CallMethod(blueModule, "DBRowDescriptor", "O", raw.o);
This is done for performance reasons to avoid data duplication.  Of course it implies tight coupling of the modules.

In our FreeType wrapper system, we also use it to wrap pointers to FreeType structs:

	template <class T>
	struct Wrapper : public T
	{
		...
		PyObject *Wrap() {if (!sMap.size())Init(); return PyCObject_FromVoidPtrAndDesc(this, &sMap, 0);}
	};

It is quite useful to pass unknown and opaque stuff around with, really, and makes certain things possible that otherwise wouldn't be.
We live with the type unsafety, of course.

In fact, I don't think we ever use a CObject to expose an API.

Kristj'an

-----Original Message-----
From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Guido van Rossum
Sent: 2. apr?l 2009 17:19
To: Jim Fulton
Cc: Python-Dev at python.org
Subject: Re: [Python-Dev] Let's update CObject API so it is safe and regular!

On Thu, Apr 2, 2009 at 6:22 AM, Jim Fulton <jim at zope.com> wrote:
> The original use case for CObjects was to export an API from a module, in
> which case, you'd be importing the API from the module.

I consider this the *only* use case. What other use cases are there?

From p.f.moore at gmail.com  Fri Apr  3 12:29:17 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 3 Apr 2009 11:29:17 +0100
Subject: [Python-Dev] sequence slice that wraps, bug or intention?
In-Reply-To: <200904031209.38726.eckhardt@satorlaser.com>
References: <200904031209.38726.eckhardt@satorlaser.com>
Message-ID: <79990c6b0904030329i1a8a070em84d0d25df7cdd35a@mail.gmail.com>

2009/4/3 Ulrich Eckhardt <eckhardt at satorlaser.com>:
> Hi!
>
> I just stumbled across something in Python 2.6 where I'm not sure if it is by
> design or a fault:
>
> x = 'abdc'
> x[-3:-3] -> ''
> x[-3:-2] -> 'b'
> x[-3:-1] -> 'bc'
> x[-3: 0] -> ''
>
> The one that actually bothers me here is the last one, I would have expected
> it to yield 'bcd' instead, because otherwise I don't see a way to specify a
> slice that starts with a negative index but still includes the last element.
>
> Similarly, I would expect x[-1,1] to yield 'ca' or at least raise an error,
> but not to return an empty string.
>
> Bug?

Feature. Documented behaviour, even
(http://docs.python.org/reference/expressions.html#id5 section
"Slicings").

This question is more appropriate for python-list (comp.lang.python)
as it is about using Python, rather than the development of the Python
interpreter itself (although I can see that your uncertainty as to
whether this was a bug might have led you to think this was a more
appropriate list). You should first confirm on python-list that a
given behaviour is a bug, and if it is, post it to the tracker, rather
than to python-dev.

In this case, the behaviour is fine. As regards your point "I don't
see a way to specify a slice that starts with a negative index but
still includes the last element" what you want is x[-3:].

If you want to discuss this further, please do so on python-list.

Paul.

From hrvoje.niksic at avl.com  Fri Apr  3 14:07:02 2009
From: hrvoje.niksic at avl.com (Hrvoje Niksic)
Date: Fri, 03 Apr 2009 14:07:02 +0200
Subject: [Python-Dev] Getting values stored inside sets
Message-ID: <49D5FBE6.6090807@avl.com>

I've stumbled upon an oddity using sets.  It's trivial to test if a 
value is in the set, but it appears to be impossible to retrieve a 
stored value, other than by iterating over the whole set.  Let me 
describe a concrete use case.

Imagine a set of objects identified by some piece of information, such 
as a "key" slot (guaranteed to be constant for any particular element). 
  The object could look like this:

class Element(object):
     def __init__(self, key):
         self.key = key
     def __eq__(self, other):
         return self.key == other
     def __hash__(self):
         return hash(self.key)
     # ...

Now imagine a set "s" of such objects.  I can add them to the set:

 >>> s = set()
 >>> s.add(Element('foo'))
 >>> s.add(Element('bar'))

I can test membership using the keys:

 >>> 'foo' in s
True
 >>> 'blah' in s
False

But I can't seem to find a way to retrieve the element corresponding to 
'foo', at least not without iterating over the entire set.  Is this an 
oversight or an intentional feature?  Or am I just missing an obvious 
way to do this?

I know I can work around this by changing the set of elements to a dict 
that maps key -> element, but this feels unsatisfactory.  It's 
redundant, as the element already contains all the necessary 
information, and the set already knows how to use it, and the set must 
remember the original elements anyway, to be able to iterate over them, 
so why not allow one to retrieve them?  Secondly, the data structure I 
need conceptually *is* a set of elements, so it feels wrong to 
pigeonhole it into a dict.

This wasn't an isolated case, we stumbled on this several times while 
trying to use sets.  In comparison, STL sets don't have this limitation.

If this is not possible, I would like to propose either that set's 
__getitem__ translates key to value, so that s['foo'] would return the 
first element, or, if this is considered ugly, an equivalent method, 
such as s.get('foo').

From p.f.moore at gmail.com  Fri Apr  3 14:22:02 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 3 Apr 2009 13:22:02 +0100
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <49D5FBE6.6090807@avl.com>
References: <49D5FBE6.6090807@avl.com>
Message-ID: <79990c6b0904030522v5f5b52b5y94473c4c439d91a8@mail.gmail.com>

2009/4/3 Hrvoje Niksic <hrvoje.niksic at avl.com>:
> I've stumbled upon an oddity using sets. ?It's trivial to test if a value is
> in the set, but it appears to be impossible to retrieve a stored value,
> other than by iterating over the whole set. ?Let me describe a concrete use
> case.
>
> Imagine a set of objects identified by some piece of information, such as a
> "key" slot (guaranteed to be constant for any particular element). ?The
> object could look like this:
>
> class Element(object):
> ? ?def __init__(self, key):
> ? ? ? ?self.key = key
> ? ?def __eq__(self, other):
> ? ? ? ?return self.key == other
> ? ?def __hash__(self):
> ? ? ? ?return hash(self.key)
> ? ?# ...
>
> Now imagine a set "s" of such objects. ?I can add them to the set:
>
>>>> s = set()
>>>> s.add(Element('foo'))
>>>> s.add(Element('bar'))
>
> I can test membership using the keys:
>
>>>> 'foo' in s
> True
>>>> 'blah' in s
> False
>
> But I can't seem to find a way to retrieve the element corresponding to
> 'foo', at least not without iterating over the entire set. ?Is this an
> oversight or an intentional feature? ?Or am I just missing an obvious way to
> do this?

My instinct is that it's intentional. I'd say that you're abusing
__eq__ here. If you can say "x in s" and then can't use x as if it
were the actual item inserted into s, then are they really "equal"?

Using a dict seems like the correct answer. I certainly don't think
it's worth complicating the set interface to cover this corner case.

Paul.

From tjreedy at udel.edu  Fri Apr  3 14:26:02 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 03 Apr 2009 08:26:02 -0400
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <49D5FBE6.6090807@avl.com>
References: <49D5FBE6.6090807@avl.com>
Message-ID: <gr4v8q$1tm$1@ger.gmane.org>

Hrvoje Niksic wrote:
> I've stumbled upon an oddity using sets.  It's trivial to test if a 
> value is in the set, but it appears to be impossible to retrieve a 
> stored value, 

Set elements, by definition, do not have keys or position by which to 
grab.  When they do, use a dict or list.

> other than by iterating over the whole set.  Let me 
> describe a concrete use case.
> 
> Imagine a set of objects identified by some piece of information, such 
> as a "key" slot (guaranteed to be constant for any particular element). 
>  The object could look like this:
> 
> class Element(object):
>     def __init__(self, key):
>         self.key = key
>     def __eq__(self, other):
>         return self.key == other
>     def __hash__(self):
>         return hash(self.key)
>     # ...
> 
> Now imagine a set "s" of such objects.  I can add them to the set:
> 
>  >>> s = set()
>  >>> s.add(Element('foo'))
>  >>> s.add(Element('bar'))
> 
> I can test membership using the keys:
> 
>  >>> 'foo' in s
> True
>  >>> 'blah' in s
> False
> 
> But I can't seem to find a way to retrieve the element corresponding to 
> 'foo', at least not without iterating over the entire set.  Is this an 
> oversight or an intentional feature?  Or am I just missing an obvious 
> way to do this?

Use a dict, like you did.
> 
> I know I can work around this by changing the set of elements to a dict 
> that maps key -> element, but this feels unsatisfactory.

Sorry, that is the right way.

 >  It's
> redundant, as the element already contains all the necessary 
> information,

Records in a database have all the information of the record, but we 
still put out fields for indexes.

tjr

From olemis at gmail.com  Fri Apr  3 14:41:52 2009
From: olemis at gmail.com (Olemis Lang)
Date: Fri, 3 Apr 2009 07:41:52 -0500
Subject: [Python-Dev] unittest package
In-Reply-To: <55E8EAA0-868C-4AEA-B0AE-7DB85F66B348@python.org>
References: <49D534FF.60901@voidspace.org.uk>
	<55E8EAA0-868C-4AEA-B0AE-7DB85F66B348@python.org>
Message-ID: <24ea26600904030541p41dd7bb9w11fb00c26cce0948@mail.gmail.com>

On Thu, Apr 2, 2009 at 6:07 PM, Barry Warsaw <barry at python.org> wrote:
> On Apr 2, 2009, at 4:58 PM, Michael Foord wrote:
>
>> The unittest module is around 1500 lines of code now, and the tests are
>> 3000 lines.
>>
>> It would be much easier to maintain as a package rather than a module.
>> Shall I work on a suggested structure or are there objections in principle?
>
> +1/jfdi :)
>

I remember that something like this was discussed some time ago ...
perhaps the ideas mentionned that time might be valuable ... AFAICR
somebody provided an example ... ;)

+1 for unittest as a package ...

BTW ... Q: Does it means that there will be subpkgs for specific (...
yet standard ...) pkgs ?

If this is the case, and there is a space for a unittest.doctest pkg
(... or whatever ... the name may be different ;) ... and inclusion is
Ok ... and so on ... I wonder ...

Q: Is it possible that dutest module [1]_ be considered ... to live in
stdlib ... ?

The module integrates doctest + unittest ... without needing a plugin
architecture or anything like that, just unittest + doctest... (... in
fact, sometimes I dont really get the idea for having plugins in
testing framews for what can be done following unittest philosophy ...
but anyway ... this is a long OT thread ... and I dont even think to
continue ... was just a brief comment ...)

Classes
=====
- DocTestLoader allows to load (using unittest-style) TestCases which
check the match made for doctests. It provides integration with
TestProgram, supports building complex TestSuites in a more natural
way, and eases the use of specialized instances of TestCases built out
of doctest examples.

- A few classes so as to allow reporting the individual results of
each and every interactive example executed during the test run. A
separate entry is created in the corresponding TestResult instance
containing the expected value and the actual result.

- PackageTestLoader class (acting as a decorator ... design pattern ;)
loads all the tests found throughout a package hierarchy using another
loader . The later is used to retrieve the tests found in modules
matching a specified pattern.

- dutest.main is an alias for dutest.VerboseTestProgram. This class
fixes a minor bug (... IMO) I found while specifying different
verbosity levels from the command line to unittest.TestProgram.

These are the classes right now, but some others (e.g.
DocTestScriptLoader ... to load doctests out of test scripts ...)
might be helpful as well ... ;o)

Download from PyPI
===============
dutest-0.2.2.win32.exe  MS Windows installer  any 76KB          28
dutest-0.2.2-py2.5.egg  Python Egg  2.5  17KB                           93
dutest-0.2.2.zip              Source    any   13KB
         47

PS: Random thoughts ...

.. [1] dutest 0.2.2
          (http://pypi.python.org/pypi/dutest)

.. [2] "Doctest and unittest... now they'll live happily together", O.
Lang (2008) The Python Papers, Volume 3, Issue 1, pp. 31:51
         (http://ojs.pythonpapers.org/index.php/tpp/article/view/56/51)

-- 
Regards,

Olemis.

Blog ES: http://simelo-es.blogspot.com/
Blog EN: http://simelo-en.blogspot.com/

Featured article:
No me gustan los templates de Django ...

From olemis at gmail.com  Fri Apr  3 14:56:19 2009
From: olemis at gmail.com (Olemis Lang)
Date: Fri, 3 Apr 2009 07:56:19 -0500
Subject: [Python-Dev] Package Management - thoughts from the peanut
	gallery
In-Reply-To: <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>
References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org>
	<94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>
Message-ID: <24ea26600904030556o79b82a68le326ddd934806135@mail.gmail.com>

2009/4/3 Tarek Ziad? <ziade.tarek at gmail.com>:
> Guys,
>
> The tasks discussed so far are:
>
> - version definition (http://wiki.python.org/moin/DistutilsVersionFight)
> - egg.info standardification (PEP 376)
> - metadata enhancement (rewrite PEP 345)
> - static metadata definition work? (*)

Looks fine ... and very useful ... ;)

> - creation of a network of OS packager people
> - PyPI mirroring (PEP 381)
>

Wow !

BTW ... I see nothing about removing dist_* commands from distutils ...

Q: Am I wrong or it seems they will remain in stdlib ?

-- 
Regards,

Olemis.

Blog ES: http://simelo-es.blogspot.com/
Blog EN: http://simelo-en.blogspot.com/

Featured article:
Comandos : Pipe Viewer ... ?Qu? est? pasando por esta tuber?a?

From ziade.tarek at gmail.com  Fri Apr  3 15:36:49 2009
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 3 Apr 2009 15:36:49 +0200
Subject: [Python-Dev] Package Management - thoughts from the peanut
	gallery
In-Reply-To: <24ea26600904030556o79b82a68le326ddd934806135@mail.gmail.com>
References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org>
	<94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>
	<24ea26600904030556o79b82a68le326ddd934806135@mail.gmail.com>
Message-ID: <94bdd2610904030636q4089bcban635c32e5eaac1d6d@mail.gmail.com>

On Fri, Apr 3, 2009 at 2:56 PM, Olemis Lang <olemis at gmail.com> wrote:
>
> BTW ... I see nothing about removing dist_* commands from distutils ...
>
> Q: Am I wrong or it seems they will remain in stdlib ?

Ok, beware that what I am writing here is for the long term. There are no plans
yet to remove things right now. Maybe some things for 3.1, as long as
it is clearly
defined and non-controversial. And this is not the most urgent thing
to take care of.

So,
Some commands are not really used by the OS packagers, whether because
these commands don't provide what packagers need, whether because
they are unable to let the packagers configure them the way they would like to.

Packagers still need to tell us why and how to make things better. Some people
like Toshio or Matthias are already helping a lot on this. We are making
a lot of progress since the summit to share our point of views.

So I'd put this task under "creation of a network of OS packager
people" (them+others)

And in detail :

1/ define with them the precise usage of Distutils commands in each OS community

2/ define if there's a leading project that could take care of
building OS-dependant
? ?packages, using packages built by/with Distutils

4/ see what needs to be done in Distutils to let these projects play
with Python packages
   whithout pain.

5/ finally, see what could be externalized/removed from Distutils in
favor of these third-party projects.

This is roughly what Guido was talking about when he said we would
remove things like bdist_rpm
from the stdlib : it's too OS-specific for the stdlib to do a good job
in this area.

To discuss this plan in details, let's move to Distutils-SIG

Cheers
Tarek

--
Tarek Ziad? | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/

From steve at pearwood.info  Fri Apr  3 16:57:21 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 4 Apr 2009 01:57:21 +1100
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <79990c6b0904030522v5f5b52b5y94473c4c439d91a8@mail.gmail.com>
References: <49D5FBE6.6090807@avl.com>
	<79990c6b0904030522v5f5b52b5y94473c4c439d91a8@mail.gmail.com>
Message-ID: <200904040157.21938.steve@pearwood.info>

On Fri, 3 Apr 2009 11:22:02 pm Paul Moore wrote:

> I'd say that you're abusing __eq__ here. If you can say "x in s" 
> and then can't use x as if it were the actual item inserted into 
> s, then are they really "equal"? 

That's hardly unusual in Python.

>>> alist = [0, 1, 2, 3, 4]
>>> 3.0 in alist
True
>>> alist[3.0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers

Besides, there's a concrete use-case for retrieving the actual object 
inside the set. You can ensure that you only have one instance of any 
object with a particular value by using a cache like this:

_cache = {}
def cache(obj):
    if obj in _cache: return _cache[obj]
    _cache[obj] = obj
    return obj

Arguably, it would be neater if the cache was a set rather than a dict, 
thus saving one pointer per item, but of course that would rely on a 
change on set behaviour.

-- 
Steven D'Aprano

From solipsis at pitrou.net  Fri Apr  3 17:07:28 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 3 Apr 2009 15:07:28 +0000 (UTC)
Subject: [Python-Dev] Getting values stored inside sets
References: <49D5FBE6.6090807@avl.com>
	<79990c6b0904030522v5f5b52b5y94473c4c439d91a8@mail.gmail.com>
	<200904040157.21938.steve@pearwood.info>
Message-ID: <loom.20090403T150459-133@post.gmane.org>

Steven D'Aprano <steve <at> pearwood.info> writes:
> 
> That's hardly unusual in Python.
> 
> >>> alist = [0, 1, 2, 3, 4]
> >>> 3.0 in alist
> True
> >>> alist[3.0]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: list indices must be integers

Your example is wrong:

>>> alist = [0, 1, 2, 3, 4]
>>> alist.index(3.0)
3
>>> alist[alist.index(3.0)]
3

Regards

Antoine.

From steve at pearwood.info  Fri Apr  3 17:41:25 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 4 Apr 2009 02:41:25 +1100
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <loom.20090403T150459-133@post.gmane.org>
References: <49D5FBE6.6090807@avl.com> <200904040157.21938.steve@pearwood.info>
	<loom.20090403T150459-133@post.gmane.org>
Message-ID: <200904040241.25758.steve@pearwood.info>

On Sat, 4 Apr 2009 02:07:28 am Antoine Pitrou wrote:

> Your example is wrong:

Of course it is. The perils of posting at 2am, sorry.

Nevertheless, the principle still holds. There's nothing in Python that 
prohibits two objects from being equal, but without them being 
interchangeable. As poorly written as my example was, it still holds: I 
just need to add a level of indirection.

>>> alist = [100, 111, 102, 103, 105, 104, 106, 108]
>>> indices_of_odd_numbers = [alist.index(n) for n in alist if n%2]
>>> if Decimal('3') in indices_of_odd_numbers:
...     print alist[Decimal('3')]
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: list indices must be integers

Python does not promise that if x == y, you can use y anywhere you can 
use x. Nor should it. Paul's declaration of abuse of __eq__ is 
unfounded.

-- 
Steven D'Aprano

From srittau at jroger.in-berlin.de  Fri Apr  3 17:45:42 2009
From: srittau at jroger.in-berlin.de (Sebastian Rittau)
Date: Fri, 3 Apr 2009 17:45:42 +0200
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <49D5FBE6.6090807@avl.com>
References: <49D5FBE6.6090807@avl.com>
Message-ID: <20090403154541.GA6881@jroger.in-berlin.de>

Hello,

On Fri, Apr 03, 2009 at 02:07:02PM +0200, Hrvoje Niksic wrote:

> But I can't seem to find a way to retrieve the element corresponding to  
> 'foo', at least not without iterating over the entire set.  Is this an  
> oversight or an intentional feature?  Or am I just missing an obvious  
> way to do this?

I am missing a simple way to retrieve the "first" element of any
iterable in python that matches a certain condition anyway. Something
like this:

  def first(iter, cb):
      for el in iter:
          if cb(el):
              return el
      raise IndexError()

Or (shorter, but potentially slower):

  def first(iter, cb):
      return [el for el in iter if cb(el)][0]

To be used like this:

  my_el = first(my_set, lambda el: el == "foobar")

This is something I need from time to time and this also seems to solve
your problem.

 - Sebastian

From thomas at python.org  Fri Apr  3 18:06:17 2009
From: thomas at python.org (Thomas Wouters)
Date: Fri, 3 Apr 2009 18:06:17 +0200
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <loom.20090403T092111-554@post.gmane.org>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
Message-ID: <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com>

On Fri, Apr 3, 2009 at 11:27, Antoine Pitrou <solipsis at pitrou.net> wrote:

> Thomas Wouters <thomas <at> python.org> writes:
> >
> >
> > Pystone is pretty much a useless benchmark. If it measures anything, it's
> the
> speed of the bytecode dispatcher (and it doesn't measure it particularly
> well.)
> PyBench isn't any better, in my experience.
>
> I don't think pybench is useless. It gives a lot of performance data about
> crucial internal operations of the interpreter. It is of course very little
> real-world, but conversely makes you know immediately where a performance
> regression has happened. (by contrast, if you witness a regression in a
> high-level benchmark, you still have a lot of investigation to do to find
> out
> where exactly something bad happened)

Really? Have you tried it? I get at least 5% noise between runs without any
changes. I have gotten results that include *negative* run times. And yes, I
tried all the different settings for calibration runs and timing mechanisms.
The tests in PyBench are not micro-benchmarks (they do way too much for
that), they don't try to minimize overhead or noise, but they are also not
representative of real-world code. That doesn't just mean "you can't infer
the affected operation from the test name", but "you can't infer anything."
You can just be looking at differently borrowed runtime. I have in the past
written patches to Python that improved *every* micro-benchmark and *every*
real-world measurement I made, except PyBench. Trying to pinpoint the
slowdown invariably lead to tests that did too much in the measurement loop,
introduced too much noise in the "calibration" run or just spent their time
*in the measurement loop* on doing setup and teardown of the test. Collin
and Jeffrey have seen the exact same thing since starting work on Unladen
Swallow.

So, sure, it might be "useful" if you have 10% or more difference across the
board, and if you don't have access to anything but pybench and pystone.

> Perhaps someone should start maintaining a suite of benchmarks, high-level
> and
> low-level; we currently have them all scattered around (pybench, pystone,
> stringbench, richard, iobench, and the various Unladen Swallow benchmarks;
> not
> to mention other third-party stuff that can be found in e.g. the Computer
> Language Shootout).

That's exactly what Collin proposed at the summits last week. Have you seen
http://code.google.com/p/unladen-swallow/wiki/Benchmarks
 ? Please feel free to suggest more benchmarks to add :)

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090403/8ace99a8/attachment.htm>

From amauryfa at gmail.com  Fri Apr  3 18:07:29 2009
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Fri, 3 Apr 2009 18:07:29 +0200
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <20090403154541.GA6881@jroger.in-berlin.de>
References: <49D5FBE6.6090807@avl.com>
	<20090403154541.GA6881@jroger.in-berlin.de>
Message-ID: <e27efe130904030907t2ba29044n32898ad1cde346e@mail.gmail.com>

Hi,

On Fri, Apr 3, 2009 at 17:45, Sebastian Rittau
<srittau at jroger.in-berlin.de> wrote:
> I am missing a simple way to retrieve the "first" element of any
> iterable in python that matches a certain condition anyway. Something
> like this:
>
> ?def first(iter, cb):
> ? ? ?for el in iter:
> ? ? ? ? ?if cb(el):
> ? ? ? ? ? ? ?return el
> ? ? ?raise IndexError()
>
> Or (shorter, but potentially slower):
>
> ?def first(iter, cb):
> ? ? ?return [el for el in iter if cb(el)][0]
>
> To be used like this:
>
> ?my_el = first(my_set, lambda el: el == "foobar")
>
> This is something I need from time to time and this also seems to solve
> your problem.

def first(iter, cb):
    return itertools.ifilter(cb, iter).next()

-- 
Amaury Forgeot d'Arc

From chris at simplistix.co.uk  Fri Apr  3 18:08:05 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Fri, 03 Apr 2009 17:08:05 +0100
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <ca471dc20904021449x659f930bn7b61feec6b640752@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
	<49D35A39.7020507@simplistix.co.uk>
	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>
	<49D52B2C.5050509@simplistix.co.uk>
	<ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com>
	<49D52C5B.7010506@simplistix.co.uk>
	<ca471dc20904021449x659f930bn7b61feec6b640752@mail.gmail.com>
Message-ID: <49D63465.80401@simplistix.co.uk>

Guido van Rossum wrote:
>>> But anyways this is moot, the bug was only about exec in a class body
>>> *nested inside a function*.
>> Indeed, I just hate seeing execs and it was an interesting mental exercise
>> to try and get rid of the above one ;-)
>>
>> Assuming it breaks no tests, would there be objection to me committing the
>> above change to the Python 3 trunk?
> 
> That's up to Benjamin. Personally, I live by "if it ain't broke, don't
> fix it." :-)

Anything using an exec is broken by definition ;-)

Benjamin?

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From olemis at gmail.com  Fri Apr  3 18:16:47 2009
From: olemis at gmail.com (Olemis Lang)
Date: Fri, 3 Apr 2009 11:16:47 -0500
Subject: [Python-Dev] Package Management - thoughts from the peanut
	gallery
In-Reply-To: <94bdd2610904030636q4089bcban635c32e5eaac1d6d@mail.gmail.com>
References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org>
	<94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>
	<24ea26600904030556o79b82a68le326ddd934806135@mail.gmail.com>
	<94bdd2610904030636q4089bcban635c32e5eaac1d6d@mail.gmail.com>
Message-ID: <24ea26600904030916y8b8d6aeqbda28fe6a481a7b5@mail.gmail.com>

On Fri, Apr 3, 2009 at 8:36 AM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
> On Fri, Apr 3, 2009 at 2:56 PM, Olemis Lang <olemis at gmail.com> wrote:
>>
>> BTW ... I see nothing about removing dist_* commands from distutils ...
>>
>> Q: Am I wrong or it seems they will remain in stdlib ?
>
> This is roughly what Guido was talking about when he said we would
> remove things like bdist_rpm
> from the stdlib : it's too OS-specific for the stdlib to do a good job
> in this area.
>
> To discuss this plan in details, let's move to Distutils-SIG
>

understood ... ;)

-- 
Regards,

Olemis.

Blog ES: http://simelo-es.blogspot.com/
Blog EN: http://simelo-en.blogspot.com/

Featured article:
Comandos : Pipe Viewer ... ?Qu? est? pasando por esta tuber?a?

From collinw at gmail.com  Fri Apr  3 18:19:29 2009
From: collinw at gmail.com (Collin Winter)
Date: Fri, 3 Apr 2009 09:19:29 -0700
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <loom.20090403T092111-554@post.gmane.org>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
Message-ID: <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>

On Fri, Apr 3, 2009 at 2:27 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Thomas Wouters <thomas <at> python.org> writes:
>>
>>
>> Pystone is pretty much a useless benchmark. If it measures anything, it's the
> speed of the bytecode dispatcher (and it doesn't measure it particularly well.)
> PyBench isn't any better, in my experience.
>
> I don't think pybench is useless. It gives a lot of performance data about
> crucial internal operations of the interpreter. It is of course very little
> real-world, but conversely makes you know immediately where a performance
> regression has happened. (by contrast, if you witness a regression in a
> high-level benchmark, you still have a lot of investigation to do to find out
> where exactly something bad happened)
>
> Perhaps someone should start maintaining a suite of benchmarks, high-level and
> low-level; we currently have them all scattered around (pybench, pystone,
> stringbench, richard, iobench, and the various Unladen Swallow benchmarks; not
> to mention other third-party stuff that can be found in e.g. the Computer
> Language Shootout).

Already in the works :)

As part of the common standard library and test suite that we agreed
on at the PyCon language summit last week, we're going to include a
common benchmark suite that all Python implementations can share. This
is still some months off, though, so there'll be plenty of time to
bikeshed^Wrationally discuss which benchmarks should go in there.

Collin

From chris at simplistix.co.uk  Fri Apr  3 18:20:36 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Fri, 03 Apr 2009 17:20:36 +0100
Subject: [Python-Dev] Package Management - thoughts from the peanut
	gallery
In-Reply-To: <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>
References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org>
	<94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>
Message-ID: <49D63754.6030601@simplistix.co.uk>

Tarek Ziad? wrote:
> I have taken the commitment to lead these tasks and synchronize the people
> that are willing to help on this.

Good, I'm one of those people, sadly my only help may be to ask "how is 
this bit going to be done?".

> The tasks discussed so far are:
> 
> - version definition (http://wiki.python.org/moin/DistutilsVersionFight)
> - egg.info standardification (PEP 376)
> - metadata enhancement (rewrite PEP 345)
> - static metadata definition work  (*)

These all seem to be a subset of the last one, right?

> - creation of a network of OS packager people

This would be useful...

> - PyPI mirroring (PEP 381)

I don't see why PyPI isn't just ported to GAE with an S3 data storage 
bit and be done with it... Offline mirrors for people behind firewalls 
already have solutions out there...

> Each one of this task has a leader, except the one with (*). I just got back
> from travelling, and I will reorganize
> http://wiki.python.org/moin/Distutils asap to it is up-to-date.

Cool, is this the focal point to track your activities?

> If you want to work on one of this task or feel there's a new task you can
> start, please, join Distutils SIG or contact me,

Well, I think my "big list" breaks down roughly as tasks, of which I 
think the stuff you're already doing will hopefully take care of the 
first 2, but what about the rest. If labour shortage is all that's 
stopping this, then let me know ;-)

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From martin at v.loewis.de  Fri Apr  3 18:21:27 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 03 Apr 2009 18:21:27 +0200
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
References: <49D3F8D0.8070805@wingware.com>	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>	<49D42013.3010600@wingware.com>	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>	<78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>	<ca471dc20904021419y33932768j8bc7e97e46361d6a@mail.gmail.com>
	<C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
Message-ID: <49D63787.3080304@v.loewis.de>

> I think it's worse to give the poor guy the run around
> by making him run lots of random benchmarks.

"the poor guy" works for Wingware (a company you may have
heard of) and has contributed to Python at several occasions.
His name is John Ehresmann.

> In the end, someone will run a timeit or have a specific
> case that shows the full effect.  All of the respondents so far seem to
> have a clear intuition that hook is right in the middle of a critical
> path.  Their intuition matches what I learned by spending a month 
> trying to find ways to optimize dictionaries.

Ok, so add me as a respondent who thinks that this deserves to be
added despite being in the critical path. I doubt it will be noticeable
in practice.

> Am surprised that there has been no discussion of why this should be in
> the default build (as opposed to a compile time option).

Because, as a compile time option, it will be useless. It's not targeted
for people who want to work on the Python VM (who are the primary users
of compile time options), but for people developing Python applications.

> AFAICT, users have not previously requested a hook like this.

That's because debugging Python in general is in a sad state (which, in
turn, is because you can get very far with just print calls).

> Also, there has been no discussion for an overall strategy
> for monitoring containers in general.  Lists and tuples will
> both defy this approach because there is so much code
> that accesses the arrays directly. 

Dicts are special because they are used to implement namespaces.
Watchpoints is an incredibly useful debugging aid.

> Am not sure whether the
> setitem hook would work for other implementations either.

I can't see why it shouldn't.

> If my thoughts on the subject bug you, I'll happily
> withdraw from the thread.  I don't aspire to be a
> source of negativity.  I just happen to think this proposal isn't a good
> idea.

As somebody who has worked a lot on performance, I'm puzzled how
easily you judge a the performance impact of a patch without having
seen any benchmarks. If I have learned anything about performance, it
is this: never guess the performance aspects of code without
benchmarking.

Regards,
Martin

From status at bugs.python.org  Fri Apr  3 18:06:59 2009
From: status at bugs.python.org (Python tracker)
Date: Fri,  3 Apr 2009 18:06:59 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20090403160659.1AB6278590@psf.upfronthosting.co.za>

ACTIVITY SUMMARY (03/27/09 - 04/03/09)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.

 2272 open (+59) / 15229 closed (+39) / 17501 total (+98)

Open issues with patches:   857

Average duration of open issues: 647 days.
Median duration of open issues: 388 days.

Open Issues Breakdown
   open  2222 (+57)
pending    50 ( +2)

Issues Created Or Reopened (103)
________________________________

mmap.move crashes by integer overflow                            03/31/09
CLOSED http://bugs.python.org/issue5387    reopened ocean-city                    
       patch                                                                   

Issue in transparency in top level tk window(python) on MAC      03/28/09
CLOSED http://bugs.python.org/issue5569    reopened YMohan                        

unqualified exec in class body                                   04/01/09
       http://bugs.python.org/issue5578    reopened loewis                        
       patch                                                                   

abc.abstractproperty() docs list fget as required; fget is not r 03/27/09
CLOSED http://bugs.python.org/issue5581    created  Devin Jeanpierre              

Incorrect DST transition on Windows                              03/27/09
       http://bugs.python.org/issue5582    created  acummings                     

Optional extensions in setup.py                                  03/28/09
CLOSED http://bugs.python.org/issue5583    created  georg.brandl                  
       patch                                                                   

json.loads(u'3.14') fails unexpectedly (minor scanner bug)       03/28/09
       http://bugs.python.org/issue5584    created  bob.ippolito                  
       easy                                                                    

implement initializer for multiprocessing.BaseManager.start()    03/28/09
CLOSED http://bugs.python.org/issue5585    created  lekma                         
       patch, needs review                                                     

The documentation of os.makedirs is misleading                   03/28/09
       http://bugs.python.org/issue5586    created  mher                          
       patch                                                                   

vars() no longer has a use __repr__                              03/28/09
       http://bugs.python.org/issue5587    created  rhettinger                    

Add --randseed to regrtest                                       03/28/09
CLOSED http://bugs.python.org/issue5588    created  collinwinter                  
       patch, needs review                                                     

Wrong dump of floats                                             03/28/09
CLOSED http://bugs.python.org/issue5589    created  stein                         

pyexpat defines global symbol template_string                    03/28/09
       http://bugs.python.org/issue5590    created  doko                          

global symbols in shared libpython not prefixed with Py or _Py   03/28/09
CLOSED http://bugs.python.org/issue5591    created  doko                          

Modules/_textio.c defines global symbol encodefuncs              03/28/09
CLOSED http://bugs.python.org/issue5592    created  doko                          

test_math.testFsum failure on release30-maint                    03/29/09
       http://bugs.python.org/issue5593    reopened pitrou                        

IDLE startup configuration                                       03/29/09
       http://bugs.python.org/issue5594    created  mark                          

os.path.ismount (ntpath) gives UnboundLocalError for any input   03/29/09
CLOSED http://bugs.python.org/issue5595    created  mnewman                       

memory leaks in 3.1                                              03/29/09
       http://bugs.python.org/issue5596    created  pitrou                        
       patch                                                                   

inspect.formatargspec crashes on missing kwonlydefaults          03/29/09
CLOSED http://bugs.python.org/issue5597    created  petr.dolezal                  

"paths" argument missing in DocFileSuite documentation           03/29/09
CLOSED http://bugs.python.org/issue5598    created  harobed                       

test_email_codecs is skipped because it fails to import TestSkip 03/30/09
CLOSED http://bugs.python.org/issue5599    created  r.david.murray                
       easy                                                                    

Slight inaccuracy in webbrowser documentation                    03/30/09
CLOSED http://bugs.python.org/issue5600    created  MLModel                       

webbrowser doesn't just open browsers                            03/30/09
       http://bugs.python.org/issue5601    created  MLModel                       

Slight punctuation problem in documentation of urllib.request.ur 03/30/09
CLOSED http://bugs.python.org/issue5602    created  MLModel                       

Garbled sentence in documentation of urllib.request.urlopen      03/30/09
CLOSED http://bugs.python.org/issue5603    created  MLModel                       

imp.find_module() mixes UTF8 and MBCS                            03/30/09
CLOSED http://bugs.python.org/issue5604    created  gvanrossum                    

Don't assume that repr of literal dicts are sorted like pprint s 03/30/09
CLOSED http://bugs.python.org/issue5605    created  fwierzbicki                   

The makefile dependencies listing formatter.h are wrong          03/30/09
       http://bugs.python.org/issue5606    created  stutzbach                     
       patch                                                                   

Lib/distutils/test/test_util: test_get_platform bogus for OSX    03/30/09
       http://bugs.python.org/issue5607    created  ronaldoussoren                

Add python.exe to the path in windows?                           03/30/09
CLOSED http://bugs.python.org/issue5608    created  twillis                       

Create Unit Tests for nturl2path module                          03/30/09
       http://bugs.python.org/issue5609    created  Kozyarchuk                    

email feedparser.py CRLFLF bug: $ vs \Z                          03/30/09
       http://bugs.python.org/issue5610    created  tony_nelson                   
       patch                                                                   

Auto-detect indentation in C source in vimrc                     03/30/09
       http://bugs.python.org/issue5611    created  KirkMcDonald                  
       patch                                                                   

whitespace folding in the email package could be better ;-)      03/30/09
       http://bugs.python.org/issue5612    created  cjw296                        

test_posix.py and test_wait4.py having missing import on win32   03/30/09
CLOSED http://bugs.python.org/issue5613    created  tdriscol                      

Malloc errors in test_io                                         03/30/09
       http://bugs.python.org/issue5614    created  ronaldoussoren                

linking fails when configured --without-threads                  03/30/09
       http://bugs.python.org/issue5615    created  stutzbach                     
       patch                                                                   

Distutils 2to3 support doesn't have the doctest_only flag.       03/30/09
       http://bugs.python.org/issue5616    created  lregebro                      

Unicode printing in gdb post-mortem sessions                     03/31/09
CLOSED http://bugs.python.org/issue5617    created  dugan                         

PyMemberDef type T_UBYTE incorrectly documtented                 03/31/09
CLOSED http://bugs.python.org/issue5618    created  briancurtin                   
       patch                                                                   

Pass MS CRT debug flags into subprocesses                        04/01/09
       http://bugs.python.org/issue5619    reopened jnoller                       

The attribute's action of an object is not correct.              03/31/09
CLOSED http://bugs.python.org/issue5620    created  Yong yang                     

Add description of special case to "Assignment statements" secti 03/31/09
       http://bugs.python.org/issue5621    created  jjposner                      

wrong error from curses.wrapper if curses initialization fails   03/31/09
       http://bugs.python.org/issue5622    created  nad                           

test_fdopen fails with vs2005,	release build on Windows 2000     03/31/09
       http://bugs.python.org/issue5623    created  amaury.forgeotdarc            
       patch                                                                   

Py3K branch import _winreg instead of winreg                     03/31/09
CLOSED http://bugs.python.org/issue5624    created  Kozyarchuk                    

test_urllib2 fails - urlopen error file not on local host        03/31/09
       http://bugs.python.org/issue5625    created  nad                           

misleading comment in socket.gethostname() documentation         03/31/09
       http://bugs.python.org/issue5626    created  nad                           

PyDict_SetItemString() fails when the second argument is	null    03/31/09
CLOSED http://bugs.python.org/issue5627    created  eulerto                       

TextIOWrapper fails with SystemError when reading HTTPResponse   04/01/09
       http://bugs.python.org/issue5628    reopened orsenthil                     

PEP 0 date and revision not being set                            03/31/09
       http://bugs.python.org/issue5629    created  brett.cannon                  

Update CObject API so it is safe and regular                     03/31/09
       http://bugs.python.org/issue5630    created  lhastings                     
       patch                                                                   

Distutils "upload" command does not show up in--help-commands ou 03/31/09
CLOSED http://bugs.python.org/issue5631    created  blais                         

Bug - threading.currentThread().ident returns None in main threa 03/31/09
CLOSED http://bugs.python.org/issue5632    created  skip.montanaro                
       patch                                                                   

fix for timeit  when the statment is a string and the setup is n 03/31/09
       http://bugs.python.org/issue5633    created  tdriscol                      

cPickle error in case of recursion limit                         03/31/09
       http://bugs.python.org/issue5634    created  bad                           

test_sys reference counting fails while tracing                  03/31/09
CLOSED http://bugs.python.org/issue5635    created  dugan                         
       patch                                                                   

csv.reader next() method missing                                 04/01/09
CLOSED http://bugs.python.org/issue5636    created  tonyjoblin                    

2to3 does not convert urllib.urlopen to urllib.request.urlopen   04/01/09
CLOSED http://bugs.python.org/issue5637    created  orsenthil                     

test_httpservers fails CGI tests if --enable-shared              04/01/09
       http://bugs.python.org/issue5638    created  tony_nelson                   

Support TLS SNI extension in ssl module                          04/01/09
       http://bugs.python.org/issue5639    created  pdp                           
       patch                                                                   

Wrong print() result when unicode error handler is not 'strict'  04/01/09
       http://bugs.python.org/issue5640    created  ishimoto                      
       patch                                                                   

Local variables not freed when Exception raises in function call 04/02/09
CLOSED http://bugs.python.org/issue5641    reopened Glin                          

multiprocessing.Pool.map() docs slightly misleading              04/01/09
       http://bugs.python.org/issue5642    created  jmmcd                         

test__locale fails with RADIXCHAR on Windows                     04/01/09
       http://bugs.python.org/issue5643    created  krisvale                      

test___future__ fails for py3k on Windows                        04/01/09
CLOSED http://bugs.python.org/issue5644    created  krisvale                      

test_memoryio fails for py3k on windows                          04/01/09
       http://bugs.python.org/issue5645    created  krisvale                      

test_importlib fails for py3k on Windows                         04/01/09
       http://bugs.python.org/issue5646    created  krisvale                      

MutableSet.__iand__ implementation calls self.discard while iter 04/01/09
CLOSED http://bugs.python.org/issue5647    created  della                         

OS X Installer: do not install obsolete documentation within Pyt 04/01/09
       http://bugs.python.org/issue5648    created  nad                           

OS X Installer: only include PythonSystemFixes package if target 04/01/09
       http://bugs.python.org/issue5649    created  nad                           

Obsolete RFC's should be removed from doc of urllib.urlparse     04/01/09
       http://bugs.python.org/issue5650    created  MLModel                       

OS X Installer: add checks to ensure proper Tcl configuration du 04/01/09
       http://bugs.python.org/issue5651    created  nad                           

OS X Installer: remove references to Mac/Tools which no longer e 04/01/09
       http://bugs.python.org/issue5652    created  nad                           

OS X Installer: by default install versioned-only links in /usr/ 04/01/09
       http://bugs.python.org/issue5653    created  nad                           

Add C hook in PyDict_SetItem for debuggers                       04/01/09
       http://bugs.python.org/issue5654    created  jpe                           
       patch                                                                   

fix glob.iglob docstring                                         04/01/09
CLOSED http://bugs.python.org/issue5655    created  dsm001                        
       patch                                                                   

Coverage execution fails for files not encoded with utf-8        04/01/09
CLOSED http://bugs.python.org/issue5656    created  maru                          
       patch                                                                   

bad repr of itertools.count object with negative value on OS X 1 04/01/09
       http://bugs.python.org/issue5657    created  nad                           

make html in doc fails because Makefile assigns python to 	PYTHO 04/01/09
CLOSED http://bugs.python.org/issue5658    created  MLModel                       

logging.FileHandler encoding parameter does not work as expected 04/01/09
CLOSED http://bugs.python.org/issue5659    created  warp                          

Cannot deepcopy unittest.TestCase instances                      04/02/09
CLOSED http://bugs.python.org/issue5660    created  spiv                          

asyncore should catch EPIPE while sending() and receiving()      04/02/09
       http://bugs.python.org/issue5661    created  giampaolo.rodola              
       patch                                                                   

py3k interpreter leak                                            04/02/09
CLOSED http://bugs.python.org/issue5662    created  quiver                        

Better failure messages for unittest assertions                  04/02/09
CLOSED http://bugs.python.org/issue5663    created  michael.foord                 
       patch                                                                   

2to3 wont convert Cookie.Cookie properly                         04/02/09
       http://bugs.python.org/issue5664    created  orsenthil                     

Add more pickling tests                                          04/02/09
       http://bugs.python.org/issue5665    created  collinwinter                  
       patch, needs review                                                     

Py_BuildValue("c") should return bytes?                          04/02/09
       http://bugs.python.org/issue5666    created  ocean-city                    
       patch                                                                   

Interpreter fails to initialize on build dir when IO encoding is 04/02/09
       http://bugs.python.org/issue5667    created  hyeshik.chang                 

file "<stdin>" on disk creates garbage output in stack trace     04/02/09
       http://bugs.python.org/issue5668    created  zbysz                         

Extra heapq nlargest/nsmallest option for including ties         04/02/09
CLOSED http://bugs.python.org/issue5669    reopened gsakkis                       

Speed up pickling of dicts in cPickle                            04/02/09
       http://bugs.python.org/issue5670    created  collinwinter                  
       patch, needs review                                                     

Speed up pickling of lists in cPickle                            04/02/09
       http://bugs.python.org/issue5671    created  collinwinter                  
       patch, needs review                                                     

Implement a way to change the python process name                04/02/09
       http://bugs.python.org/issue5672    created  marcelo_fernandez             

Add timeout option to subprocess.Popen                           04/02/09
       http://bugs.python.org/issue5673    created  rnk                           
       patch                                                                   

distutils fails to find Linux libs (lib.....so.n)                04/02/09
       http://bugs.python.org/issue5674    created  jgarrison                     

string module requires bytes type for maketrans, but calling met 04/02/09
       http://bugs.python.org/issue5675    created  MechPaul                      

Fix "make clean" in py3k/trunk                                   04/03/09
       http://bugs.python.org/issue5676    created  lhastings                     
       patch                                                                   

Serious interpreter crash and/or arbitrary memory leak using .re 04/03/09
       http://bugs.python.org/issue5677    created  nneonneo                      

typo in future_builtins documentation                            04/03/09
       http://bugs.python.org/issue5678    created  fredreichbier                 

Repair or Change installation error                              03/30/09
       http://bugs.python.org/issue1565509 reopened ghazel                        

Speed up using + for string concatenation                        03/30/09
       http://bugs.python.org/issue1569040 reopened benjamin.peterson             
       patch                                                                   

Issues Now Closed (233)
_______________________

Get rid of more references to __cmp__                             166 days
       http://bugs.python.org/issue1717    georg.brandl                  
       patch                                                                   

email.MIMEText.MIMEText.as_string incorrectly folding long subje  426 days
       http://bugs.python.org/issue1974    cjw296                        

asynchat push always sends 512 bytes (ignoring ac_out_buffer_siz  414 days
       http://bugs.python.org/issue2073    josiahcarlson                 

time.strptime too strict?  should it assume current year?         394 days
       http://bugs.python.org/issue2227    brett.cannon                  
       patch, easy                                                             

Missing documentation about old/new-style classes                 386 days
       http://bugs.python.org/issue2266    georg.brandl                  

Using an iteration variable outside a list comprehension needs a  379 days
       http://bugs.python.org/issue2344    jhylton                       
       26backport                                                              

Backport memoryview object to Python 2.7                          380 days
       http://bugs.python.org/issue2396    pitrou                        
       patch                                                                   

regrtest should not just skip imports that fail                   378 days
       http://bugs.python.org/issue2409    r.david.murray                
       patch                                                                   

stdbool support                                                   366 days
       http://bugs.python.org/issue2497    r.david.murray                
       patch                                                                   

locale.format() problems with decimal separator                   365 days
       http://bugs.python.org/issue2522    r.david.murray                
       patch                                                                   

Seconds range in time unit                                        360 days
       http://bugs.python.org/issue2568    r.david.murray                
       patch, easy                                                             

cmd.py should track input file objects so macros with	submacros   357 days
       http://bugs.python.org/issue2577    rickbking                     

ErrorHandler buffer overflow in ?unused? SGI extension module al  354 days
       http://bugs.python.org/issue2591    gvanrossum                    

alp_ReadFrames() integer overflow leads to buffer overflow        355 days
       http://bugs.python.org/issue2593    r.david.murray                

alp_readsamps() overflow leads to memory corruption in ?unused?   355 days
       http://bugs.python.org/issue2594    r.david.murray                

allow field_name in format strings to default to next positional  354 days
       http://bugs.python.org/issue2599    r.david.murray                

mailbox.MH.get_message() treats result of get_sequences() as lis  355 days
       http://bugs.python.org/issue2625    r.david.murray                
       patch, easy                                                             

Mac version of IDLE doesn't scroll as expected                    330 days
       http://bugs.python.org/issue2754    ronaldoussoren                

IDLE ignores module change before restart                         330 days
       http://bugs.python.org/issue2755    tjreedy                       

pydoc doesnt show 'from module import identifier' in the docs     308 days
       http://bugs.python.org/issue2966    georg.brandl                  

arguments and default path not set in site.py and sitecustomize.  308 days
       http://bugs.python.org/issue2972    brett.cannon                  

Clean up Demos and Tools                                          293 days
       http://bugs.python.org/issue3087    brett.cannon                  
       easy                                                                    

Multiprocessing package build problem on Solaris 10               291 days
       http://bugs.python.org/issue3110    jnoller                       
       patch                                                                   

Hang when calling get() on an empty queue in the queue module     281 days
       http://bugs.python.org/issue3138    tazle                         

test_multiprocessing: test_listener_client flakiness              272 days
       http://bugs.python.org/issue3270    jnoller                       
       patch                                                                   

Test failure in test_math::testSum                                252 days
       http://bugs.python.org/issue3421    marketdickinson               

urllib documentation: urlopen().info() return type                252 days
       http://bugs.python.org/issue3427    georg.brandl                  
       patch                                                                   

Multi-process 2to3                                                250 days
       http://bugs.python.org/issue3448    benjamin.peterson             
       patch                                                                   

os.path.normcase documentation/behaviour unclear on Mac OS X      241 days
       http://bugs.python.org/issue3485    ronaldoussoren                
       patch                                                                   

Missing IDLE Preferences on Mac                                   231 days
       http://bugs.python.org/issue3549    kbk                           

multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES i  231 days
       http://bugs.python.org/issue3551    jnoller                       
       patch                                                                   

A more informative message for ImportError                        225 days
       http://bugs.python.org/issue3619    brett.cannon                  
       patch                                                                   

Cannot read saved csv file in a single run                        216 days
       http://bugs.python.org/issue3681    gpolo                         

Python 3.0 beta 2 : json and urllib not working together?         210 days
       http://bugs.python.org/issue3763    orsenthil                     

test_multiprocessing fails on systems with HAVE_SEM_OPEN=0        210 days
       http://bugs.python.org/issue3770    jnoller                       
       patch, needs review                                                     

Patch for adding "default" to itemgetter and attrgetter           169 days
       http://bugs.python.org/issue4124    rhettinger                    
       patch                                                                   

Add file comparisons to the unittest library                      156 days
       http://bugs.python.org/issue4217    georg.brandl                  

On some Python builds, exec in a function can't create shadows o  138 days
       http://bugs.python.org/issue4315    jhylton                       

__mro__ documentation                                             127 days
       http://bugs.python.org/issue4411    georg.brandl                  

Given a module hierarchy string 'a.b.c', add an easy way to impo  126 days
       http://bugs.python.org/issue4438    georg.brandl                  
       patch                                                                   

Build / Test Py3K failed on Ubuntu 8.10                           117 days
       http://bugs.python.org/issue4535    benjamin.peterson             

add SEEK_* values to io and/or io.IOBase                          116 days
       http://bugs.python.org/issue4572    georg.brandl                  

compile() doesn't ignore the source encoding when a string is pa  110 days
       http://bugs.python.org/issue4626    jmfauth                       
       patch, needs review                                                     

urllib's splitpasswd does not accept newline chars in passwords   104 days
       http://bugs.python.org/issue4675    orsenthil                     
       patch                                                                   

try to build a C module, but don't worry if it doesn't work       101 days
       http://bugs.python.org/issue4706    tarek                         

exec() behavior - revisited                                        86 days
       http://bugs.python.org/issue4831    jhylton                       

MacPython build script uses Carbon and MacOS modules slated for    84 days
       http://bugs.python.org/issue4848    ronaldoussoren                

js_output wrong for cookies with " characters                      85 days
       http://bugs.python.org/issue4860    orsenthil                     
       patch                                                                   

system wide site-packages dir not used on Mac OS X                 82 days
       http://bugs.python.org/issue4865    ronaldoussoren                
       patch                                                                   

Behavior of backreferences to named groups in regular expression   82 days
       http://bugs.python.org/issue4882    georg.brandl                  
       patch                                                                   

test/regrtest.py contains error on __import__                      82 days
       http://bugs.python.org/issue4886    r.david.murray                

email/header.py ecre regular expression issue                      71 days
       http://bugs.python.org/issue4958    amaury.forgeotdarc            

urlparse & nfs url (rfc 2224)                                      74 days
       http://bugs.python.org/issue4962    orsenthil                     
       patch                                                                   

multiprocessing/pipe_connection.c compiler warning (conn_poll)     71 days
       http://bugs.python.org/issue5002    jnoller                       
       patch                                                                   

Overly general claim about sequence unpacking in tutorial          70 days
       http://bugs.python.org/issue5018    georg.brandl                  

Adjust reference-counting note                                     67 days
       http://bugs.python.org/issue5039    georg.brandl                  
       patch                                                                   

Bug of CGIXMLRPCRequestHandler                                     67 days
       http://bugs.python.org/issue5040    orsenthil                     
       patch                                                                   

Printing Unicode chars from the interpreter in a non-UTF8 termin   61 days
       http://bugs.python.org/issue5110    ishimoto                      
       patch                                                                   

Add combinatoric counting functions to the math module.            58 days
       http://bugs.python.org/issue5139    rhettinger                    

multiprocessing: SocketListener should use SO_REUSEADDR            51 days
       http://bugs.python.org/issue5177    jnoller                       
       patch                                                                   

optparse doex not export make_option                               50 days
       http://bugs.python.org/issue5190    georg.brandl                  

warns vars() assignment as well as locals()                        49 days
       http://bugs.python.org/issue5199    georg.brandl                  
       patch                                                                   

String Formatting with namedtuple                                  50 days
       http://bugs.python.org/issue5205    rhettinger                    

urllib2.build_opener(                                              49 days
       http://bugs.python.org/issue5208    georg.brandl                  

change value of local variable in debug                            50 days
       http://bugs.python.org/issue5215    georg.brandl                  
       patch                                                                   

Py_Main() does not return on sys.exit()                            47 days
       http://bugs.python.org/issue5227    georg.brandl                  

multiprocessing not compatible with functools.partial              47 days
       http://bugs.python.org/issue5228    jackdied                      

time.strptime should reject bytes arguments on Py3                 46 days
       http://bugs.python.org/issue5236    brett.cannon                  

Change time.strptime() to make it work with Unicode chars          46 days
       http://bugs.python.org/issue5239    brett.cannon                  
       patch                                                                   

Missing flags in the Regex howto                                   47 days
       http://bugs.python.org/issue5241    ezio.melotti                  

PyRun_SimpleStringFlags() documentation                            46 days
       http://bugs.python.org/issue5245    georg.brandl                  

with lock fails on multiprocessing                                 44 days
       http://bugs.python.org/issue5261    jnoller                       
       patch                                                                   

OS X installer: faulty Python.app bundle inside of framework       43 days
       http://bugs.python.org/issue5270    ronaldoussoren                

OS X installer: build can fail on import checks                    43 days
       http://bugs.python.org/issue5271    ronaldoussoren                

http client error                                                  37 days
       http://bugs.python.org/issue5314    jhylton                       

__subclasses__ undocumented                                        37 days
       http://bugs.python.org/issue5324    georg.brandl                  

json needs object_pairs_hook                                       31 days
       http://bugs.python.org/issue5381    bob.ippolito                  
       patch                                                                   

mmap.move crashes by integer overflow                               0 days
       http://bugs.python.org/issue5387    ocean-city                    
       patch                                                                   

patches for multiprocessing module on NetBSD                       30 days
       http://bugs.python.org/issue5400    jnoller                       
       patch                                                                   

Reference to missing(?) function in Extending & Embedding Docume   27 days
       http://bugs.python.org/issue5417    georg.brandl                  

Pyshell history management error                                   27 days
       http://bugs.python.org/issue5428    kbk                           

test_httpservers on Debian Testing                                 27 days
       http://bugs.python.org/issue5435    zamotcr                       

[3.1alpha1] test_importlib fails on Mac OSX 10.5.6                 25 days
       http://bugs.python.org/issue5442    brett.cannon                  

only accept byte for getarg('c') and unicode for getarg('C')       16 days
       http://bugs.python.org/issue5499    haypo                         
       patch                                                                   

Deletion of some statements in re documentation                    12 days
       http://bugs.python.org/issue5519    georg.brandl                  

HTTPRedirectHandler documentation is wrong                         12 days
       http://bugs.python.org/issue5522    georg.brandl                  

execfile() removed from Python3                                    12 days
       http://bugs.python.org/issue5524    jhylton                       
       patch                                                                   

Backport sys module docs involving import to 2.7                   11 days
       http://bugs.python.org/issue5529    georg.brandl                  

tearDown in unittest should be executed regardless of result in     7 days
       http://bugs.python.org/issue5538    yaneurabeya                   
       needs review                                                            

"file objects" in python 3 tutorial                                 9 days
       http://bugs.python.org/issue5540    georg.brandl                  

multiprocessing: switch to autoconf detection of platform values    9 days
       http://bugs.python.org/issue5545    jnoller                       
       patch                                                                   

In the tutorial, PyMODINIT_FUNC is shown as having a return type    8 days
       http://bugs.python.org/issue5548    georg.brandl                  

Python 3.0.1 Mac OS X install image ReadMe file is incorrect        6 days
       http://bugs.python.org/issue5558    ronaldoussoren                

Document bdist_msi                                                  6 days
       http://bugs.python.org/issue5563    georg.brandl                  
       patch                                                                   

os.symlink/os.link docs should say old/new, not src/dst             3 days
       http://bugs.python.org/issue5564    benjamin.peterson             

Minor error in document of PyLong_AsSsize_t                         5 days
       http://bugs.python.org/issue5566    georg.brandl                  

Operators in operator module don't work with keyword arguments      6 days
       http://bugs.python.org/issue5567    rhettinger                    

Issue in transparency in top level tk window(python) on MAC         0 days
       http://bugs.python.org/issue5569    amaury.forgeotdarc            

Bus error when calling .poll() on a closed Connection from multi    4 days
       http://bugs.python.org/issue5570    jnoller                       

new "TestCase.skip" method causes all tests to skip under trial     1 days
       http://bugs.python.org/issue5571    glyph                         

multiprocessing queues.py doesn't include JoinableQueue in its _    4 days
       http://bugs.python.org/issue5574    jnoller                       

yield in iterators                                                  0 days
       http://bugs.python.org/issue5577    gvanrossum                    

abc.abstractproperty() docs list fget as required; fget is not r    4 days
       http://bugs.python.org/issue5581    georg.brandl                  

Optional extensions in setup.py                                     4 days
       http://bugs.python.org/issue5583    tarek                         
       patch                                                                   

implement initializer for multiprocessing.BaseManager.start()       5 days
       http://bugs.python.org/issue5585    lekma                         
       patch, needs review                                                     

Add --randseed to regrtest                                          1 days
       http://bugs.python.org/issue5588    marketdickinson               
       patch, needs review                                                     

Wrong dump of floats                                                0 days
       http://bugs.python.org/issue5589    benjamin.peterson             

global symbols in shared libpython not prefixed with Py or _Py      1 days
       http://bugs.python.org/issue5591    pitrou                        

Modules/_textio.c defines global symbol encodefuncs                 0 days
       http://bugs.python.org/issue5592    pitrou                        

os.path.ismount (ntpath) gives UnboundLocalError for any input      0 days
       http://bugs.python.org/issue5595    benjamin.peterson             

inspect.formatargspec crashes on missing kwonlydefaults             0 days
       http://bugs.python.org/issue5597    benjamin.peterson             

"paths" argument missing in DocFileSuite documentation              2 days
       http://bugs.python.org/issue5598    georg.brandl                  

test_email_codecs is skipped because it fails to import TestSkip    0 days
       http://bugs.python.org/issue5599    benjamin.peterson             
       easy                                                                    

Slight inaccuracy in webbrowser documentation                       0 days
       http://bugs.python.org/issue5600    benjamin.peterson             

Slight punctuation problem in documentation of urllib.request.ur    2 days
       http://bugs.python.org/issue5602    georg.brandl                  

Garbled sentence in documentation of urllib.request.urlopen         2 days
       http://bugs.python.org/issue5603    georg.brandl                  

imp.find_module() mixes UTF8 and MBCS                               3 days
       http://bugs.python.org/issue5604    asvetlov                      

Don't assume that repr of literal dicts are sorted like pprint s    0 days
       http://bugs.python.org/issue5605    benjamin.peterson             

Add python.exe to the path in windows?                              0 days
       http://bugs.python.org/issue5608    loewis                        

test_posix.py and test_wait4.py having missing import on win32      2 days
       http://bugs.python.org/issue5613    r.david.murray                

Unicode printing in gdb post-mortem sessions                        1 days
       http://bugs.python.org/issue5617    georg.brandl                  

PyMemberDef type T_UBYTE incorrectly documtented                    1 days
       http://bugs.python.org/issue5618    georg.brandl                  
       patch                                                                   

The attribute's action of an object is not correct.                 1 days
       http://bugs.python.org/issue5620    Yong yang                     

Py3K branch import _winreg instead of winreg                        1 days
       http://bugs.python.org/issue5624    georg.brandl                  

PyDict_SetItemString() fails when the second argument is	null       1 days
       http://bugs.python.org/issue5627    georg.brandl                  

Distutils "upload" command does not show up in--help-commands ou    1 days
       http://bugs.python.org/issue5631    georg.brandl                  

Bug - threading.currentThread().ident returns None in main threa    0 days
       http://bugs.python.org/issue5632    benjamin.peterson             
       patch                                                                   

test_sys reference counting fails while tracing                     0 days
       http://bugs.python.org/issue5635    georg.brandl                  
       patch                                                                   

csv.reader next() method missing                                    1 days
       http://bugs.python.org/issue5636    georg.brandl                  

2to3 does not convert urllib.urlopen to urllib.request.urlopen      1 days
       http://bugs.python.org/issue5637    benjamin.peterson             

Local variables not freed when Exception raises in function call    0 days
       http://bugs.python.org/issue5641    georg.brandl                  

test___future__ fails for py3k on Windows                           0 days
       http://bugs.python.org/issue5644    benjamin.peterson             

MutableSet.__iand__ implementation calls self.discard while iter    0 days
       http://bugs.python.org/issue5647    rhettinger                    

fix glob.iglob docstring                                            0 days
       http://bugs.python.org/issue5655    georg.brandl                  
       patch                                                                   

Coverage execution fails for files not encoded with utf-8           0 days
       http://bugs.python.org/issue5656    georg.brandl                  
       patch                                                                   

make html in doc fails because Makefile assigns python to 	PYTHO    1 days
       http://bugs.python.org/issue5658    MLModel                       

logging.FileHandler encoding parameter does not work as expected    1 days
       http://bugs.python.org/issue5659    warp                          

Cannot deepcopy unittest.TestCase instances                         0 days
       http://bugs.python.org/issue5660    michael.foord                 

py3k interpreter leak                                               0 days
       http://bugs.python.org/issue5662    benjamin.peterson             

Better failure messages for unittest assertions                     0 days
       http://bugs.python.org/issue5663    michael.foord                 
       patch                                                                   

Extra heapq nlargest/nsmallest option for including ties            0 days
       http://bugs.python.org/issue5669    rhettinger                    

Confusions in formatfloat                                        2566 days
       http://bugs.python.org/issue532631  marketdickinson               

Bgen should learn about booleans                                 2404 days
       http://bugs.python.org/issue602291  ronaldoussoren                

More documentation for the imp module                            2376 days
       http://bugs.python.org/issue616247  brett.cannon                  

Imports can deadlock                                             2233 days
       http://bugs.python.org/issue689895  brett.cannon                  

Reloading pseudo modules                                         2213 days
       http://bugs.python.org/issue701743  brett.cannon                  
       patch                                                                   

bgen requires Universal Headers, not OS X dev headers            2072 days
       http://bugs.python.org/issue779153  ronaldoussoren                

BasicModuleLoader behaviour in Python 2.3c2                      2074 days
       http://bugs.python.org/issue779191  brett.cannon                  

zipimport on meta_path fails with mutual importers               2060 days
       http://bugs.python.org/issue787113  brett.cannon                  

imp.find_module doesn't work in /tmp                             2022 days
       http://bugs.python.org/issue809254  brett.cannon                  

cryptic os.spawnvpe() return code                                1972 days
       http://bugs.python.org/issue837577  georg.brandl                  

reload() fails with modules from zips                            1942 days
       http://bugs.python.org/issue856103  brett.cannon                  

Some Carbon modules missing from documentation                   1873 days
       http://bugs.python.org/issue896199  ronaldoussoren                
       easy                                                                    

bundlebuilder: some way to add non-py files in packages          1866 days
       http://bugs.python.org/issue900502  ronaldoussoren                

asyncore fixes and improvements                                  1854 days
       http://bugs.python.org/issue909005  giampaolo.rodola              
       patch                                                                   

nametowidget throws TypeError for Tcl_Objs                       1811 days
       http://bugs.python.org/issue934418  gpolo                         

List with Canvas.create_line Option arrow=LAST Broke             1799 days
       http://bugs.python.org/issue941262  gpolo                         

importing dynamic modules via embedded python                    1764 days
       http://bugs.python.org/issue965206  brett.cannon                  

import x.y inside of module x.y                                  1763 days
       http://bugs.python.org/issue966431  brett.cannon                  

PyObject_GenericGetAttr is undocumented                          1755 days
       http://bugs.python.org/issue970783  georg.brandl                  
       easy                                                                    

Starting a script in OSX within a specific folder                1748 days
       http://bugs.python.org/issue974159  ronaldoussoren                

An inconsistency with nested scopes                              1721 days
       http://bugs.python.org/issue991196  jhylton                       

exec statement balks at CR/LF                                    1719 days
       http://bugs.python.org/issue992207  georg.brandl                  
       easy                                                                    

test__locale fails on MacOS X                                    1699 days
       http://bugs.python.org/issue1005113 brett.cannon                  

Can't raise "C API version mismatch" warning                     1634 days
       http://bugs.python.org/issue1044382 brett.cannon                  

current directory in sys.path handles symlinks badly             1585 days
       http://bugs.python.org/issue1074015 brett.cannon                  

Carbon.Res misses GetIndString                                   1560 days
       http://bugs.python.org/issue1089399 ronaldoussoren                

Carbon.File.FSCatalogInfo.createDate implementation              1560 days
       http://bugs.python.org/issue1089624 ronaldoussoren                

_AEModule.c patch                                                1557 days
       http://bugs.python.org/issue1090958 ronaldoussoren                
       patch                                                                   

sys.__stdout__ doco isn't discouraging enough                    1546 days
       http://bugs.python.org/issue1096310 georg.brandl                  

OSATerminology still semi-broken                                 1519 days
       http://bugs.python.org/issue1113328 ronaldoussoren                

patches to compile for AIX 4.1.x                                 1509 days
       http://bugs.python.org/issue1119626 ajaksu2                       
       patch                                                                   

eval does not bind variables in lambda bodies correctly          1492 days
       http://bugs.python.org/issue1153622 jhylton                       

Neverending warnings from asyncore                               1483 days
       http://bugs.python.org/issue1161031 brett.cannon                  
       patch                                                                   

threading.Condition.wait() return value indicates timeout        1457 days
       http://bugs.python.org/issue1175933 gvanrossum                    
       patch, easy                                                             

allow running multiple instances of IDLE                         1417 days
       http://bugs.python.org/issue1201569 kbk                           
       patch                                                                   

'insufficient disk space' message wrong (msi on win xp pro)      1361 days
       http://bugs.python.org/issue1234328 ajaksu2                       

httplib gzip support                                             1347 days
       http://bugs.python.org/issue1243678 georg.brandl                  
       patch                                                                   

expat binding for XML_ParserReset (Bug #1208730)                 1345 days
       http://bugs.python.org/issue1244208 ajaksu2                       
       patch                                                                   

QuickTime API needs corrected object types                       1329 days
       http://bugs.python.org/issue1254695 ronaldoussoren                
       patch                                                                   

2.4.1 make fails on Solaris 10 (complexobject.c/HUGE_VAL)        1308 days
       http://bugs.python.org/issue1276509 ajaksu2                       

Incorrect use of -L/usr/lib/termcap                              1257 days
       http://bugs.python.org/issue1332732 ajaksu2                       

async_chat.push() can trigger handle_error(). undocumented.      1217 days
       http://bugs.python.org/issue1370380 josiahcarlson                 

minidom namespace problems                                       1214 days
       http://bugs.python.org/issue1371937 ajaksu2                       

_winreg specifies EnvironmentError instead of WindowsError       1197 days
       http://bugs.python.org/issue1386675 georg.brandl                  

Compile under mingw properly                                     1162 days
       http://bugs.python.org/issue1412448 ajaksu2                       
       patch                                                                   

PyImport_AppendInittab stores pointer to parameter               1157 days
       http://bugs.python.org/issue1419652 brett.cannon                  

Unable to stringify datetime with tzinfo                         1114 days
       http://bugs.python.org/issue1447945 ajaksu2                       

Hitting CTRL-C while in a loop closes IDLE on cygwin             1083 days
       http://bugs.python.org/issue1468223 tebeka                        

endless loop in PyCFunction_Fini()                               1049 days
       http://bugs.python.org/issue1488906 ajaksu2                       

sys.path issue if sys.prefix contains a colon                    1021 days
       http://bugs.python.org/issue1507224 brett.cannon                  

__del__: Type is cleared before instances                        1006 days
       http://bugs.python.org/issue1513802 benjamin.peterson             

site.py can break the location of the python library             1007 days
       http://bugs.python.org/issue1514734 brett.cannon                  

fcntl.ioctl fails to copy back exactly-1024 buffer                992 days
       http://bugs.python.org/issue1520818 ajaksu2                       

Document additions from PEP 302                                   987 days
       http://bugs.python.org/issue1525549 brett.cannon                  

Literal strings use BS as octal escape character                  978 days
       http://bugs.python.org/issue1530012 georg.brandl                  

distutils 'register' command and windows home directories         973 days
       http://bugs.python.org/issue1531505 ajaksu2                       

Win32 debug version of _msi creates _msi.pyd, not _msi_d.pyd      968 days
       http://bugs.python.org/issue1534738 ajaksu2                       

sys.path gets munged with certain directory structures            969 days
       http://bugs.python.org/issue1534764 brett.cannon                  

"make install" doesn't install to /usr/lib64 on x86_64 boxes      965 days
       http://bugs.python.org/issue1536339 ajaksu2                       

python-2.5c1.msi contains ICE validation errors and warnings      955 days
       http://bugs.python.org/issue1542432 ajaksu2                       

test_tempfile fails on cygwin                                     953 days
       http://bugs.python.org/issue1543467 ajaksu2                       

Wireless on Python                                                946 days
       http://bugs.python.org/issue1547300 ajaksu2                       

C modules reloaded on certain failed imports                      946 days
       http://bugs.python.org/issue1548687 brett.cannon                  

python 2.5 install can't find tcl/tk in /usr/lib64                936 days
       http://bugs.python.org/issue1553166 ajaksu2                       

Class instance apparently not destructed when expected            935 days
       http://bugs.python.org/issue1553819 ajaksu2                       

2.5c1 Core dump during 64-bit make on Solaris 9 Sparc             929 days
       http://bugs.python.org/issue1557490 ajaksu2                       

strftime('%z') behaving differently with/without time arg.        924 days
       http://bugs.python.org/issue1560794 ajaksu2                       

IDLE: Dedent with Italian keyboard                                921 days
       http://bugs.python.org/issue1562092 gpolo                         

importing threading in a thread does not work                     921 days
       http://bugs.python.org/issue1562822 brett.cannon                  

--disable-sunaudiodev --disable-tk does not work                  895 days
       http://bugs.python.org/issue1579029 ajaksu2                       

Error piping output between scripts on Windows                    879 days
       http://bugs.python.org/issue1590068 georg.brandl                  

import deadlocks when using PyObjC threads                        879 days
       http://bugs.python.org/issue1590864 brett.cannon                  

PyThread_release_lock with pthreads munges errno                  849 days
       http://bugs.python.org/issue1608921 benjamin.peterson             
       patch                                                                   

IDLE crashes on OS X 10.4 when "Preferences" selected             831 days
       http://bugs.python.org/issue1621111 kbk                           

Please provide rsync-method in the urllib[2] module               810 days
       http://bugs.python.org/issue1634770 orsenthil                     

MIME renderer: wrong header line break with long subject?         795 days
       http://bugs.python.org/issue1645148 barry                         

sgmllib _convert_ref UnicodeDecodeError exception, new in 2.5     786 days
       http://bugs.python.org/issue1651995 georg.brandl                  
       patch                                                                   

thread join() with timeout hangs on Windows 2003 x64              786 days
       http://bugs.python.org/issue1654429 amaury.forgeotdarc            

Handle requests to intern string subtype instances                776 days
       http://bugs.python.org/issue1658799 ajaksu2                       
       patch                                                                   

Calling tparm from extension lib fails in Python 2.5              777 days
       http://bugs.python.org/issue1659171 ajaksu2                       

Hangup when using cgitb in a thread while still in import         770 days
       http://bugs.python.org/issue1665206 brett.cannon                  

add identity function                                             760 days
       http://bugs.python.org/issue1673203 rhettinger                    

Make threading.Event().wait(timeout=3) return isSet               757 days
       http://bugs.python.org/issue1674032 georg.brandl                  
       patch                                                                   

Redirect cause invalid descriptor error                           756 days
       http://bugs.python.org/issue1675026 georg.brandl                  

Remove trailing slash from --prefix                               755 days
       http://bugs.python.org/issue1676135 georg.brandl                  
       patch                                                                   

PEP 361 Warnings                                                  742 days
       http://bugs.python.org/issue1683908 ajaksu2                       
       patch                                                                   

python throws an error when unpacking bz2 file                    694 days
       http://bugs.python.org/issue1714773 georg.brandl                  

Destructor behavior faulty                                        689 days
       http://bugs.python.org/issue1717900 pitrou                        

Line ending bug SimpleXMLRPCServer                                677 days
       http://bugs.python.org/issue1725295 georg.brandl                  
       patch                                                                   

Windows Build Warnings                                            677 days
       http://bugs.python.org/issue1726196 amaury.forgeotdarc            
       patch                                                                   

telnetlib: A callback for monitoring the telnet session           666 days
       http://bugs.python.org/issue1730959 jackdied                      
       patch                                                                   

asyncore/asynchat patches                                         654 days
       http://bugs.python.org/issue1736190 intgr                         
       patch                                                                   

Top Issues Most Discussed (10)
______________________________

 43 additional unittest type equality methods                        361 days
open    http://bugs.python.org/issue2578   

 21 asyncore delayed calls feature                                   473 days
open    http://bugs.python.org/issue1641   

 15 Pass MS CRT debug flags into subprocesses                          2 days
open    http://bugs.python.org/issue5619   

 14 multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES   231 days
closed  http://bugs.python.org/issue3551   

 13 implement initializer for multiprocessing.BaseManager.start()      5 days
closed  http://bugs.python.org/issue5585   

 13 Neverending warnings from asyncore                              1483 days
closed  http://bugs.python.org/issue1161031

 12 Speed up pickling of dicts in cPickle                              1 days
open    http://bugs.python.org/issue5670   

 12 test_math.testFsum failure on release30-maint                      5 days
open    http://bugs.python.org/issue5593   

 11 Extra heapq nlargest/nsmallest option for including ties           0 days
closed  http://bugs.python.org/issue5669   

 10 PyDict_SetItemString() fails when the second argument is	null      1 days
closed  http://bugs.python.org/issue5627   

From martin at v.loewis.de  Fri Apr  3 18:35:09 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Fri, 03 Apr 2009 18:35:09 +0200
Subject: [Python-Dev] Should the io-c modules be put in their
	own	directory?
In-Reply-To: <loom.20090403T085522-602@post.gmane.org>
References: <acd65fa20904022110p2cf46105n4061286bc249d55e@mail.gmail.com>
	<loom.20090403T085522-602@post.gmane.org>
Message-ID: <49D63ABD.30000@v.loewis.de>

>> I just noticed that the new io-c modules were merged in the py3k
>> branch (I know, I am kind late on the news?blame school work). Anyway,
>> I am just wondering if it would be a good idea to put the io-c modules
>> in a sub-directory (like sqlite), instead of scattering them around in
>> the Modules/ directory.
> 
> Welcome back!
> 
> I have no particular opinion on this. I suggest waiting for Benjamin's advice
> and following it :-)

I would suggest to leave it as is:
a) never change a running system
b) flat is better than nested

Martin

From olemis at gmail.com  Fri Apr  3 18:38:12 2009
From: olemis at gmail.com (Olemis Lang)
Date: Fri, 3 Apr 2009 11:38:12 -0500
Subject: [Python-Dev] Package Management - thoughts from the peanut gallery
In-Reply-To: <24ea26600904030937y72e36dcdydeab19607302f23d@mail.gmail.com>
References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org>
	<94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>
	<49D63754.6030601@simplistix.co.uk>
	<24ea26600904030937y72e36dcdydeab19607302f23d@mail.gmail.com>
Message-ID: <24ea26600904030938i45f191a7o6be70ac0f1a95761@mail.gmail.com>

On Fri, Apr 3, 2009 at 11:20 AM, Chris Withers <chris at simplistix.co.uk> wrote:
> Tarek Ziad? wrote:
>
>> - PyPI mirroring (PEP 381)
>
> I don't see why PyPI isn't just ported to GAE with an S3 data storage bit
> and be done with it... Offline mirrors for people behind firewalls already
> have solutions out there...
>

-1 ... IMHO ...

--
Regards,

Olemis.

Blog ES: http://simelo-es.blogspot.com/
Blog EN: http://simelo-en.blogspot.com/

Featured article:
No me gustan los templates de Django ...

-- 
Regards,

Olemis.

Blog ES: http://simelo-es.blogspot.com/
Blog EN: http://simelo-en.blogspot.com/

Featured article:
Comandos : Pipe Viewer ... ?Qu? est? pasando por esta tuber?a?

From martin at v.loewis.de  Fri Apr  3 18:43:05 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 03 Apr 2009 18:43:05 +0200
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <49D5FBE6.6090807@avl.com>
References: <49D5FBE6.6090807@avl.com>
Message-ID: <49D63C99.6000302@v.loewis.de>

> I've stumbled upon an oddity using sets.  It's trivial to test if a
> value is in the set, but it appears to be impossible to retrieve a
> stored value, other than by iterating over the whole set. 

Of course it is. That's why it is called a set: it's an unordered
collection of objects, keyed by nothing.

If you have a set of elements, and you check "'foo' in s", then
you should be able just to use the string 'foo' itself for whatever
you want to do with it - you have essentially created a set of
strings. If you think that 'foo' and Element('foo') are different
things, you should not implement __eq__ in a way that they are
considered equal.

Regards,
Martin

From solipsis at pitrou.net  Fri Apr  3 18:43:58 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 3 Apr 2009 16:43:58 +0000 (UTC)
Subject: [Python-Dev] =?utf-8?q?PyDict=5FSetItem_hook?=
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
	<9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com>
Message-ID: <loom.20090403T161126-31@post.gmane.org>

Thomas Wouters <thomas <at> python.org> writes:
> 
> Really? Have you tried it? I get at least 5% noise between runs without any
changes. I have gotten results that include *negative* run times.

That's an implementation problem, not an issue with the tests themselves.
Perhaps a better timing mechanism could be inspired from the timeit module.
Perhaps the default numbers of iterations should be higher (many subtests run
in less than 100ms on a modern CPU, which might be too low for accurate
measurement). Perhaps the so-called "calibration" should just be disabled.
etc.

> The tests in PyBench are not micro-benchmarks (they do way too much for
that),

Then I wonder what you call a micro-benchmark. Should it involve direct calls
to
low-level C API functions?

> but they are also not representative of real-world code.

Representativity is not black or white. Is measuring Spitfire performance
representative of the Genshi templating engine, or str.format-based templating?
Regardless of the answer, it is still an interesting measurement.

> That doesn't just mean "you can't infer the affected operation from the test
name"

I'm not sure what you mean by that. If you introduce an optimization to make
list comprehensions faster, it will certainly show up in the list
comprehensions subtest, and probably in none of the other tests. Isn't it enough
in terms of specificity?

Of course, some optimizations are interpreter-wide, and then the breakdown into
individual subtests is less relevant.

> I have in the past written patches to Python that improved *every*
micro-benchmark and *every* real-world measurement I made, except PyBench.

Well, I didn't claim that pybench measures /everything/. That's why we have
other benchmarks as well (stringbench, iobench, whatever).
It does test a bunch of very common operations which are important in daily use
of Python. If some important operation is missing, it's possible to add a new
test.

Conversely, someone optimizing e.g. list comprehensions and trying to measure
the impact using a set of so-called "real-world benchmarks" which don't involve
any list comprehension in their critical path will not see any improvement in
those "real-world benchmarks". Does it mean that the optimization is useless?
No, certainly not. The world is not black and white.

> That's exactly what Collin proposed at the summits last week. Have you seen
http://code.google.com/p/unladen-swallow/wiki/Benchmarks

Yes, I've seen. I haven't tried it, I hope it can be run without installing the
whole unladen-swallow suite?

These are the benchmarks I've had a tendency to use depending on the issue at
hand: pybench, richards, stringbench, iobench, binary-trees (from the Computer
Language Shootout). And various custom timeit runs :-)

Cheers

Antoine.

From jpe at wingware.com  Fri Apr  3 18:44:19 2009
From: jpe at wingware.com (John Ehresman)
Date: Fri, 03 Apr 2009 11:44:19 -0500
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <49D63787.3080304@v.loewis.de>
References: <49D3F8D0.8070805@wingware.com>	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>	<49D42013.3010600@wingware.com>	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>	<78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>	<ca471dc20904021419y33932768j8bc7e97e46361d6a@mail.gmail.com>	<C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
	<49D63787.3080304@v.loewis.de>
Message-ID: <49D63CE3.40203@wingware.com>

Just want to reply quickly because I'm traveling -- I appreciate the 
feedback from Raymond and others.  Part of the reason I created an issue 
with a proof of concept patch is to get this kind of feedback.  I also 
agree that this shouldn't go in if it slows things down noticeably.

I will do some benchmarking and look at the dtrace patches next week to 
see if there is some sort of more systematic way of adding these types 
of hooks.

John

From chris at simplistix.co.uk  Fri Apr  3 18:55:04 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Fri, 03 Apr 2009 17:55:04 +0100
Subject: [Python-Dev] Package Management - thoughts from the peanut
	gallery
In-Reply-To: <24ea26600904030937y72e36dcdydeab19607302f23d@mail.gmail.com>
References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org>	
	<94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>	
	<49D63754.6030601@simplistix.co.uk>
	<24ea26600904030937y72e36dcdydeab19607302f23d@mail.gmail.com>
Message-ID: <49D63F68.2050409@simplistix.co.uk>

Olemis Lang wrote:
> On Fri, Apr 3, 2009 at 11:20 AM, Chris Withers <chris at simplistix.co.uk> wrote:
>> Tarek Ziad? wrote:
>>
>>> - PyPI mirroring (PEP 381)
>> I don't see why PyPI isn't just ported to GAE with an S3 data storage bit
>> and be done with it... Offline mirrors for people behind firewalls already
>> have solutions out there...
> 
> -1 ... IMHO ...

For what reason?

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From p.f.moore at gmail.com  Fri Apr  3 18:57:29 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 3 Apr 2009 17:57:29 +0100
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <49D63C99.6000302@v.loewis.de>
References: <49D5FBE6.6090807@avl.com> <49D63C99.6000302@v.loewis.de>
Message-ID: <79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com>

2009/4/3 Steven D'Aprano <steve at pearwood.info>:
> Python does not promise that if x == y, you can use y anywhere you can
> use x. Nor should it. Paul's declaration of abuse of __eq__ is
> unfounded.

Sorry, I was trying to simplify what I was saying, and simplified it
to the point where it didn't make sense :-) Martin (quoted below)
explained what I was trying to say far more clearly.

2009/4/3 "Martin v. L?wis" <martin at v.loewis.de>:
> If you have a set of elements, and you check "'foo' in s", then
> you should be able just to use the string 'foo' itself for whatever
> you want to do with it - you have essentially created a set of
> strings. If you think that 'foo' and Element('foo') are different
> things, you should not implement __eq__ in a way that they are
> considered equal.

-- in particular, if you're using things in sets (which are *all
about* equality, insofar as that's how "duplicates" are defined) you
should ensure that your definition of __eq__ respects the idea that
equal objects are duplicates (ie, interchangeable). Otherwise, a dict
is the appropriate data structure.

Actually, given the definition in the original post,

class Element(object):
   def __init__(self, key):
       self.key = key
   def __eq__(self, other):
       return self.key == other
   def __hash__(self):
       return hash(self.key)

as far as I can tell, equality is *only* defined between Elements and
keys - not even between 2 elements! So with that definition, there
could be many Elements in a set, all equal to the same key. Which is
completely insane.

In fact, Python seems to be doing something I don't understand:

>>> class Element(object):
...    def __init__(self, key, id):
...        self.key = key
...        self.id = id
...    def __eq__(self, other):
...        print "Calling __eq__ for %s" % self.id
...        return self.key == other
...    def __hash__(self):
...        return hash(self.key)
...
>>> a = Element('k', 'a')
>>> b = Element('k', 'b')
>>> a == b
Calling __eq__ for a
Calling __eq__ for b
True
>>> a == a
Calling __eq__ for a
Calling __eq__ for a
True
>>>

Why does __eq__ get called twice in these cases? Why does a == b, as
that means a.key == b, and clearly a.key ('k') does *not* equal b. Or
are there some further options being tried, in str,__eq__ or
object.__eq__? The documentation doesn't say so... Specifically,
there's nothing saying that a "reversed" version is tried.

Paul.

From mal at egenix.com  Fri Apr  3 19:04:36 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 03 Apr 2009 19:04:36 +0200
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com>
References: <49D3F8D0.8070805@wingware.com>	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>	<49D42013.3010600@wingware.com>	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>	<loom.20090403T092111-554@post.gmane.org>
	<9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com>
Message-ID: <49D641A4.10904@egenix.com>

On 2009-04-03 18:06, Thomas Wouters wrote:
> On Fri, Apr 3, 2009 at 11:27, Antoine Pitrou <solipsis at pitrou.net> wrote:
> 
>> Thomas Wouters <thomas <at> python.org> writes:
>>>
>>> Pystone is pretty much a useless benchmark. If it measures anything, it's
>> the
>> speed of the bytecode dispatcher (and it doesn't measure it particularly
>> well.)
>> PyBench isn't any better, in my experience.
>>
>> I don't think pybench is useless. It gives a lot of performance data about
>> crucial internal operations of the interpreter. It is of course very little
>> real-world, but conversely makes you know immediately where a performance
>> regression has happened. (by contrast, if you witness a regression in a
>> high-level benchmark, you still have a lot of investigation to do to find
>> out
>> where exactly something bad happened)
> 
> 
> Really? Have you tried it? I get at least 5% noise between runs without any
> changes. I have gotten results that include *negative* run times. 

On which platform ? pybench 2.0 works reasonably well on Linux and
Windows, but of course can't do better than the timers available for
those platforms. If you have e.g. NTP running and it uses wall clock
timers, it is possible that you get negative round times. If you don't
and still get negative round times, you have to change the test
parameters (see below).

> And yes, I
> tried all the different settings for calibration runs and timing mechanisms.
> The tests in PyBench are not micro-benchmarks (they do way too much for
> that), they don't try to minimize overhead or noise,

That is not true. They were written as micro-benchmarks and adjusted
to have a high signal-noise ratio. For some operations this isn't easy
to do, but I certainly tried hard to get the overhead low (note that the
overhead is listed in the output).

That said, please keep in mind that the settings in pybench were last
adjusted some years ago to have the tests all run in more or less the
same wall clock time. CPUs have evolved a lot since then and this shows.

> but they are also not
> representative of real-world code.

True and they never were meant for that, since I was frustrated by
other benchmarks at the time and the whole approach in general.

Each of the tests checks one specific aspect of Python. If your
application happens to use a lot of dictionary operations, you'll
be mostly interested in those. If you do a lot of simple arithmetic,
there's another test for that.

On top of that the application is written to be easily extensible,
so it's easy to add new tests specific to whatever application space
you're after.

> That doesn't just mean "you can't infer
> the affected operation from the test name", but "you can't infer anything."
> You can just be looking at differently borrowed runtime. I have in the past
> written patches to Python that improved *every* micro-benchmark and *every*
> real-world measurement I made, except PyBench. Trying to pinpoint the
> slowdown invariably lead to tests that did too much in the measurement loop,
> introduced too much noise in the "calibration" run or just spent their time
> *in the measurement loop* on doing setup and teardown of the test. 

pybench calibrates itself to remove that kind of noise from the output.
Each test has a .calibrate() method which does all the setup and
tear down minus the actual benchmark operations.

If you get wrong numbers, try adjusting the parameters and add more
"packets" of operations. Don't forget to adjust the version number to
not compare apples and orange, though.

Perhaps it's time to readjust the pybench parameters to todays
CPUs.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 03 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From ziade.tarek at gmail.com  Fri Apr  3 19:12:20 2009
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 3 Apr 2009 19:12:20 +0200
Subject: [Python-Dev] Package Management - thoughts from the peanut
	gallery
In-Reply-To: <49D63754.6030601@simplistix.co.uk>
References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org>
	<94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>
	<49D63754.6030601@simplistix.co.uk>
Message-ID: <94bdd2610904031012g48bb2blccf24c59573fadfc@mail.gmail.com>

On Fri, Apr 3, 2009 at 6:20 PM, Chris Withers <chris at simplistix.co.uk> wrote:
> Tarek Ziad? wrote:
>>
>> I have taken the commitment to lead these tasks and synchronize the people
>> that are willing to help on this.
>
> Good, I'm one of those people,

Great !

>  sadly my only help may be to ask "how is this
> bit going to be done?".

I'll work on the wiki this week end for that

>
>> The tasks discussed so far are:
>>
>> - version definition (http://wiki.python.org/moin/DistutilsVersionFight)
>> - egg.info standardification (PEP 376)
>> - metadata enhancement (rewrite PEP 345)
>> - static metadata definition work ?(*)
>
> These all seem to be a subset of the last one, right?

Sorry I used "task" I should have used "topics".

We are trying to have a list of well-defined, isolated tasks. Theses tasks
are built upon the discussions we have in these topics.

The last topic (static metadata) might generate new tasks and/or
complete existing tasks.

>> - PyPI mirroring (PEP 381)
>
> I don't see why PyPI isn't just ported to GAE with an S3 data storage bit
> and be done with it... Offline mirrors for people behind firewalls already
> have solutions out there...

GAE+S3 is just an implementation imho. We still need a mirroring protocol
ala CPAN and features in client softwares to use them. (as defined in 381)

>
>> Each one of this task has a leader, except the one with (*). I just got
>> back
>> from travelling, and I will reorganize
>> http://wiki.python.org/moin/Distutils asap to it is up-to-date.
>
> Cool, is this the focal point to track your activities?

Exactly. And Distutils-SIG is the mailing list to discuss in ;)

>
>> If you want to work on one of this task or feel there's a new task you can
>> start, please, join Distutils SIG or contact me,
>
> Well, I think my "big list" breaks down roughly as tasks, of which I think
> the stuff you're already doing will hopefully take care of the first 2, but
> what about the rest. If labour shortage is all that's stopping this, then
> let me know ;-)
>

Please discuss these new points in Distutils-SIG

Cheers
Tarek

From fuzzyman at voidspace.org.uk  Fri Apr  3 19:13:39 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Fri, 03 Apr 2009 18:13:39 +0100
Subject: [Python-Dev] Package Management - thoughts from the
	peanut	gallery
In-Reply-To: <49D63F68.2050409@simplistix.co.uk>
References: <49D534B3.8020801@simplistix.co.uk>
	<87y6uitjxd.fsf@xemacs.org>		<94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>		<49D63754.6030601@simplistix.co.uk>	<24ea26600904030937y72e36dcdydeab19607302f23d@mail.gmail.com>
	<49D63F68.2050409@simplistix.co.uk>
Message-ID: <49D643C3.3090707@voidspace.org.uk>

Chris Withers wrote:
> Olemis Lang wrote:
>> On Fri, Apr 3, 2009 at 11:20 AM, Chris Withers 
>> <chris at simplistix.co.uk> wrote:
>>> Tarek Ziad? wrote:
>>>
>>>> - PyPI mirroring (PEP 381)
>>> I don't see why PyPI isn't just ported to GAE with an S3 data 
>>> storage bit
>>> and be done with it... Offline mirrors for people behind firewalls 
>>> already
>>> have solutions out there...
>>
>> -1 ... IMHO ...
>
> For what reason?

GAE does suffer from blackouts - which is the problem we are attempting 
to solve with mirroring.

I don't see why we should tie vital Python infrastructure to the 
proprietary APIs of a single vendor and outsource delivery entirely to 
them. If we have the manpower to do this ourselves it seems better to do 
it and retain control.

Added to which GAE is a commercial service and beyond a certain level 
bandwidth / cycles needs paying for. This may not be an issue in itself 
(either Google may waive charges or the PSF may be willing to pay).

Michael

>
> Chris
>

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From python at rcn.com  Fri Apr  3 19:14:10 2009
From: python at rcn.com (Raymond Hettinger)
Date: Fri, 3 Apr 2009 10:14:10 -0700
Subject: [Python-Dev] Getting values stored inside sets
References: <49D5FBE6.6090807@avl.com> <gr4v8q$1tm$1@ger.gmane.org>
Message-ID: <00AE37B203704E668246D2A70BAD4568@RaymondLaptop1>

> Hrvoje Niksic wrote:
>> I've stumbled upon an oddity using sets.  It's trivial to test if a 
>> value is in the set, but it appears to be impossible to retrieve a 
>> stored value, 

See:  http://code.activestate.com/recipes/499299/

Raymond

From alexander.belopolsky at gmail.com  Fri Apr  3 19:16:24 2009
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Fri, 3 Apr 2009 13:16:24 -0400
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <49D63C99.6000302@v.loewis.de>
References: <49D5FBE6.6090807@avl.com> <49D63C99.6000302@v.loewis.de>
Message-ID: <d38f5330904031016s37e21dd4s723e11df0c2f90de@mail.gmail.com>

I just want to add a link to a 2.5 year old discussion on this  issue:
<http://bugs.python.org/issue1507011>.  In that discussion I disagreed
with Martin and argued that "interning is a set
operation and it is unfortunate that set API does not support it directly."

On Fri, Apr 3, 2009 at 12:43 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> I've stumbled upon an oddity using sets. ?It's trivial to test if a
>> value is in the set, but it appears to be impossible to retrieve a
>> stored value, other than by iterating over the whole set.
>
> Of course it is. That's why it is called a set: it's an unordered
> collection of objects, keyed by nothing.
>
> If you have a set of elements, and you check "'foo' in s", then
> you should be able just to use the string 'foo' itself for whatever
> you want to do with it - you have essentially created a set of
> strings. If you think that 'foo' and Element('foo') are different
> things, you should not implement __eq__ in a way that they are
> considered equal.
>
> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com
>

From collinw at gmail.com  Fri Apr  3 19:18:04 2009
From: collinw at gmail.com (Collin Winter)
Date: Fri, 3 Apr 2009 10:18:04 -0700
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <loom.20090403T161126-31@post.gmane.org>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
	<9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com>
	<loom.20090403T161126-31@post.gmane.org>
Message-ID: <43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com>

On Fri, Apr 3, 2009 at 9:43 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Thomas Wouters <thomas <at> python.org> writes:
>>
>> Really? Have you tried it? I get at least 5% noise between runs without any
> changes. I have gotten results that include *negative* run times.
>
> That's an implementation problem, not an issue with the tests themselves.
> Perhaps a better timing mechanism could be inspired from the timeit module.
> Perhaps the default numbers of iterations should be higher (many subtests run
> in less than 100ms on a modern CPU, which might be too low for accurate
> measurement). Perhaps the so-called "calibration" should just be disabled.
> etc.
>
>> The tests in PyBench are not micro-benchmarks (they do way too much for
> that),
>
> Then I wonder what you call a micro-benchmark. Should it involve direct calls
> to
> low-level C API functions?

I agree that a suite of microbenchmarks is supremely useful: I would
very much like to be able to isolate, say, raise statement
performance. PyBench suffers from implementation defects that in its
current incarnation make it unsuitable for this, though:
- It does not effectively isolate component performance as it claims.
When I was working on a change to BINARY_MODULO to make string
formatting faster, PyBench would report that floating point math got
slower, or that generator yields got slower. There is a lot of random
noise in the results.
- We have observed overall performance swings of 10-15% between runs
on the same machine, using the same Python binary. Using the same
binary on the same unloaded machine should give as close an answer to
0% as possible.
- I wish PyBench actually did more isolation.
Call.py:ComplexPythonFunctionCalls is on my mind right now; I wish it
didn't put keyword arguments and **kwargs in the same microbenchmark.
- In experimenting with gcc 4.4's FDO support, I produced a training
load that resulted in a 15-30% performance improvement (depending on
benchmark) across all benchmarks. Using this trained binary, PyBench
slowed down by 10%.
- I would like to see PyBench incorporate better statistics for
indicating the significance of the observed performance difference.

I don't believe that these are insurmountable problems, though. A
great contribution to Python performance work would be an improved
version of PyBench that corrects these problems and offers more
precise measurements. Is that something you might be interested in
contributing to? As performance moves more into the wider
consciousness, having good tools will become increasingly important.

Thanks,
Collin

From fuzzyman at voidspace.org.uk  Fri Apr  3 19:28:40 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Fri, 03 Apr 2009 18:28:40 +0100
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>
References: <49D3F8D0.8070805@wingware.com>	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>	<49D42013.3010600@wingware.com>	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>	<loom.20090403T092111-554@post.gmane.org>
	<43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>
Message-ID: <49D64748.70305@voidspace.org.uk>

Collin Winter wrote:
> On Fri, Apr 3, 2009 at 2:27 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>   
>> Thomas Wouters <thomas <at> python.org> writes:
>>     
>>> Pystone is pretty much a useless benchmark. If it measures anything, it's the
>>>       
>> speed of the bytecode dispatcher (and it doesn't measure it particularly well.)
>> PyBench isn't any better, in my experience.
>>
>> I don't think pybench is useless. It gives a lot of performance data about
>> crucial internal operations of the interpreter. It is of course very little
>> real-world, but conversely makes you know immediately where a performance
>> regression has happened. (by contrast, if you witness a regression in a
>> high-level benchmark, you still have a lot of investigation to do to find out
>> where exactly something bad happened)
>>
>> Perhaps someone should start maintaining a suite of benchmarks, high-level and
>> low-level; we currently have them all scattered around (pybench, pystone,
>> stringbench, richard, iobench, and the various Unladen Swallow benchmarks; not
>> to mention other third-party stuff that can be found in e.g. the Computer
>> Language Shootout).
>>     
>
> Already in the works :)
>
> As part of the common standard library and test suite that we agreed
> on at the PyCon language summit last week, we're going to include a
> common benchmark suite that all Python implementations can share. This
> is still some months off, though, so there'll be plenty of time to
> bikeshed^Wrationally discuss which benchmarks should go in there.
>   
Where is the right place for us to discuss this common benchmark and 
test suite?

As the benchmark is developed I would like to ensure it can run on 
IronPython.

The test suite changes will need some discussion as well - Jython and 
IronPython (and probably PyPy) have almost identical changes to tests 
that currently rely on deterministic finalisation (reference counting) 
so it makes sense to test changes on both platforms and commit a single 
solution.

Michael

> Collin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From rdmurray at bitdance.com  Fri Apr  3 19:33:39 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Fri, 3 Apr 2009 13:33:39 -0400 (EDT)
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com>
References: <49D5FBE6.6090807@avl.com> <49D63C99.6000302@v.loewis.de>
	<79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0904031329390.26362@kimball.webabinitio.net>

On Fri, 3 Apr 2009 at 17:57, Paul Moore wrote:
> In fact, Python seems to be doing something I don't understand:
>
>>>> class Element(object):
> ...    def __init__(self, key, id):
> ...        self.key = key
> ...        self.id = id
> ...    def __eq__(self, other):
> ...        print "Calling __eq__ for %s" % self.id
> ...        return self.key == other
> ...    def __hash__(self):
> ...        return hash(self.key)
> ...
>>>> a = Element('k', 'a')
>>>> b = Element('k', 'b')
>>>> a == b
> Calling __eq__ for a
> Calling __eq__ for b
> True
>>>> a == a
> Calling __eq__ for a
> Calling __eq__ for a
> True
>>>>
>
> Why does __eq__ get called twice in these cases? Why does a == b, as
> that means a.key == b, and clearly a.key ('k') does *not* equal b. Or
> are there some further options being tried, in str,__eq__ or
> object.__eq__? The documentation doesn't say so... Specifically,
> there's nothing saying that a "reversed" version is tried.

a == b

So, python calls a.__eq__(b)

Now, that function does:

a.key == b

Since b is an object with an __eq__ method, python calls
b.__eq__(a.key).

That function does:

a.key == b.key

ie: the OP's code is inefficient :)

--David

From collinw at gmail.com  Fri Apr  3 19:35:28 2009
From: collinw at gmail.com (Collin Winter)
Date: Fri, 3 Apr 2009 10:35:28 -0700
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <49D64748.70305@voidspace.org.uk>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
	<43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>
	<49D64748.70305@voidspace.org.uk>
Message-ID: <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com>

On Fri, Apr 3, 2009 at 10:28 AM, Michael Foord
<fuzzyman at voidspace.org.uk> wrote:
> Collin Winter wrote:
>> As part of the common standard library and test suite that we agreed
>> on at the PyCon language summit last week, we're going to include a
>> common benchmark suite that all Python implementations can share. This
>> is still some months off, though, so there'll be plenty of time to
>> bikeshed^Wrationally discuss which benchmarks should go in there.
>>
>
> Where is the right place for us to discuss this common benchmark and test
> suite?
>
> As the benchmark is developed I would like to ensure it can run on
> IronPython.
>
> The test suite changes will need some discussion as well - Jython and
> IronPython (and probably PyPy) have almost identical changes to tests that
> currently rely on deterministic finalisation (reference counting) so it
> makes sense to test changes on both platforms and commit a single solution.

I believe Brett Cannon is the best person to talk to about this kind
of thing. I don't know that any common mailing list has been set up,
though there may be and Brett just hasn't told anyone yet :)

Collin

From solipsis at pitrou.net  Fri Apr  3 19:50:21 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 3 Apr 2009 17:50:21 +0000 (UTC)
Subject: [Python-Dev] =?utf-8?q?PyDict=5FSetItem_hook?=
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
	<9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com>
	<loom.20090403T161126-31@post.gmane.org>
	<43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com>
Message-ID: <loom.20090403T174518-723@post.gmane.org>

Collin Winter <collinw <at> gmail.com> writes:
> 
> - I wish PyBench actually did more isolation.
> Call.py:ComplexPythonFunctionCalls is on my mind right now; I wish it
> didn't put keyword arguments and **kwargs in the same microbenchmark.

Well, there is a balance to be found between having more subtests and keeping a
reasonable total running time :-)
(I have to plead guilty for ComplexPythonFunctionCalls, btw)

> - I would like to see PyBench incorporate better statistics for
> indicating the significance of the observed performance difference.

I see you already have this kind of measurement in your perf.py script, would it
be easy to port it?

We could also discuss making individual tests longer (by changing the default
"warp factor").

From p.f.moore at gmail.com  Fri Apr  3 19:56:44 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 3 Apr 2009 18:56:44 +0100
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <Pine.LNX.4.64.0904031329390.26362@kimball.webabinitio.net>
References: <49D5FBE6.6090807@avl.com> <49D63C99.6000302@v.loewis.de>
	<79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com>
	<Pine.LNX.4.64.0904031329390.26362@kimball.webabinitio.net>
Message-ID: <79990c6b0904031056m32a85ea5yea16dd6c4540dfb1@mail.gmail.com>

2009/4/3 R. David Murray <rdmurray at bitdance.com>:
> a == b
>
> So, python calls a.__eq__(b)
>
> Now, that function does:
>
> a.key == b
>
> Since b is an object with an __eq__ method, python calls
> b.__eq__(a.key).

That's the bit I can't actually find documented anywhere.

Ah, looking again I see that I misread the section describing the rich
comparison methods:

"""
There are no swapped-argument versions of these methods (to be used
when the left argument does not support the operation but the right
argument does); rather, __lt__() and __gt__() are each other?s
reflection, __le__() and __ge__() are each other?s reflection, and
__eq__() and __ne__() are their own reflection.
"""

I read that as meaning that no "reversed" version was called, whereas
it actually means that __eq__ is its own reversed version - and so
gets called both times.

Thanks for helping me clear that up!

Paul.

From fuzzyman at voidspace.org.uk  Fri Apr  3 20:00:43 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Fri, 03 Apr 2009 19:00:43 +0100
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com>
References: <49D3F8D0.8070805@wingware.com>	
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>	
	<49D42013.3010600@wingware.com>	
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>	
	<loom.20090403T092111-554@post.gmane.org>	
	<43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>	
	<49D64748.70305@voidspace.org.uk>
	<43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com>
Message-ID: <49D64ECB.9040100@voidspace.org.uk>

Collin Winter wrote:
> On Fri, Apr 3, 2009 at 10:28 AM, Michael Foord
> <fuzzyman at voidspace.org.uk> wrote:
>   
>> Collin Winter wrote:
>>     
>>> As part of the common standard library and test suite that we agreed
>>> on at the PyCon language summit last week, we're going to include a
>>> common benchmark suite that all Python implementations can share. This
>>> is still some months off, though, so there'll be plenty of time to
>>> bikeshed^Wrationally discuss which benchmarks should go in there.
>>>
>>>       
>> Where is the right place for us to discuss this common benchmark and test
>> suite?
>>
>> As the benchmark is developed I would like to ensure it can run on
>> IronPython.
>>
>> The test suite changes will need some discussion as well - Jython and
>> IronPython (and probably PyPy) have almost identical changes to tests that
>> currently rely on deterministic finalisation (reference counting) so it
>> makes sense to test changes on both platforms and commit a single solution.
>>     
>
> I believe Brett Cannon is the best person to talk to about this kind
> of thing. I don't know that any common mailing list has been set up,
> though there may be and Brett just hasn't told anyone yet :)
>
> Collin
>   
Which begs the question of whether we *should* have a separate mailing list.

I don't think we discussed this specific point in the language summit - 
although it makes sense. Should we have a list specifically for the test 
/ benchmarking or would a more general implementations-sig be appropriate?

And is it really Brett who sets up mailing lists? My understanding is 
that he is pulling out of stuff for a while anyway, so that he can do 
Java / Phd type things... ;-)

Michael

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From collinw at gmail.com  Fri Apr  3 20:05:46 2009
From: collinw at gmail.com (Collin Winter)
Date: Fri, 3 Apr 2009 11:05:46 -0700
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <loom.20090403T174518-723@post.gmane.org>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
	<9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com>
	<loom.20090403T161126-31@post.gmane.org>
	<43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com>
	<loom.20090403T174518-723@post.gmane.org>
Message-ID: <43aa6ff70904031105y182abfabtc59b1880736625db@mail.gmail.com>

On Fri, Apr 3, 2009 at 10:50 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Collin Winter <collinw <at> gmail.com> writes:
>>
>> - I wish PyBench actually did more isolation.
>> Call.py:ComplexPythonFunctionCalls is on my mind right now; I wish it
>> didn't put keyword arguments and **kwargs in the same microbenchmark.
>
> Well, there is a balance to be found between having more subtests and keeping a
> reasonable total running time :-)
> (I have to plead guilty for ComplexPythonFunctionCalls, btw)

Sure, there's definitely a balance to maintain. With perf.py, we're
going down the road of having different tiers of benchmarks: the
default set is the one we pay the most attention to, with other
benchmarks available for benchmarking certain specific subsystems or
workloads (like pickling list-heavy input data). Something similar
could be done for PyBench, giving the user the option of increasing
the level of detail (and run-time) as appropriate.

>> - I would like to see PyBench incorporate better statistics for
>> indicating the significance of the observed performance difference.
>
> I see you already have this kind of measurement in your perf.py script, would it
> be easy to port it?

Yes, it should be straightforward to incorporate these statistics into
PyBench. In the same directory as perf.py, you'll find test_perf.py
which includes tests for the stats functions we're using.

Collin

From steve at holdenweb.com  Fri Apr  3 21:50:01 2009
From: steve at holdenweb.com (Steve Holden)
Date: Fri, 03 Apr 2009 15:50:01 -0400
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <49D63465.80401@simplistix.co.uk>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>	<49D35A39.7020507@simplistix.co.uk>	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>	<49D52B2C.5050509@simplistix.co.uk>	<ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com>	<49D52C5B.7010506@simplistix.co.uk>	<ca471dc20904021449x659f930bn7b61feec6b640752@mail.gmail.com>
	<49D63465.80401@simplistix.co.uk>
Message-ID: <gr5p9b$sba$1@ger.gmane.org>

Chris Withers wrote:
> Guido van Rossum wrote:
>>>> But anyways this is moot, the bug was only about exec in a class body
>>>> *nested inside a function*.
>>> Indeed, I just hate seeing execs and it was an interesting mental
>>> exercise
>>> to try and get rid of the above one ;-)
>>>
>>> Assuming it breaks no tests, would there be objection to me
>>> committing the
>>> above change to the Python 3 trunk?
>>
>> That's up to Benjamin. Personally, I live by "if it ain't broke, don't
>> fix it." :-)
> 
> Anything using an exec 

that can be done in some other (more pythonic way)

> is broken by definition ;-)
> 
> Benjamin?
> 
We've just had a fairly clear demonstration that small semantic changes
to the language can leave unexpected areas borked.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Want to know? Come to PyCon - soon! http://us.pycon.org/

From martin at v.loewis.de  Fri Apr  3 21:49:58 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 03 Apr 2009 21:49:58 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090402171218.9DDEF3A40A7@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de>
	<20090402171218.9DDEF3A40A7@sparrow.telecommunity.com>
Message-ID: <49D66866.5020505@v.loewis.de>

> Perhaps we could add something like a sys.namespace_packages that would
> be updated by this mechanism?  Then, pkg_resources could check both that
> and its internal registry to be both backward and forward compatible.

I could see no problem with that, so I have added this to the PEP.

Thanks for the feedback,

Martin

From martin at v.loewis.de  Fri Apr  3 21:55:22 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 03 Apr 2009 21:55:22 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D51A16.70804@simplistix.co.uk>
References: <49D4DA72.60401@v.loewis.de> <49D51A16.70804@simplistix.co.uk>
Message-ID: <49D669AA.6080001@v.loewis.de>

Chris Withers wrote:
> Martin v. L?wis wrote:
>> I propose the following PEP for inclusion to Python 3.1.
>> Please comment.
> 
> Would this support the following case:
> 
> I have a package called mortar, which defines useful stuff:
> 
> from mortar import content, ...
> 
> I now want to distribute large optional chunks separately, but ideally
> so that the following will will work:
> 
> from mortar.rbd import ...
> from mortar.zodb import ...
> from mortar.wsgi import ...
> 
> Does the PEP support this? 

That's the primary purpose of the PEP. You can do this today already
(see the zope package, and the reference to current techniques in the
PEP), but the PEP provides a cleaner way.

In each chunk (which the PEP calls portion), you had a structure like
this:

mortar/
mortar/rbd.pkg (contains just "*")
mortar/rbd.py

or

mortar/
mortar/zobd.pkg
mortar/zobd/
mortar/zobd/__init__.py
mortar/zobd/backends.py

As a site effect, you can also do "import mortar", but that would just
give you the (nearly) empty namespace package, whose only significant
contents is the variable __path__.

Regards,
Martin

From martin at v.loewis.de  Fri Apr  3 22:07:10 2009
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 03 Apr 2009 22:07:10 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D52115.6020001@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
Message-ID: <49D66C6E.3090602@v.loewis.de>

> I'd like to extend the proposal to Python 2.7 and later.

I don't object, but I also don't want to propose this, so
I added it to the discussion.

My (and perhaps other people's) concern is that 2.7 might
well be the last release of the 2.x series. If so, adding
this feature to it would make 2.7 an odd special case for
users and providers of third party tools.

> That's going to slow down Python package detection a lot - you'd
> replace an O(1) test with an O(n) scan.

I question that claim. In traditional Unix systems, the file system
driver performs a linear search of the directory, so it's rather
O(n)-in-kernel vs. O(n)-in-Python. Even for advanced file systems,
you need at least O(log n) to determine whether a specific file is
in a directory. For all practical purposes, the package directory
will fit in a single disk block (containing a single .pkg file, and
one or few subpackages), making listdir complete as fast as stat.

> Wouldn't it be better to stick with a simpler approach and look for
> "__pkg__.py" files to detect namespace packages using that O(1) check ?

Again - this wouldn't be O(1). More importantly, it breaks system
packages, which now again have to deal with the conflicting file names
if they want to install all portions into a single location.

> This would also avoid any issues you'd otherwise run into if you want
> to maintain this scheme in an importer that doesn't have access to a list
> of files in a package directory, but is well capable for the checking
> the existence of a file.

Do you have a specific mechanism in mind?

Regards,
Martin

From martin at v.loewis.de  Fri Apr  3 22:15:55 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 03 Apr 2009 22:15:55 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090403004135.B76443A40A7@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<20090403004135.B76443A40A7@sparrow.telecommunity.com>
Message-ID: <49D66E7B.9080304@v.loewis.de>

> Note that there is no such thing as a "defining namespace package" --
> namespace package contents are symmetrical peers.

With the PEP, a "defining package" becomes possible - at most one
portion can define an __init__.py.

I know that the current mechanisms don't support it, and it might
not be useful in general, but now there is a clean way of doing it,
so I wouldn't exclude it. Distribution-wise, all distributions
relying on the defining package would need to require (or
install_require, or depend on) it.

> The above are also true for using only a '*' in .pkg files -- in that
> event there are no sys.path changes.  (Frankly, I'm doubtful that
> anybody is using extend_path and .pkg files to begin with, so I'd be
> fine with a proposal that instead used something like '.nsp' files that
> didn't even need to be opened and read -- which would let the directory
> scan stop at the first .nsp file found.

That would work for me as well. Nobody at PyCon could remember where
.pkg files came from.

> I believe the PEP does this as well, IIUC.

Correct.

>> * It's possible to have a defining package dir and add-one package
>> dirs.
> 
> Also possible in the PEP, although the __init__.py must be in the first
> such directory on sys.path.

I should make it clear that this is not the case. I envision it to work
this way: import zope
- searches sys.path, until finding either a directory zope, or a file
  zope.{py,pyc,pyd,...}
- if it is a directory, it checks for .pkg files. If it finds any,
  it processes them, extending __path__.
- it *then* checks for __init__.py, taking the first hit anywhere
  on __path__ (just like any module import would)
- if no .pkg was found, nor an __init__.py, it proceeds with the next
  sys.path item (skipping the directory entirely)

Regards,
Martin

From jyasskin at gmail.com  Fri Apr  3 22:17:37 2009
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Fri, 3 Apr 2009 15:17:37 -0500
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <49D64748.70305@voidspace.org.uk>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
	<43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>
	<49D64748.70305@voidspace.org.uk>
Message-ID: <5d44f72f0904031317o1a2cb434p7e59ce5046c4bbd1@mail.gmail.com>

On Fri, Apr 3, 2009 at 12:28 PM, Michael Foord
<fuzzyman at voidspace.org.uk> wrote:
> Collin Winter wrote:
>>
>> On Fri, Apr 3, 2009 at 2:27 AM, Antoine Pitrou <solipsis at pitrou.net>
>> wrote:
>>
>>>
>>> Thomas Wouters <thomas <at> python.org> writes:
>>>
>>>>
>>>> Pystone is pretty much a useless benchmark. If it measures anything,
>>>> it's the
>>>>
>>>
>>> speed of the bytecode dispatcher (and it doesn't measure it particularly
>>> well.)
>>> PyBench isn't any better, in my experience.
>>>
>>> I don't think pybench is useless. It gives a lot of performance data
>>> about
>>> crucial internal operations of the interpreter. It is of course very
>>> little
>>> real-world, but conversely makes you know immediately where a performance
>>> regression has happened. (by contrast, if you witness a regression in a
>>> high-level benchmark, you still have a lot of investigation to do to find
>>> out
>>> where exactly something bad happened)
>>>
>>> Perhaps someone should start maintaining a suite of benchmarks,
>>> high-level and
>>> low-level; we currently have them all scattered around (pybench, pystone,
>>> stringbench, richard, iobench, and the various Unladen Swallow
>>> benchmarks; not
>>> to mention other third-party stuff that can be found in e.g. the Computer
>>> Language Shootout).
>>>
>>
>> Already in the works :)
>>
>> As part of the common standard library and test suite that we agreed
>> on at the PyCon language summit last week, we're going to include a
>> common benchmark suite that all Python implementations can share. This
>> is still some months off, though, so there'll be plenty of time to
>> bikeshed^Wrationally discuss which benchmarks should go in there.
>>
>
> Where is the right place for us to discuss this common benchmark and test
> suite?

Dunno. Here, by default, but I'd subscribe to a tests-sig or
commonlibrary-sig or benchmark-sig if one were created.

> As the benchmark is developed I would like to ensure it can run on
> IronPython.

We want to ensure the same thing for the current unladen swallow
suite. If you find ways it currently doesn't, send us patches (until
we get it moved to the common library repository at which point you'll
be able to submit changes yourself). You should be able to check out
http://unladen-swallow.googlecode.com/svn/tests independently of the
rest of the repository. Follow the instructions at
http://code.google.com/p/unladen-swallow/wiki/Benchmarks to run
benchmarks though perf.py. You'll probably want to select benchmarks
individually rather than accepting the default of "all" because it's
currently not very resilient to tests that don't run on one of the
comparison pythons.

Personally, I'd be quite happy moving our performance tests into the
main python repository before the big library+tests move, but I don't
know what directory to put it in, and I don't know what Collin+Thomas
think of that.

> The test suite changes will need some discussion as well - Jython and
> IronPython (and probably PyPy) have almost identical changes to tests that
> currently rely on deterministic finalisation (reference counting) so it
> makes sense to test changes on both platforms and commit a single solution.

IMHO, any place in the test suite that relies on deterministic
finalization but isn't explicitly testing that CPython-specific
feature is a bug and should be fixed, even before we export it to the
new repository.

Jeffrey

From jyasskin at gmail.com  Fri Apr  3 22:36:57 2009
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Fri, 3 Apr 2009 15:36:57 -0500
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <ca471dc20904021557w11b5556aif88522fb46714211@mail.gmail.com>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>
	<ca471dc20904021419y33932768j8bc7e97e46361d6a@mail.gmail.com>
	<C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
	<ca471dc20904021557w11b5556aif88522fb46714211@mail.gmail.com>
Message-ID: <5d44f72f0904031336r70736337wc32212771e750608@mail.gmail.com>

On Thu, Apr 2, 2009 at 5:57 PM, Guido van Rossum <guido at python.org> wrote:
> On Thu, Apr 2, 2009 at 3:07 PM, Raymond Hettinger <python at rcn.com> wrote:
>>> Wow. Can you possibly be more negative?
>>
>> I think it's worse to give the poor guy the run around
>
> Mind your words please.
>
>> by making him run lots of random benchmarks. ?In
>> the end, someone will run a timeit or have a specific
>> case that shows the full effect. ?All of the respondents so far seem to have
>> a clear intuition that hook is right in the middle of a critical path.
>> ?Their intuition matches
>> what I learned by spending a month trying to find ways
>> to optimize dictionaries.
>>
>> Am surprised that there has been no discussion of why this should be in the
>> default build (as opposed to a compile time option). ?AFAICT, users have not
>> previously
>> requested a hook like this.
>
> I may be partially to blame for this. John and Stephan are requesting
> this because it would (mostly) fulfill one of the top wishes of the
> users of Wingware. So the use case is certainly real.
>
>> Also, there has been no discussion for an overall strategy
>> for monitoring containers in general. ?Lists and tuples will
>> both defy this approach because there is so much code
>> that accesses the arrays directly. ?Am not sure whether the
>> setitem hook would work for other implementations either.
>
> The primary use case is some kind of trap on assignment. While this
> cannot cover all cases, most non-local variables are stored in dicts.
> List mutations are not in the same league, as use case.
>
>> It seems weird to me that Collin's group can be working
>> so hard just to get a percent or two improvement in specific cases for
>> pickling while python-dev is readily entertaining a patch that slows down
>> the entire language.
>
> I don't actually believe that you can know whether this affects
> performance at all without serious benchmarking. The patch amounts to
> a single global flag check as long as the feature is disabled, and
> that flag could be read from the L1 cache.

When I was optimizing the tracing support in the eval loop, we started
with two memory loads and an if test. Removing the whole thing saved
about 3% of runtime, although I think that had been as high as 5% when
Neal measured it a year before. (That indicates that the exact
arrangement of the code can affect performance in subtle and annoying
ways.) Removing one of the two loads saved about 2% of runtime. I
don't remember exactly which benchmark that was; it may just have been
pybench.

Here, we're talking about introducing a load+if in dicts, which is
less critical than the eval loop, so I'd guess that the effect will be
less than 2% overall. I do think the real-life benchmarks are worth
getting for this, but they may not predict the effect after other code
changes. And I don't really have an opinion on what performance hit
for normal use is worth better debugging.

>> If my thoughts on the subject bug you, I'll happily
>> withdraw from the thread. ?I don't aspire to be a
>> source of negativity. ?I just happen to think this proposal isn't a good
>> idea.
>
> I think we need more proof either way.
>
>> Raymond
>>
>>
>>
>> ----- Original Message ----- From: "Guido van Rossum" <guido at python.org>
>> To: "Raymond Hettinger" <python at rcn.com>
>> Cc: "Thomas Wouters" <thomas at python.org>; "John Ehresman"
>> <jpe at wingware.com>; <python-dev at python.org>
>> Sent: Thursday, April 02, 2009 2:19 PM
>> Subject: Re: [Python-Dev] PyDict_SetItem hook
>>
>>
>> Wow. Can you possibly be more negative?
>>
>> 2009/4/2 Raymond Hettinger <python at rcn.com>:
>>>
>>> The measurements are just a distractor. We all already know that the hook
>>> is being added to a critical path. Everyone will pay a cost for a feature
>>> that few people will use. This is a really bad idea. It is not part of a
>>> thorough, thought-out framework of container hooks (something that would
>>> need a PEP at the very least). The case for how it helps us is somewhat
>>> thin. The case for DTrace hooks was much stronger.
>>>
>>> If something does go in, it should be #ifdef'd out by default. But then, I
>>> don't think it should go in at all.
>>>
>>>
>>> Raymond
>>>
>>>
>>>
>>>
>>> On Thu, Apr 2, 2009 at 04:16, John Ehresman <jpe at wingware.com> wrote:
>>>>
>>>> Collin Winter wrote:
>>>>>
>>>>> Have you measured the impact on performance?
>>>>
>>>> I've tried to test using pystone, but am seeing more differences between
>>>> runs than there is between python w/ the patch and w/o when there is no
>>>> hook
>>>> installed. The highest pystone is actually from the binary w/ the patch,
>>>> which I don't really believe unless it's some low level code generation
>>>> affect. The cost is one test of a global variable and then a switch to
>>>> the
>>>> branch that doesn't call the hooks.
>>>>
>>>> I'd be happy to try to come up with better numbers next week after I get
>>>> home from pycon.
>>>
>>> Pystone is pretty much a useless benchmark. If it measures anything, it's
>>> the speed of the bytecode dispatcher (and it doesn't measure it
>>> particularly
>>> well.) PyBench isn't any better, in my experience. Collin has collected a
>>> set of reasonable benchmarks for Unladen Swallow, but they still leave a
>>> lot
>>> to be desired. From the discussions at the VM and Language summits before
>>> PyCon, I don't think anyone else has better benchmarks, though, so I would
>>> suggest using Unladen Swallow's:
>>> http://code.google.com/p/unladen-swallow/wiki/Benchmarks

From glyph at divmod.com  Fri Apr  3 23:16:49 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 03 Apr 2009 21:16:49 -0000
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D66E7B.9080304@v.loewis.de>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<20090403004135.B76443A40A7@sparrow.telecommunity.com>
	<49D66E7B.9080304@v.loewis.de>
Message-ID: <20090403211649.12555.1005832716.divmod.xquotient.6954@weber.divmod.com>

On 08:15 pm, martin at v.loewis.de wrote:
>>Note that there is no such thing as a "defining namespace package" --
>>namespace package contents are symmetrical peers.
>
>With the PEP, a "defining package" becomes possible - at most one
>portion can define an __init__.py.

For what it's worth, this is a _super_ useful feature for Twisted.  We 
have one "defining package" for the "twisted" package (twisted core) and 
then a bunch of other things which want to put things into twisted.* 
(twisted.web, twisted.conch, et. al.).

For debian we already have separate packages, but such a definition of 
namespace packages would allow us to actually have things separated out 
on the cheeseshop as well.

From benjamin at python.org  Fri Apr  3 23:12:50 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Fri, 3 Apr 2009 16:12:50 -0500
Subject: [Python-Dev] Should the io-c modules be put in their own
	directory?
In-Reply-To: <loom.20090403T085522-602@post.gmane.org>
References: <acd65fa20904022110p2cf46105n4061286bc249d55e@mail.gmail.com>
	<loom.20090403T085522-602@post.gmane.org>
Message-ID: <1afaf6160904031412n6b7415acxcc9e85677f54981e@mail.gmail.com>

2009/4/3 Antoine Pitrou <solipsis at pitrou.net>:
> Alexandre Vassalotti <alexandre <at> peadrop.com> writes:
>>
>> I just noticed that the new io-c modules were merged in the py3k
>> branch (I know, I am kind late on the news?blame school work). Anyway,
>> I am just wondering if it would be a good idea to put the io-c modules
>> in a sub-directory (like sqlite), instead of scattering them around in
>> the Modules/ directory.
>
> Welcome back!
>
> I have no particular opinion on this. I suggest waiting for Benjamin's advice
> and following it :-)

I'm +.2. This is the layout I would suggest:

Modules/
  _io/
     _io.c
     stringio.c
     textio.c
     etc....

>
> (unless the FLUFL wants to chime in)
>
> Benjamin-makes-boring-decisions-easy'ly yrs,
>
> Antoine.

mad-with-power'ly yours,
Benjamin

From benjamin at python.org  Fri Apr  3 23:15:47 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Fri, 3 Apr 2009 16:15:47 -0500
Subject: [Python-Dev] Should the io-c modules be put in their own
	directory?
In-Reply-To: <49D63ABD.30000@v.loewis.de>
References: <acd65fa20904022110p2cf46105n4061286bc249d55e@mail.gmail.com>
	<loom.20090403T085522-602@post.gmane.org> <49D63ABD.30000@v.loewis.de>
Message-ID: <1afaf6160904031415y3a62c7b1u90b674d2dc3ed28c@mail.gmail.com>

2009/4/3 "Martin v. L?wis" <martin at v.loewis.de>:
>>> I just noticed that the new io-c modules were merged in the py3k
>>> branch (I know, I am kind late on the news?blame school work). Anyway,
>>> I am just wondering if it would be a good idea to put the io-c modules
>>> in a sub-directory (like sqlite), instead of scattering them around in
>>> the Modules/ directory.
>>
>> Welcome back!
>>
>> I have no particular opinion on this. I suggest waiting for Benjamin's advice
>> and following it :-)
>
> I would suggest to leave it as is:
> a) never change a running system
> b) flat is better than nested

It doesn't make sense, though, to have the 8 files that make up the
_io module scattered around in a directory with scores of other ones.

-- 
Regards,
Benjamin

From pje at telecommunity.com  Fri Apr  3 23:23:19 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 03 Apr 2009 17:23:19 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D66E7B.9080304@v.loewis.de>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<20090403004135.B76443A40A7@sparrow.telecommunity.com>
	<49D66E7B.9080304@v.loewis.de>
Message-ID: <20090403212054.D15F73A40A7@sparrow.telecommunity.com>

At 10:15 PM 4/3/2009 +0200, Martin v. L?wis wrote:
>I should make it clear that this is not the case. I envision it to work
>this way: import zope
>- searches sys.path, until finding either a directory zope, or a file
>   zope.{py,pyc,pyd,...}
>- if it is a directory, it checks for .pkg files. If it finds any,
>   it processes them, extending __path__.
>- it *then* checks for __init__.py, taking the first hit anywhere
>   on __path__ (just like any module import would)
>- if no .pkg was found, nor an __init__.py, it proceeds with the next
>   sys.path item (skipping the directory entirely)

Ah, I missed that.  Maybe the above should be added to the PEP to clarify.

From benjamin at python.org  Fri Apr  3 23:27:05 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Fri, 3 Apr 2009 16:27:05 -0500
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <49D63465.80401@simplistix.co.uk>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
	<49D35A39.7020507@simplistix.co.uk>
	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>
	<49D52B2C.5050509@simplistix.co.uk>
	<ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com>
	<49D52C5B.7010506@simplistix.co.uk>
	<ca471dc20904021449x659f930bn7b61feec6b640752@mail.gmail.com>
	<49D63465.80401@simplistix.co.uk>
Message-ID: <1afaf6160904031427p7fa95d07q340fd54cb7c34963@mail.gmail.com>

2009/4/3 Chris Withers <chris at simplistix.co.uk>:
> Guido van Rossum wrote:
>>>>
>>>> But anyways this is moot, the bug was only about exec in a class body
>>>> *nested inside a function*.
>>>
>>> Indeed, I just hate seeing execs and it was an interesting mental
>>> exercise
>>> to try and get rid of the above one ;-)
>>>
>>> Assuming it breaks no tests, would there be objection to me committing
>>> the
>>> above change to the Python 3 trunk?
>>
>> That's up to Benjamin. Personally, I live by "if it ain't broke, don't
>> fix it." :-)
>
> Anything using an exec is broken by definition ;-)

"practicality beats purity"

>
> Benjamin?

+0

-- 
Regards,
Benjamin

From guido at python.org  Fri Apr  3 23:32:42 2009
From: guido at python.org (Guido van Rossum)
Date: Fri, 3 Apr 2009 14:32:42 -0700
Subject: [Python-Dev] Should the io-c modules be put in their own
	directory?
In-Reply-To: <1afaf6160904031415y3a62c7b1u90b674d2dc3ed28c@mail.gmail.com>
References: <acd65fa20904022110p2cf46105n4061286bc249d55e@mail.gmail.com> 
	<loom.20090403T085522-602@post.gmane.org> <49D63ABD.30000@v.loewis.de> 
	<1afaf6160904031415y3a62c7b1u90b674d2dc3ed28c@mail.gmail.com>
Message-ID: <ca471dc20904031432p1ea51db8sbb2b1467970b188d@mail.gmail.com>

On Fri, Apr 3, 2009 at 2:15 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2009/4/3 "Martin v. L?wis" <martin at v.loewis.de>:
>>>> I just noticed that the new io-c modules were merged in the py3k
>>>> branch (I know, I am kind late on the news?blame school work). Anyway,
>>>> I am just wondering if it would be a good idea to put the io-c modules
>>>> in a sub-directory (like sqlite), instead of scattering them around in
>>>> the Modules/ directory.
>>>
>>> Welcome back!
>>>
>>> I have no particular opinion on this. I suggest waiting for Benjamin's advice
>>> and following it :-)
>>
>> I would suggest to leave it as is:
>> a) never change a running system
>> b) flat is better than nested
>
> It doesn't make sense, though, to have the 8 files that make up the
> _io module scattered around in a directory with scores of other ones.

I think Benjamin is right. While most of the C source is indeed
exactly one level below the root, there's plenty of code that isn't,
e.g. _ctypes, cjkcodecs, expat, _multiprocessing, zlib. And even
Objects/stringlib.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at janc.be  Sat Apr  4 00:36:14 2009
From: lists at janc.be (Jan Claeys)
Date: Sat, 04 Apr 2009 00:36:14 +0200
Subject: [Python-Dev] Integrate BeautifulSoup into stdlib?
In-Reply-To: <49C939BA.8040206@v.loewis.de>
References: <fb6fbf560903081251m4219b117l63c6bcb71e199be6@mail.gmail.com>
	<49BA3154.8080408@simplistix.co.uk>	<49BAA596.5020106@v.loewis.de>
	<49C79C1A.8040301@simplistix.co.uk>	<49C7FC85.5000809@v.loewis.de>
	<49C80FA0.4020800@simplistix.co.uk>	<87ab7bh5fb.fsf@xemacs.org>
	<49C87004.2030807@holdenweb.com> <49C88503.2030902@v.loewis.de>
	<49C886EF.80203@v.loewis.de> <49C8C9B3.3070403@holdenweb.com>
	<49C939BA.8040206@v.loewis.de>
Message-ID: <1238798174.5360.388.camel@saeko.local>

Op dinsdag 24-03-2009 om 20:51 uur [tijdzone +0100], schreef "Martin v.
L?wis":
> The Windows story is indeed sad, as none of the Windows packaging
> formats provides support for dependencies

That's not entirely true; Cygwin comes with a package management tool
that probably could be used to set up a repository of python packages
for native Windows: <http://sources.redhat.com/cygwin-apps/setup.html>

This package manager is in no way dependent on Cygwin, supports (basic)
dependencies, etc.  Of course some people would have to take care of the
packaging work (just like happens for most open source OS distros and
for Mac OS X already).

It seems like XEmacs is already using a fork of that installer for the
same purpose.

-- 
Jan Claeys

From alexandre at peadrop.com  Sat Apr  4 00:53:18 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Fri, 3 Apr 2009 18:53:18 -0400
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> 
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
Message-ID: <acd65fa20904031553w5f0b1d18k7e513a45cb1871ca@mail.gmail.com>

On Tue, Mar 31, 2009 at 11:25 PM, Guido van Rossum <guido at python.org> wrote:
> Well hold on for a minute, I remember we used to have an exec
> statement in a class body in the standard library, to define some file
> methods in socket.py IIRC.

FYI, collections.namedtuple is also implemented using exec.

- Alexandre

From leif.walsh at gmail.com  Sat Apr  4 01:18:22 2009
From: leif.walsh at gmail.com (Leif Walsh)
Date: Fri, 3 Apr 2009 19:18:22 -0400
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <49D5FBE6.6090807@avl.com>
References: <49D5FBE6.6090807@avl.com>
Message-ID: <cc7430500904031618o4d4088c7h50995377c03495ff@mail.gmail.com>

On Fri, Apr 3, 2009 at 8:07 AM, Hrvoje Niksic <hrvoje.niksic at avl.com> wrote:
> But I can't seem to find a way to retrieve the element corresponding to
> 'foo', at least not without iterating over the entire set. ?Is this an
> oversight or an intentional feature? ?Or am I just missing an obvious way to
> do this?

>>> query_obj in s
True
>>> s_prime = s.copy()
>>> s_prime.discard(query_obj)
>>> x = s.difference(s_prime).pop()

Pretty ugly, but I think it only uses a shallow copy, and it might be
a bit better than iterating, if difference is intelligent.  I haven't
run any tests though.

-- 
Cheers,
Leif

From brett at python.org  Sat Apr  4 01:37:06 2009
From: brett at python.org (Brett Cannon)
Date: Fri, 3 Apr 2009 16:37:06 -0700
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D66E7B.9080304@v.loewis.de>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> 
	<20090403004135.B76443A40A7@sparrow.telecommunity.com>
	<49D66E7B.9080304@v.loewis.de>
Message-ID: <bbaeab100904031637l700f4212p3e41b215a485cce9@mail.gmail.com>

On Fri, Apr 3, 2009 at 13:15, "Martin v. L?wis" <martin at v.loewis.de> wrote:

> > Note that there is no such thing as a "defining namespace package" --
> > namespace package contents are symmetrical peers.
>
> With the PEP, a "defining package" becomes possible - at most one
> portion can define an __init__.py.
>
> I know that the current mechanisms don't support it, and it might
> not be useful in general, but now there is a clean way of doing it,
> so I wouldn't exclude it. Distribution-wise, all distributions
> relying on the defining package would need to require (or
> install_require, or depend on) it.
>
> > The above are also true for using only a '*' in .pkg files -- in that
> > event there are no sys.path changes.  (Frankly, I'm doubtful that
> > anybody is using extend_path and .pkg files to begin with, so I'd be
> > fine with a proposal that instead used something like '.nsp' files that
> > didn't even need to be opened and read -- which would let the directory
> > scan stop at the first .nsp file found.
>
> That would work for me as well. Nobody at PyCon could remember where
> .pkg files came from.
>
> > I believe the PEP does this as well, IIUC.
>
> Correct.
>
> >> * It's possible to have a defining package dir and add-one package
> >> dirs.
> >
> > Also possible in the PEP, although the __init__.py must be in the first
> > such directory on sys.path.
>
> I should make it clear that this is not the case. I envision it to work
> this way: import zope
> - searches sys.path, until finding either a directory zope, or a file
>  zope.{py,pyc,pyd,...}
> - if it is a directory, it checks for .pkg files. If it finds any,
>  it processes them, extending __path__.
> - it *then* checks for __init__.py, taking the first hit anywhere
>  on __path__ (just like any module import would)

Just so people know how this __init__ search could be done such that
__path__ is set from the .pkg is to treat it as a reload (assuming .pkg
files can only be found off of sys.path).

-Brett

> - if no .pkg was found, nor an __init__.py, it proceeds with the next
>  sys.path item (skipping the directory entirely)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090403/32f2ecb2/attachment-0001.htm>

From lists at janc.be  Sat Apr  4 01:59:38 2009
From: lists at janc.be (Jan Claeys)
Date: Sat, 04 Apr 2009 01:59:38 +0200
Subject: [Python-Dev] And the winner is...
In-Reply-To: <ca471dc20903301954g435309dex26c8117e9316355f@mail.gmail.com>
References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com>
	<ca471dc20903301954g435309dex26c8117e9316355f@mail.gmail.com>
Message-ID: <1238803178.5360.389.camel@saeko.local>

Op maandag 30-03-2009 om 21:54 uur [tijdzone -0500], schreef Guido van
Rossum:
> But is his humility enough to cancel out Linus's attitude?

I hope not, or the /.-crowd would become desperate...   ;-)

-- 
Jan Claeys

From alexandre at peadrop.com  Sat Apr  4 01:59:44 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Fri, 3 Apr 2009 19:59:44 -0400
Subject: [Python-Dev] Should the io-c modules be put in their own
	directory?
In-Reply-To: <1afaf6160904031412n6b7415acxcc9e85677f54981e@mail.gmail.com>
References: <acd65fa20904022110p2cf46105n4061286bc249d55e@mail.gmail.com> 
	<loom.20090403T085522-602@post.gmane.org>
	<1afaf6160904031412n6b7415acxcc9e85677f54981e@mail.gmail.com>
Message-ID: <acd65fa20904031659w5f76d469w33f389b716a94bb5@mail.gmail.com>

On Fri, Apr 3, 2009 at 5:12 PM, Benjamin Peterson <benjamin at python.org> wrote:
> I'm +.2. This is the layout I would suggest:
>
> Modules/
> ?_io/
> ? ? _io.c
> ? ? stringio.c
> ? ? textio.c
> ? ? etc....
>

That seems good to me. I opened an issue on the tracker and included a patch.

http://bugs.python.org/issue5682

-- Alexandre

From brett at python.org  Sat Apr  4 02:28:59 2009
From: brett at python.org (Brett Cannon)
Date: Fri, 3 Apr 2009 17:28:59 -0700
Subject: [Python-Dev] Going "offline" for three months
Message-ID: <bbaeab100904031728p30c9583ey92d762feef30c087@mail.gmail.com>

In order to hunker down and get my thesis proposal done by its due date, I
am disabling mail delivery for myself for all mail.python.org mailing lists
for three months (sans python-committers so I don't accidentally commit when
I shouldn't). If something comes up I should know about you can always email
or IM me directly.

See you all on July 1. Here is to hoping I don't suffer any withdrawal.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090403/46dbc403/attachment.htm>

From ncoghlan at gmail.com  Sat Apr  4 03:54:51 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 04 Apr 2009 11:54:51 +1000
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <79990c6b0904031056m32a85ea5yea16dd6c4540dfb1@mail.gmail.com>
References: <49D5FBE6.6090807@avl.com>
	<49D63C99.6000302@v.loewis.de>	<79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com>	<Pine.LNX.4.64.0904031329390.26362@kimball.webabinitio.net>
	<79990c6b0904031056m32a85ea5yea16dd6c4540dfb1@mail.gmail.com>
Message-ID: <49D6BDEB.3050505@gmail.com>

Paul Moore wrote:
> 2009/4/3 R. David Murray <rdmurray at bitdance.com>:
>> a == b
>>
>> So, python calls a.__eq__(b)
>>
>> Now, that function does:
>>
>> a.key == b
>>
>> Since b is an object with an __eq__ method, python calls
>> b.__eq__(a.key).
> 
> That's the bit I can't actually find documented anywhere.

It doesn't quite work the way RDM desribed it - he missed a step.

a == b

So, python calls a.__eq__(b)

Now, that function does:

a.key == b

which first calls a.key.__eq__(b) # This step was missing

Since str has no idea what an Element is, that returns NotImplemented.

Since __eq__ is defined as being commutative, the interpreter then tries
b.__eq__(a.key).

That function does:

b.key == a.key

which calls b.key.__eq__(a.key)

which is a well defined string comparison and returns the expected answer.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From martin at v.loewis.de  Sat Apr  4 04:07:34 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Apr 2009 04:07:34 +0200
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <acd65fa20904031553w5f0b1d18k7e513a45cb1871ca@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>
	<acd65fa20904031553w5f0b1d18k7e513a45cb1871ca@mail.gmail.com>
Message-ID: <49D6C0E6.5050404@v.loewis.de>

Alexandre Vassalotti wrote:
> On Tue, Mar 31, 2009 at 11:25 PM, Guido van Rossum <guido at python.org> wrote:
>> Well hold on for a minute, I remember we used to have an exec
>> statement in a class body in the standard library, to define some file
>> methods in socket.py IIRC.
> 
> FYI, collections.namedtuple is also implemented using exec.

Ah, but it uses "exec ... in ...". That is much safer than an
unqualified exec (where the issue is what namespace it executes in,
and, consequentially, what early binding is possible).

The patch bans only unqualified exec, IIUC.

Regards,
Martin

From martin at v.loewis.de  Sat Apr  4 04:12:28 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 04 Apr 2009 04:12:28 +0200
Subject: [Python-Dev] Integrate BeautifulSoup into stdlib?
In-Reply-To: <1238798174.5360.388.camel@saeko.local>
References: <fb6fbf560903081251m4219b117l63c6bcb71e199be6@mail.gmail.com>	<49BA3154.8080408@simplistix.co.uk>	<49BAA596.5020106@v.loewis.de>	<49C79C1A.8040301@simplistix.co.uk>	<49C7FC85.5000809@v.loewis.de>	<49C80FA0.4020800@simplistix.co.uk>	<87ab7bh5fb.fsf@xemacs.org>	<49C87004.2030807@holdenweb.com>
	<49C88503.2030902@v.loewis.de>	<49C886EF.80203@v.loewis.de>
	<49C8C9B3.3070403@holdenweb.com>	<49C939BA.8040206@v.loewis.de>
	<1238798174.5360.388.camel@saeko.local>
Message-ID: <49D6C20C.8030102@v.loewis.de>

> That's not entirely true; Cygwin comes with a package management tool
> that probably could be used to set up a repository of python packages
> for native Windows: <http://sources.redhat.com/cygwin-apps/setup.html>

Ah, ok. It has the big disadvantage of not being Microsoft-endorsed,
though. In that sense, it feels very much like easy_install (which also
does dependencies).

Regards,
Martin

From ben+python at benfinney.id.au  Sat Apr  4 04:33:39 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Sat, 04 Apr 2009 13:33:39 +1100
Subject: [Python-Dev] Going "offline" for three months
References: <bbaeab100904031728p30c9583ey92d762feef30c087@mail.gmail.com>
Message-ID: <87hc15cflo.fsf@benfinney.id.au>

Brett Cannon <brett at python.org> writes:

> See you all on July 1. Here is to hoping I don't suffer any
> withdrawal.

Ouch. Best of luck to you!

-- 
 \         ?Giving every man a vote has no more made men wise and free |
  `\          than Christianity has made them good.? ?Henry L. Mencken |
_o__)                                                                  |
Ben Finney

From python at rcn.com  Sat Apr  4 04:37:38 2009
From: python at rcn.com (Raymond Hettinger)
Date: Fri, 3 Apr 2009 19:37:38 -0700
Subject: [Python-Dev] Getting values stored inside sets
References: <49D5FBE6.6090807@avl.com><49D63C99.6000302@v.loewis.de>	<79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com>	<Pine.LNX.4.64.0904031329390.26362@kimball.webabinitio.net><79990c6b0904031056m32a85ea5yea16dd6c4540dfb1@mail.gmail.com>
	<49D6BDEB.3050505@gmail.com>
Message-ID: <420CC0DE254142398B721C0067FC9403@RaymondLaptop1>

[Nick Coghlan]
> It doesn't quite work the way RDM desribed it - he missed a step.

Thanks for the clarification.  We ought to write-out the process somewhere in a FAQ.

It may also be instructive to step through the recipe that answers the OP's
original request, http://code.activestate.com/recipes/499299/ 

The call "get_equivalent(set([1, 2, 3]), 2.0)" wraps the 2.0 in a new
object t and calls "t in set([1,2,3])".  The set.__contains__ method
hashes t using t.__hash__(self) and checks for an exact match 
using t.__eq__(other).  Both calls delegate to float objects but the 
latter also records the "other" that resulted in a successful equality
test (i.e. 2 is the member of the set that matched the 2.0).  The
get_equivalent call then returns the matching value, 2.0.

As far as I can tell, the technique is completely generic and lets
you reach inside any function or container to retrieve the "other"
value that is equivalent to "self".

Raymond

From fuzzyman at voidspace.org.uk  Sat Apr  4 04:51:45 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Sat, 04 Apr 2009 03:51:45 +0100
Subject: [Python-Dev] Going "offline" for three months
In-Reply-To: <bbaeab100904031728p30c9583ey92d762feef30c087@mail.gmail.com>
References: <bbaeab100904031728p30c9583ey92d762feef30c087@mail.gmail.com>
Message-ID: <49D6CB41.5030608@voidspace.org.uk>

Brett Cannon wrote:
> In order to hunker down and get my thesis proposal done by its due 
> date, I am disabling mail delivery for myself for all mail.python.org 
> <http://mail.python.org> mailing lists for three months (sans 
> python-committers so I don't accidentally commit when I shouldn't). If 
> something comes up I should know about you can always email or IM me 
> directly.
>
> See you all on July 1. Here is to hoping I don't suffer any withdrawal.

We'll miss you. Hope you don't end up preferring Java. ;-)

Michael
>
> -Brett
> ------------------------------------------------------------------------
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From ctb at msu.edu  Sat Apr  4 04:55:34 2009
From: ctb at msu.edu (C. Titus Brown)
Date: Fri, 3 Apr 2009 19:55:34 -0700
Subject: [Python-Dev] core python tests (was: Re:  PyDict_SetItem hook)
In-Reply-To: <49D64ECB.9040100@voidspace.org.uk>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
	<43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>
	<49D64748.70305@voidspace.org.uk>
	<43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com>
	<49D64ECB.9040100@voidspace.org.uk>
Message-ID: <20090404025534.GA12996@idyll.org>

On Fri, Apr 03, 2009 at 07:00:43PM +0100, Michael Foord wrote:
-> Collin Winter wrote:
-> >On Fri, Apr 3, 2009 at 10:28 AM, Michael Foord
-> ><fuzzyman at voidspace.org.uk> wrote:
-> >  
-> >>Collin Winter wrote:
-> >>    
-> >>>As part of the common standard library and test suite that we agreed
-> >>>on at the PyCon language summit last week, we're going to include a
-> >>>common benchmark suite that all Python implementations can share. This
-> >>>is still some months off, though, so there'll be plenty of time to
-> >>>bikeshed^Wrationally discuss which benchmarks should go in there.
-> >>>
-> >>>      
-> >>Where is the right place for us to discuss this common benchmark and test
-> >>suite?
-> >>
-> >>As the benchmark is developed I would like to ensure it can run on
-> >>IronPython.
-> >>
-> >>The test suite changes will need some discussion as well - Jython and
-> >>IronPython (and probably PyPy) have almost identical changes to tests that
-> >>currently rely on deterministic finalisation (reference counting) so it
-> >>makes sense to test changes on both platforms and commit a single 
-> >>solution.
-> >>    
-> >
-> >I believe Brett Cannon is the best person to talk to about this kind
-> >of thing. I don't know that any common mailing list has been set up,
-> >though there may be and Brett just hasn't told anyone yet :)
-> >
-> >Collin
-> >  
-> Which begs the question of whether we *should* have a separate mailing list.
-> 
-> I don't think we discussed this specific point in the language summit - 
-> although it makes sense. Should we have a list specifically for the test 
-> / benchmarking or would a more general implementations-sig be appropriate?
-> 
-> And is it really Brett who sets up mailing lists? My understanding is 
-> that he is pulling out of stuff for a while anyway, so that he can do 
-> Java / Phd type things... ;-)

'tis a sad loss for both Python-dev and the academic community...

I vote for a separate mailing list -- 'python-tests'? -- but I don't
know exactly how splintered to make the conversation.  It probably
belongs at python.org but if you want me to host it, I can.

N.B. There are a bunch of GSoC projects to work on or with the CPython
test framework (increase test coverage, write plugins to make it
runnable in nose or py.test, etc.).  I don't know that the students
should be active participants in such a list, but the mentors should at
least try to stay in the loop so that we don't completely waste our time.

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu

From brett at python.org  Sat Apr  4 04:56:11 2009
From: brett at python.org (Brett Cannon)
Date: Fri, 3 Apr 2009 19:56:11 -0700
Subject: [Python-Dev] Going "offline" for three months
In-Reply-To: <49D6CB41.5030608@voidspace.org.uk>
References: <bbaeab100904031728p30c9583ey92d762feef30c087@mail.gmail.com> 
	<49D6CB41.5030608@voidspace.org.uk>
Message-ID: <bbaeab100904031956x70cd0aeap26e774611cdaead6@mail.gmail.com>

On Fri, Apr 3, 2009 at 19:51, Michael Foord <fuzzyman at voidspace.org.uk>wrote:

> Brett Cannon wrote:
>
>> In order to hunker down and get my thesis proposal done by its due date, I
>> am disabling mail delivery for myself for all mail.python.org <
>> http://mail.python.org> mailing lists for three months (sans
>> python-committers so I don't accidentally commit when I shouldn't). If
>> something comes up I should know about you can always email or IM me
>> directly.
>>
>> See you all on July 1. Here is to hoping I don't suffer any withdrawal.
>>
>
> We'll miss you. Hope you don't end up preferring Java. ;-)

No, it would be more like JavaScript, but I don't see that happening either.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090403/4ff8acd1/attachment.htm>

From martin at v.loewis.de  Sat Apr  4 07:03:40 2009
From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Apr 2009 07:03:40 +0200
Subject: [Python-Dev] Should the io-c modules be put in their own
	directory?
In-Reply-To: <ca471dc20904031432p1ea51db8sbb2b1467970b188d@mail.gmail.com>
References: <acd65fa20904022110p2cf46105n4061286bc249d55e@mail.gmail.com>
	<loom.20090403T085522-602@post.gmane.org>
	<49D63ABD.30000@v.loewis.de>
	<1afaf6160904031415y3a62c7b1u90b674d2dc3ed28c@mail.gmail.com>
	<ca471dc20904031432p1ea51db8sbb2b1467970b188d@mail.gmail.com>
Message-ID: <49D6EA2C.6080103@v.loewis.de>

> I think Benjamin is right. While most of the C source is indeed
> exactly one level below the root, there's plenty of code that isn't,
> e.g. _ctypes, cjkcodecs, expat, _multiprocessing, zlib. And even
> Objects/stringlib.

It's fine with me either way.

Martin

From ncoghlan at gmail.com  Sat Apr  4 07:16:23 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 04 Apr 2009 15:16:23 +1000
Subject: [Python-Dev] core python tests
In-Reply-To: <20090404025534.GA12996@idyll.org>
References: <49D3F8D0.8070805@wingware.com>	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>	<49D42013.3010600@wingware.com>	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>	<loom.20090403T092111-554@post.gmane.org>	<43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>	<49D64748.70305@voidspace.org.uk>	<43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com>	<49D64ECB.9040100@voidspace.org.uk>
	<20090404025534.GA12996@idyll.org>
Message-ID: <49D6ED27.8030908@gmail.com>

C. Titus Brown wrote:
> I vote for a separate mailing list -- 'python-tests'? -- but I don't
> know exactly how splintered to make the conversation.  It probably
> belongs at python.org but if you want me to host it, I can.

If too many things get moved off to SIGs there won't be anything left
for python-dev to talk about ;)

(Although in this case it makes sense, as I expect there will be
developers involved in alternate implementations that would like to be
part of the test suite discussion without having to sign up for the rest
of python-dev)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Sat Apr  4 07:54:23 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 04 Apr 2009 15:54:23 +1000
Subject: [Python-Dev] Documenting the Py3k coercion rules (was Re: Getting
 values stored inside sets)
In-Reply-To: <420CC0DE254142398B721C0067FC9403@RaymondLaptop1>
References: <49D5FBE6.6090807@avl.com><49D63C99.6000302@v.loewis.de>	<79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com>	<Pine.LNX.4.64.0904031329390.26362@kimball.webabinitio.net><79990c6b0904031056m32a85ea5yea16dd6c4540dfb1@mail.gmail.com>
	<49D6BDEB.3050505@gmail.com>
	<420CC0DE254142398B721C0067FC9403@RaymondLaptop1>
Message-ID: <49D6F60F.8090909@gmail.com>

Raymond Hettinger wrote:
> 
> [Nick Coghlan]
>> It doesn't quite work the way RDM desribed it - he missed a step.
> 
> Thanks for the clarification.  We ought to write-out the process
> somewhere in a FAQ.

The closest we currently have to that is the write-up of the coercion
rules in 2.x:

http://docs.python.org/reference/datamodel.html#id5

Unfortunately, that mixes in a lot of CPython specific special cases
along with the old coerce() builtin that obscure the basic behaviour for
__op__ and __rop__ pairs.

Here's an initial stab at a write-up of the coercion rules for Py3k that
is accurate without getting too CPython specific:

"""
Given  "a OP b", the coercion sequence is:

1. Try "a.__op__(b)"
2. If "a.__op__" doesn't exist or the call returns NotImplemented, try
"b.__rop__(a)"
3. If "b.__rop__" doesn't exist or the call returns NotImplemented,
raise TypeError identifying "type(a)" and "type(b)" as unsupported
operands for OP
4. If step 1 or 2 is successful, then the result of the call is the
value of the expression

Given "a OP= b" the coercion sequence is:

1. Try "a = a.__iop__(b)"
2. If "a.__iop__" doesn't exist or the call returns not implemented, try
"a = a OP b" using the normal binary coercion rules above

Special cases:

- if "type(b)" is a strict subclass of "type(a)", then "b.__rop__" is
tried before "a.__op__". This allows subclasses to ensure an instance of
the subclass is returned when interacting with instances of the parent
class.

- rich comparisons are associated into __op__/__rop__ pairs as follows:
  __eq__/__eq__ (i.e. a == b is considered equivalent to b == a)
  __ne__/__ne__ (i.e. a != b is considered equivalent to b != a)
  __lt__/__gt__ (i.e. a < b is considered equivalent to b > a)
  __le__/__ge__ (i.e. a <= b is considered equivalent to b >= a)

- __rpow__ is never invoked for the 3 argument form of pow(), as the
coercion rules only apply to binary operations. In this case, a
NotImplemented return from the call to __pow__ is converted immediately
into a TypeError.
"""

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Sat Apr  4 13:04:28 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 4 Apr 2009 11:04:28 +0000 (UTC)
Subject: [Python-Dev] core python tests
References: <49D3F8D0.8070805@wingware.com>	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>	<49D42013.3010600@wingware.com>	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>	<loom.20090403T092111-554@post.gmane.org>	<43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>	<49D64748.70305@voidspace.org.uk>	<43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com>	<49D64ECB.9040100@voidspace.org.uk>
	<20090404025534.GA12996@idyll.org> <49D6ED27.8030908@gmail.com>
Message-ID: <loom.20090404T110332-698@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> C. Titus Brown wrote:
> > I vote for a separate mailing list -- 'python-tests'? -- but I don't
> > know exactly how splintered to make the conversation.  It probably
> > belongs at python.org but if you want me to host it, I can.
> 
> If too many things get moved off to SIGs there won't be anything left
> for python-dev to talk about ;)

There is already an stdlib-sig, which has been almost unused.

Regards

Antoine.

From aahz at pythoncraft.com  Sat Apr  4 15:28:01 2009
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 4 Apr 2009 06:28:01 -0700
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
	<9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com>
	<loom.20090403T161126-31@post.gmane.org>
	<43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com>
Message-ID: <20090404132800.GA10257@panix.com>

On Fri, Apr 03, 2009, Collin Winter wrote:
>
> I don't believe that these are insurmountable problems, though. A
> great contribution to Python performance work would be an improved
> version of PyBench that corrects these problems and offers more
> precise measurements. Is that something you might be interested in
> contributing to? As performance moves more into the wider
> consciousness, having good tools will become increasingly important.

GSoC work?
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."  --Brian W. Kernighan

From ctb at msu.edu  Sat Apr  4 16:26:12 2009
From: ctb at msu.edu (C. Titus Brown)
Date: Sat, 4 Apr 2009 07:26:12 -0700
Subject: [Python-Dev] GSoC (was Re:  PyDict_SetItem hook)
In-Reply-To: <20090404132800.GA10257@panix.com>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<loom.20090403T092111-554@post.gmane.org>
	<9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com>
	<loom.20090403T161126-31@post.gmane.org>
	<43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com>
	<20090404132800.GA10257@panix.com>
Message-ID: <20090404142612.GG12593@idyll.org>

On Sat, Apr 04, 2009 at 06:28:01AM -0700, Aahz wrote:
-> On Fri, Apr 03, 2009, Collin Winter wrote:
-> >
-> > I don't believe that these are insurmountable problems, though. A
-> > great contribution to Python performance work would be an improved
-> > version of PyBench that corrects these problems and offers more
-> > precise measurements. Is that something you might be interested in
-> > contributing to? As performance moves more into the wider
-> > consciousness, having good tools will become increasingly important.
-> 
-> GSoC work?

Alas, it's too late to submit new proposals; the deadline was
yesterday.

The next "Google gives us money to wrangle students into doing
development" project will probably be GHOP for highschool students, in
the winter, although it has not been announced and may not happen.

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu

From fuzzyman at voidspace.org.uk  Sat Apr  4 16:33:49 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Sat, 04 Apr 2009 15:33:49 +0100
Subject: [Python-Dev] core python tests
In-Reply-To: <loom.20090404T110332-698@post.gmane.org>
References: <49D3F8D0.8070805@wingware.com>	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>	<49D42013.3010600@wingware.com>	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>	<loom.20090403T092111-554@post.gmane.org>	<43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>	<49D64748.70305@voidspace.org.uk>	<43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com>	<49D64ECB.9040100@voidspace.org.uk>	<20090404025534.GA12996@idyll.org>
	<49D6ED27.8030908@gmail.com>
	<loom.20090404T110332-698@post.gmane.org>
Message-ID: <49D76FCD.8050303@voidspace.org.uk>

Antoine Pitrou wrote:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>   
>> C. Titus Brown wrote:
>>     
>>> I vote for a separate mailing list -- 'python-tests'? -- but I don't
>>> know exactly how splintered to make the conversation.  It probably
>>> belongs at python.org but if you want me to host it, I can.
>>>       
>> If too many things get moved off to SIGs there won't be anything left
>> for python-dev to talk about ;)
>>     
>
> There is already an stdlib-sig, which has been almost unused.
>
>   
stdlib-sig isn't *quite* right (the testing and benchmarking are as much 
about core python as the stdlib) - although we could view the benchmarks 
and tests themselves as part of the standard library...

Either way we should get it underway. Collin and Jeffrey - happy to use 
stdlib-sig?

Michael

> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From ctb at msu.edu  Sat Apr  4 17:01:11 2009
From: ctb at msu.edu (C. Titus Brown)
Date: Sat, 4 Apr 2009 08:01:11 -0700
Subject: [Python-Dev] graphics maths types in python core?
Message-ID: <20090404150111.GQ12593@idyll.org>

Hi all,

we're having a discussion over on the GSoC mailing list about basic
math types, and I was wondering if there is any history that we should
be aware of in python-dev.  Has this been brought up before and
rejected?  Should the interested projects work towards a consensus and
maybe write up a PEP?

(The proximal issue is whether or not this is of direct relevance to the
python core and hence should be given priority.)

tnx,
-titus

Rene Dudfield wrote:

-> there's seven graphics math type proposals which would be a good project
-> for the graphics python using projects -- especially if they can get
-> into python.
->
-> It would be great if one of these proposals was accepted to work towards
-> getting these simple types into python.
->
-> Otherwise we'll be doomed to have each project implement vec2, vec3,
-> vec4, matrix3/4, quaternion (which has already happened many times) -
-> and continue to have interoperability issues.
->
-> The reason why just these basic types, and not full blown numpy is that
-> numpy is never planned to get into python.  Numpy doesn't want to tie
-> it's development into pythons development cycle.  Whereas a small set of
-> types  can be implemented and stabalised for python more easily.
->
-> Also, it's not image, or 3d format types -- since those are also a way
-> larger project.

-- 
C. Titus Brown, ctb at msu.edu

From solipsis at pitrou.net  Sat Apr  4 17:09:39 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 4 Apr 2009 15:09:39 +0000 (UTC)
Subject: [Python-Dev] graphics maths types in python core?
References: <20090404150111.GQ12593@idyll.org>
Message-ID: <loom.20090404T150517-197@post.gmane.org>

C. Titus Brown <ctb <at> msu.edu> writes:
> 
> we're having a discussion over on the GSoC mailing list about basic
> math types
> 
[...]
> ->
> -> Otherwise we'll be doomed to have each project implement vec2, vec3,
> -> vec4, matrix3/4, quaternion (which has already happened many times) -
> -> and continue to have interoperability issues.

This interoperability problem is the very reason the new buffer API and
memoryview object were devised by Travis Oliphant (who is, AFAIK, a numpy
contributor). Unfortunately, Travis disappeared and left us with an unfinished
implementation which doesn't support anything else than linear byte buffers.

So, rather than trying to stuff new specialized datatypes into Python, I suggest
maths types proponents contribute the missing bits of the new buffer API and
memoryview object :-)

Regards

Antoine.

From aahz at pythoncraft.com  Sat Apr  4 17:40:50 2009
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 4 Apr 2009 08:40:50 -0700
Subject: [Python-Dev] Mercurial?
Message-ID: <20090404154049.GA23987@panix.com>

With Brett's (hopefully temporary!) absence, who is spearheading the
Mercurial conversion?  Whoever it is should probably take over PEP 374
and start updating it with the conversion plan, particularly WRT
expectations for dates relative to 3.1 final and 2.7 final.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."  --Brian W. Kernighan

From mario.danic at gmail.com  Sat Apr  4 17:46:02 2009
From: mario.danic at gmail.com (Mario)
Date: Sat, 4 Apr 2009 17:46:02 +0200
Subject: [Python-Dev] Helper Python core development tools
Message-ID: <79957db20904040846k6ea1c392lcc9393899aa77352@mail.gmail.com>

With all the sand and sun on the beaches, should I really be doing this now?

That is the question we probably ask ourselves every time we have to do some
boring task. What kind of things do you think could be made better? What
would
make your workflow smoother and more fun? Now is your chance to voice your
opinion.

http://wiki.python.org/moin/CoreDevHelperTools

Some of the tools/extensions categories that could be relevant:

    - Wrappers for working with tracker issues
    - Wrapper for managing patches
    - Wrapper for running tests
    - Wrapper for submitting diffs for review
    - Commit helpers (various hooks)
    - Various Roundup extensions

Please be invited to comment and raise your concerns, so we could discuss
them together and make our hacker's life more enjoyable.

My name is Mario ?ani?, a hopeful GSoC student, and I am looking forward
working
with you. Thank you for your time and your help in this matter.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090404/fc589ce4/attachment-0001.htm>

From collinw at gmail.com  Sat Apr  4 18:52:11 2009
From: collinw at gmail.com (Collin Winter)
Date: Sat, 4 Apr 2009 09:52:11 -0700
Subject: [Python-Dev] core python tests
In-Reply-To: <49D76FCD.8050303@voidspace.org.uk>
References: <49D3F8D0.8070805@wingware.com>
	<loom.20090403T092111-554@post.gmane.org>
	<43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>
	<49D64748.70305@voidspace.org.uk>
	<43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com>
	<49D64ECB.9040100@voidspace.org.uk> <20090404025534.GA12996@idyll.org>
	<49D6ED27.8030908@gmail.com> <loom.20090404T110332-698@post.gmane.org>
	<49D76FCD.8050303@voidspace.org.uk>
Message-ID: <43aa6ff70904040952g4aece85ajfceac04b7d857194@mail.gmail.com>

On Sat, Apr 4, 2009 at 7:33 AM, Michael Foord <fuzzyman at voidspace.org.uk> wrote:
> Antoine Pitrou wrote:
>>
>> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>>
>>>
>>> C. Titus Brown wrote:
>>>
>>>>
>>>> I vote for a separate mailing list -- 'python-tests'? -- but I don't
>>>> know exactly how splintered to make the conversation. ?It probably
>>>> belongs at python.org but if you want me to host it, I can.
>>>>
>>>
>>> If too many things get moved off to SIGs there won't be anything left
>>> for python-dev to talk about ;)
>>>
>>
>> There is already an stdlib-sig, which has been almost unused.
>>
>>
>
> stdlib-sig isn't *quite* right (the testing and benchmarking are as much
> about core python as the stdlib) - although we could view the benchmarks and
> tests themselves as part of the standard library...
>
> Either way we should get it underway. Collin and Jeffrey - happy to use
> stdlib-sig?

Works for me.

Collin

From guido at python.org  Sat Apr  4 20:20:19 2009
From: guido at python.org (Guido van Rossum)
Date: Sat, 4 Apr 2009 11:20:19 -0700
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <loom.20090404T150517-197@post.gmane.org>
References: <20090404150111.GQ12593@idyll.org>
	<loom.20090404T150517-197@post.gmane.org>
Message-ID: <ca471dc20904041120y4d76a31cr9cd26f4bb73e4e95@mail.gmail.com>

I'm not even sure what you mean by "basic math types" (it would
probably depend on which math curriculum you are using :-) but if
you're not already aware of PEP 3141, that's where to start.

--Guido

On Sat, Apr 4, 2009 at 8:09 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> C. Titus Brown <ctb <at> msu.edu> writes:
>>
>> we're having a discussion over on the GSoC mailing list about basic
>> math types
>>
> [...]
>> ->
>> -> Otherwise we'll be doomed to have each project implement vec2, vec3,
>> -> vec4, matrix3/4, quaternion (which has already happened many times) -
>> -> and continue to have interoperability issues.
>
> This interoperability problem is the very reason the new buffer API and
> memoryview object were devised by Travis Oliphant (who is, AFAIK, a numpy
> contributor). Unfortunately, Travis disappeared and left us with an unfinished
> implementation which doesn't support anything else than linear byte buffers.
>
> So, rather than trying to stuff new specialized datatypes into Python, I suggest
> maths types proponents contribute the missing bits of the new buffer API and
> memoryview object :-)
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Sat Apr  4 20:23:30 2009
From: barry at python.org (Barry Warsaw)
Date: Sat, 4 Apr 2009 14:23:30 -0400
Subject: [Python-Dev] Package Management - thoughts from the peanut
	gallery
In-Reply-To: <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>
References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org>
	<94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com>
Message-ID: <75F5CB1E-8589-4848-937E-F43F2B82D5F3@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Apr 3, 2009, at 4:01 AM, Tarek Ziad? wrote:

> Each one of this task has a leader, except the one with (*). I just  
> got back
> from travelling, and I will reorganize
> http://wiki.python.org/moin/Distutils asap to it is up-to-date.

I added a link to this from the new SIG page.

http://wiki.python.org/moin/Special%20Interest%20Groups

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdelonEjvBPtnXfVAQI0QQP/c0mXr4OA+yLOFHqSksFxT5pkt2xPtxPO
25VfcGFmP0FydsGMW0fpIPC9nw3kaZhtwtx5iYiRXOg796IParSzSdleKwRdabwA
SH+EzhD0gprwyfPEi6Vptb+ORz8if1gz4UPIUBfJaLVGw7eXH0Xue5rqUEksu6MX
wi/MMub9V0g=
=2FHl
-----END PGP SIGNATURE-----

From fuzzyman at voidspace.org.uk  Sat Apr  4 20:37:26 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Sat, 04 Apr 2009 19:37:26 +0100
Subject: [Python-Dev] core python tests
In-Reply-To: <5d44f72f0904041130x5f805862t396787b8fbb5ce6f@mail.gmail.com>
References: <49D3F8D0.8070805@wingware.com>	
	<43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com>	
	<49D64748.70305@voidspace.org.uk>	
	<43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com>	
	<49D64ECB.9040100@voidspace.org.uk>
	<20090404025534.GA12996@idyll.org>	 <49D6ED27.8030908@gmail.com>
	<loom.20090404T110332-698@post.gmane.org>	
	<49D76FCD.8050303@voidspace.org.uk>	
	<43aa6ff70904040952g4aece85ajfceac04b7d857194@mail.gmail.com>
	<5d44f72f0904041130x5f805862t396787b8fbb5ce6f@mail.gmail.com>
Message-ID: <49D7A8E6.5030901@voidspace.org.uk>

Jeffrey Yasskin wrote:
> On Sat, Apr 4, 2009 at 11:52 AM, Collin Winter <collinw at gmail.com> wrote:
>   
>> On Sat, Apr 4, 2009 at 7:33 AM, Michael Foord <fuzzyman at voidspace.org.uk> wrote:
>>     
>>> Antoine Pitrou wrote:
>>>       
>>>> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>>>>
>>>>         
>>>>> C. Titus Brown wrote:
>>>>>
>>>>>           
>>>>>> I vote for a separate mailing list -- 'python-tests'? -- but I don't
>>>>>> know exactly how splintered to make the conversation.  It probably
>>>>>> belongs at python.org but if you want me to host it, I can.
>>>>>>
>>>>>>             
>>>>> If too many things get moved off to SIGs there won't be anything left
>>>>> for python-dev to talk about ;)
>>>>>
>>>>>           
>>>> There is already an stdlib-sig, which has been almost unused.
>>>>
>>>>
>>>>         
>>> stdlib-sig isn't *quite* right (the testing and benchmarking are as much
>>> about core python as the stdlib) - although we could view the benchmarks and
>>> tests themselves as part of the standard library...
>>>
>>> Either way we should get it underway. Collin and Jeffrey - happy to use
>>> stdlib-sig?
>>>       
>> Works for me.
>>     
>
> Me too.
>
> bcc python-dev, -> stdlib-sig
>
> First question: Do people want the unladen-swallow performance tests
> in the CPython repository until the whole library gets moved out? If
> so, where? Tools/performance? Lib/test/benchmarks?
>   

I'm +1 on including them (so long as they run under trunk of course) but 
agnostic on location.

Maybe better not in test as it might be expected that a full regrtest 
would then run them?

I'm keeping Python-dev cc'd as it is a Python-dev decision and bcc 
messages require individual admin approval.

Michael

> Jeffrey
>   

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From dirkjan at ochtman.nl  Sat Apr  4 22:37:55 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sat, 04 Apr 2009 22:37:55 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <20090404154049.GA23987@panix.com>
References: <20090404154049.GA23987@panix.com>
Message-ID: <49D7C523.2090605@ochtman.nl>

On 04/04/2009 17:40, Aahz wrote:
> With Brett's (hopefully temporary!) absence, who is spearheading the
> Mercurial conversion?  Whoever it is should probably take over PEP 374
> and start updating it with the conversion plan, particularly WRT
> expectations for dates relative to 3.1 final and 2.7 final.

I'd like to take that on. I know hardly anyone here knows me, but I'm 
one of the Mercurial developers. I've been in contact with Brett, saying 
that I'd gladly as much help as I could, and I figured I'd put a lot of 
time in providing the best possible migration path.

While I haven't posted here much, I've been lurking for about two years 
now, so I know a little about what's going on. Maybe I could pair up 
with someone here who wants to work on it, if that makes people more 
confident?

Anyway, I'm also on the tracker-discuss list, since Brett told me that's 
where infra stuff mostly takes place.

Cheers,

Dirkjan

From greg.ewing at canterbury.ac.nz  Sun Apr  5 00:00:36 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 05 Apr 2009 10:00:36 +1200
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <20090404150111.GQ12593@idyll.org>
References: <20090404150111.GQ12593@idyll.org>
Message-ID: <49D7D884.5060801@canterbury.ac.nz>

C. Titus Brown wrote:

> we're having a discussion over on the GSoC mailing list about basic
> math types, and I was wondering if there is any history that we should
> be aware of in python-dev.

Something I've suggested before is to provide a set of
functions for doing elementwise arithmetic operations on
objects that support the new buffer protocol.

Together with a multidimensional version of the standard
array.array type, this would provide a kind of "numpy
lite" that you could use to build reasonably efficient
vector and matrix types with no external dependencies.

By making these functions that operate through the
buffer protocol rather than special types, they would
be much more flexible and interoperate with other
libraries very well.

-- 
Greg

From solipsis at pitrou.net  Sun Apr  5 00:11:55 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 4 Apr 2009 22:11:55 +0000 (UTC)
Subject: [Python-Dev] graphics maths types in python core?
References: <20090404150111.GQ12593@idyll.org>
	<49D7D884.5060801@canterbury.ac.nz>
Message-ID: <loom.20090404T220604-267@post.gmane.org>

Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
> 
> Something I've suggested before is to provide a set of
> functions for doing elementwise arithmetic operations on
> objects that support the new buffer protocol.
> 
> Together with a multidimensional version of the standard
> array.array type, this would provide a kind of "numpy
> lite" that you could use to build reasonably efficient
> vector and matrix types with no external dependencies.

Again, I don't want to spoil the party, but multidimensional buffers are
not implemented, and neither are buffers of anything other than single-byte
data. Interested people should start with this, before jumping to the 
higher-level stuff.

Regards

Antoine.

From brian at sweetapp.com  Sun Apr  5 00:35:32 2009
From: brian at sweetapp.com (brian at sweetapp.com)
Date: Sat, 4 Apr 2009 18:35:32 -0400 (EDT)
Subject: [Python-Dev] Possible py3k io wierdness
Message-ID: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>

Hey,

I noticed that the call pattern of the C-implemented io libraries is as
follows (translating from C to Python):

class _FileIO(object):
  def flush(self):
    if self.__IOBase_closed:
      raise ...

  def close(self):
    self.flush()
    self.__IOBase_closed = True

class _RawIOBase(_FileIO):
  def close(self):
    # do close
    _FileIO.close(self)

This means that, if a subclass overrides flush(), it will be called after
the file has been closed e.g.

>>> import io
>>> class MyIO(io.FileIO):
...     def flush(self):
...             print('closed:', self.closed)
...
>>> f = MyIO('test.out', 'wb')
>>> f.close()
closed: True

It seems to me that, during close, calls should only propagate up the
class hierarchy i.e.

class _FileIO(object):
  def flush(self):
    if self.__IOBase_closed:
      raise ...

  def close(self):
    _FileIO.flush(self)
    self.__IOBase_closed = True

I volunteer to change this if there is agreement that this is the way to go.

Cheers,
Brian

From benjamin at python.org  Sun Apr  5 01:13:40 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sat, 4 Apr 2009 18:13:40 -0500
Subject: [Python-Dev] [RELEASED] Python 3.1 alpha 2
Message-ID: <1afaf6160904041613t4bb44976x65d7a4d4a90f2c47@mail.gmail.com>

On behalf of the Python development team, I'm thrilled to announce the second
alpha release of Python 3.1.

Python 3.1 focuses on the stabilization and optimization of features and changes
Python 3.0 introduced.  For example, the new I/O system has been rewritten in C
for speed.  Other features include an ordered dictionary implementation and
support for ttk Tile in Tkinter.  For a more extensive list of changes in 3.1,
see http://doc.python.org/dev/py3k/whatsnew/3.1.html or Misc/NEWS in the Python
distribution.

Please note that this is an alpha releases, and as such is not suitable for
production environments.  We continue to strive for a high degree of quality,
but there are still some known problems and the feature sets have not been
finalized.  This alpha is being released to solicit feedback and hopefully
discover bugs, as well as allowing you to determine how changes in 3.1 might
impact you.  If you find things broken or incorrect, please submit a bug report
at

     http://bugs.python.org

For more information and downloadable distributions, see the Python 3.1 website:

     http://www.python.org/download/releases/3.1/

See PEP 375 for release schedule details:

     http://www.python.org/dev/peps/pep-0375/

Regards,
-- Benjamin

Benjamin Peterson
benjamin at python.org
Release Manager
(on behalf of the entire python-dev team and 3.1's contributors)

From solipsis at pitrou.net  Sun Apr  5 01:23:06 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 4 Apr 2009 23:23:06 +0000 (UTC)
Subject: [Python-Dev] Possible py3k io wierdness
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>
Message-ID: <loom.20090404T231154-979@post.gmane.org>

Hi!

<brian <at> sweetapp.com> writes:
> 
> class _RawIOBase(_FileIO):

FileIO is a subclass of _RawIOBase, not the reverse:

>>> issubclass(_io._RawIOBase, _io.FileIO)
False
>>> issubclass(_io.FileIO, _io._RawIOBase)
True

I do understand your surprise, but the Python implementation of IOBase.close()
in _pyio.py does the same thing:

    def close(self) -> None:
        """Flush and close the IO object.

        This method has no effect if the file is already closed.
        """
        if not self.__closed:
            try:
                self.flush()
            except IOError:
                pass  # If flush() fails, just give up
            self.__closed = True

Note how it calls `self.flush()` and not `IOBase.flush(self)`.
When writing the C version of the I/O stack, we tried to keep the semantics the
same as in the Python version, although there are a couple of subtleties.

Your problem here is that it's IOBase.close() which calls your flush() method,
but FileIO.close() has already done its job before and the internal file
descriptor has been closed (hence `self.closed` is True). In this particular
case, I advocate overriding close() as well and call your flush() method
manually from there.

Thanks for your feedback!

Regards

Antoine.

From steve at holdenweb.com  Sun Apr  5 01:23:32 2009
From: steve at holdenweb.com (Steve Holden)
Date: Sat, 04 Apr 2009 19:23:32 -0400
Subject: [Python-Dev] Integrate BeautifulSoup into stdlib?
In-Reply-To: <49D6C20C.8030102@v.loewis.de>
References: <fb6fbf560903081251m4219b117l63c6bcb71e199be6@mail.gmail.com>	<49BA3154.8080408@simplistix.co.uk>	<49BAA596.5020106@v.loewis.de>	<49C79C1A.8040301@simplistix.co.uk>	<49C7FC85.5000809@v.loewis.de>	<49C80FA0.4020800@simplistix.co.uk>	<87ab7bh5fb.fsf@xemacs.org>	<49C87004.2030807@holdenweb.com>	<49C88503.2030902@v.loewis.de>	<49C886EF.80203@v.loewis.de>	<49C8C9B3.3070403@holdenweb.com>	<49C939BA.8040206@v.loewis.de>	<1238798174.5360.388.camel@saeko.local>
	<49D6C20C.8030102@v.loewis.de>
Message-ID: <gr8q62$pc4$1@ger.gmane.org>

Martin v. L?wis wrote:
>> That's not entirely true; Cygwin comes with a package management tool
>> that probably could be used to set up a repository of python packages
>> for native Windows: <http://sources.redhat.com/cygwin-apps/setup.html>
> 
> Ah, ok. It has the big disadvantage of not being Microsoft-endorsed,
> though. In that sense, it feels very much like easy_install (which also
> does dependencies).
> 
Not only that, but the Cygwin packaging system appears to be extremely
difficult to organize a package for.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Want to know? Come to PyCon - soon! http://us.pycon.org/

From benjamin at python.org  Sun Apr  5 01:31:45 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sat, 4 Apr 2009 18:31:45 -0500
Subject: [Python-Dev] 3.1 beta is closer than you think
Message-ID: <1afaf6160904041631u37bf7ed6ga3ad8b338c0afe95@mail.gmail.com>

3.1's only beta is planned for May 2nd, so that means you have exactly
28 days to get the amazing 3.1 features you have planned checked into
the py3k branch. There will be absolutely no new features after the
beta is released.

-- 
Regards,
Benjamin

From greg.ewing at canterbury.ac.nz  Sun Apr  5 01:34:20 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 05 Apr 2009 11:34:20 +1200
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <loom.20090404T220604-267@post.gmane.org>
References: <20090404150111.GQ12593@idyll.org>
	<49D7D884.5060801@canterbury.ac.nz>
	<loom.20090404T220604-267@post.gmane.org>
Message-ID: <49D7EE7C.4040604@canterbury.ac.nz>

Antoine Pitrou wrote:

> Again, I don't want to spoil the party, but multidimensional buffers are
> not implemented, and neither are buffers of anything other than single-byte
> data.

When you say "buffer" here, are you talking about the
buffer interface itself, or the memoryview object?

-- 
Greg

From solipsis at pitrou.net  Sun Apr  5 01:38:30 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 4 Apr 2009 23:38:30 +0000 (UTC)
Subject: [Python-Dev] graphics maths types in python core?
References: <20090404150111.GQ12593@idyll.org>
	<49D7D884.5060801@canterbury.ac.nz>
	<loom.20090404T220604-267@post.gmane.org>
	<49D7EE7C.4040604@canterbury.ac.nz>
Message-ID: <loom.20090404T233632-904@post.gmane.org>

Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
> 
> > Again, I don't want to spoil the party, but multidimensional buffers are
> > not implemented, and neither are buffers of anything other than single-byte
> > data.
> 
> When you say "buffer" here, are you talking about the
> buffer interface itself, or the memoryview object?

Both.
Well, taking a buffer or memoryview to non-bytes data is supported, but since
it's basically unused, some things are likely missing or broken (e.g.
memoryview.tolist()).

From greg.ewing at canterbury.ac.nz  Sun Apr  5 01:52:11 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 05 Apr 2009 11:52:11 +1200
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <loom.20090404T233632-904@post.gmane.org>
References: <20090404150111.GQ12593@idyll.org>
	<49D7D884.5060801@canterbury.ac.nz>
	<loom.20090404T220604-267@post.gmane.org>
	<49D7EE7C.4040604@canterbury.ac.nz>
	<loom.20090404T233632-904@post.gmane.org>
Message-ID: <49D7F2AB.8060907@canterbury.ac.nz>

Antoine Pitrou wrote:

> Both.
> Well, taking a buffer or memoryview to non-bytes data is supported, but since
> it's basically unused, some things are likely missing or broken

So you're saying the buffer interface *has* been fully
implemented, it just hasn't been tested very well?

If so, writing some things that attempt to use it in
non-trivial ways would be a useful thing to do.

-- 
Greg

From benjamin at python.org  Sun Apr  5 01:52:57 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sat, 4 Apr 2009 18:52:57 -0500
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <49D7F2AB.8060907@canterbury.ac.nz>
References: <20090404150111.GQ12593@idyll.org>
	<49D7D884.5060801@canterbury.ac.nz>
	<loom.20090404T220604-267@post.gmane.org>
	<49D7EE7C.4040604@canterbury.ac.nz>
	<loom.20090404T233632-904@post.gmane.org>
	<49D7F2AB.8060907@canterbury.ac.nz>
Message-ID: <1afaf6160904041652r4d538210y88ee33f7027a4dae@mail.gmail.com>

2009/4/4 Greg Ewing <greg.ewing at canterbury.ac.nz>:
> Antoine Pitrou wrote:
>
>> Both.
>> Well, taking a buffer or memoryview to non-bytes data is supported, but
>> since
>> it's basically unused, some things are likely missing or broken
>
> So you're saying the buffer interface *has* been fully
> implemented, it just hasn't been tested very well?

No, only simple linear bytes are supported.

>
> If so, writing some things that attempt to use it in
> non-trivial ways would be a useful thing to do.

-- 
Regards,
Benjamin

From solipsis at pitrou.net  Sun Apr  5 01:56:07 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 4 Apr 2009 23:56:07 +0000 (UTC)
Subject: [Python-Dev] graphics maths types in python core?
References: <20090404150111.GQ12593@idyll.org>
	<49D7D884.5060801@canterbury.ac.nz>
	<loom.20090404T220604-267@post.gmane.org>
	<49D7EE7C.4040604@canterbury.ac.nz>
	<loom.20090404T233632-904@post.gmane.org>
	<49D7F2AB.8060907@canterbury.ac.nz>
Message-ID: <loom.20090404T235147-360@post.gmane.org>

Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
> 
> So you're saying the buffer interface *has* been fully
> implemented, it just hasn't been tested very well?

No, it hasn't been implemented for multi-dimensional types, and it hasn't been
really tested for anything other than plain linear collections of bytes.
(I have added tests for arrays in test_memoryview, but that's all. And that's
only in py3k since array.array in 2.x only supports the old buffer interface)

From martin at v.loewis.de  Sun Apr  5 02:44:38 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Apr 2009 02:44:38 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D7C523.2090605@ochtman.nl>
References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl>
Message-ID: <49D7FEF6.1010006@v.loewis.de>

> I'd like to take that on. I know hardly anyone here knows me, but I'm
> one of the Mercurial developers. I've been in contact with Brett, saying
> that I'd gladly as much help as I could, and I figured I'd put a lot of
> time in providing the best possible migration path.

I'm personally happy letting you do that (although I do wonder who would
then be in charge of the Mercurial installation in the long run, the way
I have been in charge of the subversion installation).

To proceed, I think the next step should be to discuss in the PEP the
details of the migration procedure (see PEP 347 for what level of detail
I produced for the svn migration), and to set up a demo installation
that is considered ready-to-run, except that it might get torn down
again, if the actual conversion requires that (it did for the CVS->svn
case), or if problems are found with the demo installation.

I would personally remove all non-mercurial stuff out of PEP 374,
and retitle it, but that would be your choice.

Regards,
Martin

From solipsis at pitrou.net  Sun Apr  5 03:03:23 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 01:03:23 +0000 (UTC)
Subject: [Python-Dev] BufferedReader.peek() ignores its argument
Message-ID: <loom.20090405T005955-472@post.gmane.org>

Hello,

Currently, BufferedReader.peek() ignores its argument and can return more or
less than the number of bytes requested by the user. This is how it was
implemented in the Python version, and we've reflected this in the C version.

It seems a bit strange and unhelpful though. Should we change the implementation
so that the argument to peek() becomes the upper bound to the number of bytes
returned?

Thanks for your advice,

Antoine.

From ben+python at benfinney.id.au  Sun Apr  5 03:07:26 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Sun, 05 Apr 2009 11:07:26 +1000
Subject: [Python-Dev] UnicodeDecodeError bug in distutils
References: <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com>
	<94bdd2610702241247t568a942dw2fe1b10883b62d20@mail.gmail.com>
	<200702242309.46022.pogonyshev@gmx.net>
	<94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com>
	<45E0C012.7090801@palladion.com>
	<5.1.1.6.0.20070224203115.0270a5a8@sparrow.telecommunity.com>
	<877i22fuqy.fsf_-_@benfinney.id.au>
Message-ID: <87iqljc3ht.fsf@benfinney.id.au>

Ben Finney <ben+python at benfinney.id.au> writes:

> "Phillip J. Eby" <pje at telecommunity.com> writes:
> 
> > Meanwhile, the 'register' command accepts Unicode, but is broken in
> > handling it. [?]
> > 
> > Unfortunately, this isn't fixable until there's a new 2.5.x release.
> > For previous Python versions, both register and write_pkg_info()
> > accepted 8-bit strings and passed them on as-is, so the only
> > workaround for this issue at the moment is to revert to Python 2.4
> > or less.
> 
> What is the prognosis on this issue? It's still hitting me in Python
> 2.5.4.

Any word on this? Is there an open bug tracker issue with more
information? Who's working on this?

-- 
 \      ?If sharing a thing in no way diminishes it, it is not rightly |
  `\                      owned if it is not shared.? ?Saint Augustine |
_o__)                                                                  |
Ben Finney

From benjamin at python.org  Sun Apr  5 03:11:50 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sat, 4 Apr 2009 20:11:50 -0500
Subject: [Python-Dev] BufferedReader.peek() ignores its argument
In-Reply-To: <loom.20090405T005955-472@post.gmane.org>
References: <loom.20090405T005955-472@post.gmane.org>
Message-ID: <1afaf6160904041811y5d4933dfj2f7b0da02967a833@mail.gmail.com>

2009/4/4 Antoine Pitrou <solipsis at pitrou.net>:
> Hello,
>
> Currently, BufferedReader.peek() ignores its argument and can return more or
> less than the number of bytes requested by the user. This is how it was
> implemented in the Python version, and we've reflected this in the C version.
>
> It seems a bit strange and unhelpful though. Should we change the implementation
> so that the argument to peek() becomes the upper bound to the number of bytes
> returned?

+1 That sounds more useful.

>
> Thanks for your advice,

-- 
Regards,
Benjamin

From lists at cheimes.de  Sun Apr  5 03:14:03 2009
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 05 Apr 2009 03:14:03 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D7FEF6.1010006@v.loewis.de>
References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl>
	<49D7FEF6.1010006@v.loewis.de>
Message-ID: <gr90kr$63s$1@ger.gmane.org>

Martin v. L?wis wrote:
> I would personally remove all non-mercurial stuff out of PEP 374,
> and retitle it, but that would be your choice.

I suggest we keep the old PEP and start a new one about Hg exclusively.
The original PEP 374 has cost Brett a lot of time. It would be a shame
to throw it away when it may become in handy for other FOSS projects
that want to move away from subversion.

Dirkjan or whoever is going to work on the PEP can copy n' paste the
interesting pieces from PEP 374 to the new one.

Christian

From martin at v.loewis.de  Sun Apr  5 03:40:22 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 05 Apr 2009 03:40:22 +0200
Subject: [Python-Dev] UnicodeDecodeError bug in distutils
In-Reply-To: <87iqljc3ht.fsf@benfinney.id.au>
References: <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com>	<94bdd2610702241247t568a942dw2fe1b10883b62d20@mail.gmail.com>	<200702242309.46022.pogonyshev@gmx.net>	<94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com>	<45E0C012.7090801@palladion.com>	<5.1.1.6.0.20070224203115.0270a5a8@sparrow.telecommunity.com>	<877i22fuqy.fsf_-_@benfinney.id.au>
	<87iqljc3ht.fsf@benfinney.id.au>
Message-ID: <49D80C06.20809@v.loewis.de>

>>> Meanwhile, the 'register' command accepts Unicode, but is broken in
>>> handling it. [?]
>>>
>>> Unfortunately, this isn't fixable until there's a new 2.5.x release.
>>> For previous Python versions, both register and write_pkg_info()
>>> accepted 8-bit strings and passed them on as-is, so the only
>>> workaround for this issue at the moment is to revert to Python 2.4
>>> or less.
>> What is the prognosis on this issue? It's still hitting me in Python
>> 2.5.4.
> 
> Any word on this? Is there an open bug tracker issue with more
> information? Who's working on this?

For Python 2.5.4, no further changes will be made. If you can reproduce
with 2.6, and can't find a tracker issue, make a new report.

Regards,
Martin

From tjreedy at udel.edu  Sun Apr  5 03:45:40 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 04 Apr 2009 21:45:40 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <gr90kr$63s$1@ger.gmane.org>
References: <20090404154049.GA23987@panix.com>
	<49D7C523.2090605@ochtman.nl>	<49D7FEF6.1010006@v.loewis.de>
	<gr90kr$63s$1@ger.gmane.org>
Message-ID: <gr92g2$89e$1@ger.gmane.org>

Christian Heimes wrote:
> Martin v. L?wis wrote:
>> I would personally remove all non-mercurial stuff out of PEP 374,
>> and retitle it, but that would be your choice.
> 
> I suggest we keep the old PEP and start a new one about Hg exclusively.
> The original PEP 374 has cost Brett a lot of time. It would be a shame
> to throw it away when it may become in handy for other FOSS projects
> that want to move away from subversion.

I second not tossing the data and history.  It serves as partial 
justification for the decision, which has been and will occasionally 
again be discussed on python-list.

> Dirkjan or whoever is going to work on the PEP can copy n' paste the
> interesting pieces from PEP 374 to the new one.

tjr

From aahz at pythoncraft.com  Sun Apr  5 03:58:00 2009
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 4 Apr 2009 18:58:00 -0700
Subject: [Python-Dev] BufferedReader.peek() ignores its argument
In-Reply-To: <loom.20090405T005955-472@post.gmane.org>
References: <loom.20090405T005955-472@post.gmane.org>
Message-ID: <20090405015800.GB19165@panix.com>

On Sun, Apr 05, 2009, Antoine Pitrou wrote:
> 
> Currently, BufferedReader.peek() ignores its argument and can return
> more or less than the number of bytes requested by the user. This is
> how it was implemented in the Python version, and we've reflected this
> in the C version.
>
> It seems a bit strange and unhelpful though. Should we change the
> implementation so that the argument to peek() becomes the upper bound
> to the number of bytes returned?

IIRC, this was made to handle SSL where the number of bytes returned may
need to be larger than the size.  If that's the case, there should be a
record somewhere in the list archives...  (Or possibly the svn logs.)
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."  --Brian W. Kernighan

From ben+python at benfinney.id.au  Sun Apr  5 04:48:58 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Sun, 05 Apr 2009 12:48:58 +1000
Subject: [Python-Dev] UnicodeDecodeError bug in distutils
References: <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com>
	<94bdd2610702241247t568a942dw2fe1b10883b62d20@mail.gmail.com>
	<200702242309.46022.pogonyshev@gmx.net>
	<94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com>
	<45E0C012.7090801@palladion.com>
	<5.1.1.6.0.20070224203115.0270a5a8@sparrow.telecommunity.com>
	<877i22fuqy.fsf_-_@benfinney.id.au>
	<87iqljc3ht.fsf@benfinney.id.au>
Message-ID: <87eiw7bysl.fsf@benfinney.id.au>

Ben Finney <ben+python at benfinney.id.au> writes:

> Is there an open bug tracker issue with more information?

Answer: <URL:http://bugs.python.org/issue2562>. Apparently the issue
is resolved <URL:http://bugs.python.org/msg72385> for Python 2.6. I
will need to wait for my distribution to catch up before I can know
whether it's resolved.

-- 
 \        ?The World is not dangerous because of those who do harm but |
  `\          because of those who look at it without doing anything.? |
_o__)                                                 ?Albert Einstein |
Ben Finney

From martin at v.loewis.de  Sun Apr  5 04:56:12 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Apr 2009 04:56:12 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <gr92g2$89e$1@ger.gmane.org>
References: <20090404154049.GA23987@panix.com>	<49D7C523.2090605@ochtman.nl>	<49D7FEF6.1010006@v.loewis.de>	<gr90kr$63s$1@ger.gmane.org>
	<gr92g2$89e$1@ger.gmane.org>
Message-ID: <49D81DCC.70306@v.loewis.de>

> I second not tossing the data and history.  It serves as partial
> justification for the decision, which has been and will occasionally
> again be discussed on python-list.

It's in subversion, so the history won't be tossed. To keep it online,
it doesn't have to be in the PEP - putting it in a wiki page would
also allow referring to it.

Regards,
Martin

From tjreedy at udel.edu  Sun Apr  5 05:33:50 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 04 Apr 2009 23:33:50 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D81DCC.70306@v.loewis.de>
References: <20090404154049.GA23987@panix.com>	<49D7C523.2090605@ochtman.nl>	<49D7FEF6.1010006@v.loewis.de>	<gr90kr$63s$1@ger.gmane.org>	<gr92g2$89e$1@ger.gmane.org>
	<49D81DCC.70306@v.loewis.de>
Message-ID: <gr98qt$j9q$1@ger.gmane.org>

Martin v. L?wis wrote:
>> I second not tossing the data and history.  It serves as partial
>> justification for the decision, which has been and will occasionally
>> again be discussed on python-list.
> 
> It's in subversion, so the history won't be tossed.

I know; I should have been more exact: not hidden and made difficult to 
access.

> To keep it online,
> it doesn't have to be in the PEP - putting it in a wiki page would
> also allow referring to it.

Sure.  A title like DvcsComparison would be easy to remember.

From alexandre at peadrop.com  Sun Apr  5 07:55:03 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 5 Apr 2009 01:55:03 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <20090404154049.GA23987@panix.com>
References: <20090404154049.GA23987@panix.com>
Message-ID: <acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>

On Sat, Apr 4, 2009 at 11:40 AM, Aahz <aahz at pythoncraft.com> wrote:
> With Brett's (hopefully temporary!) absence, who is spearheading the
> Mercurial conversion? ?Whoever it is should probably take over PEP 374
> and start updating it with the conversion plan, particularly WRT
> expectations for dates relative to 3.1 final and 2.7 final.

I am willing to take over this. I was in charge of the Mercurial
scenarios in the PEP, so it would be natural for me to continue with
the transition. In addition, I volunteer to maintain the new Mercurial
installation.

Off the top of my head, the following is needed for a successful migration:

   - Verify that the repository at http://code.python.org/hg/ is
properly converted.
   - Convert the current svn commit hooks to Mercurial.
   - Add Mercurial support to the issue tracker.
   - Update the developer FAQ.
   - Setup temporary svn mirrors for the main Mercurial repositories.
   - Augment code.python.org infrastructure to support the creation of
developer accounts.
   - Update the release.py script.

There is probably some other things that I missed, but I think this is
a good overview of what needs to be done. And of course, I would
welcome anyone who would be willing to help me with the transition.

-- Alexandre

From alexandre at peadrop.com  Sun Apr  5 08:07:29 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 5 Apr 2009 02:07:29 -0400
Subject: [Python-Dev] BufferedReader.peek() ignores its argument
In-Reply-To: <loom.20090405T005955-472@post.gmane.org>
References: <loom.20090405T005955-472@post.gmane.org>
Message-ID: <acd65fa20904042307x790ac5c5qf2e0981cdcad1ca8@mail.gmail.com>

On Sat, Apr 4, 2009 at 9:03 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Hello,
>
> Currently, BufferedReader.peek() ignores its argument and can return more or
> less than the number of bytes requested by the user. This is how it was
> implemented in the Python version, and we've reflected this in the C version.
>
> It seems a bit strange and unhelpful though. Should we change the implementation
> so that the argument to peek() becomes the upper bound to the number of bytes
> returned?
>

I am not sure if this is a good idea. Currently, the argument of
peek() is documented as a lower bound that cannot exceed the size of
the buffer:

        Returns buffered bytes without advancing the position.

        The argument indicates a desired minimal number of bytes; we
        do at most one raw read to satisfy it.  We never return more
        than self.buffer_size.

Changing the meaning of peek() now could introduce at least some
confusion and maybe also bugs. And personally, I like the current
behavior, since it guarantees that peek() won't return an empty string
unless you reached the end-of-file.  Plus, it is fairly easy to cap
the number of bytes returned by doing f.peek()[:upper_bound].

-- Alexandre

From alexandre at peadrop.com  Sun Apr  5 08:28:52 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 5 Apr 2009 02:28:52 -0400
Subject: [Python-Dev] Should I/O object wrappers close their underlying
	buffer when deleted?
Message-ID: <acd65fa20904042328p4e17f99alc5c50272603932d0@mail.gmail.com>

Hello,

I would like to call to your attention the following behavior of TextIOWrapper:

   import io
   def test(buf):
      textio = io.TextIOWrapper(buf)
   buf = io.BytesIO()
   test(buf)
   print(buf.closed)  # This prints True currently

The problem here is TextIOWrapper closes its buffer when deleted.
BufferedRWPair behalves similarly. The solution is simply to override
the __del__ method of TextIOWrapper inherited from IOBase.

-- Alexandre

From ncoghlan at gmail.com  Sun Apr  5 09:13:39 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 05 Apr 2009 17:13:39 +1000
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <loom.20090404T235147-360@post.gmane.org>
References: <20090404150111.GQ12593@idyll.org>	<49D7D884.5060801@canterbury.ac.nz>	<loom.20090404T220604-267@post.gmane.org>	<49D7EE7C.4040604@canterbury.ac.nz>	<loom.20090404T233632-904@post.gmane.org>	<49D7F2AB.8060907@canterbury.ac.nz>
	<loom.20090404T235147-360@post.gmane.org>
Message-ID: <49D85A23.6020405@gmail.com>

Antoine Pitrou wrote:
> Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
>> So you're saying the buffer interface *has* been fully
>> implemented, it just hasn't been tested very well?
> 
> No, it hasn't been implemented for multi-dimensional types, and it hasn't been
> really tested for anything other than plain linear collections of bytes.
> (I have added tests for arrays in test_memoryview, but that's all. And that's
> only in py3k since array.array in 2.x only supports the old buffer interface)

Step back for a sec here... PEP 3118 has three pieces, not two.

Part 1, the actual new buffer protocol, is complete and works fine as
far as I know. If it didn't, we would have heard about it from the third
clients of the new protocol by now.

Parts 2 and 3, being the memoryview API and support for the new protocol
in the builtin types are the parts that are currently restricted to
simple linear memory views.

That's largely because parts 2 and 3 are somewhat use case challenged:
the key motivation behind PEP 3118 was so that libraries like NumPy, PIL
and the like would have a common standard for data interchange. Since
those all have their own extension objects and will be using the PEP
3118 C API directly rather than going through memoryview, the state of
the Python API and the support from builtin containers types is largely
irrelevant to the target audience for the PEP.

Actually *finishing* parts 2 and 3 of PEP 3118 would be a good precursor
 to having some kind of multi-dimensional mathematics in the standard
library though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From stephen at xemacs.org  Sun Apr  5 11:08:44 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 05 Apr 2009 18:08:44 +0900
Subject: [Python-Dev] Integrate BeautifulSoup into stdlib?
In-Reply-To: <gr8q62$pc4$1@ger.gmane.org>
References: <fb6fbf560903081251m4219b117l63c6bcb71e199be6@mail.gmail.com>
	<49BA3154.8080408@simplistix.co.uk> <49BAA596.5020106@v.loewis.de>
	<49C79C1A.8040301@simplistix.co.uk> <49C7FC85.5000809@v.loewis.de>
	<49C80FA0.4020800@simplistix.co.uk> <87ab7bh5fb.fsf@xemacs.org>
	<49C87004.2030807@holdenweb.com> <49C88503.2030902@v.loewis.de>
	<49C886EF.80203@v.loewis.de> <49C8C9B3.3070403@holdenweb.com>
	<49C939BA.8040206@v.loewis.de>
	<1238798174.5360.388.camel@saeko.local>
	<49D6C20C.8030102@v.loewis.de> <gr8q62$pc4$1@ger.gmane.org>
Message-ID: <87fxgntqlf.fsf@xemacs.org>

Steve Holden writes:

 > Not only that, but the Cygwin packaging system appears to be extremely
 > difficult to organize a package for.

Really?  I Don't Do Windows[tm], but the people who did installers and
stuff for XEmacs releases never had problems with it.  It was much
more painful to create the .exe-style Windows installers.

From greg.ewing at canterbury.ac.nz  Sun Apr  5 11:08:02 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 05 Apr 2009 21:08:02 +1200
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <49D85A23.6020405@gmail.com>
References: <20090404150111.GQ12593@idyll.org>
	<49D7D884.5060801@canterbury.ac.nz>
	<loom.20090404T220604-267@post.gmane.org>
	<49D7EE7C.4040604@canterbury.ac.nz>
	<loom.20090404T233632-904@post.gmane.org>
	<49D7F2AB.8060907@canterbury.ac.nz>
	<loom.20090404T235147-360@post.gmane.org>
	<49D85A23.6020405@gmail.com>
Message-ID: <49D874F2.8080709@canterbury.ac.nz>

Nick Coghlan wrote:

> Actually *finishing* parts 2 and 3 of PEP 3118 would be a good precursor
>  to having some kind of multi-dimensional mathematics in the standard
> library though.

Even if they only work on the existing one-dimensional
sequence types, elementwise operations would still be
useful to have. And if they work through the new buffer
protocol, they'll be ready for multi-dimensional types
if and when such types appear.

-- 
Greg

From martin at v.loewis.de  Sun Apr  5 11:06:33 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 05 Apr 2009 11:06:33 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
Message-ID: <49D87499.5060502@v.loewis.de>

> Off the top of my head, the following is needed for a successful migration:
> 
>    - Verify that the repository at http://code.python.org/hg/ is
> properly converted.

I see that this has four branches. What about all the other branches?
Will they be converted, or not? What about the stuff outside /python?

In particular, the Stackless people have requested that they move along
with what core Python does, so their code should also be converted.

>    - Add Mercurial support to the issue tracker.

Not sure what this means. There is currently svn support insofar as the
tracker can format rNNN references into ViewCVS links; this should be
updated if possible (removed if not). There would also be a possibility
to auto-close issues from the commit messages. This is not done
currently, so I would not make it a prerequisite for the switch.

>    - Setup temporary svn mirrors for the main Mercurial repositories.

What is that?

>    - Augment code.python.org infrastructure to support the creation of
> developer accounts.

One option would be to carry on with the current setup; migrating it
to hg might work as well, of course.

>    - Update the release.py script.
> 
> There is probably some other things that I missed

Here are some:

- integrate with the buildbot
- come up with a strategy for /external (also relevant for
  the buildbot slaves)
- decide what to do with the bzr mirrors

Regards,
Martin

From brian at sweetapp.com  Sun Apr  5 11:07:48 2009
From: brian at sweetapp.com (Brian Quinlan)
Date: Sun, 05 Apr 2009 10:07:48 +0100
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <loom.20090404T231154-979@post.gmane.org>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>
	<loom.20090404T231154-979@post.gmane.org>
Message-ID: <49D874E4.6030602@sweetapp.com>

Hey Antoine,

Thanks for the clarification!

I see that the C implementation matches the Python implementation but I 
don't see how the semantics of either are useful in this case.

If a subclass implements flush then, as you say, it must also implement 
close and call flush itself before calling its superclass' close method. 
  But then _RawIOBase will pointlessly call the subclass' flush method a 
second time. This second call should raise (because the file is closed) 
and the exception will be caught and suppressed.

I don't see why this is helpful. Could you explain why 
_RawIOBase.close() calling self.flush() is useful?

Cheers,
Brian

Antoine Pitrou wrote:
> Hi!
> 
> <brian <at> sweetapp.com> writes:
>> class _RawIOBase(_FileIO):
> 
> FileIO is a subclass of _RawIOBase, not the reverse:
> 
>>>> issubclass(_io._RawIOBase, _io.FileIO)
> False
>>>> issubclass(_io.FileIO, _io._RawIOBase)
> True
> 
> I do understand your surprise, but the Python implementation of IOBase.close()
> in _pyio.py does the same thing:
> 
>     def close(self) -> None:
>         """Flush and close the IO object.
> 
>         This method has no effect if the file is already closed.
>         """
>         if not self.__closed:
>             try:
>                 self.flush()
>             except IOError:
>                 pass  # If flush() fails, just give up
>             self.__closed = True
> 
> Note how it calls `self.flush()` and not `IOBase.flush(self)`.
> When writing the C version of the I/O stack, we tried to keep the semantics the
> same as in the Python version, although there are a couple of subtleties.
> 
> Your problem here is that it's IOBase.close() which calls your flush() method,
> but FileIO.close() has already done its job before and the internal file
> descriptor has been closed (hence `self.closed` is True). In this particular
> case, I advocate overriding close() as well and call your flush() method
> manually from there.
> 
> Thanks for your feedback!
> 
> Regards
> 
> Antoine.
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/brian%40sweetapp.com

From cournape at gmail.com  Sun Apr  5 11:14:37 2009
From: cournape at gmail.com (David Cournapeau)
Date: Sun, 5 Apr 2009 18:14:37 +0900
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D87499.5060502@v.loewis.de>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
Message-ID: <5b8d13220904050214i40142b43tbbadb7d6815c7923@mail.gmail.com>

On Sun, Apr 5, 2009 at 6:06 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Off the top of my head, the following is needed for a successful migration:
>>
>> ? ?- Verify that the repository at http://code.python.org/hg/ is
>> properly converted.
>
> I see that this has four branches. What about all the other branches?
> Will they be converted, or not? What about the stuff outside /python?
>
> In particular, the Stackless people have requested that they move along
> with what core Python does, so their code should also be converted.

I don't know the capabilities of hg w.r.t svn conversion, so this may
well be overkill, but git has a really good tool for svn conversion
(svn-all-fast-export, developed by KDE). You can handle almost any svn
organization (e.g. outside the usual trunk/tags/branches), and convert
email addresses of committers, split one big svn repo into
subprojects, etc... Then, the git repo could be converted to hg
relatively easily I believe.

cheers,

David

From robertc at robertcollins.net  Sun Apr  5 11:16:56 2009
From: robertc at robertcollins.net (Robert Collins)
Date: Sun, 05 Apr 2009 19:16:56 +1000
Subject: [Python-Dev] Integrate BeautifulSoup into stdlib?
In-Reply-To: <87fxgntqlf.fsf@xemacs.org>
References: <fb6fbf560903081251m4219b117l63c6bcb71e199be6@mail.gmail.com>
	<49BA3154.8080408@simplistix.co.uk> <49BAA596.5020106@v.loewis.de>
	<49C79C1A.8040301@simplistix.co.uk> <49C7FC85.5000809@v.loewis.de>
	<49C80FA0.4020800@simplistix.co.uk> <87ab7bh5fb.fsf@xemacs.org>
	<49C87004.2030807@holdenweb.com> <49C88503.2030902@v.loewis.de>
	<49C886EF.80203@v.loewis.de> <49C8C9B3.3070403@holdenweb.com>
	<49C939BA.8040206@v.loewis.de> <1238798174.5360.388.camel@saeko.local>
	<49D6C20C.8030102@v.loewis.de> <gr8q62$pc4$1@ger.gmane.org>
	<87fxgntqlf.fsf@xemacs.org>
Message-ID: <1238923019.2700.394.camel@lifeless-64>

On Sun, 2009-04-05 at 18:08 +0900, Stephen J. Turnbull wrote:
> Steve Holden writes:
> 
>  > Not only that, but the Cygwin packaging system appears to be extremely
>  > difficult to organize a package for.
> 
> Really?  I Don't Do Windows[tm], but the people who did installers and
> stuff for XEmacs releases never had problems with it.  It was much
> more painful to create the .exe-style Windows installers.

Back when I was maintaining setup.exe was when XEmacs started using
setup.exe to do installers; it must have been fairly straight forward
because we first heard of it when it was complete :).

The following may have changed, but I doubt it has changed dramatically
- the setup.exe system is kindof trivial: There is a .lst file which is
a .INI format file listing packages and direct dependencies.
- each package is a .tar.(gz|bz2) which is unpacked on disk, and
[optional] post-install, pre-removal scripts inside the tarball.

Doing an installer for something not part of Cygwin requires a one-time
fork of the setup.exe program, to change the master source for .lst
files, and thats about it. Beyond that its all maintaining whatever set
of packages and dependencies you have. If you are installing things for
Cygwin itself you can just depend directly on things Cygwin ships in
your .lst file; and not ship a setup.exe at all - setup.exe can source
from many places to satisfy dependencies.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090405/74cb3ec1/attachment-0001.pgp>

From mario.danic at gmail.com  Sun Apr  5 11:21:01 2009
From: mario.danic at gmail.com (Mario)
Date: Sun, 5 Apr 2009 11:21:01 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D87499.5060502@v.loewis.de>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com> 
	<49D87499.5060502@v.loewis.de>
Message-ID: <79957db20904050221m72d998b1ja126d496594c4438@mail.gmail.com>

>
>
> Not sure what this means. There is currently svn support insofar as the
> tracker can format rNNN references into ViewCVS links; this should be
> updated if possible (removed if not). There would also be a possibility
> to auto-close issues from the commit messages. This is not done
> currently, so I would not make it a prerequisite for the switch.
>
> While I don't know how urgent this is, I will just mention that I am
willing to work on Roundup-mercurial during GSoC (or outside
it, if I don't get accept).

Cheers,
M.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090405/5e4aac42/attachment.htm>

From dirkjan at ochtman.nl  Sun Apr  5 11:41:40 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 11:41:40 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
Message-ID: <49D87CD4.1000909@ochtman.nl>

On 05/04/2009 07:55, Alexandre Vassalotti wrote:
>     - Verify that the repository at http://code.python.org/hg/ is
> properly converted.

I'm pretty sure that we'll need to reconvert; I don't think the current 
conversion is particularly good. We'll also have to decide on named 
branches vs. clones, for example, and if we could try to reorder revlogs 
to make the repo smaller after conversion.

I've svnsynced the SVN repo so that we can work on it efficiently, and 
I've already talked with Augie Fackler, the hgsubversion maintainer, 
about what the best way forward is. For example, we may want to leave 
some of the very old history behind, or prune some old branches.

>     - Convert the current svn commit hooks to Mercurial.

Some new hooks should also be discussed. For example, Mozilla uses a 
single-head hook, to prevent people from pushing multiple heads. They 
also have a pushlog extension that keeps a SQLite database of what 
people pushed. This is particularly useful for linearizing history, 
which is required for integration with buildbot infrastructure.

>     - Add Mercurial support to the issue tracker.

I don't think there's much to do there, but a regex to link up some 
commonly-used revision references would be good. If we use cloned 
branches, we'll have to come up with some syntax to make that work.

>     - Update the developer FAQ.
>     - Setup temporary svn mirrors for the main Mercurial repositories.

How do you plan to do that? I don't think there are any tools that 
support that, yet. I've actually started on my own, but I haven't gotten 
very far with it, yet.

>     - Augment code.python.org infrastructure to support the creation of
> developer accounts.

Developers already have accounts, don't they? In any case, some web 
interface to facilitate setting up new clones (branches) is also 
something that's probably desirable. I think Mozilla has some tooling 
for that which we might be able to start off of.

Cheers,

Dirkjan

From dirkjan at ochtman.nl  Sun Apr  5 11:45:38 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 11:45:38 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D7FEF6.1010006@v.loewis.de>
References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl>
	<49D7FEF6.1010006@v.loewis.de>
Message-ID: <49D87DC2.2040708@ochtman.nl>

On 05/04/2009 02:44, "Martin v. L?wis" wrote:
 > I'm personally happy letting you do that (although I do wonder who would
 > then be in charge of the Mercurial installation in the long run, the way
 > I have been in charge of the subversion installation).

I'd be happy to commit to that for the foreseeable future.

 > To proceed, I think the next step should be to discuss in the PEP the
 > details of the migration procedure (see PEP 347 for what level of detail
 > I produced for the svn migration), and to set up a demo installation
 > that is considered ready-to-run, except that it might get torn down
 > again, if the actual conversion requires that (it did for the CVS->svn
 > case), or if problems are found with the demo installation.

Sounds sane. Would I be able to get access to PSF infrastructure to get 
started on that, or do you want me to get started on my own box? I'll 
probably do the conversion on my own box, but for authn/authz it might 
be useful to be able to use PSF infra.

 > I would personally remove all non-mercurial stuff out of PEP 374,
 > and retitle it, but that would be your choice.

Moving the current content to a wiki page like you suggest later in this 
thread sounds like a good idea.

Cheers,

Dirkjan

From dirkjan at ochtman.nl  Sun Apr  5 11:55:22 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 11:55:22 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D87499.5060502@v.loewis.de>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
Message-ID: <49D8800A.60601@ochtman.nl>

On 05/04/2009 11:06, "Martin v. L?wis" wrote:
> In particular, the Stackless people have requested that they move along
> with what core Python does, so their code should also be converted.

I'd be interested to hear if they want all of their stuff converted, or 
just the mainline/trunk of what is currently in trunk/branches/tags.

> - integrate with the buildbot

I've setup the buildbot infra for Mercurial (though not many people are 
interesting in it, so it's kind of languished). Using buildbot's hg 
support is easy. 0.7.10 is the first version which works with hg 1.1+, 
though, so we probably don't want to go with anything earlier.

> - come up with a strategy for /external (also relevant for
>    the buildbot slaves)

I'm not sure exactly what the purpose or mechanism for /external is. 
Sure, it's like a snapshot dir, probably used for to pull some stuff 
into other process? Seems to me like it might be interesting to, for 
example, convert to a simple config file + script that lets you specify 
a package (repository) + tag, which can then be easily pulled in.

But it'd be nice to know where and how exactly this is used.

> - decide what to do with the bzr mirrors

I'm assuming the bzr people have ways of importing hg repos. It's 
probably more effective for them to deal with this problem. If helpful, 
there are some scripts that do fast-exporting from hg repos.

Cheers,

Dirkjan

From solipsis at pitrou.net  Sun Apr  5 12:19:46 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 10:19:46 +0000 (UTC)
Subject: [Python-Dev] BufferedReader.peek() ignores its argument
References: <loom.20090405T005955-472@post.gmane.org>
	<acd65fa20904042307x790ac5c5qf2e0981cdcad1ca8@mail.gmail.com>
Message-ID: <loom.20090405T101801-14@post.gmane.org>

Alexandre Vassalotti <alexandre <at> peadrop.com> writes:
> 
> I am not sure if this is a good idea. Currently, the argument of
> peek() is documented as a lower bound that cannot exceed the size of
> the buffer:

Unfortunately, in practice, the argument is neither a lower bound nor an upper
bound. It's just used as some kind of internal heuristic (in the Python version)
or not used at all (in the C version).

Regards

Antoine.

From solipsis at pitrou.net  Sun Apr  5 12:27:45 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 10:27:45 +0000 (UTC)
Subject: [Python-Dev] Mercurial?
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
Message-ID: <loom.20090405T102604-392@post.gmane.org>

Alexandre Vassalotti <alexandre <at> peadrop.com> writes:
> 
> Off the top of my head, the following is needed for a successful migration:

There's also the issue of how we adapt the current workflow of "svnmerging"
between branches when we want to back- or forward-port stuff. In particular,
tracking of already done or blocked backports.

(the issue being that "svnmerge" is different from what DVCS'es call "merging" 
:-))

From solipsis at pitrou.net  Sun Apr  5 12:29:17 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 10:29:17 +0000 (UTC)
Subject: [Python-Dev] Possible py3k io wierdness
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>
	<loom.20090404T231154-979@post.gmane.org>
	<49D874E4.6030602@sweetapp.com>
Message-ID: <loom.20090405T102812-215@post.gmane.org>

Brian Quinlan <brian <at> sweetapp.com> writes:
> 
> I don't see why this is helpful. Could you explain why 
> _RawIOBase.close() calling self.flush() is useful?

I could not explain it for sure since I didn't write the Python version.
I suppose it's so that people who only override flush() automatically get the
flush-on-close behaviour.

cheers

Antoine.

From solipsis at pitrou.net  Sun Apr  5 12:33:33 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 10:33:33 +0000 (UTC)
Subject: [Python-Dev] graphics maths types in python core?
References: <20090404150111.GQ12593@idyll.org>	<49D7D884.5060801@canterbury.ac.nz>	<loom.20090404T220604-267@post.gmane.org>	<49D7EE7C.4040604@canterbury.ac.nz>	<loom.20090404T233632-904@post.gmane.org>	<49D7F2AB.8060907@canterbury.ac.nz>
	<loom.20090404T235147-360@post.gmane.org>
	<49D85A23.6020405@gmail.com>
Message-ID: <loom.20090405T103055-388@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> Parts 2 and 3, being the memoryview API and support for the new protocol
> in the builtin types are the parts that are currently restricted to
> simple linear memory views.
> 
> That's largely because parts 2 and 3 are somewhat use case challenged:
> the key motivation behind PEP 3118 was so that libraries like NumPy, PIL
> and the like would have a common standard for data interchange.

If I understand correctly, one of the motivations behind memoryview() is to
replace buffer() as a way to get cheap slicing without memory copies (it's used
e.g. in the C IO library). I don't know whether the third-party types mentioned
above could also benefit from that.

Regards

Antoine.

From dirkjan at ochtman.nl  Sun Apr  5 12:51:30 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 12:51:30 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <loom.20090405T102604-392@post.gmane.org>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<loom.20090405T102604-392@post.gmane.org>
Message-ID: <49D88D32.60202@ochtman.nl>

On 05/04/2009 12:27, Antoine Pitrou wrote:
> There's also the issue of how we adapt the current workflow of "svnmerging"
> between branches when we want to back- or forward-port stuff. In particular,
> tracking of already done or blocked backports.

Right. The canonical way to do that with Mercurial is to commit patches 
against the "oldest" branch where they should be applied, so that every 
stable branch is a strict subset of every less stable branch.

 From what I've understood, this doesn't fit the way the Python-dev 
community/process works very well. In that case, there are a number of 
alternatives. For example, hg's export/import commands can be used to 
explicitly deal with diffs that contain hg metadata, the transplant 
extension can be used to automate that, or in some cases, the rebase 
extension might be more appropriate. We can put extended examples from 
the PEP in the wiki to help people discovering the best workflow.

Cheers,

Dirkjan

From brian at sweetapp.com  Sun Apr  5 12:56:47 2009
From: brian at sweetapp.com (Brian Quinlan)
Date: Sun, 05 Apr 2009 11:56:47 +0100
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <loom.20090405T102812-215@post.gmane.org>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>	<loom.20090404T231154-979@post.gmane.org>	<49D874E4.6030602@sweetapp.com>
	<loom.20090405T102812-215@post.gmane.org>
Message-ID: <49D88E6F.4080801@sweetapp.com>

Antoine Pitrou wrote:
> Brian Quinlan <brian <at> sweetapp.com> writes:
>> I don't see why this is helpful. Could you explain why 
>> _RawIOBase.close() calling self.flush() is useful?
> 
> I could not explain it for sure since I didn't write the Python version.
> I suppose it's so that people who only override flush() automatically get the
> flush-on-close behaviour.

But the way that the code is currently written, flush only gets called 
*after* the file has been closed (see my original example). It seems 
very unlikely that this is the behavior that the subclass would want/expect.

So any objections to me changing IOBase (and the C implementation) to:

     def close(self):
         """Flush and close the IO object.

         This method has no effect if the file is already closed.
         """
         if not self.__closed:
             try:
-                self.flush()
+                IOBase.flush(self)
             except IOError:
                 pass  # If flush() fails, just give up
             self.__closed = True

Cheers,
Brian

From solipsis at pitrou.net  Sun Apr  5 13:00:11 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 11:00:11 +0000 (UTC)
Subject: [Python-Dev] Mercurial?
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<loom.20090405T102604-392@post.gmane.org>
	<49D88D32.60202@ochtman.nl>
Message-ID: <loom.20090405T105541-496@post.gmane.org>

Dirkjan Ochtman <dirkjan <at> ochtman.nl> writes:
> 
> Right. The canonical way to do that with Mercurial is to commit patches 
> against the "oldest" branch where they should be applied, so that every 
> stable branch is a strict subset of every less stable branch.

It doesn't work between py3k and trunk, which are wildly diverging.

> In that case, there are a number of 
> alternatives. For example, hg's export/import commands can be used to 
> explicitly deal with diffs that contain hg metadata, the transplant 
> extension can be used to automate that, or in some cases, the rebase 
> extension might be more appropriate.

Transplant or export/import have the right semantics IMO, but we lose the
tracking that's built in svnmerge. Perhaps a new hg extension? ;)
(the missing functionality is to store the list of transplanted or blocked
changesets in a .hgXXX file (storing the original hashes, not the ones after
transplant), and parse that file in order to compare it with the incoming
changesets from an other repo)

Regards

Antoine.

From dirkjan at ochtman.nl  Sun Apr  5 13:04:13 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 13:04:13 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <loom.20090405T105541-496@post.gmane.org>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<loom.20090405T102604-392@post.gmane.org>	<49D88D32.60202@ochtman.nl>
	<loom.20090405T105541-496@post.gmane.org>
Message-ID: <49D8902D.8050803@ochtman.nl>

On 05/04/2009 13:00, Antoine Pitrou wrote:
> Transplant or export/import have the right semantics IMO, but we lose the
> tracking that's built in svnmerge. Perhaps a new hg extension? ;)
> (the missing functionality is to store the list of transplanted or blocked
> changesets in a .hgXXX file (storing the original hashes, not the ones after
> transplant), and parse that file in order to compare it with the incoming
> changesets from an other repo)

Transplant can already keep the source revision hash on the new revision 
(in hg's equivalent of generic revprops, the extra dict). I think that 
blocked revisions will not be an issue due to the nature of the DAG, but 
I have too little experience with svnmerge to say for sure.

Cheers,

Dirkjan

From ncoghlan at gmail.com  Sun Apr  5 13:16:25 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 05 Apr 2009 21:16:25 +1000
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <loom.20090405T103055-388@post.gmane.org>
References: <20090404150111.GQ12593@idyll.org>	<49D7D884.5060801@canterbury.ac.nz>	<loom.20090404T220604-267@post.gmane.org>	<49D7EE7C.4040604@canterbury.ac.nz>	<loom.20090404T233632-904@post.gmane.org>	<49D7F2AB.8060907@canterbury.ac.nz>	<loom.20090404T235147-360@post.gmane.org>	<49D85A23.6020405@gmail.com>
	<loom.20090405T103055-388@post.gmane.org>
Message-ID: <49D89309.7050307@gmail.com>

Antoine Pitrou wrote:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>> Parts 2 and 3, being the memoryview API and support for the new protocol
>> in the builtin types are the parts that are currently restricted to
>> simple linear memory views.
>>
>> That's largely because parts 2 and 3 are somewhat use case challenged:
>> the key motivation behind PEP 3118 was so that libraries like NumPy, PIL
>> and the like would have a common standard for data interchange.
> 
> If I understand correctly, one of the motivations behind memoryview() is to
> replace buffer() as a way to get cheap slicing without memory copies (it's used
> e.g. in the C IO library). I don't know whether the third-party types mentioned
> above could also benefit from that.

Yep, once memoryview supports all of the PEP 3118 semantics it should be
usable with sufficiently recent versions of NumPy arrays and the like.
It's implementation has unfortunately lagged because those with the most
relevant expertise don't need it (they access the objects they care
about through the C API), and there are some interesting semantics to
get right which are hard to judge without that expertise.

Still, as both you and Greg have pointed out, even in its current form
memoryview is already useful as a replacement for buffer that doesn't
share buffer's problems - it's only if they try to use it with the more
sophisticated aspects of the PEP 3118 API that people may be
disappointed by its capabilities.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From l.mastrodomenico at gmail.com  Sun Apr  5 13:17:20 2009
From: l.mastrodomenico at gmail.com (Lino Mastrodomenico)
Date: Sun, 5 Apr 2009 13:17:20 +0200
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <loom.20090405T103055-388@post.gmane.org>
References: <20090404150111.GQ12593@idyll.org>
	<49D7D884.5060801@canterbury.ac.nz>
	<loom.20090404T220604-267@post.gmane.org>
	<49D7EE7C.4040604@canterbury.ac.nz>
	<loom.20090404T233632-904@post.gmane.org>
	<49D7F2AB.8060907@canterbury.ac.nz>
	<loom.20090404T235147-360@post.gmane.org> <49D85A23.6020405@gmail.com>
	<loom.20090405T103055-388@post.gmane.org>
Message-ID: <cc93256f0904050417w6e69e521la2a94b2e8478fe3d@mail.gmail.com>

2009/4/5 Antoine Pitrou <solipsis at pitrou.net>:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>>
>> That's largely because parts 2 and 3 are somewhat use case challenged:
>> the key motivation behind PEP 3118 was so that libraries like NumPy, PIL
>> and the like would have a common standard for data interchange.
>
> If I understand correctly, one of the motivations behind memoryview() is to
> replace buffer() as a way to get cheap slicing without memory copies (it's used
> e.g. in the C IO library). I don't know whether the third-party types mentioned
> above could also benefit from that.

Well, PEP 3118 is useful because it would be nice having e.g. the
possibility of opening an image with PIL, manipulate it directly with
NumPy and saving it to file with PIL. Right now this is possible only
if the PIL image is first converted (and copied) to a new NumPy array
and then the array is converted back to an image.

BTW, while PEP 3118 provides a common C API for this, the related PEP
368 proposes a standard "image protocol" on the Python side that
should be compatible with the image classes of PIL, wxPython and
pygame, and (mostly) with NumPy arrays.

I started an implementation of PEP 368 at:

    http://code.google.com/p/pyimage/

Both the PEP and the implementation need updates (pyimage already
includes an IEEE 754r compatible half-precision floating point type,
aka float16, that's not yet in the PEP), but if someone is interested
and willing to help I may start again working on them.

Also note that the subjects "vec2, vec3, quaternion, etc" (PEP 3141)
and "multi-dimensional arrays" (PEP 3118) are mostly unrelated.

-- 
Lino Mastrodomenico

From firephoenix at wanadoo.fr  Sun Apr  5 13:31:48 2009
From: firephoenix at wanadoo.fr (Firephoenix)
Date: Sun, 05 Apr 2009 13:31:48 +0200
Subject: [Python-Dev] Generator methods - "what's next" ?
Message-ID: <49D896A4.3000104@wanadoo.fr>

Hello everyone

I'm a little confused by the recent changes to the generator system...

I basically agreed with renaming the next() method to __next__(), so as 
to follow the naming of other similar methods (__iter__() etc.).
But I noticed then that all the other methods of the generator had 
stayed the same (send, throw, close...), which gives really weird (imo) 
codes :

   next(it)
   it.send(35)
   it.throw(Exception())
   next(it)
   ....

Browsing the web, I've found people troubled by that asymmetry, but no 
remarks on its causes nor its future...

Since __next__(), send() and others have really really close semantics, 
I consider that state as a python wart, one of the few real ones I can 
think of.

Is there any plan to fix this ? Either by coming back to the next() 
method, or by putting all the "magical methods" of generators in the 
__specialattributes__  bag ?

    next(it)
    send(it, 5)
    throw(it, Exception())
    ...

Thanks a lot for the information,
Pascal

From g.brandl at gmx.net  Sun Apr  5 14:46:12 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 05 Apr 2009 14:46:12 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D88D32.60202@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<loom.20090405T102604-392@post.gmane.org>
	<49D88D32.60202@ochtman.nl>
Message-ID: <gra96n$i57$1@ger.gmane.org>

Dirkjan Ochtman schrieb:
> On 05/04/2009 12:27, Antoine Pitrou wrote:
>> There's also the issue of how we adapt the current workflow of "svnmerging"
>> between branches when we want to back- or forward-port stuff. In particular,
>> tracking of already done or blocked backports.
> 
> Right. The canonical way to do that with Mercurial is to commit patches 
> against the "oldest" branch where they should be applied, so that every 
> stable branch is a strict subset of every less stable branch.

That's what I do as well in Sphinx.  It works fine there, but there are two
issues if you want to apply it to Python:

* As Antoine said, trunk and py3k are very different. Merging would still be
  possible, but confusing.

* Our current trunk/maint branches will have completely different commits,
  so pulling (e.g.) from 2.6 into trunk won't work.

So I'd be in favor of a solution like the following:

* Once 2.7 and 3.1 are final, create their maint branches as "real" Hg
  branches, so that for each pair committing to maint and pulling into
  trunk works.

* For the 2->3 merging, use transplant (optionally with the mentioned
  feature of keeping track what was already transplanted and blocked).

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From g.brandl at gmx.net  Sun Apr  5 14:48:38 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 05 Apr 2009 14:48:38 +0200
Subject: [Python-Dev] Generator methods - "what's next" ?
In-Reply-To: <49D896A4.3000104@wanadoo.fr>
References: <49D896A4.3000104@wanadoo.fr>
Message-ID: <gra9b8$i57$2@ger.gmane.org>

Firephoenix schrieb:
> Hello everyone
> 
> I'm a little confused by the recent changes to the generator system...
> 
> I basically agreed with renaming the next() method to __next__(), so as 
> to follow the naming of other similar methods (__iter__() etc.).
> But I noticed then that all the other methods of the generator had 
> stayed the same (send, throw, close...), which gives really weird (imo) 
> codes :
> 
>    next(it)
>    it.send(35)
>    it.throw(Exception())
>    next(it)
>    ....
> 
> Browsing the web, I've found people troubled by that asymmetry, but no 
> remarks on its causes nor its future...
> 
> Since __next__(), send() and others have really really close semantics, 
> I consider that state as a python wart, one of the few real ones I can 
> think of.

You're missing an important detail: next()/__next__() is a feature of all
iterators, while send() and throw() are generator-only methods.

The only thing I could imagine is to add a generator.next() method that
is simply an alias for generator.__next__(). However, TSBOOWTDI.

cheers,
Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From alexandre at peadrop.com  Sun Apr  5 15:11:43 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 5 Apr 2009 09:11:43 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D87499.5060502@v.loewis.de>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com> 
	<49D87499.5060502@v.loewis.de>
Message-ID: <acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>

On Sun, Apr 5, 2009 at 5:06 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Off the top of my head, the following is needed for a successful migration:
>>
>> ? ?- Verify that the repository at http://code.python.org/hg/ is
>> properly converted.
>
> I see that this has four branches. What about all the other branches?
> Will they be converted, or not? What about the stuff outside /python?
>

I am not sure if it would be useful to convert the old branches to
Mercurial. The simplest thing to do would be to keep the current svn
repository as a read-only archive. And if people needs to commit to
these branches, they could request the branch to be imported into a
Mercurial branch (or a simple to use script could be provided and
developer could run it directly on the server to create a user
branch).

> In particular, the Stackless people have requested that they move along
> with what core Python does, so their code should also be converted.
>

Noted.

>> ? ?- Add Mercurial support to the issue tracker.
>
> Not sure what this means. There is currently svn support insofar as the
> tracker can format rNNN references into ViewCVS links; this should be
> updated if possible (removed if not). There would also be a possibility
> to auto-close issues from the commit messages. This is not done
> currently, so I would not make it a prerequisite for the switch.
>

Yes, I was referring to the rNNN references. Actually, I am not sure
how this could be implemented, since with Mercurial we lose atomic
revision IDs. We could use something like hash at branch-name (e.g,
bf94293b1932 at py3k) referring to specific revision.

An auto-close would be a nice feature, but, as you said, not necessary
for the migration. The main stumbling block to implement an auto-close
feature is to define when an issue should be closed. Maybe we could
add our own meta-data to the commit message. For example:

   Fix some nasty bug.

   Close-Issue: 4532

When a such commit would arrive in one of the main branches, a commit
hook would close the issue if all the affected releases have been
fixed.

>> ? ?- Setup temporary svn mirrors for the main Mercurial repositories.
>
> What is that?
>

I think it would be a good idea to host a temporary svn mirrors for
developers who accesses their VCS via an IDE. Although, I am sure
anymore if supporting these developers (if there are any) would worth
the trouble. So, think of this as optional.

>> ? ?- Augment code.python.org infrastructure to support the creation of
>> developer accounts.
>
> One option would be to carry on with the current setup; migrating it
> to hg might work as well, of course.
>

You mean the current setup for svn.python.org? Would you be
comfortable to let this machine be accessed by core developers through
SSH? Since with Mercurial, SSH access will be needed for server-side
clone (or, a script similar to what the Mozilla folk have [1] could be
added).

[1]: https://developer.mozilla.org/en/Publishing_Mercurial_Clones

>> ? ?- Update the release.py script.
>>
>> There is probably some other things that I missed
>
> Here are some:
>
> - integrate with the buildbot

Good one. It seems buildbot has support for Mercurial. [2] So, this
will be a matter of tweaking the right options. The batch scripts in
Tools/buildbot will also need to be updated.

[2]: http://djmitche.github.com/buildbot/docs/0.7.10/#How-Different-VC-Systems-Specify-Sources

> - come up with a strategy for /external (also relevant for
> ?the buildbot slaves)

Since the directories in /external are considered read-only, we could
simply a new Mercurial repository and copy the content of /external in
it. When a new release needs to be added, just create a new directory
and commit.

> - decide what to do with the bzr mirrors
>

I don't see much benefits to keep them. So, I say, archive the
branches there unless someone step-up to maintain them.

-- Alexandre

From benjamin at python.org  Sun Apr  5 15:13:28 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sun, 5 Apr 2009 08:13:28 -0500
Subject: [Python-Dev] Mercurial?
In-Reply-To: <acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
Message-ID: <1afaf6160904050613w2016ed87i2ffab6f67c48aca4@mail.gmail.com>

2009/4/5 Alexandre Vassalotti <alexandre at peadrop.com>:
> Off the top of my head, the following is needed for a successful migration:
...
> ? - Update the release.py script.

I'll do this.

-- 
Regards,
Benjamin

From benjamin at python.org  Sun Apr  5 15:15:48 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sun, 5 Apr 2009 08:15:48 -0500
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8800A.60601@ochtman.nl>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de> <49D8800A.60601@ochtman.nl>
Message-ID: <1afaf6160904050615i39aa6fd9o953336f4c74fa871@mail.gmail.com>

2009/4/5 Dirkjan Ochtman <dirkjan at ochtman.nl>:
> On 05/04/2009 11:06, "Martin v. L?wis" wrote:
>> - come up with a strategy for /external (also relevant for
>> ? the buildbot slaves)
>
> I'm not sure exactly what the purpose or mechanism for /external is. Sure,
> it's like a snapshot dir, probably used for to pull some stuff into other
> process? Seems to me like it might be interesting to, for example, convert
> to a simple config file + script that lets you specify a package
> (repository) + tag, which can then be easily pulled in.
>
> But it'd be nice to know where and how exactly this is used.

Basically it contains released versions of packages that some parts of
Python depend on. For example, Sphinx dependencies to build the docs
reside their. A simple script that downloads a tarball and extracts it
seems more elegant.

-- 
Regards,
Benjamin

From firephoenix at wanadoo.fr  Sun Apr  5 15:35:21 2009
From: firephoenix at wanadoo.fr (Firephoenix)
Date: Sun, 05 Apr 2009 15:35:21 +0200
Subject: [Python-Dev] Generator methods - "what's next" ?
In-Reply-To: <gra9b8$i57$2@ger.gmane.org>
References: <49D896A4.3000104@wanadoo.fr> <gra9b8$i57$2@ger.gmane.org>
Message-ID: <49D8B399.4020003@wanadoo.fr>

Georg Brandl a ?crit :
> Firephoenix schrieb:
>   
>> Hello everyone
>>
>> I'm a little confused by the recent changes to the generator system...
>>
>> I basically agreed with renaming the next() method to __next__(), so as 
>> to follow the naming of other similar methods (__iter__() etc.).
>> But I noticed then that all the other methods of the generator had 
>> stayed the same (send, throw, close...), which gives really weird (imo) 
>> codes :
>>
>>    next(it)
>>    it.send(35)
>>    it.throw(Exception())
>>    next(it)
>>    ....
>>
>> Browsing the web, I've found people troubled by that asymmetry, but no 
>> remarks on its causes nor its future...
>>
>> Since __next__(), send() and others have really really close semantics, 
>> I consider that state as a python wart, one of the few real ones I can 
>> think of.
>>     
>
> You're missing an important detail: next()/__next__() is a feature of all
> iterators, while send() and throw() are generator-only methods.
>
> The only thing I could imagine is to add a generator.next() method that
> is simply an alias for generator.__next__(). However, TSBOOWTDI.
>
> cheers,
> Georg
>
>   
Good point indeed.

Generator methods (send, throw...) are some kind of black magic compared 
to normal methods, so I'd find it normal if their naming reflected this 
specificity, but on the other end it wouldn't be cool to overflow the 
builtin scope with all the corresponding functions "send(iter, var)"... 
so I guess all that will stay the way it is.

Regards,
Pascal

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090405/1cfd6a82/attachment.htm>

From alexandre at peadrop.com  Sun Apr  5 15:46:01 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 5 Apr 2009 09:46:01 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <loom.20090405T102604-392@post.gmane.org>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com> 
	<loom.20090405T102604-392@post.gmane.org>
Message-ID: <acd65fa20904050646l30e39290w9cd9d162a8929b97@mail.gmail.com>

On Sun, Apr 5, 2009 at 6:27 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Alexandre Vassalotti <alexandre <at> peadrop.com> writes:
>>
>> Off the top of my head, the following is needed for a successful migration:
>
> There's also the issue of how we adapt the current workflow of "svnmerging"
> between branches when we want to back- or forward-port stuff. In particular,
> tracking of already done or blocked backports.
>
> (the issue being that "svnmerge" is different from what DVCS'es call "merging"
> :-))
>

See the PEP about that. I have written a fair amount of details how
this would work with Mercurial:

http://www.python.org/dev/peps/pep-0374/#backport

-- Alexandre

From dirkjan at ochtman.nl  Sun Apr  5 16:13:21 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 16:13:21 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
Message-ID: <49D8BC81.7040007@ochtman.nl>

(going back on-list)

On 05/04/2009 15:42, Alexandre Vassalotti wrote:
>> I'm pretty sure that we'll need to reconvert; I don't think the current
>> conversion is particularly good.
>
> What is bad about it?

For one thing, it has the [svn] prefixes, which I found to be quite 
ugly. hgsubversion in many cases will preserve the rev order from svn so 
that the local revision numbers that hg shows will be the same as in SVN 
anyway. On top of that, good conversion tools save the svn revision in 
the revision metadata in hg, so that you can see it with log --debug.

For another, I'd like to use an author map to bring the revision authors 
more in line with what Mercurial repositories usually display; this 
helps with tool support and is also just a nicer solution IMO.

I have a stab at an author map at http://dirkjan.ochtman.nl/author-map. 
Could use some review, but it seems like a good start.

> I largely prefer clone to named branches. From personal experience, I
> found named branches difficult to use properly. And, I think even
> Mercurial developers don't use them.

No, the Mercurial project currently doesn't use them. Mozilla does use 
them at the moment, because they found they did have some advantages 
(especially lower disk usage because no separate clones were needed). I 
think named branches are fine for long-lived branches.

At the very least we should have a proper discussion over this.

> How do you reorder the revlog of a repository?

There are scripts for this which can be investigated.

> I am in favor of pruning the old branches, but not of leaving the old
> history behind. The current Mercurial mirror of py3k is 92M on my disk
> which is totally reasonable. So, I don't see what would be the
> advantage there.

The current Mercurial mirror for py3k also doesn't include any history 
from before it was branched, which is bad, IMO. In order to get the most 
of the DVCS structure, it would be helpful if py3k shared history with 
the normal (trunk) branches.

> I was thinking of something very basic?e.g., something like a commit
> hook that would asynchronously commit the latest revision to svn. We
> wouldn't to keep convert much meta-data just the committer's name and
> the changelog would be fine.

What's the use case, who do you want to support with this? hgweb 
trivially provides tarballs for download on every revision, so people 
who don't want to use hg can easily download a snapshot.

> Not really. Currently, core developers can only push stuff using the
> Bazaar setup. Personally, I think SSH access would be a lot nicer, but
> this will depend how confident python.org's admins are with this idea.

We could still enable pushing through http(s) for hgweb(dir).

Cheers,

Dirkjan

From dirkjan at ochtman.nl  Sun Apr  5 16:27:30 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 16:27:30 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
Message-ID: <49D8BFD2.8090600@ochtman.nl>

On 05/04/2009 15:11, Alexandre Vassalotti wrote:
> I am not sure if it would be useful to convert the old branches to
> Mercurial. The simplest thing to do would be to keep the current svn
> repository as a read-only archive. And if people needs to commit to
> these branches, they could request the branch to be imported into a
> Mercurial branch (or a simple to use script could be provided and
> developer could run it directly on the server to create a user
> branch).

We should probably not include any branches that haven't been touched in 
the last 18 months. Then we also leave out branches that have been pruned.

BTW, tags are also missing from the current conversions. We probably 
want to keep all release tags, but not the partial tags (e.g. the 
Distutils tags). Are there any other particularly useful tags we should 
keep?

> An auto-close would be a nice feature, but, as you said, not necessary
> for the migration. The main stumbling block to implement an auto-close
> feature is to define when an issue should be closed. Maybe we could
> add our own meta-data to the commit message. For example:
>
>     Fix some nasty bug.
>
>     Close-Issue: 4532
>
> When a such commit would arrive in one of the main branches, a commit
> hook would close the issue if all the affected releases have been
> fixed.

It makes more sense to me to use the syntax already used by Trac et al., 
e.g. "(fix|close)s? (issue|#)\d+" for closing and possibly 
"ref(erence)?s? (issue|#)\d+" for creating a link on the issue.

BTW, this would also be a good time to split out the stdlib if that's 
still desirable (which I seem to have gleaned from the PyCon videos).

Cheers,

Dirkjan

From solipsis at pitrou.net  Sun Apr  5 16:39:20 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 14:39:20 +0000 (UTC)
Subject: [Python-Dev] Mercurial?
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl>
Message-ID: <loom.20090405T141950-815@post.gmane.org>

Hello,

> hgsubversion in many cases will preserve the rev order from svn so 
> that the local revision numbers that hg shows will be the same as in SVN 
> anyway.

Er... I guess it's only the case in simplistic cases where you convert all
branches in the SVN repo to a single hg repo (which is not workable for the
CPython repo, which is too big), and there are no cases of SVN revisions being
either ignored or split between several hg changesets (for example because they
span multiple branches).

The other nice thing with having "[svn rXXX]" in the patch subject line is that
it makes the info easily viewable and searchable in the Web front-end.

> For another, I'd like to use an author map to bring the revision authors 
> more in line with what Mercurial repositories usually display; this 
> helps with tool support and is also just a nicer solution IMO.

Good idea.

[in-repo multiple branches]
> No, the Mercurial project currently doesn't use them. Mozilla does use 
> them at the moment, because they found they did have some advantages 
> (especially lower disk usage because no separate clones were needed). I 
> think named branches are fine for long-lived branches.
> 
> At the very least we should have a proper discussion over this.

I think at least 3.x and 2.x should live in separate repos. It is pointless for
a clone of py3k to end up pulling all 40000+ changesets from the trunk. It would
add 100MB+ to every py3k clone (that is, quadrupling the size of the repository).

> The current Mercurial mirror for py3k also doesn't include any history 
> from before it was branched, which is bad, IMO.

Given how much separate work has taken place in both, I'm not sure having that
history would be very useful. We have to take into account practical needs.
Someone needing to search history before py3k was created can just do a clone of
the trunk.

> In order to get the most 
> of the DVCS structure, it would be helpful if py3k shared history with 
> the normal (trunk) branches.

Is any SVN-to-hg conversion tool able to parse the commits produced by
svnmerge? And, even then, turn that information into useful hg information (say,
transplant metadata of which changes were ported)?

> > Not really. Currently, core developers can only push stuff using the
> > Bazaar setup. Personally, I think SSH access would be a lot nicer, but
> > this will depend how confident python.org's admins are with this idea.
> 
> We could still enable pushing through http(s) for hgweb(dir).

I'm not sure what the problem is. Developer SVN access already goes through
ssh.

cheers

Antoine.

From dirkjan at ochtman.nl  Sun Apr  5 16:53:23 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 16:53:23 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <loom.20090405T141950-815@post.gmane.org>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>
	<loom.20090405T141950-815@post.gmane.org>
Message-ID: <49D8C5E3.5000200@ochtman.nl>

On 05/04/2009 16:39, Antoine Pitrou wrote:
> The other nice thing with having "[svn rXXX]" in the patch subject line is that
> it makes the info easily viewable and searchable in the Web front-end.

We can still make it accessible/searchable on the web if we don't put it 
in the commit message.

> I think at least 3.x and 2.x should live in separate repos. It is pointless for
> a clone of py3k to end up pulling all 40000+ changesets from the trunk. It would
> add 100MB+ to every py3k clone (that is, quadrupling the size of the repository).

I don't agree. It's quite annoying for things like annotate/blame, for 
example, where you may have to switch to another branch while chasing 
down a defective change. I also think 100MB+ is a cheap price to pay, 
given you only pay it in disk space (cheap) and initial clone time (not 
very often, and usually still quite fast). Also, at some point you 
presumably want to deprecate the whole 2.x line, right? So at that 
point, it'd be nice to have a full 3.x line with all the history in it, 
so that you can just throw away the 2.x stuff and still have full history.

I do agree that 2.x and 3.x should probably be in separate clones.

> Is any SVN-to-hg conversion tool able to parse the commits produced by
> svnmerge? And, even then, turn that information into useful hg information (say,
> transplant metadata of which changes were ported)?

I think things are these are planned for hgsubversion, yes. I'd probably 
want to look at implementing some support for it myself if that makes 
the conversion of the Python repositories better.

> I'm not sure what the problem is. Developer SVN access already goes through
> ssh.

Okay, sounds like that will be easy. Would be good to enable compression 
on the SSH, though, if that's not already done.

Cheers,

Dirkjan

From solipsis at pitrou.net  Sun Apr  5 17:18:41 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 15:18:41 +0000 (UTC)
Subject: [Python-Dev] Mercurial?
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>
	<loom.20090405T141950-815@post.gmane.org>
	<49D8C5E3.5000200@ochtman.nl>
Message-ID: <loom.20090405T150210-608@post.gmane.org>

Dirkjan Ochtman <dirkjan <at> ochtman.nl> writes:
> 
> I also think 100MB+ is a cheap price to pay, 
> given you only pay it in disk space (cheap) and initial clone time (not 
> very often, and usually still quite fast).

It is a cheap price to pay if there is a significant return for it. In my
experience using the hg mirror of the py3k branch, I don't remember having had
to run "annotate" on the trunk to hunt for a change that I'd witnessed in py3k.
Other developers may have different experiences, though.

As for the clone time, one of our proeminent developers is, IIRC, on a 40 kb/s
line. Perhaps he wants to step in and say whether cloning the trunk is a painful
experience for him, or not.

> Also, at some point you 
> presumably want to deprecate the whole 2.x line, right?

The consensus seems to be that it will not happen before a couple of years.

> Okay, sounds like that will be easy. Would be good to enable compression 
> on the SSH, though, if that's not already done.

Does the hg protocol compress that good? I would have thought there is already a
lot of compression in the layout (given that it seems much more efficient than
some of its competitors).

Regards

Antoine.

From dirkjan at ochtman.nl  Sun Apr  5 17:47:12 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 17:47:12 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <loom.20090405T150210-608@post.gmane.org>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>	<loom.20090405T141950-815@post.gmane.org>	<49D8C5E3.5000200@ochtman.nl>
	<loom.20090405T150210-608@post.gmane.org>
Message-ID: <49D8D280.6060900@ochtman.nl>

On 05/04/2009 17:18, Antoine Pitrou wrote:
> It is a cheap price to pay if there is a significant return for it. In my
> experience using the hg mirror of the py3k branch, I don't remember having had
> to run "annotate" on the trunk to hunt for a change that I'd witnessed in py3k.
> Other developers may have different experiences, though.
>
> As for the clone time, one of our proeminent developers is, IIRC, on a 40 kb/s
> line. Perhaps he wants to step in and say whether cloning the trunk is a painful
> experience for him, or not.

Sure it's painful, but he only has to go through that once, maybe twice.

> The consensus seems to be that it will not happen before a couple of years.

See, I think the point here is that, even though you want the branches 
to be clones, you also want them to all be part of the same directed 
acyclic graph (that DAG thing I keep nattering on about). That way, you 
can later merge every branch back in to some other branch (even if it's 
trivial merge that doesn't keep anything from one of the branches). Even 
if that's not for a couple of years, it's nice when you'll be able to do 
it in a couple of years without changing all the hashes (meaning 
everybody has to re-clone).

For any dial-up providers, we could for example provide a repository 
that just has the changesets up to the split between trunk and py3k. He 
can clone that once, clone it locally, then pull the rest of the 
respective history in those local clones.

If you don't have common history, a few of the niceties of having a 
DAG-based DVCS in the first place go away; that seems like a pity.

> Does the hg protocol compress that good? I would have thought there is already a
> lot of compression in the layout (given that it seems much more efficient than
> some of its competitors).

When used over HTTP, hg uses bundles (which can also be used as separate 
file to exchange changesets informally). Bundles contain gzip- or 
bzip2-compressed csets. When communicating over SSH, on the other hand, 
hg defaults to uncompressed streams, because the assumption is that 
connections can use SSH's compression, which is more efficient.

All of this functions on top of the already quite efficient revlogs that 
make up the basic storage model for hg.

Cheers,

Dirkjan

From benjamin at python.org  Sun Apr  5 18:48:28 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sun, 5 Apr 2009 11:48:28 -0500
Subject: [Python-Dev] Mercurial?
In-Reply-To: <loom.20090405T150210-608@post.gmane.org>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl>
	<loom.20090405T141950-815@post.gmane.org>
	<49D8C5E3.5000200@ochtman.nl>
	<loom.20090405T150210-608@post.gmane.org>
Message-ID: <1afaf6160904050948n32674f60p6de4b1d335ad150d@mail.gmail.com>

2009/4/5 Antoine Pitrou <solipsis at pitrou.net>:
> Dirkjan Ochtman <dirkjan <at> ochtman.nl> writes:
>>
>> I also think 100MB+ is a cheap price to pay,
>> given you only pay it in disk space (cheap) and initial clone time (not
>> very often, and usually still quite fast).
>
> It is a cheap price to pay if there is a significant return for it. In my
> experience using the hg mirror of the py3k branch, I don't remember having had
> to run "annotate" on the trunk to hunt for a change that I'd witnessed in py3k.
> Other developers may have different experiences, though.

I agree with Dirkjan.

>
> As for the clone time, one of our proeminent developers is, IIRC, on a 40 kb/s
> line. Perhaps he wants to step in and say whether cloning the trunk is a painful
> experience for him, or not.

I suppose this is me. Cloning the hg trunk repo only takes slightly
longer than an svn checkout for me, and it only needs to be done
occasionally, so I have no problem with including all the history.

-- 
Regards,
Benjamin

From foom at fuhm.net  Sun Apr  5 18:51:38 2009
From: foom at fuhm.net (James Y Knight)
Date: Sun, 5 Apr 2009 12:51:38 -0400
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <loom.20090405T102812-215@post.gmane.org>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>
	<loom.20090404T231154-979@post.gmane.org>
	<49D874E4.6030602@sweetapp.com>
	<loom.20090405T102812-215@post.gmane.org>
Message-ID: <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>

On Apr 5, 2009, at 6:29 AM, Antoine Pitrou wrote:

> Brian Quinlan <brian <at> sweetapp.com> writes:
>>
>> I don't see why this is helpful. Could you explain why
>> _RawIOBase.close() calling self.flush() is useful?
>
> I could not explain it for sure since I didn't write the Python  
> version.
> I suppose it's so that people who only override flush()  
> automatically get the
> flush-on-close behaviour.

It seems that a separate method "_internal_close" should've been  
defined to do the actual closing of the file, and the close() method  
should've been defined on the base class as "self.flush();  
self._internal_close()" and never overridden.

James

From barry at python.org  Sun Apr  5 19:04:10 2009
From: barry at python.org (Barry Warsaw)
Date: Sun, 5 Apr 2009 13:04:10 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D87499.5060502@v.loewis.de>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
Message-ID: <25252456-64F8-42C2-BFEF-4AA791C3F1AB@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Apr 5, 2009, at 5:06 AM, Martin v. L?wis wrote:

> - decide what to do with the bzr mirrors

I don't see any reason to keep them running on python.org.  There are,  
or will be, other alternatives.

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdjki3EjvBPtnXfVAQK2gAP8Duw+imZwZhsyGildHkUSeNW1uHazxbzL
cKPeEfanSDUtkC51478/NC7+UxfNGdQJ4umo+LNiy6GXG3Kx7KCmYKHr6yBCzaxS
4HsuOVkFcjqn57u2eT9A5PDcxGgK4Os7XfB3kMS/f1xlBPYsF7W4Qpdck8gTbL+i
dXJnq/+rd6k=
=QSw3
-----END PGP SIGNATURE-----

From firephoenix at wanadoo.fr  Sun Apr  5 19:25:56 2009
From: firephoenix at wanadoo.fr (Firephoenix)
Date: Sun, 05 Apr 2009 19:25:56 +0200
Subject: [Python-Dev] Generator methods - "what's next" ?
In-Reply-To: <878wmftac1.fsf@xemacs.org>
References: <49D896A4.3000104@wanadoo.fr> <878wmftac1.fsf@xemacs.org>
Message-ID: <49D8E9A4.70604@wanadoo.fr>

Stephen J. Turnbull a ?crit :
> Firephoenix writes:
>
>  > I'm a little confused by the recent changes to the generator system...
>
> Welcome to the club.  It's not easy even for the gurus.  See the PEP
> 380 ("yield from") discussions (mostly on Python-Ideas).
>
>  > But I noticed then that all the other methods of the generator had 
>  > stayed the same (send, throw, close...), which gives really weird (imo) 
>  > codes :
>  > 
>  >    next(it)
>  >    it.send(35)
>  >    it.throw(Exception())
>  >    next(it)
>  >    ....
>  > 
>  > Browsing the web, I've found people troubled by that asymmetry, but no 
>  > remarks on its causes nor its future...
>
> Well, this kind of discussion generally belongs on c.l.py, but as far
> as I know, .next() is still present for generators but it's spelled
> .send(None).  See PEP 342.  It seems to me that the rationale for
> respelling .next() as .__next__() given in PEP 3114 doesn't apply to
> .send() and .throw(), since there is no syntax which invokes those
> methods magically.
>
> Also note that since next() takes no argument, it presumes no
> knowledge of the implementation of the iterator.  So specification as
> a function called "on" the iterator seems natural to me.  But .send()
> and .throw() are only useful if you know the semantics of their
> arguments, ie, the implementation of the generator.  Thus using method
> syntax for them seems more natural to me.
>
> If you have some concrete suggestions you want to follow up to
> Python-Dev with, two remarks:
>
> The code presented above is weird because that code is weird, not
> because the generator methods are messed up.  Why would you ever write
> that code?  You need a plausible use case, one where a generator is
> the natural way to write the code, but it's not explicitly iterative.
>
> Second, the whole trend is the other direction, fitting generators
> naturally into Python syntax without using explicit invocation of
> methods.  Again, PEP 380 is an example (though rather specialized).
> As is the expression form of yield (half-successful in that no
> recv() syntax or builtin is needed, although .send() seems to be).  So
> the use case requested above will need to be compelling.
>
>
>   
Whoups, now that you mention it, I discover other mailing-lists seemed 
more suitable for this subject... sorry

Actually I ran over an example like the following, in the case of a 
"reversed generator" that has to be activated by a first call to "next", 
before we're able to send data to the yield expression it has encountered.
But as you mention, send(None) would work as well, and this kind of 
"setup operation" had better be hidden in a function decorator or 
something like that.

 >    next(it) # init phase
 >    it.send(35)
 >    it.send(36)

Regards, 
pascal Chambon 

From martin at v.loewis.de  Sun Apr  5 19:37:53 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 05 Apr 2009 19:37:53 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
Message-ID: <49D8EC71.5020105@v.loewis.de>

> I am not sure if it would be useful to convert the old branches to
> Mercurial. The simplest thing to do would be to keep the current svn
> repository as a read-only archive. And if people needs to commit to
> these branches, they could request the branch to be imported into a
> Mercurial branch (or a simple to use script could be provided and
> developer could run it directly on the server to create a user
> branch).

I think it should be stated in the PEP what branches get converted,
in what form, and what the further usage of the svn repository should
be.

> An auto-close would be a nice feature, but, as you said, not necessary
> for the migration. The main stumbling block to implement an auto-close
> feature is to define when an issue should be closed. Maybe we could
> add our own meta-data to the commit message. For example:
> 
>    Fix some nasty bug.
> 
>    Close-Issue: 4532
> 
> When a such commit would arrive in one of the main branches, a commit
> hook would close the issue if all the affected releases have been
> fixed.

I think there is a long tradition of such annotations; we should
try to repeat history here. IIUC, the Debian bugtracker understands

   Closes: #4532

and some other syntaxes. It must be easy to remember, else people
won't use it.

>>>    - Setup temporary svn mirrors for the main Mercurial repositories.
>> What is that?
>>
> 
> I think it would be a good idea to host a temporary svn mirrors for
> developers who accesses their VCS via an IDE. Although, I am sure
> anymore if supporting these developers (if there are any) would worth
> the trouble. So, think of this as optional.

Any decision to have or not have such a feature should be stated in
the PEP. I personally don't use IDEs, so I don't care (although
I do notice that the apparent absence of IDE support for Mercurial
indicates maturity of the technology)

>>>    - Augment code.python.org infrastructure to support the creation of
>>> developer accounts.
>> One option would be to carry on with the current setup; migrating it
>> to hg might work as well, of course.
>>
> 
> You mean the current setup for svn.python.org? Would you be
> comfortable to let this machine be accessed by core developers through
> SSH? Since with Mercurial, SSH access will be needed for server-side
> clone (or, a script similar to what the Mozilla folk have [1] could be
> added).
> 
> [1]: https://developer.mozilla.org/en/Publishing_Mercurial_Clones

Ok, I take that back. I assumed that Mercurial could work *exactly*
as Subversion. Apparently, that's not the case (although I have no
idea what a server-side clone is). So I wait for the PEP to explain
how authentication and access control is to be implemented. Creating
individual Unix accounts for committers should be avoided.

>> - integrate with the buildbot
> 
> Good one. It seems buildbot has support for Mercurial. [2] So, this
> will be a matter of tweaking the right options. The batch scripts in
> Tools/buildbot will also need to be updated.
> 
> [2]: http://djmitche.github.com/buildbot/docs/0.7.10/#How-Different-VC-Systems-Specify-Sources

I can give you access to the master setup. Ideally, this should
be tested before the switchover (with a single branch). We also
need instructions for the slaves (if any - perhaps installing
a hg binary is sufficient).

> Since the directories in /external are considered read-only, we could
> simply a new Mercurial repository and copy the content of /external in
> it.
>> - decide what to do with the bzr mirrors
>>
> 
> I don't see much benefits to keep them.

Both should go into the PEP.

Regards,
Martin

From martin at v.loewis.de  Sun Apr  5 19:39:18 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 05 Apr 2009 19:39:18 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <1afaf6160904050615i39aa6fd9o953336f4c74fa871@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>	
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	
	<49D87499.5060502@v.loewis.de> <49D8800A.60601@ochtman.nl>
	<1afaf6160904050615i39aa6fd9o953336f4c74fa871@mail.gmail.com>
Message-ID: <49D8ECC6.9080302@v.loewis.de>

>> I'm not sure exactly what the purpose or mechanism for /external is. Sure,
>> it's like a snapshot dir, probably used for to pull some stuff into other
>> process? Seems to me like it might be interesting to, for example, convert
>> to a simple config file + script that lets you specify a package
>> (repository) + tag, which can then be easily pulled in.
>>
>> But it'd be nice to know where and how exactly this is used.
> 
> Basically it contains released versions of packages that some parts of
> Python depend on. For example, Sphinx dependencies to build the docs
> reside their. A simple script that downloads a tarball and extracts it
> seems more elegant.

Such a script would, in particular, also have to work on the Windows
buildbot slaves. /external is primarily used for the Window build.

Regards,
Martin

From martin at v.loewis.de  Sun Apr  5 19:42:59 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Apr 2009 19:42:59 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D87DC2.2040708@ochtman.nl>
References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl>
	<49D7FEF6.1010006@v.loewis.de> <49D87DC2.2040708@ochtman.nl>
Message-ID: <49D8EDA3.3080405@v.loewis.de>

> Sounds sane. Would I be able to get access to PSF infrastructure to get
> started on that, or do you want me to get started on my own box? I'll
> probably do the conversion on my own box, but for authn/authz it might
> be useful to be able to use PSF infra.

Now that Alexandre has also volunteered, you two need to decide who is
in charge. Whoever does that will certainly get access to
code.python.org; the demo installation should run on that machine.

Regards,
Martin

From solipsis at pitrou.net  Sun Apr  5 19:45:50 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 17:45:50 +0000 (UTC)
Subject: [Python-Dev] Possible py3k io wierdness
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>
	<loom.20090404T231154-979@post.gmane.org>
	<49D874E4.6030602@sweetapp.com>
	<loom.20090405T102812-215@post.gmane.org>
	<3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>
Message-ID: <loom.20090405T174250-453@post.gmane.org>

James Y Knight <foom <at> fuhm.net> writes:
> 
> It seems that a separate method "_internal_close" should've been  
> defined to do the actual closing of the file, and the close() method  
> should've been defined on the base class as "self.flush();  
> self._internal_close()" and never overridden.

I'm completely open to changes as long as there is a reasonable consensus around
them. Your proposal looks sane, although the fact that a semi-private method
(starting with an underscore) is designed to be overriden in some classes is a
bit annoying.

I'd also like to have some advice from Guido, since he was one of the driving
forces behind the specification and the original Python implementation.

Regards

Antoine.

From martin at v.loewis.de  Sun Apr  5 19:50:07 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Apr 2009 19:50:07 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8800A.60601@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de> <49D8800A.60601@ochtman.nl>
Message-ID: <49D8EF4F.7010709@v.loewis.de>

Dirkjan Ochtman wrote:
> On 05/04/2009 11:06, "Martin v. L?wis" wrote:
>> In particular, the Stackless people have requested that they move along
>> with what core Python does, so their code should also be converted.
> 
> I'd be interested to hear if they want all of their stuff converted, or
> just the mainline/trunk of what is currently in trunk/branches/tags.

Richard Tew would be the person discuss the details with.

>> - integrate with the buildbot
> 
> I've setup the buildbot infra for Mercurial (though not many people are
> interesting in it, so it's kind of languished). Using buildbot's hg
> support is easy. 0.7.10 is the first version which works with hg 1.1+,
> though, so we probably don't want to go with anything earlier.

Ok, that's a problem. We currently run 0.7.5 on the master, and have
made custom changes that need to be forward-ported. IIUC, this will
also mean that the waterfall default page is gone, which might surprise
people.

I suppose all slaves also need to upgrade.

>> - come up with a strategy for /external (also relevant for
>>    the buildbot slaves)
> 
> I'm not sure exactly what the purpose or mechanism for /external is.
> Sure, it's like a snapshot dir, probably used for to pull some stuff
> into other process? Seems to me like it might be interesting to, for
> example, convert to a simple config file + script that lets you specify
> a package (repository) + tag, which can then be easily pulled in.
>
> But it'd be nice to know where and how exactly this is used.

Take a look at the batch files in Tools/buildbot - they are the
primary consumers. PCbuild/readme.txt also refers to it.

Regards,
Martin

From martin at v.loewis.de  Sun Apr  5 19:53:48 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 05 Apr 2009 19:53:48 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8BFD2.8090600@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
	<49D8BFD2.8090600@ochtman.nl>
Message-ID: <49D8F02C.1040008@v.loewis.de>

> We should probably not include any branches that haven't been touched in
> the last 18 months. Then we also leave out branches that have been pruned.
> 
> BTW, tags are also missing from the current conversions. We probably
> want to keep all release tags, but not the partial tags (e.g. the
> Distutils tags). Are there any other particularly useful tags we should
> keep?

First of all, if the conversion is incomplete, the PEP should make
explicit what information will be lost.

As for tags - I think providing just the release tags is fine.

> BTW, this would also be a good time to split out the stdlib if that's
> still desirable (which I seem to have gleaned from the PyCon videos).

Is it possible to branch from a subdirectory? For the "different VMs"
stuff, it's rather desirable to have a branch of the test suite, and
the perhaps the standard library, than extracting it from the source.

Regards,
Martin

From dirkjan at ochtman.nl  Sun Apr  5 19:58:21 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 19:58:21 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8EF4F.7010709@v.loewis.de>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87499.5060502@v.loewis.de>
	<49D8800A.60601@ochtman.nl> <49D8EF4F.7010709@v.loewis.de>
Message-ID: <49D8F13D.90302@ochtman.nl>

On 05/04/2009 19:50, "Martin v. L?wis" wrote:
> Ok, that's a problem. We currently run 0.7.5 on the master, and have
> made custom changes that need to be forward-ported. IIUC, this will
> also mean that the waterfall default page is gone, which might surprise
> people.
>
> I suppose all slaves also need to upgrade.

Why is the waterfall default page gone? I had that in my 0.7.9 setup, at 
least. Provided the 0.7.5 slaves work with 0.7.10, then no, it's not 
necessary to upgrade the slaves. The problem in buildbot was strictly 
with the change detection in hg repos (combined with the Mercurial API, 
which hasn't entirely become stable -- so it changed a bit in 1.1).

> Take a look at the batch files in Tools/buildbot - they are the
> primary consumers. PCbuild/readme.txt also refers to it.

Will do.

Cheers,

Dirkjan

From dirkjan at ochtman.nl  Sun Apr  5 20:02:44 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 20:02:44 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8EC71.5020105@v.loewis.de>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87499.5060502@v.loewis.de>	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
	<49D8EC71.5020105@v.loewis.de>
Message-ID: <49D8F244.8080204@ochtman.nl>

On 05/04/2009 19:37, "Martin v. L?wis" wrote:
> Any decision to have or not have such a feature should be stated in
> the PEP. I personally don't use IDEs, so I don't care (although
> I do notice that the apparent absence of IDE support for Mercurial
> indicates maturity of the technology)

Well, there should be good support for Eclipse (through 
MercurialEclipse), NetBeans (they use hg themselves, after all), and the 
IDE-version of Komodo 5.0+ also includes hg support. I suppose other, 
more Python-specific IDEs might be following suit as Python switches.

> Ok, I take that back. I assumed that Mercurial could work *exactly*
> as Subversion. Apparently, that's not the case (although I have no
> idea what a server-side clone is). So I wait for the PEP to explain
> how authentication and access control is to be implemented. Creating
> individual Unix accounts for committers should be avoided.

Yeah, that won't be necessary. The canonical solution is to have just 
one Unix account called hg, to which we can add public keys.

Cheers,

Dirkjan

From martin at v.loewis.de  Sun Apr  5 20:18:33 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Apr 2009 20:18:33 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8F13D.90302@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87499.5060502@v.loewis.de>
	<49D8800A.60601@ochtman.nl> <49D8EF4F.7010709@v.loewis.de>
	<49D8F13D.90302@ochtman.nl>
Message-ID: <49D8F5F9.3000803@v.loewis.de>

>> Ok, that's a problem. We currently run 0.7.5 on the master, and have
>> made custom changes that need to be forward-ported. IIUC, this will
>> also mean that the waterfall default page is gone, which might surprise
>> people.
>>
>> I suppose all slaves also need to upgrade.
> 
> Why is the waterfall default page gone? I had that in my 0.7.9 setup, at
> least. Provided the 0.7.5 slaves work with 0.7.10, then no, it's not
> necessary to upgrade the slaves. The problem in buildbot was strictly
> with the change detection in hg repos (combined with the Mercurial API,
> which hasn't entirely become stable -- so it changed a bit in 1.1).

My understanding is that with 0.7.6 and later, the default page
won't be the waterfall anymore. In the 0.7.6 release notes, it
says

# The initial page (when you hit the root of the web site) is served
# from index.html, and provides links to the Waterfall as well as the
# other pages.

In the 0.7.9 release notes, it says

# The html.Waterfall status target was replaced by html.WebStatus in
# 0.7.6, and will be removed by 0.8.0.

But then, I have not tried installing it, so I don't know what it
actually looks like.

Regards,
Martin

From barry at python.org  Sun Apr  5 20:19:55 2009
From: barry at python.org (Barry Warsaw)
Date: Sun, 5 Apr 2009 14:19:55 -0400
Subject: [Python-Dev] Tools
Message-ID: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Someone (I'm sorry, I forgot who) asked me at Pycon about stripping  
out Demos and Tools.  I'm happy to remove the two I wrote - Tools/ 
world and Tools/pynche - from the distribution and release them as  
separate projects (retaining the PSF license).   Should I remove them  
from both the Python 2.x and 3.x trunks?

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdj2S3EjvBPtnXfVAQJvkAQAhj/Go+OtfYP//OZ7HIHwTjaeMlpAkfwn
iPxE6O8gY0K48J1AUmjvGSeckfP4FRqVJWOVMQYvX8yTHNFnCJxDSl4JjgboqLz4
s/IvrUYjSiN1FGrQJBA3RI4jFmuetzmKxNWgi6gEzQ6ocTLC80EyCHhxsAMhCeqr
SGQ+Alrewis=
=ODWt
-----END PGP SIGNATURE-----

From martin at v.loewis.de  Sun Apr  5 20:22:46 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Apr 2009 20:22:46 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8F244.8080204@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87499.5060502@v.loewis.de>	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
	<49D8EC71.5020105@v.loewis.de> <49D8F244.8080204@ochtman.nl>
Message-ID: <49D8F6F6.8030308@v.loewis.de>

> Yeah, that won't be necessary. The canonical solution is to have just
> one Unix account called hg, to which we can add public keys.

That would work fine for me. We currently call the account pythondev,
but calling it hg would be shorter, and therefore better (plus,
pythondev is associated with svn).

The PEP should then explain what the authorized_keys lines should
look like; this allows people to review the security of the setup.

Regards,
Martin

From martin at v.loewis.de  Sun Apr  5 20:29:31 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Apr 2009 20:29:31 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D87CD4.1000909@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
Message-ID: <49D8F88B.3050102@v.loewis.de>

> I've svnsynced the SVN repo so that we can work on it efficiently, and
> I've already talked with Augie Fackler, the hgsubversion maintainer,
> about what the best way forward is. For example, we may want to leave
> some of the very old history behind, or prune some old branches.

I'm -1 on removing very old history; it's still useful to find out
that some change goes back to 1994. I'm -0 on removing old branches
(your 18 month policy sounds reasonable).

>>     - Convert the current svn commit hooks to Mercurial.
> 
> Some new hooks should also be discussed. For example, Mozilla uses a
> single-head hook, to prevent people from pushing multiple heads. They
> also have a pushlog extension that keeps a SQLite database of what
> people pushed. This is particularly useful for linearizing history,
> which is required for integration with buildbot infrastructure.

FYI: this is the list of hooks currently employed:
- pre: check whitespace
- post: mail python-checkins
        inform regular buildbot
        inform community buildbot
        trigger website rebuild if a PEP was modified
        (but then, whether or not the PEPs will be maintained
         in hg also needs to be decided)

>>     - Augment code.python.org infrastructure to support the creation of
>> developer accounts.
> 
> Developers already have accounts, don't they?

Depends on the term "account". There is a mapping ssh-key <-> logname.

> In any case, some web
> interface to facilitate setting up new clones (branches) is also
> something that's probably desirable. I think Mozilla has some tooling
> for that which we might be able to start off of.

How to authenticate in that interface? We don't have passwords per
committer.

Regards,
Martin

From martin at v.loewis.de  Sun Apr  5 20:36:42 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 05 Apr 2009 20:36:42 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8BC81.7040007@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl>
Message-ID: <49D8FA3A.5050400@v.loewis.de>

> For another, I'd like to use an author map to bring the revision authors
> more in line with what Mercurial repositories usually display; this
> helps with tool support and is also just a nicer solution IMO.

We do require full real names (i.e. no nicknames). Can Mercurial
guarantee such a thing?

> At the very least we should have a proper discussion over this.

If so, I would like to see that discussion in the PEP.

I don't think I can personally contribute to that discussion.
I will have to trust that whatever Mercurial experts propose is
good.

> The current Mercurial mirror for py3k also doesn't include any history
> from before it was branched, which is bad, IMO. In order to get the most
> of the DVCS structure, it would be helpful if py3k shared history with
> the normal (trunk) branches.

In the long run, the current trunk may cease to exist, and the py3k
branch may take over its role. Not sure whether this needs to be
considered.

>> Not really. Currently, core developers can only push stuff using the
>> Bazaar setup. Personally, I think SSH access would be a lot nicer, but
>> this will depend how confident python.org's admins are with this idea.

If it's the same as the current subversion access, it's fine. Otherwise,
it needs discussion.

> We could still enable pushing through http(s) for hgweb(dir).

But that would require to hand out (and manage) passwords, right?

Martin

From dirkjan at ochtman.nl  Sun Apr  5 20:37:36 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 20:37:36 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8F5F9.3000803@v.loewis.de>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87499.5060502@v.loewis.de>
	<49D8800A.60601@ochtman.nl> <49D8EF4F.7010709@v.loewis.de>
	<49D8F13D.90302@ochtman.nl> <49D8F5F9.3000803@v.loewis.de>
Message-ID: <49D8FA70.2060304@ochtman.nl>

On 05/04/2009 20:18, "Martin v. L?wis" wrote:
> But then, I have not tried installing it, so I don't know what it
> actually looks like.

Ah, right. In my setup, there was an index page with three lines of 
text, one of which had a link to the waterfall. So I think that should 
still be simple enough for most of the interested parties. ;)

Cheers,

Dirkjan

From martin at v.loewis.de  Sun Apr  5 20:40:20 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Apr 2009 20:40:20 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8C5E3.5000200@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>	<loom.20090405T141950-815@post.gmane.org>
	<49D8C5E3.5000200@ochtman.nl>
Message-ID: <49D8FB14.2080509@v.loewis.de>

>> I think at least 3.x and 2.x should live in separate repos. It is
>> pointless for
>> a clone of py3k to end up pulling all 40000+ changesets from the
>> trunk. It would
>> add 100MB+ to every py3k clone (that is, quadrupling the size of the
>> repository).
> 
> I don't agree. It's quite annoying for things like annotate/blame, for
> example, where you may have to switch to another branch while chasing
> down a defective change.

FWIW, I also think that all branches should go back to the very beginning.

> Okay, sounds like that will be easy. Would be good to enable compression
> on the SSH, though, if that's not already done.

Where is that configured?

Regards,
Martin

From dirkjan at ochtman.nl  Sun Apr  5 20:43:24 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 20:43:24 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8F88B.3050102@v.loewis.de>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de>
Message-ID: <49D8FBCC.1050801@ochtman.nl>

On 05/04/2009 20:29, "Martin v. L?wis" wrote:
> FYI: this is the list of hooks currently employed:
> - pre: check whitespace
> - post: mail python-checkins
>          inform regular buildbot
>          inform community buildbot
>          trigger website rebuild if a PEP was modified
>          (but then, whether or not the PEPs will be maintained
>           in hg also needs to be decided)

All this is easy to do with Mercurial's hook system. One caveat is that 
stuff (like whitespace) only gets checked at push time, not at commit 
time (running commit hooks would have to be done on the client, but 
since we don't sandbox hooks, they would be a liability to distribute by 
default). People could still set them up locally for pre-commit if they 
want, of course, but otherwise they may need some trickery to modify the 
changesets they want to push.

For the email messages, we'll probably want to use the notify extension 
that comes with hg.

> How to authenticate in that interface? We don't have passwords per
> committer.

Okay, so we'll use ssh.

Cheers,

Dirkjan

From dirkjan at ochtman.nl  Sun Apr  5 20:45:27 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sun, 05 Apr 2009 20:45:27 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8FA3A.5050400@v.loewis.de>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de>
Message-ID: <49D8FC47.8080803@ochtman.nl>

On 05/04/2009 20:36, "Martin v. L?wis" wrote:
> We do require full real names (i.e. no nicknames). Can Mercurial
> guarantee such a thing?

We could pre-record the list of allowed names in a hook, then have the 
hook check that usernames include one of those names and an email 
address (so people can still start using another email address).

> In the long run, the current trunk may cease to exist, and the py3k
> branch may take over its role. Not sure whether this needs to be
> considered.

I considered that in some other subthread. :)

Cheers,

Dirkjan

From benjamin at python.org  Sun Apr  5 21:30:04 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sun, 5 Apr 2009 14:30:04 -0500
Subject: [Python-Dev] Tools
In-Reply-To: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org>
References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org>
Message-ID: <1afaf6160904051230h3f657ad3tbf41e6fa20bd02fb@mail.gmail.com>

2009/4/5 Barry Warsaw <barry at python.org>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Someone (I'm sorry, I forgot who) asked me at Pycon about stripping out
> Demos and Tools. ?I'm happy to remove the two I wrote - Tools/world and
> Tools/pynche - from the distribution and release them as separate projects
> (retaining the PSF license). ? Should I remove them from both the Python 2.x
> and 3.x trunks?

+1 to removing some of the old unused stuff from those directories.

-- 
Regards,
Benjamin

From g.brandl at gmx.net  Sun Apr  5 22:09:46 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 05 Apr 2009 22:09:46 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8FC47.8080803@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>
	<49D8FA3A.5050400@v.loewis.de> <49D8FC47.8080803@ochtman.nl>
Message-ID: <grb36c$1b7$1@ger.gmane.org>

Dirkjan Ochtman schrieb:
> On 05/04/2009 20:36, "Martin v. L?wis" wrote:
>> We do require full real names (i.e. no nicknames). Can Mercurial
>> guarantee such a thing?
> 
> We could pre-record the list of allowed names in a hook, then have the 
> hook check that usernames include one of those names and an email 
> address (so people can still start using another email address).

What about commits from other people, e.g. pulled from a repo or imported
via hg import?

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From g.brandl at gmx.net  Sun Apr  5 22:11:36 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 05 Apr 2009 22:11:36 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8FBCC.1050801@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>
	<49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl>
Message-ID: <grb39r$1b7$2@ger.gmane.org>

Dirkjan Ochtman schrieb:
> On 05/04/2009 20:29, "Martin v. L?wis" wrote:
>> FYI: this is the list of hooks currently employed:
>> - pre: check whitespace
>> - post: mail python-checkins
>>          inform regular buildbot
>>          inform community buildbot
>>          trigger website rebuild if a PEP was modified
>>          (but then, whether or not the PEPs will be maintained
>>           in hg also needs to be decided)
> 
> All this is easy to do with Mercurial's hook system. One caveat is that 
> stuff (like whitespace) only gets checked at push time, not at commit 
> time (running commit hooks would have to be done on the client, but 
> since we don't sandbox hooks, they would be a liability to distribute by 
> default). People could still set them up locally for pre-commit if they 
> want, of course, but otherwise they may need some trickery to modify the 
> changesets they want to push.

When commits with bad whitespace changes are rejected on push, this is a
pretty good incentive to run the pre-commit hook next time, so that you
don't have to re-do all the commits in that batch :)

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From solipsis at pitrou.net  Sun Apr  5 22:29:57 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 20:29:57 +0000 (UTC)
Subject: [Python-Dev] Mercurial?
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>
	<49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl>
	<grb39r$1b7$2@ger.gmane.org>
Message-ID: <loom.20090405T202918-34@post.gmane.org>

Georg Brandl <g.brandl <at> gmx.net> writes:
> 
> When commits with bad whitespace changes are rejected on push, this is a
> pretty good incentive to run the pre-commit hook next time, so that you
> don't have to re-do all the commits in that batch :)

Do you really have to re-do all the commits, or can you just commit the
whitespace fixes separately?

From martin at v.loewis.de  Sun Apr  5 22:35:45 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 05 Apr 2009 22:35:45 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <grb36c$1b7$1@ger.gmane.org>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>	<49D8FA3A.5050400@v.loewis.de>
	<49D8FC47.8080803@ochtman.nl> <grb36c$1b7$1@ger.gmane.org>
Message-ID: <49D91621.1050306@v.loewis.de>

>> We could pre-record the list of allowed names in a hook, then have the 
>> hook check that usernames include one of those names and an email 
>> address (so people can still start using another email address).
> 
> What about commits from other people, e.g. pulled from a repo or imported
> via hg import?

Not sure. What is the recommendation?

Ideally, we would have a contributor agreement on file of any, well,
contributor.

Regards,
Martin

From g.brandl at gmx.net  Sun Apr  5 22:37:06 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 05 Apr 2009 22:37:06 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <loom.20090405T202918-34@post.gmane.org>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<49D8F88B.3050102@v.loewis.de>
	<49D8FBCC.1050801@ochtman.nl>	<grb39r$1b7$2@ger.gmane.org>
	<loom.20090405T202918-34@post.gmane.org>
Message-ID: <grb4pl$5fj$1@ger.gmane.org>

Antoine Pitrou schrieb:
> Georg Brandl <g.brandl <at> gmx.net> writes:
>> 
>> When commits with bad whitespace changes are rejected on push, this is a
>> pretty good incentive to run the pre-commit hook next time, so that you
>> don't have to re-do all the commits in that batch :)
> 
> Do you really have to re-do all the commits, or can you just commit the
> whitespace fixes separately?

Probably yes. I was just painting the devil on the wall :)

At PyCon, I already wrote the pre-commit hook.  And what's best, since it
runs locally it can fix the files for you instead of just bitching around...

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From g.brandl at gmx.net  Sun Apr  5 22:47:32 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 05 Apr 2009 22:47:32 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D91621.1050306@v.loewis.de>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>	<49D8FA3A.5050400@v.loewis.de>	<49D8FC47.8080803@ochtman.nl>
	<grb36c$1b7$1@ger.gmane.org> <49D91621.1050306@v.loewis.de>
Message-ID: <grb5d8$89r$1@ger.gmane.org>

Martin v. L?wis schrieb:
>>> We could pre-record the list of allowed names in a hook, then have the 
>>> hook check that usernames include one of those names and an email 
>>> address (so people can still start using another email address).
>> 
>> What about commits from other people, e.g. pulled from a repo or imported
>> via hg import?
> 
> Not sure. What is the recommendation?
>
> Ideally, we would have a contributor agreement on file of any, well,
> contributor.

Well, in theory it shouldn't make a difference if a contributed patch is
committed by a committer under his name (and the contributor's name mentioned
in the commit message), or if the patch is committed under the contributor's
name.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From greg.ewing at canterbury.ac.nz  Mon Apr  6 00:39:43 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Apr 2009 10:39:43 +1200
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <49D88E6F.4080801@sweetapp.com>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>
	<loom.20090404T231154-979@post.gmane.org>
	<49D874E4.6030602@sweetapp.com>
	<loom.20090405T102812-215@post.gmane.org>
	<49D88E6F.4080801@sweetapp.com>
Message-ID: <49D9332F.4050503@canterbury.ac.nz>

Brian Quinlan wrote:

>         if not self.__closed:
>             try:
> -                self.flush()
> +                IOBase.flush(self)
>             except IOError:
>                 pass  # If flush() fails, just give up
>             self.__closed = True

That doesn't seem like a good idea to me at all. If
someone overrides flush() but not close(), their
flush method won't get called, which would be surprising.

To get the desired behaviour, you need something like

   def close(self):
     if not self.__closed:
       self.flush()
       self._close()
       self.__closed = True

and then tell people to override _close() rather than
close().

-- 
Greg

From greg.ewing at canterbury.ac.nz  Mon Apr  6 00:51:19 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Apr 2009 10:51:19 +1200
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <49D89309.7050307@gmail.com>
References: <20090404150111.GQ12593@idyll.org>
	<49D7D884.5060801@canterbury.ac.nz>
	<loom.20090404T220604-267@post.gmane.org>
	<49D7EE7C.4040604@canterbury.ac.nz>
	<loom.20090404T233632-904@post.gmane.org>
	<49D7F2AB.8060907@canterbury.ac.nz>
	<loom.20090404T235147-360@post.gmane.org>
	<49D85A23.6020405@gmail.com> <loom.20090405T103055-388@post.gmane.org>
	<49D89309.7050307@gmail.com>
Message-ID: <49D935E7.5010207@canterbury.ac.nz>

Nick Coghlan wrote:

> Still, as both you and Greg have pointed out, even in its current form
> memoryview is already useful as a replacement for buffer that doesn't
> share buffer's problems

That may be so, but I was more pointing out that the
elementwise functions I'm talking about would be useful
even without memoryview at all. Mostly you would just
use them directly on array.array or other sequence types.

<rant subject="terminology">
Why is it that whenever the word "buffer" is mentioned,
some people seem to think it has something to do with
memoryview? There is no such thing as "a buffer". There
is the buffer interface, and there are objects which
support the buffer interface, of which memoryview is
one among many.
</rant>

-- 
Greg

From skippy.hammond at gmail.com  Mon Apr  6 00:48:04 2009
From: skippy.hammond at gmail.com (Mark Hammond)
Date: Mon, 06 Apr 2009 08:48:04 +1000
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8BC81.7040007@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl>
Message-ID: <49D93524.3060500@gmail.com>

On 6/04/2009 12:13 AM, Dirkjan Ochtman wrote:
>
> I have a stab at an author map at http://dirkjan.ochtman.nl/author-map.
> Could use some review, but it seems like a good start.

Just to be clear, what input would you like on that map?

I'm listed twice:

mark.hammond = Mark Hammond <skippy.hammond at gmail.com>
mhammond = Mark Hammond <skippy.hammond at gmail.com>

but that email address isn't the address normally associated with any 
checkins I make, nor the address in the comments of the ssh keys I use 
(which is mhammond at skippinet.com.au)

The addresses given are valid though, so I'm not sure what kind of 
review or feedback you are after.

Cheers,

Mark

From ncoghlan at gmail.com  Mon Apr  6 00:54:20 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Apr 2009 08:54:20 +1000
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <loom.20090405T174250-453@post.gmane.org>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>	<loom.20090404T231154-979@post.gmane.org>	<49D874E4.6030602@sweetapp.com>	<loom.20090405T102812-215@post.gmane.org>	<3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>
	<loom.20090405T174250-453@post.gmane.org>
Message-ID: <49D9369C.8080400@gmail.com>

Antoine Pitrou wrote:
> James Y Knight <foom <at> fuhm.net> writes:
>> It seems that a separate method "_internal_close" should've been  
>> defined to do the actual closing of the file, and the close() method  
>> should've been defined on the base class as "self.flush();  
>> self._internal_close()" and never overridden.
> 
> I'm completely open to changes as long as there is a reasonable consensus around
> them. Your proposal looks sane, although the fact that a semi-private method
> (starting with an underscore) is designed to be overriden in some classes is a
> bit annoying.

Note that we already do that in a couple of places where it makes sense
- in those cases the underscore is there to tell *clients* of the class
"don't call this directly", but it is still explicitly documented as
part of the subclassing API.

(the only example I can find at the moment is in asynchat, but I thought
there were a couple of more common ones than that - hopefully I'll think
of them later)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From greg.ewing at canterbury.ac.nz  Mon Apr  6 00:56:28 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Apr 2009 10:56:28 +1200
Subject: [Python-Dev] Generator methods - "what's next" ?
In-Reply-To: <49D896A4.3000104@wanadoo.fr>
References: <49D896A4.3000104@wanadoo.fr>
Message-ID: <49D9371C.3000202@canterbury.ac.nz>

Firephoenix wrote:

> I basically agreed with renaming the next() method to __next__(), so as 
> to follow the naming of other similar methods (__iter__() etc.).
> But I noticed then that all the other methods of the generator had 
> stayed the same (send, throw, close...)

Keep in mind that next() is part of the iterator protocol
that applies to all iterators, whereas the others are
specific to generators. By your reasoning, any object that
has any __xxx__ methods should have all its other methods
turned into __xxx__ methods as well.

-- 
Greg

From ncoghlan at gmail.com  Mon Apr  6 01:10:37 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Apr 2009 09:10:37 +1000
Subject: [Python-Dev] graphics maths types in python core?
In-Reply-To: <49D935E7.5010207@canterbury.ac.nz>
References: <20090404150111.GQ12593@idyll.org>	<49D7D884.5060801@canterbury.ac.nz>	<loom.20090404T220604-267@post.gmane.org>	<49D7EE7C.4040604@canterbury.ac.nz>	<loom.20090404T233632-904@post.gmane.org>	<49D7F2AB.8060907@canterbury.ac.nz>	<loom.20090404T235147-360@post.gmane.org>	<49D85A23.6020405@gmail.com>
	<loom.20090405T103055-388@post.gmane.org>	<49D89309.7050307@gmail.com>
	<49D935E7.5010207@canterbury.ac.nz>
Message-ID: <49D93A6D.2040602@gmail.com>

Greg Ewing wrote:
> <rant subject="terminology">
> Why is it that whenever the word "buffer" is mentioned,
> some people seem to think it has something to do with
> memoryview? There is no such thing as "a buffer". There
> is the buffer interface, and there are objects which
> support the buffer interface, of which memoryview is
> one among many.
> </rant>

Probably because memoryview *is* the Python API for the C-level buffer
interface. While I can understand that point of view, I don't agree with
it, which is why I consider it important to point out that memoryview's
limitations aren't shared by the underlying API when the topic comes up.

/tangent from the vector math thread (hopefully)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Mon Apr  6 01:14:54 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 5 Apr 2009 23:14:54 +0000 (UTC)
Subject: [Python-Dev] graphics maths types in python core?
References: <20090404150111.GQ12593@idyll.org>
	<49D7D884.5060801@canterbury.ac.nz>
	<loom.20090404T220604-267@post.gmane.org>
	<49D7EE7C.4040604@canterbury.ac.nz>
	<loom.20090404T233632-904@post.gmane.org>
	<49D7F2AB.8060907@canterbury.ac.nz>
	<loom.20090404T235147-360@post.gmane.org>
	<49D85A23.6020405@gmail.com>
	<loom.20090405T103055-388@post.gmane.org>
	<49D89309.7050307@gmail.com> <49D935E7.5010207@canterbury.ac.nz>
Message-ID: <loom.20090405T231433-693@post.gmane.org>

Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
> 
> Why is it that whenever the word "buffer" is mentioned,
> some people seem to think it has something to do with
> memoryview? There is no such thing as "a buffer".

There is a Py_buffer struct.

From doko at ubuntu.com  Mon Apr  6 01:37:00 2009
From: doko at ubuntu.com (Matthias Klose)
Date: Mon, 06 Apr 2009 01:37:00 +0200
Subject: [Python-Dev] Tools
In-Reply-To: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org>
References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org>
Message-ID: <49D9409C.6060108@ubuntu.com>

Barry Warsaw schrieb:
> Someone (I'm sorry, I forgot who) asked me at Pycon about stripping out
> Demos and Tools.  I'm happy to remove the two I wrote - Tools/world and
> Tools/pynche - from the distribution and release them as separate
> projects (retaining the PSF license).   Should I remove them from both
> the Python 2.x and 3.x trunks?

+1, but please for 2.7 and 3.1 only.

From ajaksu at gmail.com  Mon Apr  6 01:56:09 2009
From: ajaksu at gmail.com (Daniel (ajax) Diniz)
Date: Sun, 5 Apr 2009 20:56:09 -0300
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8EC71.5020105@v.loewis.de>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com> 
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com> 
	<49D8EC71.5020105@v.loewis.de>
Message-ID: <2d75d7660904051656s2242a9ex91ac0a2d8056cbfe@mail.gmail.com>

"Martin v. L?wis" wrote:
>> I think it would be a good idea to host a temporary svn mirrors for
>> developers who accesses their VCS via an IDE. Although, I am sure
>> anymore if supporting these developers (if there are any) would worth
>> the trouble. So, think of this as optional.
>
> Any decision to have or not have such a feature should be stated in
> the PEP. I personally don't use IDEs, so I don't care (although
> I do notice that the apparent absence of IDE support for Mercurial
> indicates maturity of the technology)

I can spend some time on Mercurial integration for the main IDEs in
use by core devs, I'm sure the PIDA folks have most of this sorted
already. It would be important to have these (and any other non-PEP
worthy tasks/helpers) listed with some detail, e.g., in a wiki page.

If anyone has requests for tools that would make the transition
smoother (e.g. the script for /external, a wrapper for svnmerge
semantics on top of hg transplant, etc.) but has no time to work on
them, please add them to
http://wiki.python.org/moin/CoreDevHelperTools .

Daniel

From aleaxit at gmail.com  Mon Apr  6 02:28:51 2009
From: aleaxit at gmail.com (Alex Martelli)
Date: Sun, 5 Apr 2009 17:28:51 -0700
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <49D9369C.8080400@gmail.com>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>
	<loom.20090404T231154-979@post.gmane.org>
	<49D874E4.6030602@sweetapp.com>
	<loom.20090405T102812-215@post.gmane.org>
	<3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>
	<loom.20090405T174250-453@post.gmane.org> <49D9369C.8080400@gmail.com>
Message-ID: <e8a0972d0904051728l434b2cel8dd02e6c06926298@mail.gmail.com>

On Sun, Apr 5, 2009 at 3:54 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> Antoine Pitrou wrote:
> > James Y Knight <foom <at> fuhm.net> writes:
> >> It seems that a separate method "_internal_close" should've been
> >> defined to do the actual closing of the file, and the close() method
> >> should've been defined on the base class as "self.flush();
> >> self._internal_close()" and never overridden.
> >
> > I'm completely open to changes as long as there is a reasonable consensus
> around
> > them. Your proposal looks sane, although the fact that a semi-private
> method
> > (starting with an underscore) is designed to be overriden in some classes
> is a
> > bit annoying.
>
> Note that we already do that in a couple of places where it makes sense
> - in those cases the underscore is there to tell *clients* of the class
> "don't call this directly", but it is still explicitly documented as
> part of the subclassing API.
>
> (the only example I can find at the moment is in asynchat, but I thought
> there were a couple of more common ones than that - hopefully I'll think
> of them later)
>

Queue.Queue in 2.* (and queue.Queue in 3.*) is like that too -- the single
leading underscore meaning "protected" ("I'm here for subclasses to override
me, only" in C++ parlance) and a great way to denote "hook methods" in a
Template Method design pattern instance.  Base class deals with all locking
issues in e.g. 'get' (the method a client calls), subclass can override _get
and not worry about threading (it will be called by parent class's get with
proper locks held and locks will be properly released &c afterwards).

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090405/7ae67f2c/attachment.htm>

From greg.ewing at canterbury.ac.nz  Mon Apr  6 03:06:52 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Apr 2009 13:06:52 +1200
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <loom.20090405T174250-453@post.gmane.org>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>
	<loom.20090404T231154-979@post.gmane.org>
	<49D874E4.6030602@sweetapp.com>
	<loom.20090405T102812-215@post.gmane.org>
	<3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>
	<loom.20090405T174250-453@post.gmane.org>
Message-ID: <49D955AC.2050106@canterbury.ac.nz>

Antoine Pitrou wrote:
> Your proposal looks sane, although the fact that a semi-private method
> (starting with an underscore) is designed to be overriden in some classes is a
> bit annoying.

The only other way I can see is to give up any attempt
in the base class to ensure that flushing occurs before
closing, and make that the responsibility of the derived
class.

-- 
Greg

From skip at pobox.com  Mon Apr  6 04:03:44 2009
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 5 Apr 2009 21:03:44 -0500
Subject: [Python-Dev] Mercurial?
In-Reply-To: <loom.20090405T141950-815@post.gmane.org>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl> <loom.20090405T141950-815@post.gmane.org>
Message-ID: <18905.25344.21772.908887@montanaro.dyndns.org>

After the private hell I've gone through the past few days stumbling around
Mercurial without really understanding what the hell I was doing, I strongly
recommend that when the conversion is complete that there is a "do it just
like you did it with svn" mode available.  Fortunately, this was just with
my little lockfile module, so the damage was very isolated.  (And perhaps
"damage" is the wrong word.  Someone more experienced with hg could almost
certainly correct my mistakes.)  I freely admit that my own misunderstanding
of how Mercurial works was the primary cause of my problems.  Still, until
people are real familiar with what is going on, especially people like me
who have little or no familiarity with dVCSs I think it's best to just treat
it like Subversion if at all possible.

Skip

From skip at pobox.com  Mon Apr  6 04:27:36 2009
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 5 Apr 2009 21:27:36 -0500
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8D280.6060900@ochtman.nl>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl> <loom.20090405T141950-815@post.gmane.org>
	<49D8C5E3.5000200@ochtman.nl> <loom.20090405T150210-608@post.gmane.org>
	<49D8D280.6060900@ochtman.nl>
Message-ID: <18905.26776.630604.13623@montanaro.dyndns.org>

    >> As for the clone time, one of our proeminent developers is, IIRC, on
    >> a 40 kb/s line. Perhaps he wants to step in and say whether cloning
    >> the trunk is a painful experience for him, or not.

    Dirkjan> Sure it's painful, but he only has to go through that once,
    Dirkjan> maybe twice.

Maybe once for each currently active Subversion branch (2.6, 2.7, 3.0, 3.1)?

Skip

From skip at pobox.com  Mon Apr  6 04:50:42 2009
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 5 Apr 2009 21:50:42 -0500
Subject: [Python-Dev] Tools
In-Reply-To: <49D9409C.6060108@ubuntu.com>
References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org>
	<49D9409C.6060108@ubuntu.com>
Message-ID: <18905.28162.645078.593247@montanaro.dyndns.org>

    Barry> Someone asked me at Pycon about stripping out Demos and Tools.

    Matthias> +1, but please for 2.7 and 3.1 only.

Is there a list of other demos or tools which should be deleted?  If
possible the list should be publicized so that people can pick up external
maintenance if desired.

Skip

From jackdied at gmail.com  Mon Apr  6 04:58:22 2009
From: jackdied at gmail.com (Jack diederich)
Date: Sun, 5 Apr 2009 22:58:22 -0400
Subject: [Python-Dev] Tools
In-Reply-To: <18905.28162.645078.593247@montanaro.dyndns.org>
References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org>
	<49D9409C.6060108@ubuntu.com>
	<18905.28162.645078.593247@montanaro.dyndns.org>
Message-ID: <b8e622740904051958h57494a36i54b84fecd8f3de4c@mail.gmail.com>

On Sun, Apr 5, 2009 at 10:50 PM,  <skip at pobox.com> wrote:
> ? ?Barry> Someone asked me at Pycon about stripping out Demos and Tools.
>
> ? ?Matthias> +1, but please for 2.7 and 3.1 only.
>
> Is there a list of other demos or tools which should be deleted? ?If
> possible the list should be publicized so that people can pick up external
> maintenance if desired.

I liked Brett's (Georg's?) half joking idea at sprints.  Just delete
each subdirectory in a separate commit and then wait to see what
people revert.

-Jack

From alexandre at peadrop.com  Mon Apr  6 06:06:15 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 6 Apr 2009 00:06:15 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8EC71.5020105@v.loewis.de>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com> 
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com> 
	<49D8EC71.5020105@v.loewis.de>
Message-ID: <acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com>

On Sun, Apr 5, 2009 at 1:37 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I think it should be stated in the PEP what branches get converted,
> in what form, and what the further usage of the svn repository should
> be.
>

Noted.

> I think there is a long tradition of such annotations; we should
> try to repeat history here. IIUC, the Debian bugtracker understands
>
> ? Closes: #4532
>
> and some other syntaxes. It must be easy to remember, else people
> won't use it.
>

That should reasonable. Personally, I don't really care about the
syntax we would use as long its consistent and documented.

> Any decision to have or not have such a feature should be stated in
> the PEP. I personally don't use IDEs, so I don't care (although
> I do notice that the apparent absence of IDE support for Mercurial
> indicates maturity of the technology)
>

I know Netbeans has Mercurial support built-in (which makes sense
because Sun uses Mercurial for its open-source projects). However, I
am not sure if Eclipse has good Mercurial support yet. There are
3rd-party plugins for Eclipse, but I don't know if they work well.

> Ok, I take that back. I assumed that Mercurial could work *exactly*
> as Subversion. Apparently, that's not the case (although I have no
> idea what a server-side clone is). So I wait for the PEP to explain
> how authentication and access control is to be implemented. Creating
> individual Unix accounts for committers should be avoided.

With Subversion, we can do a server-side clone (or copy) using the copy command:

  svn copy SRC_URL DEST_URL

This prevents wasting time and bandwidth by doing the copy directly on
server. Without this feature, you would need to checkout the remote
repository to clone, then push it to a different location. Since
upload bandwidth is often limited, creating new branch in a such
fashion would be time consuming.

With Mercurial, we will need to add support for server-side clone
ourselves. There's few ways to provide this feature. We give Unix user
accounts to all core developers and let developers manages their
private branches directly on the server. You made clear that this is
not wanted. So an alternative approach is to add a interface
accessible via SSH. As I previously mentioned, this is the approach
used by Mozilla.

Yet another approach would be to add a web interface for managing the
repositories. This what OpenSolaris admins opted for. Personnally, I
do not think this a good idea because it would requires us to roll our
own authentication mechanism which is clearly a bad thing (both
security-wise and usability-wise).

This makes me remember that we will have to decide how we will
reorganize our workflow. For this, we can either be conservative and
keep the current CVS-style development workflow?i.e., a few main
repositories where all developers can commit to. Or we could drink the
kool-aid and go with a kernel-style development workflow?i.e., each
developer maintains his own branch and pull changes from each others.

>From what I have heard, the CVS-style workflow has a lower overhead
than the kernel-style workflow. However the kernel-style workflow
somehow advantageous because changes get reviewed several times before
they get in the main branches. Thus, it is less likely that someone
manage to break the build. In addition, Mercurial is much better
suited at supporting the kernel-style workflow.

However if we go kernel-style, I will need to designate someone (i.e.,
an integrator) that will maintain the main branches, which will tested
by buildbot and used for the public releases. These are issues I would
like to address in the PEP.

> I can give you access to the master setup. Ideally, this should
> be tested before the switchover (with a single branch). We also
> need instructions for the slaves (if any - perhaps installing
> a hg binary is sufficient).
>

I am not too familiar with our buildbot setup. So, I will to do some
reading before actually doing any change. You can give me access to
the buildbot master now. However, I would use this access only to
study how the current setup works and to plan the changes we need
accordingly.

>> Since the directories in /external are considered read-only, we could
>> simply a new Mercurial repository and copy the content of /external in
>> it.
>>> - decide what to do with the bzr mirrors
>>>
>>
>> I don't see much benefits to keep them.
>
> Both should go into the PEP.

Noted.

Regards,
-- Alexandre

From aahz at pythoncraft.com  Mon Apr  6 06:20:16 2009
From: aahz at pythoncraft.com (Aahz)
Date: Sun, 5 Apr 2009 21:20:16 -0700
Subject: [Python-Dev] Mercurial?
In-Reply-To: <acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
	<49D8EC71.5020105@v.loewis.de>
	<acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com>
Message-ID: <20090406042016.GA97@panix.com>

On Mon, Apr 06, 2009, Alexandre Vassalotti wrote:
>
> This makes me remember that we will have to decide how we will
> reorganize our workflow. For this, we can either be conservative and
> keep the current CVS-style development workflow?i.e., a few main
> repositories where all developers can commit to. Or we could drink the
> kool-aid and go with a kernel-style development workflow?i.e., each
> developer maintains his own branch and pull changes from each others.

How difficult would it be to change the decision later?  That is, how
about starting with a CVS-style system and maybe switch to kernel-style
once people get comfortable with Hg?
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"...string iteration isn't about treating strings as sequences of strings, 
it's about treating strings as sequences of characters.  The fact that
characters are also strings is the reason we have problems, but characters 
are strings for other good reasons."  --Aahz

From alexandre at peadrop.com  Mon Apr  6 06:20:52 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 6 Apr 2009 00:20:52 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8FC47.8080803@ochtman.nl>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com> 
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com> 
	<49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de> 
	<49D8FC47.8080803@ochtman.nl>
Message-ID: <acd65fa20904052120h54132755g7ebf936f7842a69a@mail.gmail.com>

On Sun, Apr 5, 2009 at 2:45 PM, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
> On 05/04/2009 20:36, "Martin v. L?wis" wrote:
>>
>> We do require full real names (i.e. no nicknames). Can Mercurial
>> guarantee such a thing?
>
> We could pre-record the list of allowed names in a hook, then have the hook
> check that usernames include one of those names and an email address (so
> people can still start using another email address).
>

But that won't work if people who are not core developers submit us
patch bundle to import. And maintaining a such white-list sounds to me
more burdensome than necessary.

-- Alexandre

From alexandre at peadrop.com  Mon Apr  6 06:26:06 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 6 Apr 2009 00:26:06 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8FB14.2080509@v.loewis.de>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com> 
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com> 
	<49D8BC81.7040007@ochtman.nl> <loom.20090405T141950-815@post.gmane.org>
	<49D8C5E3.5000200@ochtman.nl> <49D8FB14.2080509@v.loewis.de>
Message-ID: <acd65fa20904052126l70c8546bgc659a4f9b6e01060@mail.gmail.com>

On Sun, Apr 5, 2009 at 2:40 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Okay, sounds like that will be easy. Would be good to enable compression
>> on the SSH, though, if that's not already done.
>
> Where is that configured?
>

If I recall correctly, only ssh clients can request compression to the
server?in other words, the server cannot forces the clients to use
compression, but merely allow them use it.

See the man page for sshd_config and ssh_config for the specific details.

-- Alexandre

From alexandre at peadrop.com  Mon Apr  6 06:31:56 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 6 Apr 2009 00:31:56 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <20090406042016.GA97@panix.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com> 
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com> 
	<49D8EC71.5020105@v.loewis.de>
	<acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com> 
	<20090406042016.GA97@panix.com>
Message-ID: <acd65fa20904052131q2575299bi5c81659f62c3dabf@mail.gmail.com>

On Mon, Apr 6, 2009 at 12:20 AM, Aahz <aahz at pythoncraft.com> wrote:
> How difficult would it be to change the decision later? ?That is, how
> about starting with a CVS-style system and maybe switch to kernel-style
> once people get comfortable with Hg?

I believe it would be fairly easy. It would be a matter of declaring a
volunteer to maintain the main repositories and ask core developers to
avoid committing directly to them.

Cheers,
-- Alexandre

From dirkjan at ochtman.nl  Mon Apr  6 08:00:10 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 6 Apr 2009 08:00:10 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D93524.3060500@gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl> <49D93524.3060500@gmail.com>
Message-ID: <ea2499da0904052300t5d0ff402o7146812a6f0615ce@mail.gmail.com>

On Mon, Apr 6, 2009 at 00:48, Mark Hammond <skippy.hammond at gmail.com> wrote:
> Just to be clear, what input would you like on that map?

Review of email addresses, pointers to names/email addresses for the
usernames I don't have anything for yet. Also, there's a few commented
question marks, it would be useful if someone checked those.

> I'm listed twice:
>
> mark.hammond = Mark Hammond <skippy.hammond at gmail.com>
> mhammond = Mark Hammond <skippy.hammond at gmail.com>
>
> but that email address isn't the address normally associated with any
> checkins I make, nor the address in the comments of the ssh keys I use
> (which is mhammond at skippinet.com.au)

Your being listed twice is normal; both mark.hammond and mhammond have
been used in the commit history, and I just assumed they're both you.
I'll probably change your email address to be the one associated with
the checkins/public key, though. Is there a list of such email
addresses? I just parsed python-dev archives to get to my list.

Cheers,

Dirkjan

From dirkjan at ochtman.nl  Mon Apr  6 08:03:18 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 6 Apr 2009 08:03:18 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <18905.26776.630604.13623@montanaro.dyndns.org>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl>
	<loom.20090405T141950-815@post.gmane.org>
	<49D8C5E3.5000200@ochtman.nl>
	<loom.20090405T150210-608@post.gmane.org>
	<49D8D280.6060900@ochtman.nl>
	<18905.26776.630604.13623@montanaro.dyndns.org>
Message-ID: <ea2499da0904052303h61a83476xbb01f9a0bd92b59@mail.gmail.com>

On Mon, Apr 6, 2009 at 04:27,  <skip at pobox.com> wrote:
> Maybe once for each currently active Subversion branch (2.6, 2.7, 3.0, 3.1)?

Sure, if we're doing cloned branches. But then someone will also need
to clone 2.5, and maybe 2.4. The point is, as long as it's a constant
factor and not an order of magnitude more, it's still quite easy to
cope with.

This would also be one of the arguments *for* named branches, I suppose.

Cheers,

Dirkjan

From dirkjan at ochtman.nl  Mon Apr  6 08:04:52 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 6 Apr 2009 08:04:52 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <acd65fa20904052120h54132755g7ebf936f7842a69a@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de>
	<49D8FC47.8080803@ochtman.nl>
	<acd65fa20904052120h54132755g7ebf936f7842a69a@mail.gmail.com>
Message-ID: <ea2499da0904052304n277d3874la32c7056dc3e76a3@mail.gmail.com>

On Mon, Apr 6, 2009 at 06:20, Alexandre Vassalotti
<alexandre at peadrop.com> wrote:
> But that won't work if people who are not core developers submit us
> patch bundle to import. And maintaining a such white-list sounds to me
> more burdensome than necessary.

Well, if you need contributors to sign a contributor's agreement
anyway, there's already some list out there that we can leverage.

The other option is to play the consenting adults card and ask all
people with push access to ascertain the correct names of committer
names on patches they push.

Cheers,

Dirkjan

From dirkjan at ochtman.nl  Mon Apr  6 08:07:39 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 6 Apr 2009 06:07:39 +0000 (UTC)
Subject: [Python-Dev] Mercurial?
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl>
	<loom.20090405T141950-815@post.gmane.org>
	<49D8C5E3.5000200@ochtman.nl> <49D8FB14.2080509@v.loewis.de>
	<acd65fa20904052126l70c8546bgc659a4f9b6e01060@mail.gmail.com>
Message-ID: <loom.20090406T060532-657@post.gmane.org>

Alexandre Vassalotti <alexandre <at> peadrop.com> writes:
> If I recall correctly, only ssh clients can request compression to the
> server?in other words, the server cannot forces the clients to use
> compression, but merely allow them use it.
> 
> See the man page for sshd_config and ssh_config for the specific details.

So we'll explain how to configure that in the .hgrc/Mercurial.ini file that
people will have to create anyway.

Alternatively, we do it the way Mozilla has done and let everyone clone/pull
over http and push over ssh. Then everyone always gets compression for the big
clones/pulls, pushes are a little slower (but they aren't usually that large),
and people who don't have push access already have the right setup.

Cheers,

Dirkjan

From dirkjan at ochtman.nl  Mon Apr  6 08:13:00 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 6 Apr 2009 06:13:00 +0000 (UTC)
Subject: [Python-Dev] Mercurial?
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
	<49D8BFD2.8090600@ochtman.nl> <49D8F02C.1040008@v.loewis.de>
Message-ID: <loom.20090406T060932-957@post.gmane.org>

Martin v. L?wis <martin <at> v.loewis.de> writes:
> Is it possible to branch from a subdirectory? For the "different VMs"
> stuff, it's rather desirable to have a branch of the test suite, and
> the perhaps the standard library, than extracting it from the source.

You can only branch the whole repository. Of course you could drop the other
stuff right after branching it, but that would kind of defy the point of
branching (since you won't really be able to merge later on).

This is why it might be interesting to just split out the stdlib entirely.
Though maybe we should wait for Mercurial's subrepos support to arrive before we
go there (so we can a CPython repo that has the stdlib repo included
automatically). Something like that is already provided by the forest extension,
but it's not being maintained. Subrepo support is slated for the 1.3 release,
which is planned for early July.

Cheers,

Dirkjan

From dirkjan at ochtman.nl  Mon Apr  6 08:21:05 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 6 Apr 2009 06:21:05 +0000 (UTC)
Subject: [Python-Dev] Mercurial?
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
	<49D8EC71.5020105@v.loewis.de>
	<acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com>
Message-ID: <loom.20090406T061434-911@post.gmane.org>

Alexandre Vassalotti <alexandre <at> peadrop.com> writes:
> With Mercurial, we will need to add support for server-side clone
> ourselves. There's few ways to provide this feature. We give Unix user
> accounts to all core developers and let developers manages their
> private branches directly on the server. You made clear that this is
> not wanted. So an alternative approach is to add a interface
> accessible via SSH. As I previously mentioned, this is the approach
> used by Mozilla.

The easier solution here is to just allow normal local-to-remote clones. hg
supports commands like hg clone . ssh://hg at hg.python.org/my-branch without the
need for any extra scripts or setup. I think that would be a good start.

> This makes me remember that we will have to decide how we will
> reorganize our workflow. For this, we can either be conservative and
> keep the current CVS-style development workflow?i.e., a few main
> repositories where all developers can commit to. Or we could drink the
> kool-aid and go with a kernel-style development workflow?i.e., each
> developer maintains his own branch and pull changes from each others.

The differences between these workflows aren't all that big, i.e. it's not like
there's a big schisma between them. But I suspect that, in a setup where
buildbots are important, a very much multi-repo setup probably isn't a good idea
(this is also why Mozilla doesn't use that many repos; their continuous
integration infra is /very/ important to them).

Cheers,

Dirkjan

From brian at sweetapp.com  Mon Apr  6 08:51:21 2009
From: brian at sweetapp.com (Brian Quinlan)
Date: Mon, 06 Apr 2009 07:51:21 +0100
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>	<loom.20090404T231154-979@post.gmane.org>	<49D874E4.6030602@sweetapp.com>	<loom.20090405T102812-215@post.gmane.org>
	<3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>
Message-ID: <49D9A669.9010008@sweetapp.com>

James Y Knight wrote:
> 
> On Apr 5, 2009, at 6:29 AM, Antoine Pitrou wrote:
> 
>> Brian Quinlan <brian <at> sweetapp.com> writes:
>>>
>>> I don't see why this is helpful. Could you explain why
>>> _RawIOBase.close() calling self.flush() is useful?
>>
>> I could not explain it for sure since I didn't write the Python version.
>> I suppose it's so that people who only override flush() automatically 
>> get the
>> flush-on-close behaviour.
> 
> It seems that a separate method "_internal_close" should've been defined 
> to do the actual closing of the file, and the close() method should've 
> been defined on the base class as "self.flush(); self._internal_close()" 
> and never overridden.

Are you imagining something like this?

class RawIOBase(object):
   def flush(self): pass
   def _internal_close(self): pass
   def close(self):
     self.flush()
     self._internal_close()

class FileIO(RawIOBase):
   def _internal_close(self):
     # Do close
     super()._internal_close()

class SomeSubclass(FileIO):
   def flush(self):
     # Do flush
     super().flush()

   def _internal_close(self):
     # Do close
     super()._internal_close()

That looks pretty good. RawIOBase.close acts as the controller and 
.flush() calls move up the class hierarchy.

The downsides that I see:
- you need the cooperation of your subclasses i.e. they must call
   super().flush() in .flush() to get correct close behavior (and this
   represents a backwards-incompatible semantic change)
- there is also going to be some extra method calls

Another approach is to get every subclass to deal with their own close 
semantics i.e.

class RawIOBase(object):
   def flush(self): pass
   def close(self): pass

class FileIO(RawIOBase):
    def close(self):
      # Do close
      super().close()

class SomeSubclass(FileIO):
   def _flush_internal(self):
     # Do flush

   def flush(self):
     self._flush_internal()
     super().flush()

   def close(self):
     FileIO._flush_internal(self)
     # Do close
     super().close()

I was thinking about this approach when I wrote this patch:
http://bugs.python.org/file13620/remove_flush.diff

But I think I like your way better. Let me play with it a bit.

Cheers,
Brian

From larry at hastings.org  Mon Apr  6 10:00:57 2009
From: larry at hastings.org (Larry Hastings)
Date: Mon, 06 Apr 2009 01:00:57 -0700
Subject: [Python-Dev] CObject take 2: Introducing the "Capsule"
Message-ID: <49D9B6B9.6020304@hastings.org>

(See my posting "Let's update CObject API so it is safe and regular!" 
from 2009/03/31 for "take 1").

I discussed this off-list with GvR.  He was primarily concerned with 
fixing the passing-around-a-vtable C API usage of CObject, but he wanted 
to preserve as much backwards compatibility as possible.  In the end, he 
suggested I create a new API and leave CObject unchanged.  I've done 
that, incorporating many of GvR's suggestions, though the blame for the 
proposed new API is ultimately mine.

The new object is called a "Capsule".  (I *had* wanted to call it 
"Wrapper", but there's already a PyWrapper_New in descrobject.h.)

Highlights of the new API:
* PyCapsule_New() replaces PyCObject_FromVoidPtr.
  * It takes a void * pointer, a const char *name, and a destructor.
  * The pointer must not be NULL.
  * The name may be NULL; if it is not NULL, it must be a valid C string 
which outlives the capsule.
  * The destructor takes a PyObject *, not a void *.
* PyCapsule_GetPointer() replaces PyCObject_AsVoidPtr.
  * It takes a PyObject * and a const char *name.
  * The name must compare to the name inside the object; either they're 
both NULL or they strcmp to be the same.
* PyCapsule_Import() replaces PyCObject_Import.
  * It takes three arguments: const char *module_name, const char 
*attribute_name, int no_block.
  * It ensures that the "name" of the Capsule is "modulename.attributename".
  * If no_block is true, it uses PyModule_ImportModuleNoBlock.  If this 
fails it sets no exception.
* The PyCapsule structure is private.  There are accessors for all 
fields: pointer, name, destructor, and "context".
  * The "context" is a second "void *" you can set / get.

You can read the full API and its implementation in the patch I just 
posted to the tracker:
    http://bugs.python.org/issue5630
The patch was written against svn r71304.  The patch isn't ready to be 
applied--there is no documentation for the new API beyond the header file.

GvR and I disagree on one point: he thinks that we should leave CObject 
in forever, undeprecated.  I think we should deprecate it now and remove 
it... whenever we'd do that.  The new API does everything the old one 
does, and more, and it's cleaner and safer.  Let me start an informal 
poll: assuming we accept the new API, should we deprecate CObject?

/larry/

From phil at freehackers.org  Mon Apr  6 10:21:36 2009
From: phil at freehackers.org (Philippe Fremy)
Date: Mon, 06 Apr 2009 10:21:36 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8FBCC.1050801@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>
	<49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl>
Message-ID: <49D9BB90.8040008@freehackers.org>

Dirkjan Ochtman wrote:
> On 05/04/2009 20:29, "Martin v. L?wis" wrote:
>> FYI: this is the list of hooks currently employed:
>> - pre: check whitespace
>> - post: mail python-checkins
>>          inform regular buildbot
>>          inform community buildbot
>>          trigger website rebuild if a PEP was modified
>>          (but then, whether or not the PEPs will be maintained
>>           in hg also needs to be decided)
> 
> All this is easy to do with Mercurial's hook system. 

One question: if somebody pushes a changeset with 3 commits, will the
pre and post hooks be applied on all of the commits, or only on the
final commit ?

If this is applied on every commit, then you have no way to fix a
whitespace problem without rewriting your history ?

cheers,

Philippe

From dirkjan at ochtman.nl  Mon Apr  6 10:33:36 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 6 Apr 2009 10:33:36 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D9BB90.8040008@freehackers.org>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de>
	<49D8FBCC.1050801@ochtman.nl> <49D9BB90.8040008@freehackers.org>
Message-ID: <ea2499da0904060133k5a4d0572vd6aea50f3f5f5cff@mail.gmail.com>

On Mon, Apr 6, 2009 at 10:21, Philippe Fremy <phil at freehackers.org> wrote:
> One question: if somebody pushes a changeset with 3 commits, will the
> pre and post hooks be applied on all of the commits, or only on the
> final commit ?
>
> If this is applied on every commit, then you have no way to fix a
> whitespace problem without rewriting your history ?

Correct, so if the latter is something we want, we could run the
whitespace hook just on every changegroup (group of changesets
pushed).

Cheers,

Dirkjan

From aafshar at gmail.com  Mon Apr  6 09:52:15 2009
From: aafshar at gmail.com (Ali Afshar)
Date: Mon, 06 Apr 2009 08:52:15 +0100
Subject: [Python-Dev] Mercurial?
In-Reply-To: <2d75d7660904051656s2242a9ex91ac0a2d8056cbfe@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
	<49D8EC71.5020105@v.loewis.de>
	<2d75d7660904051656s2242a9ex91ac0a2d8056cbfe@mail.gmail.com>
Message-ID: <49D9B4AF.6080403@gmail.com>

Daniel (ajax) Diniz wrote:
> "Martin v. L?wis" wrote:
>   
>>> I think it would be a good idea to host a temporary svn mirrors for
>>> developers who accesses their VCS via an IDE. Although, I am sure
>>> anymore if supporting these developers (if there are any) would worth
>>> the trouble. So, think of this as optional.
>>>       
>> Any decision to have or not have such a feature should be stated in
>> the PEP. I personally don't use IDEs, so I don't care (although
>> I do notice that the apparent absence of IDE support for Mercurial
>> indicates maturity of the technology)
>>     
>
> I can spend some time on Mercurial integration for the main IDEs in
> use by core devs, I'm sure the PIDA folks have most of this sorted
> already. It would be important to have these (and any other non-PEP
> worthy tasks/helpers) listed with some detail, e.g., in a wiki page.
>
>   
Well PIDA is the IDE-hater's IDE, but yes, it has excellent Mercurial
integration (probably the best integration of any system). It is all through
anyvc with a small amount of user interface added. I am sure this would
be easily portable.

Ali (thanks for cc)

From phil at freehackers.org  Mon Apr  6 11:14:39 2009
From: phil at freehackers.org (Philippe Fremy)
Date: Mon, 06 Apr 2009 11:14:39 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <ea2499da0904060133k5a4d0572vd6aea50f3f5f5cff@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>	
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	
	<49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de>	
	<49D8FBCC.1050801@ochtman.nl> <49D9BB90.8040008@freehackers.org>
	<ea2499da0904060133k5a4d0572vd6aea50f3f5f5cff@mail.gmail.com>
Message-ID: <49D9C7FF.80506@freehackers.org>

Dirkjan Ochtman wrote:
> On Mon, Apr 6, 2009 at 10:21, Philippe Fremy <phil at freehackers.org> wrote:
>> One question: if somebody pushes a changeset with 3 commits, will the
>> pre and post hooks be applied on all of the commits, or only on the
>> final commit ?
>>
>> If this is applied on every commit, then you have no way to fix a
>> whitespace problem without rewriting your history ?
> 
> Correct, so if the latter is something we want, we could run the
> whitespace hook just on every changegroup (group of changesets
> pushed).

Probably wise, and for many other checks as well.

This is a problem I have with my daily usage of mercurial. It's supposed
to be great to work offline and to commit your intermediate versions
before it's fully working but if you do that, all those intermediate non
working versions find their way into the main repository.

This means that something like "all test pass 100% or close on every
version of the repository" is not really feasible unless every committer
agrees not to have any version in his local repository that does not
break any tests. Which defeats part of the purpose of being able to have
a local repository, no ?

cheers,

Philippe

From dirkjan at ochtman.nl  Mon Apr  6 11:41:30 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 6 Apr 2009 11:41:30 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D9C7FF.80506@freehackers.org>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de>
	<49D8FBCC.1050801@ochtman.nl> <49D9BB90.8040008@freehackers.org>
	<ea2499da0904060133k5a4d0572vd6aea50f3f5f5cff@mail.gmail.com>
	<49D9C7FF.80506@freehackers.org>
Message-ID: <ea2499da0904060241y45c299efx478f1c7af14cd09b@mail.gmail.com>

On Mon, Apr 6, 2009 at 11:14, Philippe Fremy <phil at freehackers.org> wrote:
> This is a problem I have with my daily usage of mercurial. It's supposed
> to be great to work offline and to commit your intermediate versions
> before it's fully working but if you do that, all those intermediate non
> working versions find their way into the main repository.

Well, it can also be nice to have smaller commits. They're easier to
review, and will provide easier history to browse/read later on.

BTW, having smaller commits doesn't necessarily equate having
non-working changesets. I.e. in my work on Mercurial, I'll often push
small changesets (we all do), but we try to keep the test suite
passing in every single one of them.

> This means that something like "all test pass 100% or close on every
> version of the repository" is not really feasible unless every committer
> agrees not to have any version in his local repository that does not
> break any tests. Which defeats part of the purpose of being able to have
> a local repository, no ?

This is why you'd want something like a pushlog, to provide a way to
see what revisions were actually tested by buildbots.

Another thing that I discussed with Georg last night would be a setup
where changesets get pushed to a gateway repo that runs the tests and
only pushes to an "official" repo if everything's still green. That
should probably be a topic discussed separately, though.

Cheers,

Dirkjan

From ncoghlan at gmail.com  Mon Apr  6 13:08:47 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Apr 2009 21:08:47 +1000
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <e8a0972d0904051728l434b2cel8dd02e6c06926298@mail.gmail.com>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>	
	<loom.20090404T231154-979@post.gmane.org>	
	<49D874E4.6030602@sweetapp.com>	
	<loom.20090405T102812-215@post.gmane.org>	
	<3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>	
	<loom.20090405T174250-453@post.gmane.org>
	<49D9369C.8080400@gmail.com>
	<e8a0972d0904051728l434b2cel8dd02e6c06926298@mail.gmail.com>
Message-ID: <49D9E2BF.1010604@gmail.com>

Alex Martelli wrote:
> Queue.Queue in 2.* (and queue.Queue in 3.*) is like that too -- the
> single leading underscore meaning "protected" ("I'm here for subclasses
> to override me, only" in C++ parlance) and a great way to denote "hook
> methods" in a Template Method design pattern instance.  Base class deals
> with all locking issues in e.g. 'get' (the method a client calls),
> subclass can override _get and not worry about threading (it will be
> called by parent class's get with proper locks held and locks will be
> properly released &c afterwards).

Ah, thank you - yes, that's the one I was thinking of. My brain was
telling me "threading", which makes some sense, since I put the Queue
conceptually in the same bucket as the rest of the locking constructs in
the threading module.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Mon Apr  6 13:13:36 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Apr 2009 21:13:36 +1000
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <49D9A669.9010008@sweetapp.com>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>	<loom.20090404T231154-979@post.gmane.org>	<49D874E4.6030602@sweetapp.com>	<loom.20090405T102812-215@post.gmane.org>	<3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>
	<49D9A669.9010008@sweetapp.com>
Message-ID: <49D9E3E0.2060408@gmail.com>

Brian Quinlan wrote:
> - you need the cooperation of your subclasses i.e. they must call
>   super().flush() in .flush() to get correct close behavior (and this
>   represents a backwards-incompatible semantic change)

Are you sure about that? Going by the current _pyio semantics that
Antoine posted, it looks to me that it is already the case that
subclasses need to invoke the parent flush() call correctly to avoid
breaking the base class semantics (which really isn't an uncommon
problem when it comes to writing correct subclasses).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Mon Apr  6 13:36:14 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Apr 2009 21:36:14 +1000
Subject: [Python-Dev] Mercurial?
In-Reply-To: <acd65fa20904052131q2575299bi5c81659f62c3dabf@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
	<49D8EC71.5020105@v.loewis.de>	<acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com>
	<20090406042016.GA97@panix.com>
	<acd65fa20904052131q2575299bi5c81659f62c3dabf@mail.gmail.com>
Message-ID: <49D9E92E.4010603@gmail.com>

Alexandre Vassalotti wrote:
> On Mon, Apr 6, 2009 at 12:20 AM, Aahz <aahz at pythoncraft.com> wrote:
>> How difficult would it be to change the decision later?  That is, how
>> about starting with a CVS-style system and maybe switch to kernel-style
>> once people get comfortable with Hg?
> 
> I believe it would be fairly easy. It would be a matter of declaring a
> volunteer to maintain the main repositories and ask core developers to
> avoid committing directly to them.

I think that would be the way to go then (i.e. start with a fairly
centralised workflow, and then look at adjusting to something more
decentralised later)*.

Cheers,
Nick.

*I actually had an interesting off-list discussion with Steve Turnbull
regarding how well the 3 most popular DVCS tools supported centralised
and decentralised workflows (or rather, how their advocates evangelise
them in that respect). This is relevant when pitching a DVCS to people
like me that really only have experience working with a centralised
repository model like CVS or SVN.

My guess was that Bazaar anchored the "centralised" end of the DVCS
scale by letting users avoid caring about the underlying acyclic graph,
while Git was solidly down the "decentralised" end with users expected
to be fully aware of and comfortable with the graph. Mercurial appeared
to be somewhere in the middle, as it allowed you to avoid caring about
the graph most of the time, but still provided tools to manipulate it
when you needed to.

That makes Bazaar easy to pitch conceptually to someone like me ("you
can use it just like you use SVN, only with much better merging and
offline support"), and Git a tough sell ("umm, yeah, you really think
about version control all wrong... we're going to have to fix that
before Git makes much sense to you"). Mercurial appears to best allow
the sales pitch to be tailored to the target audience (in this case, a
group including a lot of people with a background predominantly
involving centralised version control tools).

That's just a subjective impression formed from reading what other
people have written *about* the various tools, rather than anything
based on my own experience using them, so you may want to investigate
the location of the nearest salt mine before taking it too seriously :)

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From hrvoje.niksic at avl.com  Mon Apr  6 13:37:15 2009
From: hrvoje.niksic at avl.com (Hrvoje Niksic)
Date: Mon, 06 Apr 2009 13:37:15 +0200
Subject: [Python-Dev] Getting values stored inside sets
In-Reply-To: <17434881.97057.1238778876345.JavaMail.xicrypt@atgrzls001>
References: <49D5FBE6.6090807@avl.com> <gr4v8q$1tm$1@ger.gmane.org>
	<17434881.97057.1238778876345.JavaMail.xicrypt@atgrzls001>
Message-ID: <49D9E96B.1060805@avl.com>

Raymond Hettinger wrote:
>> Hrvoje Niksic wrote:
>>> I've stumbled upon an oddity using sets.  It's trivial to test if a 
>>> value is in the set, but it appears to be impossible to retrieve a 
>>> stored value, 
> 
> See:  http://code.activestate.com/recipes/499299/

Thanks, this is *really* good, the kind of idea that seems perfectly 
obvious once pointed out by someone else.  :-)  I'd still prefer sets to 
get this functionality so they can be used to implement, say, interning, 
but this is good enough for me.

In fact, I can derive from set and add a method similar to that in the 
recipe.  It can be a bit simpler than yours because it only needs to 
support operations needed by sets (__eq__ and __hash__), not arbitrary 
attributes.

class Set(set):
     def find(self, item, default=None):
         capt = _CaptureEq(item)
         if capt in self:
             return capt.match
         return default

class _CaptureEq(object):
     __slots__ = 'obj', 'match'
     def __init__(self, obj):
         self.obj = obj
     def __eq__(self, other):
         eq = (self.obj == other)
         if eq:
             self.match = other
         return eq
     def __hash__(self):
         return hash(self.obj)

 >>> s = Set([1, 2, 3])
 >>> s.find(2.0)
2

From ncoghlan at gmail.com  Mon Apr  6 13:44:21 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Apr 2009 21:44:21 +1000
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D8BC81.7040007@ochtman.nl>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl>
Message-ID: <49D9EB15.8070806@gmail.com>

Dirkjan Ochtman wrote:
> I have a stab at an author map at http://dirkjan.ochtman.nl/author-map.
> Could use some review, but it seems like a good start.

Martin may be able to provide a better list of names based on the
checkin name<->SSH public key mapping in the SVN setup.

(e.g. I believe my SVN checkin name is nick.coghlan rather than the
shorter ncoghlan in my email address, and many others are in a similar
boat since first.last was the chosen scheme for names in the SVN switchover)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Mon Apr  6 13:47:02 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 06 Apr 2009 21:47:02 +1000
Subject: [Python-Dev] Mercurial?
In-Reply-To: <ea2499da0904060241y45c299efx478f1c7af14cd09b@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>
	<49D8F88B.3050102@v.loewis.de>	<49D8FBCC.1050801@ochtman.nl>
	<49D9BB90.8040008@freehackers.org>	<ea2499da0904060133k5a4d0572vd6aea50f3f5f5cff@mail.gmail.com>	<49D9C7FF.80506@freehackers.org>
	<ea2499da0904060241y45c299efx478f1c7af14cd09b@mail.gmail.com>
Message-ID: <49D9EBB6.1080004@gmail.com>

Dirkjan Ochtman wrote:
> Another thing that I discussed with Georg last night would be a setup
> where changesets get pushed to a gateway repo that runs the tests and
> only pushes to an "official" repo if everything's still green. That
> should probably be a topic discussed separately, though.

That was one of the post-switch workflow enhancements that Barry was
advocating - it's still a good idea, even if Barry's preferred flavour
of DVCS wasn't chosen :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From fuzzyman at voidspace.org.uk  Mon Apr  6 13:55:55 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Mon, 06 Apr 2009 12:55:55 +0100
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D9EBB6.1080004@gmail.com>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<49D8F88B.3050102@v.loewis.de>	<49D8FBCC.1050801@ochtman.nl>	<49D9BB90.8040008@freehackers.org>	<ea2499da0904060133k5a4d0572vd6aea50f3f5f5cff@mail.gmail.com>	<49D9C7FF.80506@freehackers.org>	<ea2499da0904060241y45c299efx478f1c7af14cd09b@mail.gmail.com>
	<49D9EBB6.1080004@gmail.com>
Message-ID: <49D9EDCB.7010905@voidspace.org.uk>

Nick Coghlan wrote:
> Dirkjan Ochtman wrote:
>   
>> Another thing that I discussed with Georg last night would be a setup
>> where changesets get pushed to a gateway repo that runs the tests and
>> only pushes to an "official" repo if everything's still green. That
>> should probably be a topic discussed separately, though.
>>     
>
> That was one of the post-switch workflow enhancements that Barry was
> advocating - it's still a good idea, even if Barry's preferred flavour
> of DVCS wasn't chosen :)
>
>   

Gated checkins can work fine but can also have many problems. For 
example if we have a spuriously failing test then if you are working on 
an unrelated issue it will be entirely up to chance as to whether you 
can checkin...

Building the docs would be another thing we could check, although it can 
take a while.

If we have a queue then it could be the case that you do a commit - and 
then discover half an hour later that it conflicts with something that 
was ahead of you in the queue.

Michael

> Cheers,
> Nick.
>
>   

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From dirkjan at ochtman.nl  Mon Apr  6 14:23:29 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 6 Apr 2009 14:23:29 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D9EDCB.7010905@voidspace.org.uk>
References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl>
	<49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl>
	<49D9BB90.8040008@freehackers.org>
	<ea2499da0904060133k5a4d0572vd6aea50f3f5f5cff@mail.gmail.com>
	<49D9C7FF.80506@freehackers.org>
	<ea2499da0904060241y45c299efx478f1c7af14cd09b@mail.gmail.com>
	<49D9EBB6.1080004@gmail.com> <49D9EDCB.7010905@voidspace.org.uk>
Message-ID: <ea2499da0904060523m5b999e1j3f1a88bf9271b499@mail.gmail.com>

On Mon, Apr 6, 2009 at 13:55, Michael Foord <fuzzyman at voidspace.org.uk> wrote:
> Gated checkins can work fine but can also have many problems. For example if
> we have a spuriously failing test then if you are working on an unrelated
> issue it will be entirely up to chance as to whether you can checkin...

Sure, it's a problem, but it does get you a tree that's always green.
They're all trade-offs. But let's keep this discussion for some time
*after* migration to hg is completed.

Cheers,

Dirkjan

From jnoller at gmail.com  Mon Apr  6 14:55:58 2009
From: jnoller at gmail.com (Jesse Noller)
Date: Mon, 6 Apr 2009 08:55:58 -0400
Subject: [Python-Dev] Tools
In-Reply-To: <b8e622740904051958h57494a36i54b84fecd8f3de4c@mail.gmail.com>
References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org>
	<49D9409C.6060108@ubuntu.com>
	<18905.28162.645078.593247@montanaro.dyndns.org>
	<b8e622740904051958h57494a36i54b84fecd8f3de4c@mail.gmail.com>
Message-ID: <4222a8490904060555u676e0a02y50db666a9f447436@mail.gmail.com>

On Sun, Apr 5, 2009 at 10:58 PM, Jack diederich <jackdied at gmail.com> wrote:
> On Sun, Apr 5, 2009 at 10:50 PM, ?<skip at pobox.com> wrote:
>> ? ?Barry> Someone asked me at Pycon about stripping out Demos and Tools.
>>
>> ? ?Matthias> +1, but please for 2.7 and 3.1 only.
>>
>> Is there a list of other demos or tools which should be deleted? ?If
>> possible the list should be publicized so that people can pick up external
>> maintenance if desired.
>
> I liked Brett's (Georg's?) half joking idea at sprints. ?Just delete
> each subdirectory in a separate commit and then wait to see what
> people revert.
>
> -Jack

Jack brought up a good point - this discussion came up during the
sprints, I believe Martin and others had some good arguments to keep
*some* of the demo/... stuff, however I think we all agreed that it
belongs somewhere else; possibly the documentation.

As it is, the demo/... directory only exists in subversion - it's not
installed anywhere. I really do think that most of the contents can
either be deleted, or moved to the docs where it might be of more use
for people in general.

Random thought - what if we made a docs/demos directory, which
contained sub directories ala Demo/... - and added a sphinx extension
which would detect nested directories and zip them up during the
build? This way, you could add a tag in the .rst for the module that
looked like:

.. demos::
    multiprocessing.zip

The zip would not be checked in, but created at build time from
Docs/demos/multiprocessing

Just some thoughts. Back to my coffee.

-jesse

From chris at simplistix.co.uk  Mon Apr  6 15:00:18 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Mon, 06 Apr 2009 14:00:18 +0100
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D669AA.6080001@v.loewis.de>
References: <49D4DA72.60401@v.loewis.de> <49D51A16.70804@simplistix.co.uk>
	<49D669AA.6080001@v.loewis.de>
Message-ID: <49D9FCE2.4070805@simplistix.co.uk>

Martin v. L?wis wrote:
> Chris Withers wrote:
>> Martin v. L?wis wrote:
>>> I propose the following PEP for inclusion to Python 3.1.
>>> Please comment.
>> Would this support the following case:
>>
>> I have a package called mortar, which defines useful stuff:
>>
>> from mortar import content, ...
>>
>> I now want to distribute large optional chunks separately, but ideally
>> so that the following will will work:
>>
>> from mortar.rbd import ...
>> from mortar.zodb import ...
>> from mortar.wsgi import ...
>>
>> Does the PEP support this? 
> 
> That's the primary purpose of the PEP. 

Are you sure?

Does the pep really allow for:

from mortar import content
from mortar.rdb import something

...where 'content' is a function defined in mortar/__init__.py and 
'something' is a function defined in mortar/rdb/__init__.py *and* the 
following are separate distributions on PyPI:

- mortar
- mortar.rdb

...where 'mortar' does not contain 'mortar.rdb'.

 > You can do this today already
> (see the zope package,

No, they have nothing but a (functionally) empty __init__.py in the zope 
package.

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From chris at simplistix.co.uk  Mon Apr  6 15:01:09 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Mon, 06 Apr 2009 14:01:09 +0100
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <1afaf6160904031427p7fa95d07q340fd54cb7c34963@mail.gmail.com>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>	
	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>	
	<49D35A39.7020507@simplistix.co.uk>	
	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>	
	<49D52B2C.5050509@simplistix.co.uk>	
	<ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com>	
	<49D52C5B.7010506@simplistix.co.uk>	
	<ca471dc20904021449x659f930bn7b61feec6b640752@mail.gmail.com>	
	<49D63465.80401@simplistix.co.uk>
	<1afaf6160904031427p7fa95d07q340fd54cb7c34963@mail.gmail.com>
Message-ID: <49D9FD15.9030406@simplistix.co.uk>

Benjamin Peterson wrote:
>>>> Assuming it breaks no tests, would there be objection to me committing
>>>> the
>>>> above change to the Python 3 trunk?
>>> That's up to Benjamin. Personally, I live by "if it ain't broke, don't
>>> fix it." :-)
>> Anything using an exec is broken by definition ;-)
> 
> "practicality beats purity"
> 
>> Benjamin?
> 
> +0

OK, well, I'll use it as my first "test commit" when I get a chance :-)

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From aahz at pythoncraft.com  Mon Apr  6 15:04:46 2009
From: aahz at pythoncraft.com (Aahz)
Date: Mon, 6 Apr 2009 06:04:46 -0700
Subject: [Python-Dev] FWD: Documentation site problems
Message-ID: <20090406130446.GB19296@panix.com>

The 3.0 docs seem to be correct:
http://docs.python.org/3.0/tutorial/

----- Forwarded message from Ernst Persson <ernst at stickybit.se> -----

> Subject: Documentation site problems
> From: Ernst Persson <ernst at stickybit.se>
> To: webmaster at python.org
> Organization: StickyBit AB
> Date: Mon, 06 Apr 2009 10:32:42 +0200
> 
> Hi,
> 
> there contents is missing from the python tutorial:
> http://docs.python.org/tutorial/
> 
> BR
> /Ernst Persson

----- End forwarded message -----

-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"...string iteration isn't about treating strings as sequences of strings, 
it's about treating strings as sequences of characters.  The fact that
characters are also strings is the reason we have problems, but characters 
are strings for other good reasons."  --Aahz

From aahz at pythoncraft.com  Mon Apr  6 15:06:18 2009
From: aahz at pythoncraft.com (Aahz)
Date: Mon, 6 Apr 2009 06:06:18 -0700
Subject: [Python-Dev] FWD: Library Reference is incomplete
Message-ID: <20090406130618.GD19296@panix.com>

Hrm, looks like the whole 2.6 build is broken.

----- Forwarded message from "M?ller-Reineke, Matthias" <matthias.mueller-reineke at grundvers.de> -----

> Subject: Library Reference is incomplete
> Date: Mon, 6 Apr 2009 11:25:54 +0200
> From: "M?ller-Reineke, Matthias" <matthias.mueller-reineke at grundvers.de>
> To: webmaster at python.org
> 
> Dear Webmaster,
> 
> "Library Reference" on http://www.python.org/doc/ takes me to http://docs.python.org/library/ .
> That side doesn't contain the index of contents.
> 
> Matthias M?ller-Reineke
> 
> ------------------------------------------
> Grundeigent?mer-Versicherung VVaG
> Gro?e B?ckerstra?e 7
> 20095 Hamburg
> Tel: 040 - 3 76 63 - 199
> Fax: 040 - 3 76 63 - 98 199
> 
> http://www.grundvers.de
> <mailto:matthias.mueller-reineke at grundvers.de>
> 
> Firmensitz: Hamburg HRB 13 103
> Vorstand: Heinz Walter Berens (Vors.), R?diger Buyten
> Aufsichtsratsvorsitzender: Peter Landmann

----- End forwarded message -----

-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"...string iteration isn't about treating strings as sequences of strings, 
it's about treating strings as sequences of characters.  The fact that
characters are also strings is the reason we have problems, but characters 
are strings for other good reasons."  --Aahz

From barry at python.org  Mon Apr  6 15:07:21 2009
From: barry at python.org (Barry Warsaw)
Date: Mon, 6 Apr 2009 09:07:21 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D9EDCB.7010905@voidspace.org.uk>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<49D8F88B.3050102@v.loewis.de>	<49D8FBCC.1050801@ochtman.nl>	<49D9BB90.8040008@freehackers.org>	<ea2499da0904060133k5a4d0572vd6aea50f3f5f5cff@mail.gmail.com>	<49D9C7FF.80506@freehackers.org>	<ea2499da0904060241y45c299efx478f1c7af14cd09b@mail.gmail.com>
	<49D9EBB6.1080004@gmail.com> <49D9EDCB.7010905@voidspace.org.uk>
Message-ID: <462BFB67-C648-42DB-91BC-E9610DABC8D4@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Apr 6, 2009, at 7:55 AM, Michael Foord wrote:

> Gated checkins can work fine but can also have many problems. For  
> example if we have a spuriously failing test then if you are working  
> on an unrelated issue it will be entirely up to chance as to whether  
> you can checkin...
>
> Building the docs would be another thing we could check, although it  
> can take a while.
>
> If we have a queue then it could be the case that you do a commit -  
> and then discover half an hour later that it conflicts with  
> something that was ahead of you in the queue.

All very true.  Where I've worked with gated branches, there are  
procedures for dealing with each of these issues.  For a test suite  
like Python's which runs in a few minutes, I don't think some of the  
more extreme approaches are necessary (as opposed to a system where a  
full test run takes *hours*).  On the whole though, it's a net win  
because you know the main tree is always good.  This is especially  
useful around release time!  But I guess it's up to Benjamin now to  
push for that :).

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdn+iXEjvBPtnXfVAQIEKAP/b3RcUIxcxOpTGfk8POAj+oQXvcvIpI+H
6sN2CWss7bt9qLVlJMFCJoEH78JKnydHuGy+JmZf2rMtnfwIr0w7EFSMoT8X7tPg
YflsHn3ePrBddqD9EOwXo+hQfgodSKHEyPHDPgYSMUtiR4TTqkVXD/o4ViQk4K1b
YFtRkehHKfc=
=F39k
-----END PGP SIGNATURE-----

From ben+python at benfinney.id.au  Mon Apr  6 15:15:09 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Mon, 06 Apr 2009 23:15:09 +1000
Subject: [Python-Dev] Mercurial?
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
	<49D8EC71.5020105@v.loewis.de>
	<acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com>
	<20090406042016.GA97@panix.com>
	<acd65fa20904052131q2575299bi5c81659f62c3dabf@mail.gmail.com>
	<49D9E92E.4010603@gmail.com>
Message-ID: <87ab6tappe.fsf@benfinney.id.au>

Nick Coghlan <ncoghlan at gmail.com> writes:

> My guess was that Bazaar anchored the "centralised" end of the DVCS
> scale by letting users avoid caring about the underlying acyclic
> graph
[?]

> That makes Bazaar easy to pitch conceptually to someone like me
> ("you can use it just like you use SVN, only with much better
> merging and offline support")
[?]

> Mercurial appears to best allow the sales pitch to be tailored to
> the target audience (in this case, a group including a lot of people
> with a background predominantly involving centralised version
> control tools).

I don't follow. Wouldn't your preceding points above instead make
*Bazaar* the one best suited for a group including a lot of people
with a background predominantly involving centralised version control
tools?

-- 
 \       ?I disapprove of what you say, but I will defend to the death |
  `\     your right to say it.? ?Evelyn Beatrice Hall, _The Friends of |
_o__)                                                  Voltaire_, 1906 |
Ben Finney

From jnoller at gmail.com  Mon Apr  6 15:21:06 2009
From: jnoller at gmail.com (Jesse Noller)
Date: Mon, 6 Apr 2009 09:21:06 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D52115.6020001@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
Message-ID: <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>

On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2009-04-02 17:32, Martin v. L?wis wrote:
>> I propose the following PEP for inclusion to Python 3.1.
>
> Thanks for picking this up.
>
> I'd like to extend the proposal to Python 2.7 and later.
>

-1 to adding it to the 2.x series. There was much discussion around
adding features to 2.x *and* 3.0, and the consensus seemed to *not*
add new features to 2.x and use those new features as carrots to help
lead people into 3.0.

jesse

From barry at python.org  Mon Apr  6 15:26:24 2009
From: barry at python.org (Barry Warsaw)
Date: Mon, 6 Apr 2009 09:26:24 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
Message-ID: <FBE37C05-0934-49F9-B1BD-2E5A31D0FEB6@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote:

> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 2009-04-02 17:32, Martin v. L?wis wrote:
>>> I propose the following PEP for inclusion to Python 3.1.
>>
>> Thanks for picking this up.
>>
>> I'd like to extend the proposal to Python 2.7 and later.
>>
>
> -1 to adding it to the 2.x series. There was much discussion around
> adding features to 2.x *and* 3.0, and the consensus seemed to *not*
> add new features to 2.x and use those new features as carrots to help
> lead people into 3.0.

Actually, isn't the policy just that nothing can go into 2.7 that  
isn't backported from 3.1?  Whether the actual backport happens or not  
is up to the developer though.  OTOH, we talked about a lot of things  
and my recollection is probably fuzzy.

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdoDAXEjvBPtnXfVAQIrPgQAse7BXQfPYHJJ/g3HNEtc0UmZZ9MCNtGc
sIoZ2EHRVz+pylZT9fmSmorJdIdFvAj7E43tKsV2bQpo/am9XlL10SMn3k0KLxnF
vNCi39nB1B7Uktbnrlpnfo4u93suuEqYexEwrkDhJuTMeye0Cxg0os5aysryuPza
mKr5jsqkV5c=
=Y9iP
-----END PGP SIGNATURE-----

From barry at python.org  Mon Apr  6 15:34:09 2009
From: barry at python.org (Barry Warsaw)
Date: Mon, 6 Apr 2009 09:34:09 -0400
Subject: [Python-Dev] Tools
In-Reply-To: <49D9409C.6060108@ubuntu.com>
References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org>
	<49D9409C.6060108@ubuntu.com>
Message-ID: <C028229C-52AD-4331-9F47-EC341F18C850@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Apr 5, 2009, at 7:37 PM, Matthias Klose wrote:

> Barry Warsaw schrieb:
>> Someone (I'm sorry, I forgot who) asked me at Pycon about stripping  
>> out
>> Demos and Tools.  I'm happy to remove the two I wrote - Tools/world  
>> and
>> Tools/pynche - from the distribution and release them as separate
>> projects (retaining the PSF license).   Should I remove them from  
>> both
>> the Python 2.x and 3.x trunks?
>
> +1, but please for 2.7 and 3.1 only.

Yes, of course.
Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdoE0XEjvBPtnXfVAQIyFgP+MqBghtSqVigJF9w/u47npaheOusITPWT
iUeeJfTFDDHBKyYKXOwpASW+SahtnTO3OTR3f40S0Ptf+HRGo0J2efWUWcbXkN5X
ikrHePT8YIp0MC4qYcUAfNrSNtgYxJuVKd7ARCFotBSN3Nu+bxzPO+LGw5xhlvbT
Q3H3f3TQM3A=
=nCUB
-----END PGP SIGNATURE-----

From cesare.dimauro at a-tono.com  Mon Apr  6 16:28:45 2009
From: cesare.dimauro at a-tono.com (Cesare Di Mauro)
Date: Mon, 06 Apr 2009 16:28:45 +0200
Subject: [Python-Dev] pyc files,
 constant folding and borderline portability issues
In-Reply-To: <ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
Message-ID: <op.uryyh7uh03jqhe@cesareprova.org>

On Mar 29, 2009 at 05:36PM, Guido van Rossum <guido at python.org> wrote:

>> - Issue #5593: code like 1e16+2.9999 is optimized away and its result stored as
>> a constant (again), but the result can vary slightly depending on the internal
>> FPU precision.
>
> I would just not bother constant folding involving FP, or only if the
> values involved have an exact representation in IEEE binary FP format.

The Language Reference says nothing about the effects of code optimizations.
I think it's a very good thing, because we can do some work here with constant
folding.

If someone wants to preserve precision with floats, it can always use a temporary
variable, like in many other languages.

>> These problems have probably been there for a long time and almost no one seems
>> to complain, but I thought I'd report them here just in case.
>
> I would expect that constant folding isn't nearly effective in Python
> as in other (less dynamic) languages because it doesn't do anything
> for NAMED constants. E.g.
>
> MINUTE = 60
>
> def half_hour():
>     return MINUTE*30
>
> This should be folded to "return 1800" but doesn't because the
> compiler doesn't know that MINUTE is a constant.

I completely agree. We can't say nothing about MINUTE at the time half_hour
will be executed. The code here must never been changed.

> Has anyone ever profiled the effectiveness of constant folding on
> real-world code? The only kind of constant folding that I expect to be
> making a diference is things like unary operators, since e.g. "x = -2"
> is technically an expression involving a unary minus.

At this time with Python 2.6.1 we have these results:
def f(): return 1 + 2 * 3 + 4j
dis(f)

  1           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               5 (6)
              6 BINARY_ADD
              7 LOAD_CONST               4 (4j)
             10 BINARY_ADD
             11 RETURN_VALUE

def f(): return ['a', ('b', 'c')] * (1 + 2 * 3)
dis(f)

  1           0 LOAD_CONST               1 ('a')
              3 LOAD_CONST               7 (('b', 'c'))
              6 BUILD_LIST               2
              9 LOAD_CONST               4 (1)
             12 LOAD_CONST               8 (6)
             15 BINARY_ADD
             16 BINARY_MULTIPLY
             17 RETURN_VALUE

With proper constant folding code, both functions can be reduced
to a single LOAD_CONST and a RETURN_VALUE (or, definitely, by
a single instruction at all with an advanced peephole optimizer).

I'll show you it at PyCon in Florence, next month.

> ISTM that historically, almost every time we attempted some new form
> of constant folding, we introduced a bug.

I found a very rich test battery with Python, which helped me a lot in my
work of changing the ast, compiler, peephole, and VM.
If they aren't enough, we can expand them to add more test cases.

But, again, the Language Reference says nothing about optimizations.

Cheers,
Cesare

From eric at trueblade.com  Mon Apr  6 16:40:56 2009
From: eric at trueblade.com (Eric Smith)
Date: Mon, 6 Apr 2009 10:40:56 -0400 (EDT)
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <FBE37C05-0934-49F9-B1BD-2E5A31D0FEB6@python.org>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
	<FBE37C05-0934-49F9-B1BD-2E5A31D0FEB6@python.org>
Message-ID: <39936.63.251.87.214.1239028856.squirrel@mail.trueblade.com>

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote:
>
>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 2009-04-02 17:32, Martin v. L?wis wrote:
>>>> I propose the following PEP for inclusion to Python 3.1.
>>>
>>> Thanks for picking this up.
>>>
>>> I'd like to extend the proposal to Python 2.7 and later.
>>>
>>
>> -1 to adding it to the 2.x series. There was much discussion around
>> adding features to 2.x *and* 3.0, and the consensus seemed to *not*
>> add new features to 2.x and use those new features as carrots to help
>> lead people into 3.0.
>
> Actually, isn't the policy just that nothing can go into 2.7 that
> isn't backported from 3.1?  Whether the actual backport happens or not
> is up to the developer though.  OTOH, we talked about a lot of things
> and my recollection is probably fuzzy.

I believe Barry is correct. The official policy is "no features in 2.7
that aren't also in 3.1". I personally think I'm not going to put anything
else in 2.7, specifically the ',' formatter stuff from PEP 378. 3.1 has
diverged too far from 2.7 in this regard to make the backport easy to do.
But this decision is left up to the individual committer.

From solipsis at pitrou.net  Mon Apr  6 16:43:11 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 6 Apr 2009 14:43:11 +0000 (UTC)
Subject: [Python-Dev] pyc files,
	constant folding and borderline portability issues
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
	<op.uryyh7uh03jqhe@cesareprova.org>
Message-ID: <loom.20090406T144208-541@post.gmane.org>

Cesare Di Mauro <cesare.dimauro <at> a-tono.com> writes:
> def f(): return ['a', ('b', 'c')] * (1 + 2 * 3)
[...]
> 
> With proper constant folding code, both functions can be reduced
> to a single LOAD_CONST and a RETURN_VALUE (or, definitely, by
> a single instruction at all with an advanced peephole optimizer).

Lists are mutable, you can't optimize the creation of list literals by storing
them as singleton constants.

Regards

Antoine.

From pje at telecommunity.com  Mon Apr  6 17:21:42 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 06 Apr 2009 11:21:42 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D9FCE2.4070805@simplistix.co.uk>
References: <49D4DA72.60401@v.loewis.de> <49D51A16.70804@simplistix.co.uk>
	<49D669AA.6080001@v.loewis.de> <49D9FCE2.4070805@simplistix.co.uk>
Message-ID: <20090406151915.5D4F93A406A@sparrow.telecommunity.com>

At 02:00 PM 4/6/2009 +0100, Chris Withers wrote:
>Martin v. L?wis wrote:
>>Chris Withers wrote:
>>>Would this support the following case:
>>>
>>>I have a package called mortar, which defines useful stuff:
>>>
>>>from mortar import content, ...
>>>
>>>I now want to distribute large optional chunks separately, but ideally
>>>so that the following will will work:
>>>
>>>from mortar.rbd import ...
>>>from mortar.zodb import ...
>>>from mortar.wsgi import ...
>>>
>>>Does the PEP support this?
>>That's the primary purpose of the PEP.
>
>Are you sure?
>
>Does the pep really allow for:
>
>from mortar import content
>from mortar.rdb import something
>
>...where 'content' is a function defined in mortar/__init__.py and 
>'something' is a function defined in mortar/rdb/__init__.py *and* 
>the following are separate distributions on PyPI:
>
>- mortar
>- mortar.rdb
>
>...where 'mortar' does not contain 'mortar.rdb'.

See the third paragraph of http://www.python.org/dev/peps/pep-0382/#discussion

From chris at simplistix.co.uk  Mon Apr  6 17:57:59 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Mon, 06 Apr 2009 16:57:59 +0100
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090406151915.5D4F93A406A@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de> <49D51A16.70804@simplistix.co.uk>
	<49D669AA.6080001@v.loewis.de> <49D9FCE2.4070805@simplistix.co.uk>
	<20090406151915.5D4F93A406A@sparrow.telecommunity.com>
Message-ID: <49DA2687.6050508@simplistix.co.uk>

P.J. Eby wrote:

> See the third paragraph of 
> http://www.python.org/dev/peps/pep-0382/#discussion

Indeed, I guess the PEP could be made more explanatory then 'cos, as a 
packager, I don't see what I'd put in the various setup.py and 
__init__.py to make this work...

That said, I'm delighted to hear it's going to be possible and 
wholeheartedly support the PEP and it's backporting to 2.7 as a result...

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From jnoller at gmail.com  Mon Apr  6 18:00:46 2009
From: jnoller at gmail.com (Jesse Noller)
Date: Mon, 6 Apr 2009 12:00:46 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <FBE37C05-0934-49F9-B1BD-2E5A31D0FEB6@python.org>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
	<FBE37C05-0934-49F9-B1BD-2E5A31D0FEB6@python.org>
Message-ID: <4222a8490904060900s349180a5k952b32b35274df73@mail.gmail.com>

On Mon, Apr 6, 2009 at 9:26 AM, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote:
>
>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>
>>> On 2009-04-02 17:32, Martin v. L?wis wrote:
>>>>
>>>> I propose the following PEP for inclusion to Python 3.1.
>>>
>>> Thanks for picking this up.
>>>
>>> I'd like to extend the proposal to Python 2.7 and later.
>>>
>>
>> -1 to adding it to the 2.x series. There was much discussion around
>> adding features to 2.x *and* 3.0, and the consensus seemed to *not*
>> add new features to 2.x and use those new features as carrots to help
>> lead people into 3.0.
>
> Actually, isn't the policy just that nothing can go into 2.7 that isn't
> backported from 3.1? ?Whether the actual backport happens or not is up to
> the developer though. ?OTOH, we talked about a lot of things and my
> recollection is probably fuzzy.
>
> Barry

That *is* the official policy, but there was discussions around no
further backporting of features from 3.1 into 2.x, therefore providing
more of an upgrade incentive

From tseaver at palladion.com  Mon Apr  6 18:15:43 2009
From: tseaver at palladion.com (Tres Seaver)
Date: Mon, 06 Apr 2009 12:15:43 -0400
Subject: [Python-Dev] deprecating BaseException.message
In-Reply-To: <bbaeab100704092133x3067a779h4c33ee08a12d1bbe@mail.gmail.com>
References: <bbaeab100704092133x3067a779h4c33ee08a12d1bbe@mail.gmail.com>
Message-ID: <grd9ru$pn1$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brett Cannon wrote:
> During the PyCon sprint I tried to make BaseException accept only a single
> argument and bind it to BaseException.message .  I was successful (see the
> p3yk_no_args_on_exc branch), but it was very painful to pull off as anyone
> who sat around me the last three days of the sprint will tell you as they
> had to listen to me curse incessantly.
> 
> Because of the pain that I went through in the transition and thus the
> lessons learned, Guido and I discussed it and we think it would be best to
> give up on forcing BaseException to accept only a single argument.  I think
> it is still doable, but requires a multi-release transition period and not
> the one that 2.6 -> 3.0 is offering.  And so Guido and I plan on deprecating
> BaseException.message as its entire point in existence was to help
> transition to what we are not going to have happen.  =)
> 
> Now that means BaseException.message might hold the record for shortest
> lived feature as it was only introduced in 2.5 and is now to be deprecated
> in 2.6 and removed in 2.7/3.0.  =)
> 
> Below is PEP 352, revised to reflect the removal of
> BaseException.messageand for letting the 'args' attribute stay (along
> with suggesting one should
> only pass a single argument to BaseException).  Basically the interface for
> exceptions doesn't really change in 3.0 except for the removal of
> __getitem__.

Hmm, I'm working on cleaning up deprecations for Zope and related
packages under Python 2.6.  The irony here is that I'm receiving
deprecation warnings for custom exception classes  which had a 'message'
attribute long before the abortive attempt to add them to the
BaseException type, which hardly seems reasonable.

For instance, docutils.parsers.rst defines a DirectiveError which takes
two arguments, 'level' and 'message', and therefore gets hit with the
deprecation (even though it never used the new signature).  Likewise,
ZODB.POSException defines a ConflictError type which takes 'message' as
one of several arguments, all optional, and has since at least 2002.

I don't think either of these classes should be subject to a deprecation
warning for a feature they never used or depended on.

Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ2iqv+gerLs4ltQ4RArg7AJ9cjTweXUuGdUZNxZ3dHzYb9u6AcQCePJW/
PrXQ48wFrwrsrXSslZ0LSB4=
=VU1d
-----END PGP SIGNATURE-----

From rdmurray at bitdance.com  Mon Apr  6 18:28:43 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Mon, 6 Apr 2009 12:28:43 -0400 (EDT)
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <4222a8490904060900s349180a5k952b32b35274df73@mail.gmail.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
	<FBE37C05-0934-49F9-B1BD-2E5A31D0FEB6@python.org>
	<4222a8490904060900s349180a5k952b32b35274df73@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0904061207310.26362@kimball.webabinitio.net>

On Mon, 6 Apr 2009 at 12:00, Jesse Noller wrote:
> On Mon, Apr 6, 2009 at 9:26 AM, Barry Warsaw <barry at python.org> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote:
>>
>>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>
>>>> On 2009-04-02 17:32, Martin v. L?wis wrote:
>>>>>
>>>>> I propose the following PEP for inclusion to Python 3.1.
>>>>
>>>> Thanks for picking this up.
>>>>
>>>> I'd like to extend the proposal to Python 2.7 and later.
>>>>
>>>
>>> -1 to adding it to the 2.x series. There was much discussion around
>>> adding features to 2.x *and* 3.0, and the consensus seemed to *not*
>>> add new features to 2.x and use those new features as carrots to help
>>> lead people into 3.0.
>>
>> Actually, isn't the policy just that nothing can go into 2.7 that isn't
>> backported from 3.1? ?Whether the actual backport happens or not is up to
>> the developer though. ?OTOH, we talked about a lot of things and my
>> recollection is probably fuzzy.
>>
>> Barry
>
> That *is* the official policy, but there was discussions around no
> further backporting of features from 3.1 into 2.x, therefore providing
> more of an upgrade incentive

My sense was that this wasn't proposed as a hard and fast rule, more
as a strongly suggested guideline.

And in this case, I think you could argue that the PEP is actually
fixing a bug in the current namespace packaging system.

Some projects, especially the large ones where this matters most, are
going to have to maintain backward compatibility for 2.x for a long time
even as 3.x adoption accelerates.  It seems a shame to require packagers
to continue to deal with the problems caused by the current system even
after all the platforms have made it to 2.7+.

--David

From jnoller at gmail.com  Mon Apr  6 18:33:54 2009
From: jnoller at gmail.com (Jesse Noller)
Date: Mon, 6 Apr 2009 12:33:54 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <Pine.LNX.4.64.0904061207310.26362@kimball.webabinitio.net>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
	<FBE37C05-0934-49F9-B1BD-2E5A31D0FEB6@python.org>
	<4222a8490904060900s349180a5k952b32b35274df73@mail.gmail.com>
	<Pine.LNX.4.64.0904061207310.26362@kimball.webabinitio.net>
Message-ID: <4222a8490904060933y540fd611lc2b9c554eb079c5e@mail.gmail.com>

On Mon, Apr 6, 2009 at 12:28 PM, R. David Murray <rdmurray at bitdance.com> wrote:
> On Mon, 6 Apr 2009 at 12:00, Jesse Noller wrote:
>>
>> On Mon, Apr 6, 2009 at 9:26 AM, Barry Warsaw <barry at python.org> wrote:
>>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote:
>>>
>>>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>>
>>>>> On 2009-04-02 17:32, Martin v. L?wis wrote:
>>>>>>
>>>>>> I propose the following PEP for inclusion to Python 3.1.
>>>>>
>>>>> Thanks for picking this up.
>>>>>
>>>>> I'd like to extend the proposal to Python 2.7 and later.
>>>>>
>>>>
>>>> -1 to adding it to the 2.x series. There was much discussion around
>>>> adding features to 2.x *and* 3.0, and the consensus seemed to *not*
>>>> add new features to 2.x and use those new features as carrots to help
>>>> lead people into 3.0.
>>>
>>> Actually, isn't the policy just that nothing can go into 2.7 that isn't
>>> backported from 3.1? ?Whether the actual backport happens or not is up to
>>> the developer though. ?OTOH, we talked about a lot of things and my
>>> recollection is probably fuzzy.
>>>
>>> Barry
>>
>> That *is* the official policy, but there was discussions around no
>> further backporting of features from 3.1 into 2.x, therefore providing
>> more of an upgrade incentive
>
> My sense was that this wasn't proposed as a hard and fast rule, more
> as a strongly suggested guideline.
>
> And in this case, I think you could argue that the PEP is actually
> fixing a bug in the current namespace packaging system.
>
> Some projects, especially the large ones where this matters most, are
> going to have to maintain backward compatibility for 2.x for a long time
> even as 3.x adoption accelerates. ?It seems a shame to require packagers
> to continue to deal with the problems caused by the current system even
> after all the platforms have made it to 2.7+.
>
> --David

I know it wasn't a hard and fast rule; also, with 3to2 already being
worked on, the barrier of maintenance and back porting is going to be
lowered.

From skip at pobox.com  Mon Apr  6 18:57:44 2009
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 6 Apr 2009 11:57:44 -0500
Subject: [Python-Dev] pyc files,
 constant folding and borderline portability issues
In-Reply-To: <op.uryyh7uh03jqhe@cesareprova.org>
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
	<op.uryyh7uh03jqhe@cesareprova.org>
Message-ID: <18906.13448.974602.214940@montanaro.dyndns.org>

    Cesare> At this time with Python 2.6.1 we have these results:
    Cesare> def f(): return 1 + 2 * 3 + 4j
    ...
    Cesare> def f(): return ['a', ('b', 'c')] * (1 + 2 * 3)

Guido can certainly correct me if I'm wrong, but I believe the main point of
his message was that you aren't going to encounter a lot of code in Python
which is amenable to traditional constant folding.  For the most part, they
will be assigned to symbolic "constants", which, unlike C preprocessor
macros aren't really constants at all.  Consequently, the opportunity for
constant folding is minimal and probably introduces more opportunities for
bugs than performance improvements.

Skip

From cesare.dimauro at a-tono.com  Mon Apr  6 18:34:53 2009
From: cesare.dimauro at a-tono.com (Cesare Di Mauro)
Date: Mon, 6 Apr 2009 18:34:53 +0200 (CEST)
Subject: [Python-Dev] pyc files,
 constant folding and borderline portability issues
In-Reply-To: <loom.20090406T144208-541@post.gmane.org>
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
	<op.uryyh7uh03jqhe@cesareprova.org>
	<loom.20090406T144208-541@post.gmane.org>
Message-ID: <58342.151.53.159.5.1239035693.squirrel@webmail6.pair.com>

On Lun, Apr 6, 2009 16:43, Antoine Pitrou wrote:
> Cesare Di Mauro <cesare.dimauro <at> a-tono.com> writes:
>> def f(): return ['a', ('b', 'c')] * (1 + 2 * 3)
> [...]
>>
>> With proper constant folding code, both functions can be reduced
>> to a single LOAD_CONST and a RETURN_VALUE (or, definitely, by
>> a single instruction at all with an advanced peephole optimizer).
>
> Lists are mutable, you can't optimize the creation of list literals by
> storing
> them as singleton constants.
>
> Regards
>
> Antoine.

You are right, I've mistyped the example.

def f(): return ('a', ('b', 'c')) * (1 + 2 * 3)

generates a single instruction (depending on the threshold used to limit
folding of sequences), whereas

def f(): return ['a', ('b', 'c')] * (1 + 2 * 3)

needs three.

Sorry for the mistake.

Cheers,
Cesare

From tseaver at palladion.com  Mon Apr  6 19:06:25 2009
From: tseaver at palladion.com (Tres Seaver)
Date: Mon, 06 Apr 2009 13:06:25 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <4222a8490904060933y540fd611lc2b9c554eb079c5e@mail.gmail.com>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>	<FBE37C05-0934-49F9-B1BD-2E5A31D0FEB6@python.org>	<4222a8490904060900s349180a5k952b32b35274df73@mail.gmail.com>	<Pine.LNX.4.64.0904061207310.26362@kimball.webabinitio.net>
	<4222a8490904060933y540fd611lc2b9c554eb079c5e@mail.gmail.com>
Message-ID: <grdcqv$vo$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jesse Noller wrote:
> On Mon, Apr 6, 2009 at 12:28 PM, R. David Murray <rdmurray at bitdance.com> wrote:
>> On Mon, 6 Apr 2009 at 12:00, Jesse Noller wrote:
>>> On Mon, Apr 6, 2009 at 9:26 AM, Barry Warsaw <barry at python.org> wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote:
>>>>
>>>>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>>> On 2009-04-02 17:32, Martin v. L?wis wrote:
>>>>>>> I propose the following PEP for inclusion to Python 3.1.
>>>>>> Thanks for picking this up.
>>>>>>
>>>>>> I'd like to extend the proposal to Python 2.7 and later.
>>>>>>
>>>>> -1 to adding it to the 2.x series. There was much discussion around
>>>>> adding features to 2.x *and* 3.0, and the consensus seemed to *not*
>>>>> add new features to 2.x and use those new features as carrots to help
>>>>> lead people into 3.0.
>>>> Actually, isn't the policy just that nothing can go into 2.7 that isn't
>>>> backported from 3.1?  Whether the actual backport happens or not is up to
>>>> the developer though.  OTOH, we talked about a lot of things and my
>>>> recollection is probably fuzzy.
>>>>
>>>> Barry
>>> That *is* the official policy, but there was discussions around no
>>> further backporting of features from 3.1 into 2.x, therefore providing
>>> more of an upgrade incentive
>> My sense was that this wasn't proposed as a hard and fast rule, more
>> as a strongly suggested guideline.
>>
>> And in this case, I think you could argue that the PEP is actually
>> fixing a bug in the current namespace packaging system.
>>
>> Some projects, especially the large ones where this matters most, are
>> going to have to maintain backward compatibility for 2.x for a long time
>> even as 3.x adoption accelerates.  It seems a shame to require packagers
>> to continue to deal with the problems caused by the current system even
>> after all the platforms have made it to 2.7+.
>>
>> --David
> 
> I know it wasn't a hard and fast rule; also, with 3to2 already being
> worked on, the barrier of maintenance and back porting is going to be
> lowered.

My understanding from the summit is that the only point in a 2.7 release
at all is to lower the "speed bumps" which make porting from 2.x to 3.x
hard for large codebases.  In this case, having a consistent spelling
for namespace packages between 2.7 and 3.1 would incent those
applications / frameworks / libraries to move to 2.7, and therefore ease
getting them to 3.1.

Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ2jaR+gerLs4ltQ4RAsi1AJ0cJyKsoP5SlOcBlnzLr6MB11ZoNwCg1Kil
4O2M0sZG+jH12s22p2AmXWk=
=DLRM
-----END PGP SIGNATURE-----

From brian at sweetapp.com  Mon Apr  6 20:13:28 2009
From: brian at sweetapp.com (Brian Quinlan)
Date: Mon, 06 Apr 2009 19:13:28 +0100
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <49D9E3E0.2060408@gmail.com>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>	<loom.20090404T231154-979@post.gmane.org>	<49D874E4.6030602@sweetapp.com>	<loom.20090405T102812-215@post.gmane.org>	<3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>	<49D9A669.9010008@sweetapp.com>
	<49D9E3E0.2060408@gmail.com>
Message-ID: <49DA4648.9070204@sweetapp.com>

Nick Coghlan wrote:
> Brian Quinlan wrote:
>> - you need the cooperation of your subclasses i.e. they must call
>>   super().flush() in .flush() to get correct close behavior (and this
>>   represents a backwards-incompatible semantic change)
> 
> Are you sure about that? Going by the current _pyio semantics that
> Antoine posted, it looks to me that it is already the case that
> subclasses need to invoke the parent flush() call correctly to avoid
> breaking the base class semantics (which really isn't an uncommon
> problem when it comes to writing correct subclasses).

As it is now, if you didn't call super().flush() in your flush override, 
then a buffer won't be flushed at the time that you expected.

With the proposed change, if you don't call super().flush() in your 
flush override, then the buffer will never get flushed and you will lose 
data when you close the file.

I'm not saying that it is a big deal, but it is a difference.

Cheers,
Brian

From cesare.dimauro at a-tono.com  Mon Apr  6 21:23:18 2009
From: cesare.dimauro at a-tono.com (Cesare Di Mauro)
Date: Mon, 6 Apr 2009 21:23:18 +0200 (CEST)
Subject: [Python-Dev] pyc files,
 constant folding and borderline portability issues
In-Reply-To: <18906.13448.974602.214940@montanaro.dyndns.org>
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
	<op.uryyh7uh03jqhe@cesareprova.org>
	<18906.13448.974602.214940@montanaro.dyndns.org>
Message-ID: <52217.151.53.159.5.1239045798.squirrel@webmail6.pair.com>

On Mon, Apr 6, 2009 18:57, skip at pobox.com wrote:
>
>     Cesare> At this time with Python 2.6.1 we have these results:
>     Cesare> def f(): return 1 + 2 * 3 + 4j
>     ...
>     Cesare> def f(): return ['a', ('b', 'c')] * (1 + 2 * 3)
>
> Guido can certainly correct me if I'm wrong, but I believe the main point
> of
> his message was that you aren't going to encounter a lot of code in Python
> which is amenable to traditional constant folding.  For the most part,
> they
> will be assigned to symbolic "constants", which, unlike C preprocessor
> macros aren't really constants at all.  Consequently, the opportunity for
> constant folding is minimal and probably introduces more opportunities for
> bugs than performance improvements.
>
> Skip

I can understand Guido's concern, but you worked as well on constant
folding, and you know that there's space for optimizations here.

peephole.c have some code for unary, binary, and tuple/list folding; they
worked fine. Why mantaining unuseful and dangerous code, otherwise?

I know that bugs can come out doing such optimizations, but Python have a
good tests battery that can help find them. Obviously tests can't give us
100% insurance that everything works as expected, but they are very good
starting point.

Bugs can happen at every change on the code base, but code base changes...

Cesare

From dickinsm at gmail.com  Mon Apr  6 21:30:57 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Mon, 6 Apr 2009 20:30:57 +0100
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
Message-ID: <5c6f2a5d0904061230j6ef7fd18q369c19d91c5e34b8@mail.gmail.com>

[Antoine]
> - Issue #5593: code like 1e16+2.9999 is optimized away and its result stored as
> a constant (again), but the result can vary slightly depending on the internal
> FPU precision.
[Guido]
> I would just not bother constant folding involving FP, or only if the
> values involved have an exact representation in IEEE binary FP format.

+1 for removing constant folding for floats (besides conversion
of -<literal>).  There are just too many things to worry about:
FPU rounding mode and precision, floating-point signals and flags,
effect of compiler flags, and the potential benefit seems small.

Mark

From python at rcn.com  Mon Apr  6 22:05:37 2009
From: python at rcn.com (Raymond Hettinger)
Date: Mon, 6 Apr 2009 13:05:37 -0700
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
References: <loom.20090329T143625-223@post.gmane.org><ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
	<5c6f2a5d0904061230j6ef7fd18q369c19d91c5e34b8@mail.gmail.com>
Message-ID: <B42D463B6C7148F6BFF0F722361EB031@RaymondLaptop1>

> +1 for removing constant folding for floats (besides conversion
> of -<literal>).  There are just too many things to worry about:
> FPU rounding mode and precision, floating-point signals and flags,
> effect of compiler flags, and the potential benefit seems small.

If you're talking about the existing peepholer optimization that
has been in-place for years, I think it would be better to leave
it as-is.  It's better to have the compiler do the work than to
have a programmer thinking he/she needs to do it by hand
(reducing readability by introducing magic numbers).
The code for the lsum() recipe is more readable with a line like:

   exp = long(mant * 2.0 ** 53)

than with

   exp = long(mant * 9007199254740992.0)

It would be ashamed if code written like the former suddenly
started doing the exponentation in the inner-loop or if the code
got rewritten by hand as shown.

The list of "things to worry about" seems like the normal list
of issues associated with doing anything in floating point.
Python is already FPU challenged in that it offers nearly
zero control over the FPU or direct access to signals and flags.
Every step of a floating point calculation in Python gets written-out 
to a PyFloat object and is squeezed back into a C double (potentially
introducing double-rounding if extended precision had be used by
the FPU).  Disabling the peepholer doesn't change this situation.

Raymond

From ondrej at certik.cz  Mon Apr  6 22:06:06 2009
From: ondrej at certik.cz (Ondrej Certik)
Date: Mon, 6 Apr 2009 13:06:06 -0700
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
Message-ID: <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>

Hi,

On Sun, Mar 29, 2009 at 10:21 AM, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> I've heard some good things about cmake ? LLVM, googletest, and Boost
> are all looking at switching to it ? so I wanted to see if we could
> simplify our autoconf+makefile system by using it. The biggest wins I
> see from going to cmake are:
> ?1. It can autogenerate the Visual Studio project files instead of
> needing them to be maintained separately
> ?2. It lets you write functions and modules without understanding
> autoconf's mix of shell and M4.
> ?3. Its generated Makefiles track header dependencies accurately so we
> might be able to add private headers efficiently.

I am switching to cmake with all my python projects, as it is rock
solid, supports building in parallel (if I have some C++ and Cython
extensions), and the configure part works well.

The only disadvantage that I can see is that one has to learn a new
syntax, which is not Python. But on the other hand, at least it forces
one to really just use cmake to write build scripts in a standard way,
while scons and other Python solutions imho encourage to write full
Python programs, which imho is a disadvantage for the build system, as
then every build system is nonstandard.

Ondrej

From dickinsm at gmail.com  Mon Apr  6 22:22:28 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Mon, 6 Apr 2009 21:22:28 +0100
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <B42D463B6C7148F6BFF0F722361EB031@RaymondLaptop1>
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
	<5c6f2a5d0904061230j6ef7fd18q369c19d91c5e34b8@mail.gmail.com>
	<B42D463B6C7148F6BFF0F722361EB031@RaymondLaptop1>
Message-ID: <5c6f2a5d0904061322t2c7f6bd7y55f73ced221c8804@mail.gmail.com>

On Mon, Apr 6, 2009 at 9:05 PM, Raymond Hettinger <python at rcn.com> wrote:
> The code for the lsum() recipe is more readable with a line like:
>
> ?exp = long(mant * 2.0 ** 53)
>
> than with
>
> ?exp = long(mant * 9007199254740992.0)
>
> It would be ashamed if code written like the former suddenly
> started doing the exponentation in the inner-loop or if the code
> got rewritten by hand as shown.

Well, I'd say that the obvious solution here is to compute
the constant 2.0**53 just once, somewhere outside the
inner loop.  In any case, that value would probably be better
written as 2.0**DBL_MANT_DIG (or something similar).

As Antoine reported, the constant-folding caused quite
a confusing bug report (issue #5593):  the problem (when
we eventually tracked it down) was that the folded
constant was in a .pyc file, and so wasn't updated when
the compiler flags changed.

Mark

From jackdied at gmail.com  Mon Apr  6 22:32:05 2009
From: jackdied at gmail.com (Jack diederich)
Date: Mon, 6 Apr 2009 16:32:05 -0400
Subject: [Python-Dev] Getting information out of the buildbots
Message-ID: <b8e622740904061332q783e12b7u61d96418cddbbf50@mail.gmail.com>

I committed some new telnetlib tests yesterday to the trunk and I can
see they are failing on Neal's setup but not what the failures are.
Ideally I like to get the information out of the buildbots but they
all seem to be hanging on stdio tests and quiting out.

Ideas?  TIA,

-Jack

From solipsis at pitrou.net  Mon Apr  6 22:35:36 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 6 Apr 2009 20:35:36 +0000 (UTC)
Subject: [Python-Dev] Getting information out of the buildbots
References: <b8e622740904061332q783e12b7u61d96418cddbbf50@mail.gmail.com>
Message-ID: <loom.20090406T203439-796@post.gmane.org>

Jack diederich <jackdied <at> gmail.com> writes:
> 
> I committed some new telnetlib tests yesterday to the trunk and I can
> see they are failing on Neal's setup but not what the failures are.
> Ideally I like to get the information out of the buildbots but they
> all seem to be hanging on stdio tests and quiting out.

You can commit some temporary debug output in the tests (just sprinkle those
print()'s you need to get your tasty information).

Regards

Antoine.

From guido at python.org  Mon Apr  6 23:27:29 2009
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Apr 2009 14:27:29 -0700
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <op.uryyh7uh03jqhe@cesareprova.org>
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com> 
	<op.uryyh7uh03jqhe@cesareprova.org>
Message-ID: <ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com>

On Mon, Apr 6, 2009 at 7:28 AM, Cesare Di Mauro
<cesare.dimauro at a-tono.com> wrote:
> The Language Reference says nothing about the effects of code optimizations.
> I think it's a very good thing, because we can do some work here with constant
> folding.

Unfortunately the language reference is not the only thing we have to
worry about. Unlike languages like C++, where compiler writers have
the moral right to modify the compiler as long as they stay within the
weasel-words of the standard, in Python, users' expectations carry
value. Since the language is inherently not that fast, users are not
all that focused on performance (if they were, they wouldn't be using
Python). Unsurprising behavior OTOH is valued tremendously.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Apr  6 23:28:41 2009
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Apr 2009 14:28:41 -0700
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <5c6f2a5d0904061322t2c7f6bd7y55f73ced221c8804@mail.gmail.com>
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com> 
	<5c6f2a5d0904061230j6ef7fd18q369c19d91c5e34b8@mail.gmail.com> 
	<B42D463B6C7148F6BFF0F722361EB031@RaymondLaptop1>
	<5c6f2a5d0904061322t2c7f6bd7y55f73ced221c8804@mail.gmail.com>
Message-ID: <ca471dc20904061428s6dfe1f20t37e0567baf6677dd@mail.gmail.com>

On Mon, Apr 6, 2009 at 1:22 PM, Mark Dickinson <dickinsm at gmail.com> wrote:
> On Mon, Apr 6, 2009 at 9:05 PM, Raymond Hettinger <python at rcn.com> wrote:
>> The code for the lsum() recipe is more readable with a line like:
>>
>> ?exp = long(mant * 2.0 ** 53)
>>
>> than with
>>
>> ?exp = long(mant * 9007199254740992.0)
>>
>> It would be ashamed if code written like the former suddenly
>> started doing the exponentation in the inner-loop or if the code
>> got rewritten by hand as shown.

Do you have any evidence that people write lots of inner loops with
constant expressions? In real-world code these just don't exist that
much. The case of constant folding in Python is *much* weaker than in
C because Python doesn't have real compile-time constants, so named
"constants" are variables to the compiler.

> Well, I'd say that the obvious solution here is to compute
> the constant 2.0**53 just once, somewhere outside the
> inner loop. ?In any case, that value would probably be better
> written as 2.0**DBL_MANT_DIG (or something similar).

So true.

> As Antoine reported, the constant-folding caused quite
> a confusing bug report (issue #5593): ?the problem (when
> we eventually tracked it down) was that the folded
> constant was in a .pyc file, and so wasn't updated when
> the compiler flags changed.

Right. Over the years the peephole optimizer and constant folding have
been a constant (though small) source of bugs. I'm not sure that there
is much real-world value in it, and it is certainly not right to
choose speed over correctness.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From thomas at python.org  Mon Apr  6 23:32:45 2009
From: thomas at python.org (Thomas Wouters)
Date: Mon, 6 Apr 2009 23:32:45 +0200
Subject: [Python-Dev] FWD: Library Reference is incomplete
In-Reply-To: <20090406130618.GD19296@panix.com>
References: <20090406130618.GD19296@panix.com>
Message-ID: <9e804ac0904061432r16475d1et9fb5b15494ff9d4@mail.gmail.com>

Anyone able to look into this and fix it? Having all of the normal
entrypoints for documentation broken is rather inconvenient for users :-)

On Mon, Apr 6, 2009 at 15:06, Aahz <aahz at pythoncraft.com> wrote:

> Hrm, looks like the whole 2.6 build is broken.
>
> ----- Forwarded message from "M?ller-Reineke, Matthias" <
> matthias.mueller-reineke at grundvers.de> -----
>
> > Subject: Library Reference is incomplete
> > Date: Mon, 6 Apr 2009 11:25:54 +0200
> > From: "M?ller-Reineke, Matthias" <matthias.mueller-reineke at grundvers.de>
> > To: webmaster at python.org
> >
> > Dear Webmaster,
> >
> > "Library Reference" on http://www.python.org/doc/ takes me to
> http://docs.python.org/library/ .
> > That side doesn't contain the index of contents.
> >
> > Matthias M?ller-Reineke
> >
> > ------------------------------------------
> > Grundeigent?mer-Versicherung VVaG
> > Gro?e B?ckerstra?e 7
> > 20095 Hamburg
> > Tel: 040 - 3 76 63 - 199
> > Fax: 040 - 3 76 63 - 98 199
> >
> > http://www.grundvers.de
> > <mailto:matthias.mueller-reineke at grundvers.de>
> >
> > Firmensitz: Hamburg HRB 13 103
> > Vorstand: Heinz Walter Berens (Vors.), R?diger Buyten
> > Aufsichtsratsvorsitzender: Peter Landmann
>
> ----- End forwarded message -----
>
> --
> Aahz (aahz at pythoncraft.com)           <*>
> http://www.pythoncraft.com/
>
> "...string iteration isn't about treating strings as sequences of strings,
> it's about treating strings as sequences of characters.  The fact that
> characters are also strings is the reason we have problems, but characters
> are strings for other good reasons."  --Aahz
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/thomas%40python.org
>

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090406/922fecea/attachment.htm>

From ncoghlan at gmail.com  Mon Apr  6 23:44:18 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 07 Apr 2009 07:44:18 +1000
Subject: [Python-Dev] deprecating BaseException.message
In-Reply-To: <grd9ru$pn1$1@ger.gmane.org>
References: <bbaeab100704092133x3067a779h4c33ee08a12d1bbe@mail.gmail.com>
	<grd9ru$pn1$1@ger.gmane.org>
Message-ID: <49DA77B2.4020508@gmail.com>

Tres Seaver wrote:
> I don't think either of these classes should be subject to a deprecation
> warning for a feature they never used or depended on.

Agreed. Could you raise a tracker issue for the spurious warnings? (I
believe we should be able to make the warning condition a bit smarter to
eliminate these).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Mon Apr  6 23:51:26 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 07 Apr 2009 07:51:26 +1000
Subject: [Python-Dev] Mercurial?
In-Reply-To: <87ab6tappe.fsf@benfinney.id.au>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87499.5060502@v.loewis.de>	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>	<49D8EC71.5020105@v.loewis.de>	<acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com>	<20090406042016.GA97@panix.com>	<acd65fa20904052131q2575299bi5c81659f62c3dabf@mail.gmail.com>	<49D9E92E.4010603@gmail.com>
	<87ab6tappe.fsf@benfinney.id.au>
Message-ID: <49DA795E.3030104@gmail.com>

Ben Finney wrote:
> Nick Coghlan <ncoghlan at gmail.com> writes:
>> Mercurial appears to best allow the sales pitch to be tailored to
>> the target audience (in this case, a group including a lot of people
>> with a background predominantly involving centralised version
>> control tools).
> 
> I don't follow. Wouldn't your preceding points above instead make
> *Bazaar* the one best suited for a group including a lot of people
> with a background predominantly involving centralised version control
> tools?

Yes, but the Bazaar advocates appear to have a hard time convincing the
other existing DVCS users that it provides *enough* access to the
underlying graph. So it then tends to get resisted by the folks that are
already fans of git or Mercurial.

Like I said though, this is a subjective impression formed by reading
what other people have written rather than by actually experiencing any
of the tools myself. I'm sure all of them are quite capable of getting
the job done :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From martin at v.loewis.de  Tue Apr  7 00:05:05 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Apr 2009 00:05:05 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49D9EB15.8070806@gmail.com>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>
	<49D9EB15.8070806@gmail.com>
Message-ID: <49DA7C91.6010202@v.loewis.de>

Nick Coghlan wrote:
> Dirkjan Ochtman wrote:
>> I have a stab at an author map at http://dirkjan.ochtman.nl/author-map.
>> Could use some review, but it seems like a good start.
> 
> Martin may be able to provide a better list of names based on the
> checkin name<->SSH public key mapping in the SVN setup.

I think the identification in the SSH keys is useless. It contains
strings like "loewis at mira" or "ncoghlan at uberwald", or even multiple
of them (barry at wooz, barry at resist, ...).

It seems that the PEP needs to spell out a policy as to what committer
information needs to look like; then we need to verify that the proposed
name mapping matches that policy.

> (e.g. I believe my SVN checkin name is nick.coghlan rather than the
> shorter ncoghlan in my email address, and many others are in a similar
> boat since first.last was the chosen scheme for names in the SVN switchover)

Correct. The objective was to not allow nick names, but have real names
as committer names. It appears that this policy does not directly
translate into Mercurial.

Regards,
Martin

From rhamph at gmail.com  Tue Apr  7 00:05:58 2009
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 6 Apr 2009 16:05:58 -0600
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <5c6f2a5d0904061322t2c7f6bd7y55f73ced221c8804@mail.gmail.com>
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
	<5c6f2a5d0904061230j6ef7fd18q369c19d91c5e34b8@mail.gmail.com>
	<B42D463B6C7148F6BFF0F722361EB031@RaymondLaptop1>
	<5c6f2a5d0904061322t2c7f6bd7y55f73ced221c8804@mail.gmail.com>
Message-ID: <aac2c7cb0904061505j5b8a5b6fj13042bd081465f01@mail.gmail.com>

On Mon, Apr 6, 2009 at 2:22 PM, Mark Dickinson <dickinsm at gmail.com> wrote:
> Well, I'd say that the obvious solution here is to compute
> the constant 2.0**53 just once, somewhere outside the
> inner loop. ?In any case, that value would probably be better
> written as 2.0**DBL_MANT_DIG (or something similar).
>
> As Antoine reported, the constant-folding caused quite
> a confusing bug report (issue #5593): ?the problem (when
> we eventually tracked it down) was that the folded
> constant was in a .pyc file, and so wasn't updated when
> the compiler flags changed.

Another way of looking at this is that we have a ./configure option
which affects .pyc output.  Therefor, we should add a flag to the
magic number, causing it to be regenerated as needed.

Whether that's better or worse than removing constant folding I
haven't decided.  I have such low expectations of floating point that
I'm not surprised by bugs like this.  I'm more surprised that people
expect consistent, deterministic results...

-- 
Adam Olsen, aka Rhamphoryncus

From martin at v.loewis.de  Tue Apr  7 00:12:26 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Apr 2009 00:12:26 +0200
Subject: [Python-Dev] Getting information out of the buildbots
In-Reply-To: <loom.20090406T203439-796@post.gmane.org>
References: <b8e622740904061332q783e12b7u61d96418cddbbf50@mail.gmail.com>
	<loom.20090406T203439-796@post.gmane.org>
Message-ID: <49DA7E4A.4020203@v.loewis.de>

> You can commit some temporary debug output in the tests (just sprinkle those
> print()'s you need to get your tasty information).

Also, if you want to do a sequence of changes to test a specific
machine, you might want to create a branch, make those changes, and then
trigger a build of that branch just on that specific slave (use
branches/<name> in the input field). When doing so, feel free to cancel
any automated build that is currently running; make sure to use your
real name in the UI so we know it's not spam.

Regards,
Martin

From syfou at users.sourceforge.net  Tue Apr  7 01:58:16 2009
From: syfou at users.sourceforge.net (Sylvain Fourmanoit)
Date: Mon, 6 Apr 2009 19:58:16 -0400 (EDT)
Subject: [Python-Dev] FWD: Documentation site problems
In-Reply-To: <20090406130446.GB19296@panix.com>
References: <20090406130446.GB19296@panix.com>
Message-ID: <alpine.LNX.2.00.0904061936490.21738@Turing.gateway.2wire.net>

>> there contents is missing from the python tutorial:
> The 3.0 docs seem to be correct:
> http://docs.python.org/3.0/tutorial/

It seems it is not the case anymore. The devel doc from Python 3 are 
missing a few tables of contents as well:

http://docs.python.org/dev/py3k/tutorial/

When I build the html doc locally, it looks like Sphinx from svn (r68598) 
has an issue with the 'numbered' option in the toctree directive. Here is 
my output of `make html' from revision 71295 of the py3k branch:

http://fourmanoit.googlepages.com/pydoc_output.txt

It did work fine a few days back though -- yesterday, the online doc was 
still complete: I believe it was last built on March the 28th. Yours,

--
Sylvain Fourmanoit

Memory fault -- core...uh...um...core... Oh dammit, I forget!

From steve at pearwood.info  Tue Apr  7 02:10:16 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 7 Apr 2009 10:10:16 +1000
Subject: [Python-Dev]
	=?iso-8859-1?q?pyc_files=2C_constant_folding_and_bor?=
	=?iso-8859-1?q?derline_=09portability_issues?=
In-Reply-To: <ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com>
References: <loom.20090329T143625-223@post.gmane.org>
	<op.uryyh7uh03jqhe@cesareprova.org>
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com>
Message-ID: <200904071010.16855.steve@pearwood.info>

On Tue, 7 Apr 2009 07:27:29 am Guido van Rossum wrote:

> Unfortunately the language reference is not the only thing we have to
> worry about. Unlike languages like C++, where compiler writers have
> the moral right to modify the compiler as long as they stay within
> the weasel-words of the standard, in Python, users' expectations
> carry value. Since the language is inherently not that fast, users
> are not all that focused on performance (if they were, they wouldn't
> be using Python). Unsurprising behavior OTOH is valued tremendously.

Speaking as a user, Python's slowness is *not* a feature. Anything 
reasonable which can increase performance is a Good Thing.

One of the better aspects of Python programming is that (in general) you 
can write code in the most natural way possible, with the least amount 
of scaffolding getting in the way. I'm with Raymond: I think it would 
be sad if "exp = long(mant * 2.0 ** 53)" did the exponentiation in the 
inner-loop. Pre-computing that value outside the loop counts as 
scaffolding, and gets in the way of readability and beauty.

On the other hand, I'm with Guido when he wrote "it is certainly not 
right to choose speed over correctness". This is especially a problem 
for floating point optimizations, and I urge Cesare to be conservative 
in any f.p. optimizations he introduces, including constant folding.

So... +1 on the general principle of constant folding, -0.5 on any such 
optimizations which change the semantics of a f.p. operation. The only 
reason it's -0.5 rather than -1 is that (presumably) anyone who cares 
about floating point correctness already knows to never trust the 
compiler.

-- 
Steven D'Aprano

From guido at python.org  Tue Apr  7 02:18:42 2009
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Apr 2009 17:18:42 -0700
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <200904071010.16855.steve@pearwood.info>
References: <loom.20090329T143625-223@post.gmane.org>
	<op.uryyh7uh03jqhe@cesareprova.org> 
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com> 
	<200904071010.16855.steve@pearwood.info>
Message-ID: <ca471dc20904061718y20311f67p8e2e937070372779@mail.gmail.com>

On Mon, Apr 6, 2009 at 5:10 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Tue, 7 Apr 2009 07:27:29 am Guido van Rossum wrote:
>
>> Unfortunately the language reference is not the only thing we have to
>> worry about. Unlike languages like C++, where compiler writers have
>> the moral right to modify the compiler as long as they stay within
>> the weasel-words of the standard, in Python, users' expectations
>> carry value. Since the language is inherently not that fast, users
>> are not all that focused on performance (if they were, they wouldn't
>> be using Python). Unsurprising behavior OTOH is valued tremendously.
>
> Speaking as a user, Python's slowness is *not* a feature. Anything
> reasonable which can increase performance is a Good Thing.
>
> One of the better aspects of Python programming is that (in general) you
> can write code in the most natural way possible, with the least amount
> of scaffolding getting in the way. I'm with Raymond: I think it would
> be sad if "exp = long(mant * 2.0 ** 53)" did the exponentiation in the
> inner-loop. Pre-computing that value outside the loop counts as
> scaffolding, and gets in the way of readability and beauty.
>
> On the other hand, I'm with Guido when he wrote "it is certainly not
> right to choose speed over correctness". This is especially a problem
> for floating point optimizations, and I urge Cesare to be conservative
> in any f.p. optimizations he introduces, including constant folding.
>
> So... +1 on the general principle of constant folding, -0.5 on any such
> optimizations which change the semantics of a f.p. operation. The only
> reason it's -0.5 rather than -1 is that (presumably) anyone who cares
> about floating point correctness already knows to never trust the
> compiler.

Unfortunately, historically well-meaning attempts at adding
constant-folding have more than once introduced obscure bugs that were
hard to reproduce and only discovered one or two releases later. This
has little to do with caring about float correctness. It's more about
the difficulty of debugging Heisenbugs. For all these reasons should
be super risk averse in this area.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ndbecker2 at gmail.com  Tue Apr  7 02:25:44 2009
From: ndbecker2 at gmail.com (Neal Becker)
Date: Mon, 06 Apr 2009 20:25:44 -0400
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<loom.20090329T175435-783@post.gmane.org>
	<5b8d13220903291114k17e9eff9v6d1a5eef1fb72332@mail.gmail.com>
Message-ID: <gre6i9$l76$1@ger.gmane.org>

David Cournapeau wrote:

> On Mon, Mar 30, 2009 at 2:59 AM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
...

> 
> Waf is definitely faster than scons - something like one order of
> magnitude. I am yet very familiar with waf, but I like what I saw -
> the architecture is much nicer than scons (waf core amount of code is
> almost ten times smaller than scons core), but I would not call it a
> mature project yet.
> 

I haven't tried waf, but IIUC it _solves_ the bootstrap issue.

From steve at holdenweb.com  Tue Apr  7 03:35:22 2009
From: steve at holdenweb.com (Steve Holden)
Date: Mon, 06 Apr 2009 21:35:22 -0400
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
Message-ID: <greakt$tds$1@ger.gmane.org>

Ondrej Certik wrote:
> Hi,
> 
> On Sun, Mar 29, 2009 at 10:21 AM, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
>> I've heard some good things about cmake ? LLVM, googletest, and Boost
>> are all looking at switching to it ? so I wanted to see if we could
>> simplify our autoconf+makefile system by using it. The biggest wins I
>> see from going to cmake are:
>>  1. It can autogenerate the Visual Studio project files instead of
>> needing them to be maintained separately
>>  2. It lets you write functions and modules without understanding
>> autoconf's mix of shell and M4.
>>  3. Its generated Makefiles track header dependencies accurately so we
>> might be able to add private headers efficiently.
> 
> I am switching to cmake with all my python projects, as it is rock
> solid, supports building in parallel (if I have some C++ and Cython
> extensions), and the configure part works well.
> 
> The only disadvantage that I can see is that one has to learn a new
> syntax, which is not Python. But on the other hand, at least it forces
> one to really just use cmake to write build scripts in a standard way,
> while scons and other Python solutions imho encourage to write full
> Python programs, which imho is a disadvantage for the build system, as
> then every build system is nonstandard.
> 
[obirrelevance]

Isn't it strange how nobody every complained about the significance of
whitespace in makefiles: only the fact that leading tabs were required
rather than just-any-old whitespace.

I guess some people just home in on things to complain about.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Want to know? Come to PyCon - soon! http://us.pycon.org/

From steve at holdenweb.com  Tue Apr  7 04:25:36 2009
From: steve at holdenweb.com (Steve Holden)
Date: Mon, 06 Apr 2009 22:25:36 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <ea2499da0904052304n277d3874la32c7056dc3e76a3@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>
	<49D8FA3A.5050400@v.loewis.de>	<49D8FC47.8080803@ochtman.nl>	<acd65fa20904052120h54132755g7ebf936f7842a69a@mail.gmail.com>
	<ea2499da0904052304n277d3874la32c7056dc3e76a3@mail.gmail.com>
Message-ID: <49DAB9A0.2090803@holdenweb.com>

Dirkjan Ochtman wrote:
> On Mon, Apr 6, 2009 at 06:20, Alexandre Vassalotti
> <alexandre at peadrop.com> wrote:
>> But that won't work if people who are not core developers submit us
>> patch bundle to import. And maintaining a such white-list sounds to me
>> more burdensome than necessary.
> 
> Well, if you need contributors to sign a contributor's agreement
> anyway, there's already some list out there that we can leverage.
> 
> The other option is to play the consenting adults card and ask all
> people with push access to ascertain the correct names of committer
> names on patches they push.
> 
I would remind you all that it's *very* necessary to make sure that
whatever finds its way into released code is indeed covered by
contributor agreements. The PSF (as the guardian of the IP) has to
ensure this, and so we have to find a way of ensuring that all
contributions to source are correctly logged against authors in a
traceable way.

regasds
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Want to know? Come to PyCon - soon! http://us.pycon.org/

From steve at holdenweb.com  Tue Apr  7 04:25:36 2009
From: steve at holdenweb.com (Steve Holden)
Date: Mon, 06 Apr 2009 22:25:36 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <ea2499da0904052304n277d3874la32c7056dc3e76a3@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>
	<49D8FA3A.5050400@v.loewis.de>	<49D8FC47.8080803@ochtman.nl>	<acd65fa20904052120h54132755g7ebf936f7842a69a@mail.gmail.com>
	<ea2499da0904052304n277d3874la32c7056dc3e76a3@mail.gmail.com>
Message-ID: <49DAB9A0.2090803@holdenweb.com>

Dirkjan Ochtman wrote:
> On Mon, Apr 6, 2009 at 06:20, Alexandre Vassalotti
> <alexandre at peadrop.com> wrote:
>> But that won't work if people who are not core developers submit us
>> patch bundle to import. And maintaining a such white-list sounds to me
>> more burdensome than necessary.
> 
> Well, if you need contributors to sign a contributor's agreement
> anyway, there's already some list out there that we can leverage.
> 
> The other option is to play the consenting adults card and ask all
> people with push access to ascertain the correct names of committer
> names on patches they push.
> 
I would remind you all that it's *very* necessary to make sure that
whatever finds its way into released code is indeed covered by
contributor agreements. The PSF (as the guardian of the IP) has to
ensure this, and so we have to find a way of ensuring that all
contributions to source are correctly logged against authors in a
traceable way.

regasds
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Want to know? Come to PyCon - soon! http://us.pycon.org/

From dschult at colgate.edu  Tue Apr  7 05:47:17 2009
From: dschult at colgate.edu (Dan Schult)
Date: Mon, 6 Apr 2009 23:47:17 -0400
Subject: [Python-Dev] calling dictresize outside dictobject.c
Message-ID: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu>

Hi,
I'm trying to write a C extension which is a subclass of dict.
I want to do something like a setdefault() but with a single lookup.

Looking through the dictobject code, the three workhorse
routines lookdict, insertdict and dictresize are not available
directly for functions outside dictobject.c,
but I can get at lookdict through dict->ma_lookup().

So I use lookdict to get the PyDictEntry (call it ep) I'm looking for.
The comments for lookdict say ep is ready to be set... so I do that.
Then I check whether the dict needs to be resized--following the
nice example of PyDict_SetItem.  But I can't call dictresize to finish
off the process.

Should I be using PyDict_SetItem directly?  No... it does its own  
lookup.
I don't want a second lookup!   I already know which entry will be  
filled.

So then I look at the code for setdefault and it also does
a double lookup for checking and setting an entry.

What subtle issue am I missing?
Why does setdefault do a double lookup?
More globally, why isn't dictresize available through the C-API?

If there isn't a reason to do a double lookup I have a patch for  
setdefault,
but I thought I should ask here first.

Thanks!
Dan 

From greg.ewing at canterbury.ac.nz  Tue Apr  7 07:20:24 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Apr 2009 17:20:24 +1200
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <greakt$tds$1@ger.gmane.org>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<greakt$tds$1@ger.gmane.org>
Message-ID: <49DAE298.7040007@canterbury.ac.nz>

Steve Holden wrote:

> Isn't it strange how nobody every complained about the significance of
> whitespace in makefiles: only the fact that leading tabs were required
> rather than just-any-old whitespace.

Make doesn't care how *much* whitespace there
is, though, only whether it's there or not. If
it accepted anything that looks like whitespace,
there would be no cause for complaint.

-- 
Greg

From fetchinson at googlemail.com  Tue Apr  7 07:55:47 2009
From: fetchinson at googlemail.com (Daniel Fetchinson)
Date: Mon, 6 Apr 2009 22:55:47 -0700
Subject: [Python-Dev] decorator module in stdlib?
Message-ID: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>

The decorator module [1] written by Michele Simionato is a very useful
tool for maintaining function signatures while applying a decorator.
Many different projects implement their own versions of the same
functionality, for example turbogears has its own utility for this, I
guess others do something similar too.

Was the issue whether to include this module in the stdlib raised? If
yes, what were the arguments against it? If not, what do you folks
think, shouldn't it be included? I certainly think it should be.

Originally I sent this message to c.l.p [2] and Michele suggested it
be brought up on python-dev. He also pointed out that a PEP [3] is
already written about this topic and it is in draft form.

What do you guys think, wouldn't this be a useful addition to functools?

Cheers,
Daniel

[1] http://pypi.python.org/pypi/decorator
[2] http://groups.google.com/group/comp.lang.python/browse_thread/thread/d4056023f1150fe0
[3] http://www.python.org/dev/peps/pep-0362/

-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown

From stephen at xemacs.org  Tue Apr  7 08:03:05 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 07 Apr 2009 15:03:05 +0900
Subject: [Python-Dev] Mercurial?
In-Reply-To: <acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com>
	<49D8EC71.5020105@v.loewis.de>
	<acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com>
Message-ID: <873aclt2zq.fsf@xemacs.org>

Alexandre Vassalotti writes:

 > This makes me remember that we will have to decide how we will
 > reorganize our workflow. For this, we can either be conservative and
 > keep the current CVS-style development workflow--i.e., a few main
 > repositories where all developers can commit to.

That was the original idea of PEP 374, that was a presumption under
which I wrote my part of it, I think we should stick with it.  As
people develop personal workflows, they can suggest them, and/or
changes in the public workflow needed to support them.  But there
should be a working sample implementation before thinking about
changes to the workflow.

Simply allowing more people to work effectively offline is going to
speed things up perceptibly.  Improved branching will add to that
impact.  The current workflow is pretty clean.  Let's not mess it up
or all that will be achieved is to speed up the mess.

 > Or we could drink the kool-aid and go with a kernel-style
 > development workflow--i.e., each developer maintains his own branch
 > and pull changes from each others.

Can you give examples of projects using Mercurial that do that?  All
of the Mercurial projects I've seen "up close" have relatively
centralized workflows, which Mercurial encourages because of the way
it likes to automatically merge.  I wouldn't want to try the kernel
style with Mercurial because its named branch support doesn't work the
way it should.  In my experience, to deal with external branches, you
have to maintain a separate workspace per external branch you want to
follow.

You'd also need to provide a users' guide to things like rebasing,
which become very important in a kernel-style workflow, but which the
Mercurial developers opposed on principle, at least at first.

 > However if we go kernel-style, I will need to designate someone
 > (i.e., an integrator) that will maintain the main branches, which
 > will tested by buildbot and used for the public releases. These are
 > issues I would like to address in the PEP.

IMHO, that's new PEP.  This is not part of the PEP 374 decision to go
to a dVCS, nor part of the requirements for implementation, whether
that is considered an extension of 374 or a new PEP in itself.

From dirkjan at ochtman.nl  Tue Apr  7 08:15:33 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Tue, 7 Apr 2009 08:15:33 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49DA7C91.6010202@v.loewis.de>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com>
	<49DA7C91.6010202@v.loewis.de>
Message-ID: <ea2499da0904062315j3a535077w387ce1323ad81a1b@mail.gmail.com>

On Tue, Apr 7, 2009 at 00:05, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I think the identification in the SSH keys is useless. It contains
> strings like "loewis at mira" or "ncoghlan at uberwald", or even multiple
> of them (barry at wooz, barry at resist, ...).

Right, so we'll put up the author map somewhere with the email
addresses I gathered and ask for a more thorough review at some point.

> It seems that the PEP needs to spell out a policy as to what committer
> information needs to look like; then we need to verify that the proposed
> name mapping matches that policy.

Right. It's basically "Name Lastname <email>" -- we can verify that in a hook.

> Correct. The objective was to not allow nick names, but have real names
> as committer names. It appears that this policy does not directly
> translate into Mercurial.

One of the nicer features of Mercurial/DVCSs, in my experience, is
that non-committers get to keep the credit on their patches. That
means that it's impossible to enforce a policy more extensive than
some basic checks (such as the format above). Unless we keep a list of
people who have signed an agreement, which will mean people will have
to re-do the username on commits that don't constitute a non-trivial
contribution.

Cheers,

Dirkjan

From alexandre at peadrop.com  Tue Apr  7 08:17:51 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Tue, 7 Apr 2009 02:17:51 -0400
Subject: [Python-Dev] Mercurial?
In-Reply-To: <873aclt2zq.fsf@xemacs.org>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com> 
	<49D87499.5060502@v.loewis.de>
	<acd65fa20904050611x3a53d3e0p6149a375839e9407@mail.gmail.com> 
	<49D8EC71.5020105@v.loewis.de>
	<acd65fa20904052106r3dec8a22ve02aadf119d655ab@mail.gmail.com> 
	<873aclt2zq.fsf@xemacs.org>
Message-ID: <acd65fa20904062317p54e2b15dxe15b9ce5dca497d1@mail.gmail.com>

On Tue, Apr 7, 2009 at 2:03 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Alexandre Vassalotti writes:
>
> ?> This makes me remember that we will have to decide how we will
> ?> reorganize our workflow. For this, we can either be conservative and
> ?> keep the current CVS-style development workflow--i.e., a few main
> ?> repositories where all developers can commit to.
>
> That was the original idea of PEP 374, that was a presumption under
> which I wrote my part of it, I think we should stick with it. ?As
> people develop personal workflows, they can suggest them, and/or
> changes in the public workflow needed to support them. ?But there
> should be a working sample implementation before thinking about
> changes to the workflow.
>

Aahz convinced me earlier that changing the current workflow would be
stupid. So, I now think the best thing to do is to provide a CVS-style
environment similar to what we have currently, and let the workflow
evolve naturally as developers gain more confidence with Mercurial.

>
> ?> Or we could drink the kool-aid and go with a kernel-style
> ?> development workflow--i.e., each developer maintains his own branch
> ?> and pull changes from each others.
>
> Can you give examples of projects using Mercurial that do that?
>

Mercurial itself is developed using that style, I believe.

-- Alexandre

From dirkjan at ochtman.nl  Tue Apr  7 08:18:34 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Tue, 7 Apr 2009 08:18:34 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49DAB9A0.2090803@holdenweb.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de>
	<49D8FC47.8080803@ochtman.nl>
	<acd65fa20904052120h54132755g7ebf936f7842a69a@mail.gmail.com>
	<ea2499da0904052304n277d3874la32c7056dc3e76a3@mail.gmail.com>
	<49DAB9A0.2090803@holdenweb.com>
Message-ID: <ea2499da0904062318q764dabe7ob9b469d0829421dc@mail.gmail.com>

On Tue, Apr 7, 2009 at 04:25, Steve Holden <steve at holdenweb.com> wrote:
> I would remind you all that it's *very* necessary to make sure that
> whatever finds its way into released code is indeed covered by
> contributor agreements. The PSF (as the guardian of the IP) has to
> ensure this, and so we have to find a way of ensuring that all
> contributions to source are correctly logged against authors in a
> traceable way.

I think having full name *and* email addresses make it easier to trace
code, I guess, since previously code not written by committers would
be harder to trace. The fact that some stuff isn't covered just
becomes more explicit, which is a good thing IMO.

Cheers,

Dirkjan

From ben+python at benfinney.id.au  Tue Apr  7 08:25:09 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Tue, 07 Apr 2009 16:25:09 +1000
Subject: [Python-Dev] Mercurial?
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com>
	<49DA7C91.6010202@v.loewis.de>
	<ea2499da0904062315j3a535077w387ce1323ad81a1b@mail.gmail.com>
Message-ID: <873acl7zga.fsf@benfinney.id.au>

Dirkjan Ochtman <dirkjan at ochtman.nl> writes:

> Right. It's basically "Name Lastname <email>" -- we can verify that
> in a hook.

Remembering, of course, that full names don't follow any template
(especially not first-name last-name). The person's full name must be
treated as free-form text, since there's no format common to all.

-- 
 \     ?We should strive to do things in [Gandhi's] spirit? not to use |
  `\   violence in fighting for our cause, but by non-participation in |
_o__)                       what we believe is evil.? ?Albert Einstein |
Ben Finney

From dirkjan at ochtman.nl  Tue Apr  7 08:30:10 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Tue, 7 Apr 2009 08:30:10 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <873acl7zga.fsf@benfinney.id.au>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com>
	<49DA7C91.6010202@v.loewis.de>
	<ea2499da0904062315j3a535077w387ce1323ad81a1b@mail.gmail.com>
	<873acl7zga.fsf@benfinney.id.au>
Message-ID: <ea2499da0904062330o68066bacgbcdbca865c57ee0e@mail.gmail.com>

On Tue, Apr 7, 2009 at 08:25, Ben Finney <ben+python at benfinney.id.au> wrote:
> Remembering, of course, that full names don't follow any template
> (especially not first-name last-name). The person's full name must be
> treated as free-form text, since there's no format common to all.

Of course, unless we lock it down through a list of people who have
contributor's agreements.

Cheers,

Dirkjan

From cesare.dimauro at a-tono.com  Tue Apr  7 09:27:04 2009
From: cesare.dimauro at a-tono.com (Cesare Di Mauro)
Date: Tue, 07 Apr 2009 09:27:04 +0200
Subject: [Python-Dev] pyc files,
 constant folding and borderline portability issues
In-Reply-To: <200904071010.16855.steve@pearwood.info>
References: <loom.20090329T143625-223@post.gmane.org>
	<op.uryyh7uh03jqhe@cesareprova.org>
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com>
	<200904071010.16855.steve@pearwood.info>
Message-ID: <op.urz9neqq03jqhe@cesareprova.org>

On Apr 07, 2009 at 02:10AM, Steven D'Aprano <steve at pearwood.info> wrote:

> On the other hand, I'm with Guido when he wrote "it is certainly not
> right to choose speed over correctness". This is especially a problem
> for floating point optimizations, and I urge Cesare to be conservative
> in any f.p. optimizations he introduces, including constant folding.

The principle that I followed on doing constant folding was: "do what Python
will do without constant folding enabled".

So if Python will generate

LOAD_CONST	1
LOAD_CONST	2
BINARY_ADD

the constant folding code will simply replace them with a single

LOAD_CONST	3

When working with such kind of optimizations, the temptation is to
apply them at any situation possible. For example, in other languages
this

a = b * 2 * 3

will be replaced by

a = b * 6

In Python I can't do that, because b can be an object which overloaded
the * operator, so it *must* be called two times, one for 2 and one for 3.

That's the way I choose to implement constant folding.

The only difference at this time is regards invalid operations, which will
raise exceptions at compile time, not at running time.

So if you write:

a = 1 / 0

an exception will be raised at compile time.

I decided to let the exception be raised immediately, because I think that
it's better to detect an error at compile time than at execution time.

However, this can leed to incompatibilities with existing code, so in the
final implementation I will add a flag to struct compiling (in ast.c) so that
this behaviour can be controlled programmatically (enabling or not the
exception raising).

I already introduced a flag in struct compiling to control the constant
folding, that can be completely disabled, if desired.

> So... +1 on the general principle of constant folding, -0.5 on any such
> optimizations which change the semantics of a f.p. operation. The only
> reason it's -0.5 rather than -1 is that (presumably) anyone who cares
> about floating point correctness already knows to never trust the
> compiler.

As Raymond stated, there's no loss in precision working with constant
folding code on float datas. That's because there will be a rounding and
a store of computed values each time that a result is calculated.

Other languages will use FPU registers to hold results as long as
possibile, keeping full 80 bit precision (16 bit exponent + 64 bit
mantissa).
That's not the Python case.

Cesare

From andrewm at object-craft.com.au  Tue Apr  7 09:43:37 2009
From: andrewm at object-craft.com.au (Andrew McNamara)
Date: Tue, 7 Apr 2009 17:43:37 +1000
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com>
References: <loom.20090329T143625-223@post.gmane.org>
	<ca471dc20903290836n7420b81ep39bdb9bfd2373757@mail.gmail.com>
	<op.uryyh7uh03jqhe@cesareprova.org>
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com>
Message-ID: <7CCEA2C3-A131-4E35-BAE6-8D9896A786AB@object-craft.com.au>

On 07/04/2009, at 7:27 AM, Guido van Rossum wrote:
> On Mon, Apr 6, 2009 at 7:28 AM, Cesare Di Mauro
> <cesare.dimauro at a-tono.com> wrote:
>> The Language Reference says nothing about the effects of code  
>> optimizations.
>> I think it's a very good thing, because we can do some work here  
>> with constant
>> folding.
>
> Unfortunately the language reference is not the only thing we have to
> worry about. Unlike languages like C++, where compiler writers have
> the moral right to modify the compiler as long as they stay within the
> weasel-words of the standard, in Python, users' expectations carry
> value. Since the language is inherently not that fast, users are not
> all that focused on performance (if they were, they wouldn't be using
> Python). Unsurprising behavior OTOH is valued tremendously.

Rather than trying to get the optimizer to guess, why not have a  
"const" keyword and make it explicit? The result would be a symbol  
that essentially only exists at compile time - references to the  
symbol would be replaced by the computed value while compiling. Okay,  
maybe that would suck a bit (no symbolic debug output).

Yeah, I know... take it to python-wild-and-ill-considered-ideas at python.org 
.

From g.brandl at gmx.net  Tue Apr  7 10:27:28 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 07 Apr 2009 10:27:28 +0200
Subject: [Python-Dev] FWD: Library Reference is incomplete
In-Reply-To: <9e804ac0904061432r16475d1et9fb5b15494ff9d4@mail.gmail.com>
References: <20090406130618.GD19296@panix.com>
	<9e804ac0904061432r16475d1et9fb5b15494ff9d4@mail.gmail.com>
Message-ID: <grf2d3$ije$1@ger.gmane.org>

Thomas Wouters schrieb:
> 
> Anyone able to look into this and fix it? Having all of the normal
> entrypoints for documentation broken is rather inconvenient for users :-)

A rebuild should do the trick, I'll fix this ASAP.

Georg

From p.f.moore at gmail.com  Tue Apr  7 12:33:39 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 7 Apr 2009 11:33:39 +0100
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <op.urz9neqq03jqhe@cesareprova.org>
References: <loom.20090329T143625-223@post.gmane.org>
	<op.uryyh7uh03jqhe@cesareprova.org>
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com>
	<200904071010.16855.steve@pearwood.info>
	<op.urz9neqq03jqhe@cesareprova.org>
Message-ID: <79990c6b0904070333t42c55ddfhc72f4a2c987cc38e@mail.gmail.com>

2009/4/7 Cesare Di Mauro <cesare.dimauro at a-tono.com>:
> The principle that I followed on doing constant folding was: "do what Python
> will do without constant folding enabled".
>
> So if Python will generate
>
> LOAD_CONST ? ? ?1
> LOAD_CONST ? ? ?2
> BINARY_ADD
>
> the constant folding code will simply replace them with a single
>
> LOAD_CONST ? ? ?3
>
> When working with such kind of optimizations, the temptation is to
> apply them at any situation possible. For example, in other languages
> this
>
> a = b * 2 * 3
>
> will be replaced by
>
> a = b * 6
>
> In Python I can't do that, because b can be an object which overloaded
> the * operator, so it *must* be called two times, one for 2 and one for 3.
>
> That's the way I choose to implement constant folding.

That sounds sufficiently "super risk-averse" to me, so I'm in favour
of constant folding being implemented with this attitude :-)

Paul.

From steve at pearwood.info  Tue Apr  7 13:42:05 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 7 Apr 2009 21:42:05 +1000
Subject: [Python-Dev] Mercurial?
In-Reply-To: <ea2499da0904062330o68066bacgbcdbca865c57ee0e@mail.gmail.com>
References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au>
	<ea2499da0904062330o68066bacgbcdbca865c57ee0e@mail.gmail.com>
Message-ID: <200904072142.06158.steve@pearwood.info>

On Tue, 7 Apr 2009 04:30:10 pm Dirkjan Ochtman wrote:
> On Tue, Apr 7, 2009 at 08:25, Ben Finney <ben+python at benfinney.id.au> 
wrote:
> > Remembering, of course, that full names don't follow any template
> > (especially not first-name last-name). The person's full name must
> > be treated as free-form text, since there's no format common to
> > all.
>
> Of course, unless we lock it down through a list of people who have
> contributor's agreements.

Perhaps you should ask Aahz what he thinks about being forced to provide 
two names before being allowed to contribute.

To say nothing of noted MIT professor and computer scientist Arvind, 
British lords, the magician Teller, and millions of people from 
Spanish, Portuguese, Indonesian, Burmese and Malaysian cultures.

Ben is correct: you can't assume that contributors will have both a 
first name and a last name, or that a first name and last name is 
sufficient to legally identify them. Those from Spanish and Portuguese 
cultures usually have two family names as well as a personal name; 
people from Indonesian, Burmese and Malaysian cultures often only use a 
single name.

-- 
Steven D'Aprano

From dirkjan at ochtman.nl  Tue Apr  7 13:57:05 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Tue, 7 Apr 2009 13:57:05 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <200904072142.06158.steve@pearwood.info>
References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au>
	<ea2499da0904062330o68066bacgbcdbca865c57ee0e@mail.gmail.com>
	<200904072142.06158.steve@pearwood.info>
Message-ID: <ea2499da0904070457k5d177139u19e53ff6ae8c27c9@mail.gmail.com>

On Tue, Apr 7, 2009 at 13:42, Steven D'Aprano <steve at pearwood.info> wrote:
> Perhaps you should ask Aahz what he thinks about being forced to provide
> two names before being allowed to contribute.

Huh? The contributor's agreement list would presumably include real
names only (so Aahz is out of luck), but the names wouldn't need to be
limited to just one "word".

I don't think I was implying otherwise; maybe my example much earlier
in the thread was simplistic and I should have put it in EBNF (with
Unicode character classes just to be very sure).

Oh, yes, I am excluding people whose names include non-Unicode
characters. Tough luck.

Cheers,

Dirkjan

From mal at egenix.com  Tue Apr  7 14:02:54 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 07 Apr 2009 14:02:54 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090403004135.B76443A40A7@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<20090403004135.B76443A40A7@sparrow.telecommunity.com>
Message-ID: <49DB40EE.60004@egenix.com>

On 2009-04-03 02:44, P.J. Eby wrote:
> At 10:33 PM 4/2/2009 +0200, M.-A. Lemburg wrote:
>> Alternative Approach:
>> ---------------------
>>
>> Wouldn't it be better to stick with a simpler approach and look for
>> "__pkg__.py" files to detect namespace packages using that O(1) check ?
> 
>> One of the namespace packages, the defining namespace package, will have
>> to include a __init__.py file.
> 
> Note that there is no such thing as a "defining namespace package" --
> namespace package contents are symmetrical peers.

That was a definition :-)

Definition namespace package := the namespace package having the
                                __pkg__.py file

This is useful to have since packages allowing integration of
other sub-packages typically come as a base package with some
basic infra-structure in place which is required by all other
namespace packages.

If the __init__.py file is not found among the namespace directories,
the importer will have to raise an exception, since the result
would not be a proper Python package.

>> * It's possible to have a defining package dir and add-one package
>> dirs.
> 
> Also possible in the PEP, although the __init__.py must be in the first
> such directory on sys.path.  (However, such "defining" packages are not
> that common now, due to tool limitations.)

That's a strange limitation of the PEP. Why should the location of
the __init__.py file depend on the order of sys.path ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 03 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From tseaver at palladion.com  Tue Apr  7 14:07:51 2009
From: tseaver at palladion.com (Tres Seaver)
Date: Tue, 07 Apr 2009 08:07:51 -0400
Subject: [Python-Dev] deprecating BaseException.message
In-Reply-To: <49DA77B2.4020508@gmail.com>
References: <bbaeab100704092133x3067a779h4c33ee08a12d1bbe@mail.gmail.com>	<grd9ru$pn1$1@ger.gmane.org>
	<49DA77B2.4020508@gmail.com>
Message-ID: <49DB4217.5060004@palladion.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nick Coghlan wrote:
> Tres Seaver wrote:
>> I don't think either of these classes should be subject to a deprecation
>> warning for a feature they never used or depended on.
> 
> Agreed. Could you raise a tracker issue for the spurious warnings? (I
> believe we should be able to make the warning condition a bit smarter to
> eliminate these).

Done:  http://bugs.python.org/issue5716?

Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ20IX+gerLs4ltQ4RAkuDAKCTZNp0r38d+hW8TmvjIh9Sj59CJQCfbJlQ
taNbsBUT79MF8t7owySE2dg=
=LjZf
-----END PGP SIGNATURE-----

From mal at egenix.com  Tue Apr  7 14:25:08 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 07 Apr 2009 14:25:08 +0200
Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382:
 Namespace Packages)
In-Reply-To: <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
Message-ID: <49DB4624.604@egenix.com>

On 2009-04-06 15:21, Jesse Noller wrote:
> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 2009-04-02 17:32, Martin v. L?wis wrote:
>>> I propose the following PEP for inclusion to Python 3.1.
>> Thanks for picking this up.
>>
>> I'd like to extend the proposal to Python 2.7 and later.
>>
> 
> -1 to adding it to the 2.x series. There was much discussion around
> adding features to 2.x *and* 3.0, and the consensus seemed to *not*
> add new features to 2.x and use those new features as carrots to help
> lead people into 3.0.

I must have missed that discussion :-)

Where's the PEP pinning this down ?

The Python 2.x user base is huge and the number of installed
applications even larger.

Cutting these users and application developers off of important new
features added to Python 3 is only going to work as "carrot" for
those developers who:

 * have enough resources (time, money, manpower) to port their existing
   application to Python 3

 * can persuade their users to switch to Python 3

 * don't rely much on 3rd party libraries (the bread and butter
   of Python applications)

Realistically, such a porting effort is not likely going to happen
for any decent sized application, except perhaps a few open source
ones.

Such a policy would then translate to a dead end for Python 2.x
based applications.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 07 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From skip at pobox.com  Tue Apr  7 14:14:22 2009
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 7 Apr 2009 07:14:22 -0500
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
Message-ID: <18907.17310.201358.697994@montanaro.dyndns.org>

    Ondrej> ... while scons and other Python solutions imho encourage to
    Ondrej> write full Python programs, which imho is a disadvantage for the
    Ondrej> build system, as then every build system is nonstandard.

Hmmm...  Like distutils setup scripts?

I don't know thing one about cmake, but if it's good for the goose (building
Python proper) would it be good for the gander (building extensions)?

-- 
Skip Montanaro - skip at pobox.com - http://www.smontanaro.net/
        "XML sucks, dictionaries rock" - Dave Beazley

From mal at egenix.com  Tue Apr  7 14:30:19 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 07 Apr 2009 14:30:19 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49D66C6E.3090602@v.loewis.de>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<49D66C6E.3090602@v.loewis.de>
Message-ID: <49DB475B.8060504@egenix.com>

[Resent due to a python.org mail server problem]

On 2009-04-03 22:07, Martin v. L?wis wrote:
>> I'd like to extend the proposal to Python 2.7 and later.
> 
> I don't object, but I also don't want to propose this, so
> I added it to the discussion.
> 
> My (and perhaps other people's) concern is that 2.7 might
> well be the last release of the 2.x series. If so, adding
> this feature to it would make 2.7 an odd special case for
> users and providers of third party tools.

I certainly hope that we'll see more useful features backported
from 3.x to the 2.x series or forward ported from 2.x to 3.x
(depending on what the core developer preferences are).

Regarding this particular PEP, it is well possible to implement
an importer that provides the functionality for Python 2.3-2.7
versions, so it doesn't have to be an odd special case.

>> That's going to slow down Python package detection a lot - you'd
>> replace an O(1) test with an O(n) scan.
> 
> I question that claim. In traditional Unix systems, the file system
> driver performs a linear search of the directory, so it's rather
> O(n)-in-kernel vs. O(n)-in-Python. Even for advanced file systems,
> you need at least O(log n) to determine whether a specific file is
> in a directory. For all practical purposes, the package directory
> will fit in a single disk block (containing a single .pkg file, and
> one or few subpackages), making listdir complete as fast as stat.

On second thought, you're right, it won't be that costly. It
requires an os.listdir() scan due to the wildcard approach and in
some cases, such a scan may not be possible, e.g. when using
frozen packages. Indeed, the freeze mechanism would not even add
the .pkg files - it only handles .py file content.

The same is true for distutils, MANIFEST generators and other
installer mechanisms - it would have to learn to package
the .pkg files along with the Python files.

Another problem with the .pkg file approach is that the file extension
is already in use for e.g. Mac OS X installers.

You don't have those issues with the __pkg__.py file approach
I suggested.

>> Wouldn't it be better to stick with a simpler approach and look for
>> "__pkg__.py" files to detect namespace packages using that O(1) check ?
> 
> Again - this wouldn't be O(1). More importantly, it breaks system
> packages, which now again have to deal with the conflicting file names
> if they want to install all portions into a single location.

True, but since that means changing the package infrastructure, I think
it's fair to ask distributors who want to use that approach to also take
care of looking into the __pkg__.py files and merging them if
necessary.

Most of the time the __pkg__.py files will be empty, so that's not
really much to ask for.

>> This would also avoid any issues you'd otherwise run into if you want
>> to maintain this scheme in an importer that doesn't have access to a list
>> of files in a package directory, but is well capable for the checking
>> the existence of a file.
> 
> Do you have a specific mechanism in mind?

Yes: frozen modules and imports straight from a web resource.

The .pkg file approach requires a directory scan and additional
support from all importers.

The __pkg__.py approach I suggested can use existing importers
without modifications by checking for the existence of such
a Python module in an importer managed resource.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 07 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From solipsis at pitrou.net  Tue Apr  7 14:53:02 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 7 Apr 2009 12:53:02 +0000 (UTC)
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
Message-ID: <loom.20090407T125242-36@post.gmane.org>

<skip <at> pobox.com> writes:
> 
> I don't know thing one about cmake, but if it's good for the goose (building
> Python proper) would it be good for the gander (building extensions)?

African or European?

From ndbecker2 at gmail.com  Tue Apr  7 15:02:13 2009
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 07 Apr 2009 09:02:13 -0400
Subject: [Python-Dev] What's missing from easy_install
Message-ID: <grfism$7j5$1@ger.gmane.org>

1. easy_remove!

2. Various utilities to provide query package management.
   - easy_install --list (list files installed)

From kdr2 at x-macro.com  Tue Apr  7 15:05:01 2009
From: kdr2 at x-macro.com (KDr2)
Date: Tue, 7 Apr 2009 21:05:01 +0800
Subject: [Python-Dev] What's missing from easy_install
In-Reply-To: <grfism$7j5$1@ger.gmane.org>
References: <grfism$7j5$1@ger.gmane.org>
Message-ID: <efc790a30904070605j58347e60jbe0d710da7ffd8ff@mail.gmail.com>

I need an CPyAN.

--
Best Regards,
   -- KDr2, at x-macro.com.

On Tue, Apr 7, 2009 at 9:02 PM, Neal Becker <ndbecker2 at gmail.com> wrote:

> 1. easy_remove!
>
> 2. Various utilities to provide query package management.
>   - easy_install --list (list files installed)
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/kdr2%40x-macro.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090407/9f4848ae/attachment.htm>

From jnoller at gmail.com  Tue Apr  7 15:06:41 2009
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 7 Apr 2009 09:06:41 -0400
Subject: [Python-Dev] What's missing from easy_install
In-Reply-To: <grfism$7j5$1@ger.gmane.org>
References: <grfism$7j5$1@ger.gmane.org>
Message-ID: <4222a8490904070606s77e8177exeb053c03bc63ae30@mail.gmail.com>

On Tue, Apr 7, 2009 at 9:02 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
> 1. easy_remove!
>
> 2. Various utilities to provide query package management.
> ? - easy_install --list (list files installed)

This discussion should happen on the distutils-sig list; not python-dev

From solipsis at pitrou.net  Tue Apr  7 15:06:53 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 7 Apr 2009 13:06:53 +0000 (UTC)
Subject: [Python-Dev] =?utf-8?q?What=27s_missing_from_easy=5Finstall?=
References: <grfism$7j5$1@ger.gmane.org>
Message-ID: <loom.20090407T130608-179@post.gmane.org>

Neal Becker <ndbecker2 <at> gmail.com> writes:
> 
> 2. Various utilities to provide query package management.
>    - easy_install --list (list files installed)

"yolk" will tell you that.
http://pypi.python.org/pypi/yolk

Regards

Antoine.

From cournape at gmail.com  Tue Apr  7 15:08:38 2009
From: cournape at gmail.com (David Cournapeau)
Date: Tue, 7 Apr 2009 22:08:38 +0900
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <18907.17310.201358.697994@montanaro.dyndns.org>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
Message-ID: <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>

On Tue, Apr 7, 2009 at 9:14 PM,  <skip at pobox.com> wrote:
>
> ? ?Ondrej> ... while scons and other Python solutions imho encourage to
> ? ?Ondrej> write full Python programs, which imho is a disadvantage for the
> ? ?Ondrej> build system, as then every build system is nonstandard.
>
> Hmmm... ?Like distutils setup scripts?

fortunately, waf and scons are much better than distutils, at least
for the build part :)

I think it is hard to overestimate the importance of a python solution
for python softwares (python itself is different). Having a full
fledged language for complex builds is nice, I think most familiar
with complex makefiles would agree with this.

>
> I don't know thing one about cmake, but if it's good for the goose (building
> Python proper) would it be good for the gander (building extensions)?

For complex softwares, specially ones relying on lot of C and platform
idiosyncrasies, distutils is just too cumbersome and limited. Both
Ondrej and me use python for scientific usage, and I think it is no
hazard that we both look for something else. In those cases, scons -
and cmake it seems - are very nice; build tools are incredibly hard to
get right once you want to manage dependencies automatically.

For simple python projects (pure python, a few .c source files without
much dependencies), I think it is just overkill.

cheers,

David
>
> --
> Skip Montanaro - skip at pobox.com - http://www.smontanaro.net/
> ? ? ? ?"XML sucks, dictionaries rock" - Dave Beazley
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/cournape%40gmail.com
>

From alex.neundorf at kitware.com  Tue Apr  7 15:08:54 2009
From: alex.neundorf at kitware.com (Alexander Neundorf)
Date: Tue, 7 Apr 2009 15:08:54 +0200
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <18907.17310.201358.697994@montanaro.dyndns.org>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
Message-ID: <806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com>

On Tue, Apr 7, 2009 at 2:14 PM,  <skip at pobox.com> wrote:
>
>    Ondrej> ... while scons and other Python solutions imho encourage to
>    Ondrej> write full Python programs, which imho is a disadvantage for the
>    Ondrej> build system, as then every build system is nonstandard.

I fully agree here.

> Hmmm...  Like distutils setup scripts?
>
> I don't know thing one about cmake, but if it's good for the goose (building
> Python proper) would it be good for the gander (building extensions)?

What is involved in building python extensions ? Can you please explain ?

Alex

From cournape at gmail.com  Tue Apr  7 15:23:18 2009
From: cournape at gmail.com (David Cournapeau)
Date: Tue, 7 Apr 2009 22:23:18 +0900
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com>
Message-ID: <5b8d13220904070623j258605bob5200dc84362dc11@mail.gmail.com>

On Tue, Apr 7, 2009 at 10:08 PM, Alexander Neundorf
<alex.neundorf at kitware.com> wrote:

>
> What is involved in building python extensions ? Can you please explain ?

Not much: at the core, a python extension is nothing more than a
dynamically loaded library + a couple of options. One choice is
whether to take options from distutils or to set them up
independently. In my own scons tool to build python extensions, both
are possible.

The hard (or rather time consuming) work is to do everything else that
distutils does related to the packaging. That's where scons/waf are
more interesting than cmake IMO, because you can "easily" give up this
task back to distutils, whereas it is inherently more difficult with
cmake.

cheers,

David

From pje at telecommunity.com  Tue Apr  7 16:05:45 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Tue, 07 Apr 2009 10:05:45 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49DB475B.8060504@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com>
Message-ID: <20090407140317.EBD383A4063@sparrow.telecommunity.com>

At 02:30 PM 4/7/2009 +0200, M.-A. Lemburg wrote:
> >> Wouldn't it be better to stick with a simpler approach and look for
> >> "__pkg__.py" files to detect namespace packages using that O(1) check ?
> >
> > Again - this wouldn't be O(1). More importantly, it breaks system
> > packages, which now again have to deal with the conflicting file names
> > if they want to install all portions into a single location.
>
>True, but since that means changing the package infrastructure, I think
>it's fair to ask distributors who want to use that approach to also take
>care of looking into the __pkg__.py files and merging them if
>necessary.
>
>Most of the time the __pkg__.py files will be empty, so that's not
>really much to ask for.

This means your proposal actually doesn't add any benefit over the 
status quo, where you can have an __init__.py that does nothing but 
declare the package a namespace.  We already have that now, and it 
doesn't need a new filename.  Why would we expect OS vendors to start 
supporting it, just because we name it __pkg__.py instead of __init__.py?

From aahz at pythoncraft.com  Tue Apr  7 16:29:02 2009
From: aahz at pythoncraft.com (Aahz)
Date: Tue, 7 Apr 2009 07:29:02 -0700
Subject: [Python-Dev] Mercurial?
In-Reply-To: <ea2499da0904070457k5d177139u19e53ff6ae8c27c9@mail.gmail.com>
References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au>
	<ea2499da0904062330o68066bacgbcdbca865c57ee0e@mail.gmail.com>
	<200904072142.06158.steve@pearwood.info>
	<ea2499da0904070457k5d177139u19e53ff6ae8c27c9@mail.gmail.com>
Message-ID: <20090407142902.GC13081@panix.com>

On Tue, Apr 07, 2009, Dirkjan Ochtman wrote:
> On Tue, Apr 7, 2009 at 13:42, Steven D'Aprano <steve at pearwood.info> wrote:
>>
>> Perhaps you should ask Aahz what he thinks about being forced to provide
>> two names before being allowed to contribute.

Thanks for speaking up!  I'm not sure I would have noticed the
implication of Dirkjan's post (I'm not paying a huge amount of attention
to the conversion process).

> Huh? The contributor's agreement list would presumably include real
> names only (so Aahz is out of luck), but the names wouldn't need to be
> limited to just one "word".

What you apparently are unaware of is that "Aahz" is in fact my full
legal name.  (Which was clearly the point of Steven's post since he knows
that Teller also has only one legal name -- it's not common, but we do
exist.)
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"...string iteration isn't about treating strings as sequences of strings, 
it's about treating strings as sequences of characters.  The fact that
characters are also strings is the reason we have problems, but characters 
are strings for other good reasons."  --Aahz

From dirkjan at ochtman.nl  Tue Apr  7 16:35:17 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Tue, 7 Apr 2009 16:35:17 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <20090407142902.GC13081@panix.com>
References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au>
	<ea2499da0904062330o68066bacgbcdbca865c57ee0e@mail.gmail.com>
	<200904072142.06158.steve@pearwood.info>
	<ea2499da0904070457k5d177139u19e53ff6ae8c27c9@mail.gmail.com>
	<20090407142902.GC13081@panix.com>
Message-ID: <ea2499da0904070735g5778b190ob0e510fedda1d8ce@mail.gmail.com>

On Tue, Apr 7, 2009 at 16:29, Aahz <aahz at pythoncraft.com> wrote:
> What you apparently are unaware of is that "Aahz" is in fact my full
> legal name. ?(Which was clearly the point of Steven's post since he knows
> that Teller also has only one legal name -- it's not common, but we do
> exist.)

Ah, sorry about that. But I hope you also concluded from my email that
that wouldn't be a problem.

Cheers,

Dirkjan

From aahz at pythoncraft.com  Tue Apr  7 16:39:00 2009
From: aahz at pythoncraft.com (Aahz)
Date: Tue, 7 Apr 2009 07:39:00 -0700
Subject: [Python-Dev] Mercurial?
In-Reply-To: <ea2499da0904070735g5778b190ob0e510fedda1d8ce@mail.gmail.com>
References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au>
	<ea2499da0904062330o68066bacgbcdbca865c57ee0e@mail.gmail.com>
	<200904072142.06158.steve@pearwood.info>
	<ea2499da0904070457k5d177139u19e53ff6ae8c27c9@mail.gmail.com>
	<20090407142902.GC13081@panix.com>
	<ea2499da0904070735g5778b190ob0e510fedda1d8ce@mail.gmail.com>
Message-ID: <20090407143900.GA713@panix.com>

On Tue, Apr 07, 2009, Dirkjan Ochtman wrote:
> On Tue, Apr 7, 2009 at 16:29, Aahz <aahz at pythoncraft.com> wrote:
>>
>> What you apparently are unaware of is that "Aahz" is in fact my full
>> legal name.  (Which was clearly the point of Steven's post since he knows
>> that Teller also has only one legal name -- it's not common, but we do
>> exist.)
> 
> Ah, sorry about that. But I hope you also concluded from my email that
> that wouldn't be a problem.

Nope, thanks for clearing it up.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"...string iteration isn't about treating strings as sequences of strings, 
it's about treating strings as sequences of characters.  The fact that
characters are also strings is the reason we have problems, but characters 
are strings for other good reasons."  --Aahz

From dickinsm at gmail.com  Tue Apr  7 16:39:47 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 7 Apr 2009 15:39:47 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
Message-ID: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>

Executive summary (details and discussion points below)
=================

Some time ago, Noam Raphael pointed out that for a float x,
repr(x) can often be much shorter than it currently is, without
sacrificing the property that eval(repr(x)) == x, and proposed
changing Python accordingly.  See

http://bugs.python.org/issue1580

For example, instead of the current behaviour:

Python 3.1a2+ (py3k:71353:71354, Apr  7 2009, 12:55:16)
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.01
0.01
>>> 0.02
0.02
>>> 0.03
0.029999999999999999
>>> 0.04
0.040000000000000001
>>> 0.04 == eval(repr(0.04))
True

we'd have this:

Python 3.1a2+ (py3k-short-float-repr:71350:71352M, Apr  7 2009, )
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.01
0.01
>>> 0.02
0.02
>>> 0.03
0.03
>>> 0.04
0.04
>>> 0.04 == eval(repr(0.04))
True

Initial attempts to implement this encountered various
difficulties, and at some point Tim Peters pointed out
(I'm paraphrasing horribly here) that one can't have all
three of {fast, easy, correct}.

One PyCon 2009 sprint later, Eric Smith and I have
produced the py3k-short-float-repr branch, which implements
short repr of floats and also does some major cleaning
up of the current float formatting functions.
We've gone for the {fast, correct} pairing.
We'd like to get this into Python 3.1.

Any thoughts/objections/counter-proposals/...?

More details
============
Our solution is based on an adaptation of David Gay's
'perfect rounding' code for inclusion in Python.  To make
eval(repr(x)) roundtripping work, one needs to have
correctly rounded float -> decimal *and* decimal -> float
conversions:  Gay's code provides correctly rounded
dtoa and strtod functions for these two conversions.
His code is well-known and well-tested:  it's used as the
basis of the glibc strtod, and is also in OS X.  It's
available from

http://www.netlib.org/fp/dtoa.c

So our branch contains a new file Python/dtoa.c,
which is a cut down version of Gay's original file. (We've
removed stuff for VAX and IBM floating-point formats,
hex NaNs, hex floating-point formats, locale-aware
interpretation of the decimal separator, K&R headers,
code for correct setting of the inexact flag, and various
other bits and pieces that Python doesn't care about.)

Most of the rest of the work is in the existing file
Python/pystrtod.c.  Every float -> string or string -> float
conversion goes through a function in this file at
some point.

Gay's code also provides the opportunity to clean
up the current float formatting code, and Eric has
reworked a lot of the float formatting in the py3k-short-float-repr
branch.  This reworking should make finishing off the
implementation of things like thousands separators much
more straightforward.

One example of this:  the previous string -> float conversion
used the system strtod, which is locale-aware, so the code
had to first replace the '.' by the current locale's decimal
separator, *then* call strtod.  There was a similar dance in
the reverse direction when doing float -> string conversion.
Both these are now unnecessary.

The current code is pretty close to ready for merging
to py3k.  I've uploaded a patchset to Rietveld:

http://codereview.appspot.com/33084/show

Apart from the short float repr, and a couple of bugfixes,
all behaviour should be unchanged from before.  There
are a few exceptions:

 - format(1e200, '<') doesn't behave quite as it did
   before.  See item (3) below for details

 - repr switches to using exponential notation at
   1e16 instead of the previous 1e17.  This avoids
   a subtle issue where the 'short float repr' result
   is padded with bogus zeros.

 - a similar change applies to str, which switches
   to exponential notation at 1e11, not 1e12.  This
   fixes the following minor annoyance, which goes
   back at least as far as Python 2.5 (and probably
   much further):

   >>> x = 1e11 + 0.5
   >>> x
   100000000000.5
   >>> print(x)
   100000000000.0

    That .0 seems wrong to me:  if we're going to
    go to the trouble of printing extra digits (str
    usually only gives 12 significant digits; here
    there are 13), they should be the *right* extra digits.

Discussion points
=================

(1) Any objections to including this into py3k?  If there's
controversy, then I guess we'll need a PEP.

(2) Should other Python implementations (Jython,
IronPython, etc.) be expected to use short float repr, or should
it just be considered an implementation detail of CPython?
I propose the latter, except that all implementations should
be required to satisfy eval(repr(x)) == x for finite floats x.

(3) There's a PEP 3101 line we don't know what to do with.
In py3k, we currently have:

>>> format(1e200, '<')
'1.0e+200'

but in our py3k-short-float-repr branch:

>>> format(1e200, '<')
'1e+200'

Which is correct? The py3k behaviour
comes from the 'Standard Format Specifiers' section of
PEP 3101, where it says:

"""
The available floating point presentation types are:

[... list of other format codes omitted here ...]

'' (None) - similar to 'g', except that it prints at least one
              digit after the decimal point.
"""

It's that 'at least one digit after the decimal point' bit
that's at issue.  I understood this to apply only to
floats converted to a string *without* an exponent;
this is the way that repr and str work, adding a .0
to floats formatted without an exponent, but leaving
the .0 out when the exponent is present.

Should the .0 always be added?  Or is it required
only when it would be necessary to distinguish
a float string from an integer string?

My preference is for the latter (i.e., format(x, '<')
should behave in the same way as repr and str
in this respect).  But I'm biased, not least because
the other behaviour would be a pain to implement.
Does anyone care?

This email is already too long.  I'll stop now.

Mark

From mal at egenix.com  Tue Apr  7 16:58:39 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 07 Apr 2009 16:58:39 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090407140317.EBD383A4063@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<49D66C6E.3090602@v.loewis.de>
	<49DB475B.8060504@egenix.com>
	<20090407140317.EBD383A4063@sparrow.telecommunity.com>
Message-ID: <49DB6A1F.50801@egenix.com>

On 2009-04-07 16:05, P.J. Eby wrote:
> At 02:30 PM 4/7/2009 +0200, M.-A. Lemburg wrote:
>> >> Wouldn't it be better to stick with a simpler approach and look for
>> >> "__pkg__.py" files to detect namespace packages using that O(1)
>> check ?
>> >
>> > Again - this wouldn't be O(1). More importantly, it breaks system
>> > packages, which now again have to deal with the conflicting file names
>> > if they want to install all portions into a single location.
>>
>> True, but since that means changing the package infrastructure, I think
>> it's fair to ask distributors who want to use that approach to also take
>> care of looking into the __pkg__.py files and merging them if
>> necessary.
>>
>> Most of the time the __pkg__.py files will be empty, so that's not
>> really much to ask for.
> 
> This means your proposal actually doesn't add any benefit over the
> status quo, where you can have an __init__.py that does nothing but
> declare the package a namespace.  We already have that now, and it
> doesn't need a new filename.  Why would we expect OS vendors to start
> supporting it, just because we name it __pkg__.py instead of __init__.py?

I lost you there.

Since when do we support namespace packages in core Python without
the need to add some form of magic support code to __init__.py ?

My suggestion basically builds on the same idea as Martin's PEP,
but uses a single __pkg__.py file as opposed to some non-Python
file yaddayadda.pkg.

Here's a copy of the proposal, with some additional discussion
bullets added:

"""
Alternative Approach:
---------------------

Wouldn't it be better to stick with a simpler approach and look for
"__pkg__.py" files to detect namespace packages using that O(1) check ?

This would also avoid any issues you'd otherwise run into if you want
to maintain this scheme in an importer that doesn't have access to a list
of files in a package directory, but is well capable for the checking
the existence of a file.

Mechanism:
----------

If the import mechanism finds a matching namespace package (a directory
with a __pkg__.py file), it then goes into namespace package scan mode and
scans the complete sys.path for more occurrences of the same namespace
package.

The import loads all __pkg__.py files of matching namespace packages
having the same package name during the search.

One of the namespace packages, the defining namespace package, will have
to include a __init__.py file.

After having scanned all matching namespace packages and loading
the __pkg__.py files in the order of the search, the import mechanism
then sets the packages .__path__ attribute to include all namespace
package directories found on sys.path and finally executes the
__init__.py file.

(Please let me know if the above is not clear, I will then try to
follow up on it.)

Discussion:
-----------

The above mechanism allows the same kind of flexibility we already
have with the existing normal __init__.py mechanism.

* It doesn't add yet another .pth-style sys.path extension (which are
difficult to manage in installations).

* It always uses the same naive sys.path search strategy. The strategy
is not determined by some file contents.

* The search is only done once - on the first import of the package.

* It's possible to have a defining package dir and add-one package
dirs.

* The search does not depend on the order of directories in sys.path.
There's no requirement for the defining package to appear first
on sys.path.

* Namespace packages are easy to recognize by testing for a single
resource.

* There's no conflict with existing files using the .pkg extension
such as Mac OS X installer files or Solaris packages.

* Namespace __pkg__.py modules can provide extra meta-information,
logging, etc. to simplify debugging namespace package setups.

* It's possible to freeze such setups, to put them into ZIP files,
or only have parts of it in a ZIP file and the other parts in the
file-system.

* There's no need for a package directory scan, allowing the
mechanism to also work with resources that do not permit to
(easily and efficiently) scan the contents of a package "directory",
e.g. frozen packages or imports from web resources.

Caveats:

* Changes to sys.path will not result in an automatic rescan for
additional namespace packages, if the package was already loaded.
However, we could have a function to make such a rescan explicit.
"""

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 07 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From Scott.Daniels at Acm.Org  Tue Apr  7 17:04:56 2009
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Tue, 07 Apr 2009 08:04:56 -0700
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <49DAE298.7040007@canterbury.ac.nz>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>	<greakt$tds$1@ger.gmane.org>
	<49DAE298.7040007@canterbury.ac.nz>
Message-ID: <grfpr8$2q4$1@ger.gmane.org>

Greg Ewing wrote:
> Steve Holden wrote:
> 
>> Isn't it strange how nobody every complained about the significance of
>> whitespace in makefiles: only the fact that leading tabs were required
>> rather than just-any-old whitespace.
> 
> Make doesn't care how *much* whitespace there
> is, though, only whether it's there or not. If
> it accepted anything that looks like whitespace,
> there would be no cause for complaint.
> 
Make and the *roff formats had the nasty feature that they treated
homographs differently.  That is, you could print two sources that
placed all the same ink on the paper at the same places, but they
would perform differently.  For make it was tabs.  For the *roff
files, the periods ending sentences and the periods for abbreviations
(such as honorifics) were distinguished by following end-of-sentence
periods with two spaces.  This left any line ending in a period
ambiguous, and tools to strip whitespace off the end of lines as
information-destroying.

--Scott David Daniels
Scott.Daniels at Acm.Org

From ronaldoussoren at mac.com  Tue Apr  7 17:10:01 2009
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Tue, 07 Apr 2009 17:10:01 +0200
Subject: [Python-Dev] PyDict_SetItem hook
In-Reply-To: <ca471dc20904021557w11b5556aif88522fb46714211@mail.gmail.com>
References: <49D3F8D0.8070805@wingware.com>
	<43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com>
	<49D42013.3010600@wingware.com>
	<9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com>
	<78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1>
	<ca471dc20904021419y33932768j8bc7e97e46361d6a@mail.gmail.com>
	<C16A2E80A5854CBEB189C765E6DB7561@RaymondLaptop1>
	<ca471dc20904021557w11b5556aif88522fb46714211@mail.gmail.com>
Message-ID: <B9745066-B44B-4117-A890-5C127E2D3741@mac.com>

On 3 Apr, 2009, at 0:57, Guido van Rossum wrote:
>>
>
> The primary use case is some kind of trap on assignment. While this
> cannot cover all cases, most non-local variables are stored in dicts.
> List mutations are not in the same league, as use case.

I have a slightly different use-case than a debugger, although it  
boils down to "some kind of trap on assignment": implementing  Key- 
Value Observing support for Python objects in PyObjC.  "Key-Value  
Observing" is a technique in Cocoa where you can get callbacks when a  
property of an object changes and is something I cannot support for  
plain python objects at the moment due to lack of a callback  
mechanism.   A full implementation would require hooks for mutation of  
lists and sets as well.

The lack of mutation hooks is not a terrible problem for PyObjC, we  
can always  use Cocoa datastructures when using KVO, but it is  
somewhat annoying that Cocoa datastructures leak into code that could  
be pure python just because I want to use KVO.

Ronald

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2224 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090407/aff7279e/attachment-0001.bin>

From techtonik at gmail.com  Tue Apr  7 17:10:08 2009
From: techtonik at gmail.com (anatoly techtonik)
Date: Tue, 7 Apr 2009 18:10:08 +0300
Subject: [Python-Dev] os.defpath for Windows
In-Reply-To: <494E0A2B.4080704@gmail.com>
References: <494E0A2B.4080704@gmail.com>
Message-ID: <d34314100904070810m84bfa9v43749b9642811e78@mail.gmail.com>

Hi,

I've added the issue to tracker. http://bugs.python.org/issue5717

--anatoly t.

On Sun, Dec 21, 2008 at 12:19 PM, Yinon Ehrlich <yinon.me at gmail.com> wrote:
> Hi,
>
> just saw that os.defpath for Windows is defined as
> ? ? ? ?Lib/ntpath.py:30:defpath = '.;C:\\bin'
>
> Most Windows machines I saw has no c:\bin directory.
>
> Any reason why it was defined this way ?
> Thanks,
> ? ? ? ?Yinon
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/techtonik%40gmail.com
>

From skip at pobox.com  Tue Apr  7 17:19:25 2009
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 7 Apr 2009 10:19:25 -0500
Subject: [Python-Dev] pyc files,
 constant folding and borderline portability issues
In-Reply-To: <op.urz9neqq03jqhe@cesareprova.org>
References: <loom.20090329T143625-223@post.gmane.org>
	<op.uryyh7uh03jqhe@cesareprova.org>
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com>
	<200904071010.16855.steve@pearwood.info>
	<op.urz9neqq03jqhe@cesareprova.org>
Message-ID: <18907.28413.42458.631358@montanaro.dyndns.org>

    Cesare> The only difference at this time is regards invalid operations,
    Cesare> which will raise exceptions at compile time, not at running
    Cesare> time.

    Cesare> So if you write:

    Cesare> a = 1 / 0

    Cesare> an exception will be raised at compile time.

I think I have to call *bzzzzt* here.  This is a common technique used
during debugging.  Insert a 1/0 to force an exception (possibly causing the
running program to drop into pdb).  I think you have to leave that in.

Skip

From skip at pobox.com  Tue Apr  7 17:22:05 2009
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 7 Apr 2009 10:22:05 -0500
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <loom.20090407T125242-36@post.gmane.org>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<loom.20090407T125242-36@post.gmane.org>
Message-ID: <18907.28573.46660.915761@montanaro.dyndns.org>

    >> I don't know thing one about cmake, but if it's good for the goose
    >> (building Python proper) would it be good for the gander (building
    >> extensions)?

    Antoine> African or European?

I was thinking Canadian... 

Skip

From cesare.dimauro at a-tono.com  Tue Apr  7 17:19:10 2009
From: cesare.dimauro at a-tono.com (Cesare Di Mauro)
Date: Tue, 07 Apr 2009 17:19:10 +0200
Subject: [Python-Dev] pyc files,
 constant folding and borderline portability issues
In-Reply-To: <18907.28413.42458.631358@montanaro.dyndns.org>
References: <loom.20090329T143625-223@post.gmane.org>
	<op.uryyh7uh03jqhe@cesareprova.org>
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com>
	<200904071010.16855.steve@pearwood.info>
	<op.urz9neqq03jqhe@cesareprova.org>
	<18907.28413.42458.631358@montanaro.dyndns.org>
Message-ID: <op.ur0vh8jf03jqhe@cesareprova.org>

In data 07 aprile 2009 alle ore 17:19:25, <skip at pobox.com> ha scritto:

>
>     Cesare> The only difference at this time is regards invalid operations,
>     Cesare> which will raise exceptions at compile time, not at running
>     Cesare> time.
>
>     Cesare> So if you write:
>
>     Cesare> a = 1 / 0
>
>     Cesare> an exception will be raised at compile time.
>
> I think I have to call *bzzzzt* here.  This is a common technique used
> during debugging.  Insert a 1/0 to force an exception (possibly causing the
> running program to drop into pdb).  I think you have to leave that in.
>
> Skip

Many tests rely on this, and I have changed them from something like:

try:
   1 / 0
except:
  ....

to

try:
  a = 1; a / 0
except:
  ....

But I know that it's a major source of incompatibilities, and in the final
code I'll enabled it only if user demanded it (through a flag).

Cesare

From cournape at gmail.com  Tue Apr  7 17:29:02 2009
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 8 Apr 2009 00:29:02 +0900
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49DB6A1F.50801@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com>
	<20090407140317.EBD383A4063@sparrow.telecommunity.com>
	<49DB6A1F.50801@egenix.com>
Message-ID: <5b8d13220904070829j416b2536u885cd79a33ebefb5@mail.gmail.com>

On Tue, Apr 7, 2009 at 11:58 PM, M.-A. Lemburg <mal at egenix.com> wrote:

>>
>> This means your proposal actually doesn't add any benefit over the
>> status quo, where you can have an __init__.py that does nothing but
>> declare the package a namespace. ?We already have that now, and it
>> doesn't need a new filename. ?Why would we expect OS vendors to start
>> supporting it, just because we name it __pkg__.py instead of __init__.py?
>
> I lost you there.
>
> Since when do we support namespace packages in core Python without
> the need to add some form of magic support code to __init__.py ?

I think P. Eby refers to the problem that most packaging systems don't
like several packages to have the same file - be it empty or not.
That's my main personal grip against namespace packages, and from this
POV, I think it is fair to say the proposal does not solve anything.
Not that I have a solution, of course :)

cheers,

David
>
> My suggestion basically builds on the same idea as Martin's PEP,
> but uses a single __pkg__.py file as opposed to some non-Python
> file yaddayadda.pkg.
>
> Here's a copy of the proposal, with some additional discussion
> bullets added:
>
> """
> Alternative Approach:
> ---------------------
>
> Wouldn't it be better to stick with a simpler approach and look for
> "__pkg__.py" files to detect namespace packages using that O(1) check ?
>
> This would also avoid any issues you'd otherwise run into if you want
> to maintain this scheme in an importer that doesn't have access to a list
> of files in a package directory, but is well capable for the checking
> the existence of a file.
>
> Mechanism:
> ----------
>
> If the import mechanism finds a matching namespace package (a directory
> with a __pkg__.py file), it then goes into namespace package scan mode and
> scans the complete sys.path for more occurrences of the same namespace
> package.
>
> The import loads all __pkg__.py files of matching namespace packages
> having the same package name during the search.
>
> One of the namespace packages, the defining namespace package, will have
> to include a __init__.py file.
>
> After having scanned all matching namespace packages and loading
> the __pkg__.py files in the order of the search, the import mechanism
> then sets the packages .__path__ attribute to include all namespace
> package directories found on sys.path and finally executes the
> __init__.py file.
>
> (Please let me know if the above is not clear, I will then try to
> follow up on it.)
>
> Discussion:
> -----------
>
> The above mechanism allows the same kind of flexibility we already
> have with the existing normal __init__.py mechanism.
>
> * It doesn't add yet another .pth-style sys.path extension (which are
> difficult to manage in installations).
>
> * It always uses the same naive sys.path search strategy. The strategy
> is not determined by some file contents.
>
> * The search is only done once - on the first import of the package.
>
> * It's possible to have a defining package dir and add-one package
> dirs.
>
> * The search does not depend on the order of directories in sys.path.
> There's no requirement for the defining package to appear first
> on sys.path.
>
> * Namespace packages are easy to recognize by testing for a single
> resource.
>
> * There's no conflict with existing files using the .pkg extension
> such as Mac OS X installer files or Solaris packages.
>
> * Namespace __pkg__.py modules can provide extra meta-information,
> logging, etc. to simplify debugging namespace package setups.
>
> * It's possible to freeze such setups, to put them into ZIP files,
> or only have parts of it in a ZIP file and the other parts in the
> file-system.
>
> * There's no need for a package directory scan, allowing the
> mechanism to also work with resources that do not permit to
> (easily and efficiently) scan the contents of a package "directory",
> e.g. frozen packages or imports from web resources.
>
> Caveats:
>
> * Changes to sys.path will not result in an automatic rescan for
> additional namespace packages, if the package was already loaded.
> However, we could have a function to make such a rescan explicit.
> """
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Source ?(#1, Apr 07 2009)
>>>> Python/Zope Consulting and Support ... ? ? ? ?http://www.egenix.com/
>>>> mxODBC.Zope.Database.Adapter ... ? ? ? ? ? ? http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ... ? ? ? ?http://python.egenix.com/
> ________________________________________________________________________
> 2009-03-19: Released mxODBC.Connect 1.0.1 ? ? ?http://python.egenix.com/
>
> ::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
>
>
> ? eGenix.com Software, Skills and Services GmbH ?Pastor-Loeh-Str.48
> ? ?D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> ? ? ? ? ? Registered at Amtsgericht Duesseldorf: HRB 46611
> ? ? ? ? ? ? ? http://www.egenix.com/company/contact/
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/cournape%40gmail.com
>

From regebro at gmail.com  Tue Apr  7 17:34:37 2009
From: regebro at gmail.com (Lennart Regebro)
Date: Tue, 7 Apr 2009 17:34:37 +0200
Subject: [Python-Dev] What's missing from easy_install
In-Reply-To: <efc790a30904070605j58347e60jbe0d710da7ffd8ff@mail.gmail.com>
References: <grfism$7j5$1@ger.gmane.org>
	<efc790a30904070605j58347e60jbe0d710da7ffd8ff@mail.gmail.com>
Message-ID: <319e029f0904070834j47066061h10c3c9aafe9dd9c9@mail.gmail.com>

On Tue, Apr 7, 2009 at 15:05, KDr2 <kdr2 at x-macro.com> wrote:
> I need an CPyAN.

On the lighter side of things: That would be pronounced "spy-ann",
which mean "the vomit" is swedish.  Do you still want it? :-D

-- 
Lennart Regebro: Pythonista, Barista, Notsotrista.
http://regebro.wordpress.com/
+33 661 58 14 64

From fuzzyman at voidspace.org.uk  Tue Apr  7 17:41:06 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Tue, 07 Apr 2009 16:41:06 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
Message-ID: <49DB7412.9030404@voidspace.org.uk>

Mark Dickinson wrote:
> [snip...]
>   
> Discussion points
> =================
>
> (1) Any objections to including this into py3k?  If there's
> controversy, then I guess we'll need a PEP.
>   

Big +1
> (2) Should other Python implementations (Jython,
> IronPython, etc.) be expected to use short float repr, or should
> it just be considered an implementation detail of CPython?
> I propose the latter, except that all implementations should
> be required to satisfy eval(repr(x)) == x for finite floats x.
>   
Short float repr should be an implementation detail, so long as 
eval(repr(x)) == x still holds.

Michael Foord

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From p.f.moore at gmail.com  Tue Apr  7 17:51:35 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 7 Apr 2009 16:51:35 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <79990c6b0904070850l7513d9b7y2863d347d87d7e6f@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49DB7412.9030404@voidspace.org.uk>
	<79990c6b0904070850l7513d9b7y2863d347d87d7e6f@mail.gmail.com>
Message-ID: <79990c6b0904070851k9ea2054o4864ccd4fb0c9b35@mail.gmail.com>

It would have helped if I'd copied the list...

Sorry,
Paul.

2009/4/7 Paul Moore <p.f.moore at gmail.com>:
> 2009/4/7 Michael Foord <fuzzyman at voidspace.org.uk>:
>> Mark Dickinson wrote:
>>>
>>> [snip...]
>>> ?Discussion points
>>> =================
>>>
>>> (1) Any objections to including this into py3k? ?If there's
>>> controversy, then I guess we'll need a PEP.
>>>
>>
>> Big +1
>>>
>>> (2) Should other Python implementations (Jython,
>>> IronPython, etc.) be expected to use short float repr, or should
>>> it just be considered an implementation detail of CPython?
>>> I propose the latter, except that all implementations should
>>> be required to satisfy eval(repr(x)) == x for finite floats x.
>>>
>>
>> Short float repr should be an implementation detail, so long as
>> eval(repr(x)) == x still holds.
>
> What he said :-)
> Paul.
>

From eric at trueblade.com  Tue Apr  7 17:55:34 2009
From: eric at trueblade.com (Eric Smith)
Date: Tue, 07 Apr 2009 11:55:34 -0400
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
Message-ID: <49DB7776.3010500@trueblade.com>

Mark Dickinson wrote:
> One PyCon 2009 sprint later, Eric Smith and I have
> produced the py3k-short-float-repr branch, which implements
> short repr of floats and also does some major cleaning
> up of the current float formatting functions.
> We've gone for the {fast, correct} pairing.
> We'd like to get this into Python 3.1.
> 
> Any thoughts/objections/counter-proposals/...?

As part of the decision process, we've tried this on several buildbots, 
and it has been successful on at least:

AMD64 Gentoo:
http://www.python.org/dev/buildbot/3.x/amd64%20gentoo%203.x/builds/592

PPC Debian unstable:
http://www.python.org/dev/buildbot/3.x/ppc%20Debian%20unstable%203.x/builds/584

Sparc Solaris 10:
http://www.python.org/dev/buildbot/3.x/sparc%20solaris10%20gcc%203.x/builds/493 

The Sparc test failed, but that wasn't our fault! Our tests succeeded.

These builds are in addition to x86 Linux and x86 Mac, which we've 
developed on.

Eric.

From aahz at pythoncraft.com  Tue Apr  7 18:01:31 2009
From: aahz at pythoncraft.com (Aahz)
Date: Tue, 7 Apr 2009 09:01:31 -0700
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
Message-ID: <20090407160130.GA1220@panix.com>

On Tue, Apr 07, 2009, Mark Dickinson wrote:
>
> Executive summary (details and discussion points below)
> =================
> 
> Some time ago, Noam Raphael pointed out that for a float x,
> repr(x) can often be much shorter than it currently is, without
> sacrificing the property that eval(repr(x)) == x, and proposed
> changing Python accordingly.  See
>
> http://bugs.python.org/issue1580

Sounds good to me!  
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"...string iteration isn't about treating strings as sequences of strings, 
it's about treating strings as sequences of characters.  The fact that
characters are also strings is the reason we have problems, but characters 
are strings for other good reasons."  --Aahz

From guido at python.org  Tue Apr  7 18:19:37 2009
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Apr 2009 09:19:37 -0700
Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382:
	Namespace Packages)
In-Reply-To: <49DB4624.604@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> 
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> 
	<49DB4624.604@egenix.com>
Message-ID: <ca471dc20904070919m1bd08dbdj259eef076a3d7319@mail.gmail.com>

On Tue, Apr 7, 2009 at 5:25 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2009-04-06 15:21, Jesse Noller wrote:
>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 2009-04-02 17:32, Martin v. L?wis wrote:
>>>> I propose the following PEP for inclusion to Python 3.1.
>>> Thanks for picking this up.
>>>
>>> I'd like to extend the proposal to Python 2.7 and later.
>>>
>>
>> -1 to adding it to the 2.x series. There was much discussion around
>> adding features to 2.x *and* 3.0, and the consensus seemed to *not*
>> add new features to 2.x and use those new features as carrots to help
>> lead people into 3.0.
>
> I must have missed that discussion :-)
>
> Where's the PEP pinning this down ?
>
> The Python 2.x user base is huge and the number of installed
> applications even larger.
>
> Cutting these users and application developers off of important new
> features added to Python 3 is only going to work as "carrot" for
> those developers who:
>
> ?* have enough resources (time, money, manpower) to port their existing
> ? application to Python 3
>
> ?* can persuade their users to switch to Python 3
>
> ?* don't rely much on 3rd party libraries (the bread and butter
> ? of Python applications)
>
> Realistically, such a porting effort is not likely going to happen
> for any decent sized application, except perhaps a few open source
> ones.
>
> Such a policy would then translate to a dead end for Python 2.x
> based applications.

Think of the advantages though! Python 2 will finally become *stable*.
The group of users you are talking to are usually balking at the
thought of upgrading from 2.x to 2.(x+1) just as much as they might
balk at the thought of Py3k. We're finally giving them what they
really want.

Regarding calling this a dead end, we're committed to supporting 2.x
for at least five years. If that's not enough, well, it's open source,
so there's no reason why some group of rogue 2.x fans can't maintain
it indefinitely after that.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Apr  7 18:25:53 2009
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Apr 2009 09:25:53 -0700
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <op.ur0vh8jf03jqhe@cesareprova.org>
References: <loom.20090329T143625-223@post.gmane.org>
	<op.uryyh7uh03jqhe@cesareprova.org> 
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com> 
	<200904071010.16855.steve@pearwood.info>
	<op.urz9neqq03jqhe@cesareprova.org> 
	<18907.28413.42458.631358@montanaro.dyndns.org>
	<op.ur0vh8jf03jqhe@cesareprova.org>
Message-ID: <ca471dc20904070925g44d6a143l2bf8392faf4c0af6@mail.gmail.com>

Well I'm sorry Cesare but this is unacceptable. As Skip points out
there is plenty of code that relies on this. Also, consider what
"problem" you are trying to solve here. What is the benefit to the
user of moving this error to compile time? I cannot see any.

--Guido

On Tue, Apr 7, 2009 at 8:19 AM, Cesare Di Mauro
<cesare.dimauro at a-tono.com> wrote:
> In data 07 aprile 2009 alle ore 17:19:25, <skip at pobox.com> ha scritto:
>
>>
>> ? ? Cesare> The only difference at this time is regards invalid operations,
>> ? ? Cesare> which will raise exceptions at compile time, not at running
>> ? ? Cesare> time.
>>
>> ? ? Cesare> So if you write:
>>
>> ? ? Cesare> a = 1 / 0
>>
>> ? ? Cesare> an exception will be raised at compile time.
>>
>> I think I have to call *bzzzzt* here. ?This is a common technique used
>> during debugging. ?Insert a 1/0 to force an exception (possibly causing the
>> running program to drop into pdb). ?I think you have to leave that in.
>>
>> Skip
>
> Many tests rely on this, and I have changed them from something like:
>
> try:
> ? 1 / 0
> except:
> ?....
>
> to
>
> try:
> ?a = 1; a / 0
> except:
> ?....
>
> But I know that it's a major source of incompatibilities, and in the final
> code I'll enabled it only if user demanded it (through a flag).
>
> Cesare
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From aahz at pythoncraft.com  Tue Apr  7 18:34:49 2009
From: aahz at pythoncraft.com (Aahz)
Date: Tue, 7 Apr 2009 09:34:49 -0700
Subject: [Python-Dev] calling dictresize outside dictobject.c
In-Reply-To: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu>
References: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu>
Message-ID: <20090407163449.GA10119@panix.com>

On Mon, Apr 06, 2009, Dan Schult wrote:
>
> I'm trying to write a C extension which is a subclass of dict.
> I want to do something like a setdefault() but with a single lookup.

python-dev is for core development, not for questions about using Python.
Please use comp.lang.python or the capi-sig list.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"...string iteration isn't about treating strings as sequences of strings, 
it's about treating strings as sequences of characters.  The fact that
characters are also strings is the reason we have problems, but characters 
are strings for other good reasons."  --Aahz

From cesare.dimauro at a-tono.com  Tue Apr  7 18:46:29 2009
From: cesare.dimauro at a-tono.com (Cesare Di Mauro)
Date: Tue, 7 Apr 2009 18:46:29 +0200 (CEST)
Subject: [Python-Dev] pyc files,
 constant folding and borderline  portability issues
In-Reply-To: <ca471dc20904070925g44d6a143l2bf8392faf4c0af6@mail.gmail.com>
References: <loom.20090329T143625-223@post.gmane.org>
	<op.uryyh7uh03jqhe@cesareprova.org> 
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com> 
	<200904071010.16855.steve@pearwood.info>
	<op.urz9neqq03jqhe@cesareprova.org> 
	<18907.28413.42458.631358@montanaro.dyndns.org>
	<op.ur0vh8jf03jqhe@cesareprova.org>
	<ca471dc20904070925g44d6a143l2bf8392faf4c0af6@mail.gmail.com>
Message-ID: <56851.151.53.159.5.1239122789.squirrel@webmail6.pair.com>

On Tue, Apr 7, 2009 06:25PM, Guido van Rossum wrote:
> Well I'm sorry Cesare but this is unacceptable. As Skip points out
> there is plenty of code that relies on this.

Guido, as I already said, in the final code the normal Python behaviour
will be kept, and the stricter one will be enabled solely due to a flag
set by the user.

> Also, consider what
> "problem" you are trying to solve here. What is the benefit to the
> user of moving this error to compile time? I cannot see any.
>
> --Guido

In my experience it's better to discover a bug at compile time rather than
at running time.

Cesare

> On Tue, Apr 7, 2009 at 8:19 AM, Cesare Di Mauro
> <cesare.dimauro at a-tono.com> wrote:
>> In data 07 aprile 2009 alle ore 17:19:25, <skip at pobox.com> ha scritto:
>>
>>>
>>> ? ? Cesare> The only difference at this time is regards invalid
>>> operations,
>>> ? ? Cesare> which will raise exceptions at compile time, not at running
>>> ? ? Cesare> time.
>>>
>>> ? ? Cesare> So if you write:
>>>
>>> ? ? Cesare> a = 1 / 0
>>>
>>> ? ? Cesare> an exception will be raised at compile time.
>>>
>>> I think I have to call *bzzzzt* here. ?This is a common technique used
>>> during debugging. ?Insert a 1/0 to force an exception (possibly causing
>>> the
>>> running program to drop into pdb). ?I think you have to leave that in.
>>>
>>> Skip
>>
>> Many tests rely on this, and I have changed them from something like:
>>
>> try:
>> ? 1 / 0
>> except:
>> ?....
>>
>> to
>>
>> try:
>> ?a = 1; a / 0
>> except:
>> ?....
>>
>> But I know that it's a major source of incompatibilities, and in the
>> final
>> code I'll enabled it only if user demanded it (through a flag).
>>
>> Cesare
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>

From guido at python.org  Tue Apr  7 19:22:15 2009
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Apr 2009 10:22:15 -0700
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <56851.151.53.159.5.1239122789.squirrel@webmail6.pair.com>
References: <loom.20090329T143625-223@post.gmane.org>
	<op.uryyh7uh03jqhe@cesareprova.org> 
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com> 
	<200904071010.16855.steve@pearwood.info>
	<op.urz9neqq03jqhe@cesareprova.org> 
	<18907.28413.42458.631358@montanaro.dyndns.org>
	<op.ur0vh8jf03jqhe@cesareprova.org> 
	<ca471dc20904070925g44d6a143l2bf8392faf4c0af6@mail.gmail.com> 
	<56851.151.53.159.5.1239122789.squirrel@webmail6.pair.com>
Message-ID: <ca471dc20904071022t3d299337rc4baf9651243f769@mail.gmail.com>

On Tue, Apr 7, 2009 at 9:46 AM, Cesare Di Mauro
<cesare.dimauro at a-tono.com> wrote:
> On Tue, Apr 7, 2009 06:25PM, Guido van Rossum wrote:
>> Well I'm sorry Cesare but this is unacceptable. As Skip points out
>> there is plenty of code that relies on this.
>
> Guido, as I already said, in the final code the normal Python behaviour
> will be kept, and the stricter one will be enabled solely due to a flag
> set by the user.

Ok.

>> Also, consider what
>> "problem" you are trying to solve here. What is the benefit to the
>> user of moving this error to compile time? I cannot see any.
>>
>> --Guido
>
> In my experience it's better to discover a bug at compile time rather than
> at running time.

That's my point though, which you seem to be ignoring: if the user
explicitly writes "1/0" it is not likely to be a bug. That's very
different than "1/x" where x happens to take on zero at runtime --
*that* is likely  bug, but a constant folder can't detect that (at
least not for Python).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From cournape at gmail.com  Tue Apr  7 19:41:10 2009
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 8 Apr 2009 02:41:10 +0900
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <EMEWEMEW2_DELIMl36COv852b071ff4034ca1694f46,
	htoivonen@spikesource.com, 49DB8C>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com>
	<5b8d13220904070623j258605bob5200dc84362dc11@mail.gmail.com>
	<EMEWEMEW2_DELIMl36COv852b071ff4034ca1694f46, htoivonen@spikesource.com,
	49DB8C>
Message-ID: <5b8d13220904071041w36087a87rf84c8b52defc02c0@mail.gmail.com>

On Wed, Apr 8, 2009 at 2:24 AM, Heikki Toivonen
<htoivonen at spikesource.com> wrote:
> David Cournapeau wrote:
>> The hard (or rather time consuming) work is to do everything else that
>> distutils does related to the packaging. That's where scons/waf are
>> more interesting than cmake IMO, because you can "easily" give up this
>> task back to distutils, whereas it is inherently more difficult with
>> cmake.
>
> I think this was the first I heard about using SCons this way. Do you
> have any articles or examples of this? If not, could you perhaps write one?

I developed numscons as an experiment to build numpy, scipy, and other
complex python projects depending on many library/compilers:

http://github.com/cournape/numscons/tree/master

The general ideas are somewhat explained on my blog

http://cournape.wordpress.com/?s=numscons

And also the slides from SciPy08 conf:

http://conference.scipy.org/static/wiki/numscons.pdf

It is plugged into distutils through a scons command (which bypasses
all the compiled build_* ones, so that the whole build is done through
scons for correct dependency handling). It is not really meant as a
general replacement (it is too fragile, partly because of distutils,
partly because of scons, partly because of me), but it shows it is
possible not only theoretically.

cheers,

David

From pje at telecommunity.com  Tue Apr  7 19:46:21 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Tue, 07 Apr 2009 13:46:21 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49DB6A1F.50801@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com>
	<20090407140317.EBD383A4063@sparrow.telecommunity.com>
	<49DB6A1F.50801@egenix.com>
Message-ID: <20090407174355.B62983A4063@sparrow.telecommunity.com>

At 04:58 PM 4/7/2009 +0200, M.-A. Lemburg wrote:
>On 2009-04-07 16:05, P.J. Eby wrote:
> > At 02:30 PM 4/7/2009 +0200, M.-A. Lemburg wrote:
> >> >> Wouldn't it be better to stick with a simpler approach and look for
> >> >> "__pkg__.py" files to detect namespace packages using that O(1)
> >> check ?
> >> >
> >> > Again - this wouldn't be O(1). More importantly, it breaks system
> >> > packages, which now again have to deal with the conflicting file names
> >> > if they want to install all portions into a single location.
> >>
> >> True, but since that means changing the package infrastructure, I think
> >> it's fair to ask distributors who want to use that approach to also take
> >> care of looking into the __pkg__.py files and merging them if
> >> necessary.
> >>
> >> Most of the time the __pkg__.py files will be empty, so that's not
> >> really much to ask for.
> >
> > This means your proposal actually doesn't add any benefit over the
> > status quo, where you can have an __init__.py that does nothing but
> > declare the package a namespace.  We already have that now, and it
> > doesn't need a new filename.  Why would we expect OS vendors to start
> > supporting it, just because we name it __pkg__.py instead of __init__.py?
>
>I lost you there.
>
>Since when do we support namespace packages in core Python without
>the need to add some form of magic support code to __init__.py ?
>
>My suggestion basically builds on the same idea as Martin's PEP,
>but uses a single __pkg__.py file as opposed to some non-Python
>file yaddayadda.pkg.

Right... which completely obliterates the primary benefit of the 
original proposal compared to the status quo.  That is, that the PEP 
382 way is more compatible with system packaging tools.

Without that benefit, there's zero gain in your proposal over having 
__init__.py files just call pkgutil.extend_path() (in the stdlib 
since 2.3, btw) or pkg_resources.declare_namespace() (similar 
functionality, but with zipfile support and some other niceties).

IOW, your proposal doesn't actually improve the status quo in any way 
that I am able to determine, except that it calls for loading all the 
__pkg__.py modules, rather than just the first one.  (And the 
setuptools implementation of namespace packages actually *does* load 
multiple __init__.py's, so that's still no change over the status quo 
for setuptools-using packages.)

From cesare.dimauro at a-tono.com  Tue Apr  7 19:51:45 2009
From: cesare.dimauro at a-tono.com (Cesare Di Mauro)
Date: Tue, 7 Apr 2009 19:51:45 +0200 (CEST)
Subject: [Python-Dev] pyc files,
 constant folding and borderline  portability issues
In-Reply-To: <ca471dc20904071022t3d299337rc4baf9651243f769@mail.gmail.com>
References: <loom.20090329T143625-223@post.gmane.org>
	<op.uryyh7uh03jqhe@cesareprova.org> 
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com> 
	<200904071010.16855.steve@pearwood.info>
	<op.urz9neqq03jqhe@cesareprova.org> 
	<18907.28413.42458.631358@montanaro.dyndns.org>
	<op.ur0vh8jf03jqhe@cesareprova.org> 
	<ca471dc20904070925g44d6a143l2bf8392faf4c0af6@mail.gmail.com> 
	<56851.151.53.159.5.1239122789.squirrel@webmail6.pair.com>
	<ca471dc20904071022t3d299337rc4baf9651243f769@mail.gmail.com>
Message-ID: <62037.151.53.159.5.1239126705.squirrel@webmail6.pair.com>

On Tue, Apr 7, 2009 07:22PM, Guido van Rossum wrote:
>> In my experience it's better to discover a bug at compile time rather
>> than
>> at running time.
>
> That's my point though, which you seem to be ignoring: if the user
> explicitly writes "1/0" it is not likely to be a bug. That's very
> different than "1/x" where x happens to take on zero at runtime --
> *that* is likely  bug, but a constant folder can't detect that (at
> least not for Python).
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)

I agree. My only concern was about user mistyping that can leed to an
error interceptable by a stricter constant folder.

But I admit that it's a rarer case compared to an explicit exception
raising such the one you showed.

Cesare

From ajaksu at gmail.com  Tue Apr  7 20:25:53 2009
From: ajaksu at gmail.com (Daniel (ajax) Diniz)
Date: Tue, 7 Apr 2009 15:25:53 -0300
Subject: [Python-Dev] Mercurial?
In-Reply-To: <ea2499da0904062315j3a535077w387ce1323ad81a1b@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com> 
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com> 
	<49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com> 
	<49DA7C91.6010202@v.loewis.de>
	<ea2499da0904062315j3a535077w387ce1323ad81a1b@mail.gmail.com>
Message-ID: <2d75d7660904071125o3e132dabg4f250a52755e81dd@mail.gmail.com>

Dirkjan Ochtman wrote:
> One of the nicer features of Mercurial/DVCSs, in my experience, is
> that non-committers get to keep the credit on their patches. That
> means that it's impossible to enforce a policy more extensive than
> some basic checks (such as the format above). Unless we keep a list of
> people who have signed an agreement, which will mean people will have
> to re-do the username on commits that don't constitute a non-trivial
> contribution.

Maybe it'd be better to first replicate the current workflow,
shortcomings and all, then later discuss a new policy? That would mean
no credits for non-commiters should come from the VCS alone: those
come from commit messages, the ACKS file, copyright notices in source,
etc.

BTW, keep in mind some people will prefer to submit diff-generated,
non-hg patches. IMO,  this use case should be supported before the
rich-patch one.

Regards,
Daniel

From dirkjan at ochtman.nl  Tue Apr  7 20:32:46 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Tue, 7 Apr 2009 20:32:46 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <2d75d7660904071125o3e132dabg4f250a52755e81dd@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>
	<49D87CD4.1000909@ochtman.nl>
	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>
	<49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com>
	<49DA7C91.6010202@v.loewis.de>
	<ea2499da0904062315j3a535077w387ce1323ad81a1b@mail.gmail.com>
	<2d75d7660904071125o3e132dabg4f250a52755e81dd@mail.gmail.com>
Message-ID: <ea2499da0904071132o8350d38lea9913129fb786b0@mail.gmail.com>

On Tue, Apr 7, 2009 at 20:25, Daniel (ajax) Diniz <ajaksu at gmail.com> wrote:
> BTW, keep in mind some people will prefer to submit diff-generated,
> non-hg patches. IMO, ?this use case should be supported before the
> rich-patch one.

Sure, that will be in the PEP as well (and it's quite simple).

Cheers,

Dirkjan

From brtzsnr at gmail.com  Tue Apr  7 20:59:01 2009
From: brtzsnr at gmail.com (=?UTF-8?Q?Alexandru_Mo=C8=99oi?=)
Date: Tue, 7 Apr 2009 21:59:01 +0300
Subject: [Python-Dev] pyc files,
	constant folding and borderline portability issues
Message-ID: <c59005ea0904071159q614097fw8efa6ed936251c9f@mail.gmail.com>

> From:?"Cesare Di Mauro" <cesare.dimauro at a-tono.com>
> So if Python will generate
>
> LOAD_CONST ? ? ?1
> LOAD_CONST ? ? ?2
> BINARY_ADD
>
> the constant folding code will simply replace them with a single
>
> LOAD_CONST ? ? ?3
>
> When working with such kind of optimizations, the temptation is to
> apply them at any situation possible. For example, in other languages
> this
>
> a = b * 2 * 3
>
> will be replaced by
>
> a = b * 6
>
> In Python I can't do that, because b can be an object which overloaded
> the * operator, so it *must* be called two times, one for 2 and one for 3.

Not necessarily. For example C/C++ doesn't define the order of the
operations inside an expression (and AFAIK neither Python) and
therefore folding 2 * 3 is OK whether b is an integer or an arbitrary
object with mul operator overloaded. Moreover one would expect * to be
associative and commutative (take a look at Python strings); if a * 2
* 3 returns a different result from a * 6 I will find it very
surprising and probably reject such code.

However you can fix the order of operations like this:

a = (b * 2) * 3

or

a = b * (2 * 3)

or

a = b * 2
a = a * 3

-- 
Alexandru Mo?oi
http://alexandru.mosoi.googlepages.com

From fredrik.johansson at gmail.com  Tue Apr  7 21:09:26 2009
From: fredrik.johansson at gmail.com (Fredrik Johansson)
Date: Tue, 7 Apr 2009 21:09:26 +0200
Subject: [Python-Dev] pyc files,
	constant folding and borderline 	portability issues
In-Reply-To: <c59005ea0904071159q614097fw8efa6ed936251c9f@mail.gmail.com>
References: <c59005ea0904071159q614097fw8efa6ed936251c9f@mail.gmail.com>
Message-ID: <3d0cebfb0904071209m12b0d587vffc2057454ba5363@mail.gmail.com>

On Tue, Apr 7, 2009 at 8:59 PM, Alexandru Mo?oi <brtzsnr at gmail.com> wrote:

> Not necessarily. For example C/C++ doesn't define the order of the
> operations inside an expression (and AFAIK neither Python) and
> therefore folding 2 * 3 is OK whether b is an integer or an arbitrary
> object with mul operator overloaded. Moreover one would expect * to be
> associative and commutative (take a look at Python strings); if a * 2
> * 3 returns a different result from a * 6 I will find it very
> surprising and probably reject such code.

Multiplication is not associative for floats:

>>> a = 0.1
>>> a*3*5
1.5000000000000002
>>> a*(3*5)
1.5

Fredrik

From martin at v.loewis.de  Tue Apr  7 21:20:47 2009
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Apr 2009 21:20:47 +0200
Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382:
 Namespace Packages)
In-Reply-To: <49DB4624.604@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
	<49DB4624.604@egenix.com>
Message-ID: <49DBA78F.7010904@v.loewis.de>

> Such a policy would then translate to a dead end for Python 2.x
> based applications.

2.x based applications *are* in a dead end, with the only exit
being portage to 3.x.

Regards,
Martin

From firephoenix at wanadoo.fr  Tue Apr  7 21:30:19 2009
From: firephoenix at wanadoo.fr (Firephoenix)
Date: Tue, 07 Apr 2009 21:30:19 +0200
Subject: [Python-Dev] Generator methods - "what's next" ?
In-Reply-To: <49D9371C.3000202@canterbury.ac.nz>
References: <49D896A4.3000104@wanadoo.fr> <49D9371C.3000202@canterbury.ac.nz>
Message-ID: <49DBA9CB.6010100@wanadoo.fr>

Greg Ewing a ?crit :
>
> Firephoenix wrote:
>
>> I basically agreed with renaming the next() method to __next__(), so 
>> as to follow the naming of other similar methods (__iter__() etc.).
>> But I noticed then that all the other methods of the generator had 
>> stayed the same (send, throw, close...)
>
> Keep in mind that next() is part of the iterator protocol
> that applies to all iterators, whereas the others are
> specific to generators. By your reasoning, any object that
> has any __xxx__ methods should have all its other methods
> turned into __xxx__ methods as well.
>

Indeed, I kind of mixed up generators with the wider family of iterators.

From martin at v.loewis.de  Tue Apr  7 21:51:16 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Apr 2009 21:51:16 +0200
Subject: [Python-Dev] Mercurial?
In-Reply-To: <200904072142.06158.steve@pearwood.info>
References: <20090404154049.GA23987@panix.com>
	<873acl7zga.fsf@benfinney.id.au>	<ea2499da0904062330o68066bacgbcdbca865c57ee0e@mail.gmail.com>
	<200904072142.06158.steve@pearwood.info>
Message-ID: <49DBAEB4.1070007@v.loewis.de>

> Ben is correct: you can't assume that contributors will have both a 
> first name and a last name, or that a first name and last name is 
> sufficient to legally identify them. Those from Spanish and Portuguese 
> cultures usually have two family names as well as a personal name; 
> people from Indonesian, Burmese and Malaysian cultures often only use a 
> single name.

That's why asking for a policy. We have to have *some* way of
identifying where a certain change originated from. I'm sure
there is solution, and it doesn't matter to me whether I need
to identify myself as "Martin v. L?wis"
or "Martinv von L?wis of Menar".

Regards,
Martin

From jared.grubb at gmail.com  Tue Apr  7 21:55:10 2009
From: jared.grubb at gmail.com (Jared Grubb)
Date: Tue, 7 Apr 2009 12:55:10 -0700
Subject: [Python-Dev] pyc files,
	constant folding and borderline portability issues
In-Reply-To: <c59005ea0904071159q614097fw8efa6ed936251c9f@mail.gmail.com>
References: <c59005ea0904071159q614097fw8efa6ed936251c9f@mail.gmail.com>
Message-ID: <2BA445ED-A541-4EA0-8BEC-6D0C469F971E@gmail.com>

On 7 Apr 2009, at 11:59, Alexandru Mo?oi wrote:
> Not necessarily. For example C/C++ doesn't define the order of the
> operations inside an expression (and AFAIK neither Python) and
> therefore folding 2 * 3 is OK whether b is an integer or an arbitrary
> object with mul operator overloaded. Moreover one would expect * to be
> associative and commutative (take a look at Python strings); if a * 2
> * 3 returns a different result from a * 6 I will find it very
> surprising and probably reject such code.

That's not true. All ops in C/C++ have associativity that is fixed and  
well-defined; the star op is left-associative:
2*3*x is (2*3)*x is 6*x
x*2*3 is (x*2)*3, and this is NOT x*6 (You can show this in C++ by  
creating a class that has a side-effect in its * operator).

The star operator is not commutative in Python or C/C++ (otherwise  
what would __rmul__ do?). It's easier to see that + is not  
commutative: "abc"+"def" and "def"+"abc" are definitely different!

You may be confusing the "order is undefined" for the evaluation of  
parameter lists in C/C++. Example: foo(f(),g()) calls f and g in an  
undefined order.

Jared

From martin at v.loewis.de  Tue Apr  7 21:59:09 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Apr 2009 21:59:09 +0200
Subject: [Python-Dev] $Id$ and sys.subversion (Was: Mercurial?)
In-Reply-To: <ea2499da0904070457k5d177139u19e53ff6ae8c27c9@mail.gmail.com>
References: <20090404154049.GA23987@panix.com>
	<873acl7zga.fsf@benfinney.id.au>	<ea2499da0904062330o68066bacgbcdbca865c57ee0e@mail.gmail.com>	<200904072142.06158.steve@pearwood.info>
	<ea2499da0904070457k5d177139u19e53ff6ae8c27c9@mail.gmail.com>
Message-ID: <49DBB08D.3090208@v.loewis.de>

One issue that the PEP needs to address is what to do with the files
that use svn (really, CVS) keywords, and what should happen to
sys.subversion. Along with it goes the question what sys.version
should say.

It probably would be good if somebody could produce a patch that
can be applied to a mercurial checkout that gets these things right
(perhaps a Mercurial branch in itself?). Subversion-specific code
is both in configure.in, Makefile.pre.in, and PCbuild/make_buildinfo.c
(not sure whether that would still be needed).

Regards,
Martin

From tjreedy at udel.edu  Tue Apr  7 23:04:43 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 07 Apr 2009 17:04:43 -0400
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
Message-ID: <grgf59$j7c$1@ger.gmane.org>

Daniel Fetchinson wrote:
> The decorator module [1] written by Michele Simionato is a very useful
> tool for maintaining function signatures while applying a decorator.
> Many different projects implement their own versions of the same
> functionality, for example turbogears has its own utility for this, I
> guess others do something similar too.
> 
> Was the issue whether to include this module in the stdlib raised? If
> yes, what were the arguments against it? If not, what do you folks
> think, shouldn't it be included? I certainly think it should be.
> 
> Originally I sent this message to c.l.p [2] and Michele suggested it
> be brought up on python-dev. He also pointed out that a PEP [3] is
> already written about this topic and it is in draft form.
> 
> What do you guys think, wouldn't this be a useful addition to functools?

> [1] http://pypi.python.org/pypi/decorator
> [2] http://groups.google.com/group/comp.lang.python/browse_thread/thread/d4056023f1150fe0
> [3] http://www.python.org/dev/peps/pep-0362/

This probably should have gone to the python-ideas list.  In any case, I 
think it needs to start with a clear offer from Michele (directly or 
relayed by you) to contribute it to the PSF with the usual conditions.

From tjreedy at udel.edu  Tue Apr  7 23:09:13 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 07 Apr 2009 17:09:13 -0400
Subject: [Python-Dev] pyc files,
	constant folding and borderline  portability issues
In-Reply-To: <62037.151.53.159.5.1239126705.squirrel@webmail6.pair.com>
References: <loom.20090329T143625-223@post.gmane.org>	<op.uryyh7uh03jqhe@cesareprova.org>
	<ca471dc20904061427g68b30b7cj3547310ca1ebd2da@mail.gmail.com>
	<200904071010.16855.steve@pearwood.info>	<op.urz9neqq03jqhe@cesareprova.org>
	<18907.28413.42458.631358@montanaro.dyndns.org>	<op.ur0vh8jf03jqhe@cesareprova.org>
	<ca471dc20904070925g44d6a143l2bf8392faf4c0af6@mail.gmail.com>
	<56851.151.53.159.5.1239122789.squirrel@webmail6.pair.com>	<ca471dc20904071022t3d299337rc4baf9651243f769@mail.gmail.com>
	<62037.151.53.159.5.1239126705.squirrel@webmail6.pair.com>
Message-ID: <grgfdl$j7c$2@ger.gmane.org>

Cesare Di Mauro wrote:
> On Tue, Apr 7, 2009 07:22PM, Guido van Rossum wrote:
>>> In my experience it's better to discover a bug at compile time rather
>>> than
>>> at running time.
>> That's my point though, which you seem to be ignoring: if the user
>> explicitly writes "1/0" it is not likely to be a bug. That's very
>> different than "1/x" where x happens to take on zero at runtime --
>> *that* is likely  bug, but a constant folder can't detect that (at
>> least not for Python).
>>
>> --
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
> 
> I agree. My only concern was about user mistyping that can leed to an
> error interceptable by a stricter constant folder.
> 
> But I admit that it's a rarer case compared to an explicit exception
> raising such the one you showed.

I would guess that it is so rare as to not be worth bothering about.

From tjreedy at udel.edu  Tue Apr  7 23:11:41 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 07 Apr 2009 17:11:41 -0400
Subject: [Python-Dev] pyc files,
	constant folding and borderline portability issues
In-Reply-To: <c59005ea0904071159q614097fw8efa6ed936251c9f@mail.gmail.com>
References: <c59005ea0904071159q614097fw8efa6ed936251c9f@mail.gmail.com>
Message-ID: <grgfia$j7c$3@ger.gmane.org>

Alexandru Mo?oi wrote:
>> From: "Cesare Di Mauro" <cesare.dimauro at a-tono.com>
>> So if Python will generate
>>
>> LOAD_CONST      1
>> LOAD_CONST      2
>> BINARY_ADD
>>
>> the constant folding code will simply replace them with a single
>>
>> LOAD_CONST      3
>>
>> When working with such kind of optimizations, the temptation is to
>> apply them at any situation possible. For example, in other languages
>> this
>>
>> a = b * 2 * 3
>>
>> will be replaced by
>>
>> a = b * 6
>>
>> In Python I can't do that, because b can be an object which overloaded
>> the * operator, so it *must* be called two times, one for 2 and one for 3.
> 
> Not necessarily. For example C/C++ doesn't define the order of the
> operations inside an expression (and AFAIK neither Python)

Yes is does. Expression/Evaluation order "Python evaluates expressions 
from left to right."

From alex.neundorf at kitware.com  Tue Apr  7 23:42:48 2009
From: alex.neundorf at kitware.com (Alexander Neundorf)
Date: Tue, 7 Apr 2009 23:42:48 +0200
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <5b8d13220904070623j258605bob5200dc84362dc11@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com>
	<5b8d13220904070623j258605bob5200dc84362dc11@mail.gmail.com>
Message-ID: <806d41050904071442u405f6473t6888b848dd5a6922@mail.gmail.com>

On Tue, Apr 7, 2009 at 3:23 PM, David Cournapeau <cournape at gmail.com> wrote:
> On Tue, Apr 7, 2009 at 10:08 PM, Alexander Neundorf
> <alex.neundorf at kitware.com> wrote:
>
>>
>> What is involved in building python extensions ? Can you please explain ?
>
> Not much: at the core, a python extension is nothing more than a
> dynamically loaded library + a couple of options.

CMake has support (slightly but intentionally undocumented) for this,
from FindPythonLibs.cmake:

# PYTHON_ADD_MODULE(<name> src1 src2 ... srcN) is used to build
modules for python.
# PYTHON_WRITE_MODULES_HEADER(<filename>) writes a header file you can include
# in your sources to initialize the static python modules

Using python_add_module(name file1.c file2.c) you can build python
modules, and decide at cmake time whether it should be a dynamically
loaded module (default) or whether it should be built as a static
library (useful for platforms without shared libs).
Installation then happens simply via
install(TARGETS ...)

> One choice is whether to take options from distutils or to set them up

What options ?

> independently. In my own scons tool to build python extensions, both
> are possible.
>
> The hard (or rather time consuming) work is to do everything else that
> distutils does related to the packaging. That's where scons/waf are
> more interesting than cmake IMO, because you can "easily" give up this
> task back to distutils, whereas it is inherently more difficult with
> cmake.

Can you please explain ?
It is easy to run external tools with cmake at cmake time and at build
time, and it is also possible to run them at install time.

Alex

From skip at pobox.com  Tue Apr  7 23:29:30 2009
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 7 Apr 2009 16:29:30 -0500
Subject: [Python-Dev] ANN: deps extension (fwd)
Message-ID: <18907.50618.40170.430005@montanaro.dyndns.org>

I know the subject of external dependencies came up here in the discussion
about Mercurial.  I just saw this on the Mercurial mailing list.  Perhaps it
will be of interest to our hg mavens.

Skip

-------------- next part --------------
An embedded message was scrubbed...
From: =?ISO-8859-1?Q?Martin_Vejn=E1r?= <avakar at ratatanek.cz>
Subject: ANN: deps extension
Date: Tue, 07 Apr 2009 22:09:38 +0200
Size: 7245
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090407/c4fd2071/attachment-0001.eml>

From greg.ewing at canterbury.ac.nz  Wed Apr  8 00:43:05 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Apr 2009 10:43:05 +1200
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>
Message-ID: <49DBD6F9.7030502@canterbury.ac.nz>

David Cournapeau wrote:
> Having a full
> fledged language for complex builds is nice, I think most familiar
> with complex makefiles would agree with this.

Yes, people will still need general computation in their
build process from time to time whether the build tool
they're using supports it or not. And if it doesn't,
they'll resort to some ungodly mash such as Makefile+
shell+m4. Python has got to be a better choice than that.

-- 
Greg

From alex.neundorf at kitware.com  Wed Apr  8 00:54:12 2009
From: alex.neundorf at kitware.com (Alexander Neundorf)
Date: Wed, 8 Apr 2009 00:54:12 +0200
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <49DBD6F9.7030502@canterbury.ac.nz>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>
	<49DBD6F9.7030502@canterbury.ac.nz>
Message-ID: <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>

On Wed, Apr 8, 2009 at 12:43 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> David Cournapeau wrote:
>>
>> Having a full
>> fledged language for complex builds is nice, I think most familiar
>> with complex makefiles would agree with this.
>
> Yes, people will still need general computation in their
> build process from time to time whether the build tool
> they're using supports it or not.

I'm maintaining the CMake-based buildsystem for KDE4 since 3 years now
in my sparetime, millions lines of code, multiple code generators, all
major operating systems. My experience is that people don't need
general computation in their build process.
CMake supports now more general purpose programming features than it
did 2 years ago, e.g. it has now functions with local variables, it
can do simple math, regexps and other things.
If we get to the point where this is not enough, it usually means a
real program which does real work is required.
In this case it's actually a good thing to have this as a separate
tool, and not mixed into the buildsystem.
Having a not very powerful, but therefor domain specific language for
the buildsystem is really a feature :-)
(even if it sounds wrong in the first moment).

>From what I saw when I was building Python I didn't actually see too
complicated things. In KDE4 we are not only building and installing
programs, but we are also installing and shipping a development
platform. This includes CMake files which contain functionality which
helps in developing KDE software, i.e. variables and a bunch of
KDE-specific macros. They are documented here:
http://api.kde.org/cmake/modules.html#module_FindKDE4Internal
(this is generated automatically from the cmake file we ship).
I guess something similar could be useful for Python, maybe this is
what distutils actually do ? I.e. they help with developing
python-standard-conformant software ?
This could be solved easily if python would install a cmake file which
provides the necessary utility functions/macros.

Alex

From tleeuwenburg at gmail.com  Wed Apr  8 00:59:39 2009
From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg)
Date: Wed, 8 Apr 2009 08:59:39 +1000
Subject: [Python-Dev] Is there an issue with bugs.python.org currently
Message-ID: <43c8685c0904071559p6be5f274r945ac8c6258217d9@mail.gmail.com>

Sadly, my work firewall/proxy often handles things badly, so I can't
actually tell. Is bugs.python.org accepting changes at the moment (I'm
trying to update the Stage of an issue)?

Cheers,
-T

-- 
--------------------------------------------------
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090408/df020ade/attachment.htm>

From greg.ewing at canterbury.ac.nz  Wed Apr  8 01:58:54 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Apr 2009 11:58:54 +1200
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>
	<49DBD6F9.7030502@canterbury.ac.nz>
	<806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>
Message-ID: <49DBE8BE.3090208@canterbury.ac.nz>

Alexander Neundorf wrote:
> My experience is that people don't need
> general computation in their build process.
 > ...
> CMake supports now more general purpose programming features than it
> did 2 years ago, e.g. it has now functions with local variables, it
> can do simple math, regexps and other things.

In other words, it's growing towards being able to
do general computation. Why is it doing that, if
people don't need general computation in their
build process?

> If we get to the point where this is not enough, it usually means a
> real program which does real work is required.
> In this case it's actually a good thing to have this as a separate
> tool, and not mixed into the buildsystem.

There's some merit in that idea, but the build
tool and the program need to work together
smoothly somehow. If the build tool is implemented
in Python, there's more chance of that happening
(e.g. the Python code can import parts of the
build system and call them directly, rather than
having to generate a file in some other language).

-- 
Greg

From cournape at gmail.com  Wed Apr  8 04:11:33 2009
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 8 Apr 2009 11:11:33 +0900
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <806d41050904071442u405f6473t6888b848dd5a6922@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com>
	<5b8d13220904070623j258605bob5200dc84362dc11@mail.gmail.com>
	<806d41050904071442u405f6473t6888b848dd5a6922@mail.gmail.com>
Message-ID: <5b8d13220904071911i1bc9ae8ah616e55fdbc080e83@mail.gmail.com>

On Wed, Apr 8, 2009 at 6:42 AM, Alexander Neundorf
<alex.neundorf at kitware.com> wrote:

> What options ?

Compilation options. If you build an extension with distutils, the
extension is built with the same flags as the ones used by python, the
options are taken from distutils.sysconfig  (except for MS compilers,
which has its own options, which is one of the big pain in distutils).

>
> Can you please explain ?

If you want to stay compatible with distutils, you have to do quite a
lot of things. Cmake (and scons, and waf) only handle the build, but
they can't handle all the packaging done by distutils (tarballs
generation, binaries generation, in place build, develop mode of
setuptools, eggs, .pyc and .pyo generation, etc...), so you have two
choices: add support for this in the build tool (lot of work) or just
use distutils once everything is built with your tool of choice.

> It is easy to run external tools with cmake at cmake time and at build
> time, and it is also possible to run them at install time.

Sure, what can of build tool could not do that :) But given the design
of distutils, if you want to keep all its packaging features, you
can't just launch a few commands, you have to integrate them somewhat.
Everytime you need something from distutils, you would need to launch
python for cmake, whether with scons/waf, you can just use it as you
would use any python library. That's just inherent to the fact that
waf/scons are in the same language as distutils; if we were doing
ocaml builds, having a build tool in ocaml would have been easier,
etc...

David

From cournape at gmail.com  Wed Apr  8 04:18:16 2009
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 8 Apr 2009 11:18:16 +0900
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>
	<49DBD6F9.7030502@canterbury.ac.nz>
	<806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>
Message-ID: <5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com>

On Wed, Apr 8, 2009 at 7:54 AM, Alexander Neundorf
<alex.neundorf at kitware.com> wrote:
> On Wed, Apr 8, 2009 at 12:43 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> David Cournapeau wrote:
>>>
>>> Having a full
>>> fledged language for complex builds is nice, I think most familiar
>>> with complex makefiles would agree with this.
>>
>> Yes, people will still need general computation in their
>> build process from time to time whether the build tool
>> they're using supports it or not.
>
> I'm maintaining the CMake-based buildsystem for KDE4 since 3 years now
> in my sparetime, millions lines of code, multiple code generators, all
> major operating systems. My experience is that people don't need
> general computation in their build process.
> CMake supports now more general purpose programming features than it
> did 2 years ago, e.g. it has now functions with local variables, it
> can do simple math, regexps and other things.
> If we get to the point where this is not enough, it usually means a
> real program which does real work is required.
> In this case it's actually a good thing to have this as a separate
> tool, and not mixed into the buildsystem.
> Having a not very powerful, but therefor domain specific language for
> the buildsystem is really a feature :-)
> (even if it sounds wrong in the first moment).

Yes, there are some advantages to that. The point of python is to have
the same language for the build specification and the extensions, in
my mind. For extensions, you really need a full language - for
example, if you want to add support for tools which generate unknown
files in advance, and handle this correctly from a build POV, a
macro-like language is not sufficient.

>
> >From what I saw when I was building Python I didn't actually see too
> complicated things. In KDE4 we are not only building and installing
> programs, but we are also installing and shipping a development
> platform. This includes CMake files which contain functionality which
> helps in developing KDE software, i.e. variables and a bunch of
> KDE-specific macros. They are documented here:
> http://api.kde.org/cmake/modules.html#module_FindKDE4Internal
> (this is generated automatically from the cmake file we ship).
> I guess something similar could be useful for Python, maybe this is
> what distutils actually do ?

distutils does roughly everything that autotools does, and more:
 - configuration: not often used in extensions, we (numpy) are the
exception I would guess
 - build
 - installation
 - tarball generation
 - bdist_ installers (msi, .exe on windows, .pkg/.mpkg on mac os x,
rpm/deb on Linux)
 - registration to pypi
 - more things which just ellude me at the moment

cheers,

David

From tleeuwenburg at gmail.com  Wed Apr  8 04:54:27 2009
From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg)
Date: Wed, 8 Apr 2009 12:54:27 +1000
Subject: [Python-Dev] http://bugs.python.org/issue2240
Message-ID: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com>

This issue has been largely resolved, but there is an outstanding bug where
the (reviewed and committed) solution does not work on certain versions of
FreeBSD (broken in 6.3, working in 7+). Do we have a list of 'supported
platforms', and is FreeBSD 6.3 in it?

What's the policy with regards to supporting dependencies like this? Should
I set this issue to 'pending' seeing as no-one is currently working on a
patch for this? Or is leaving this open and hanging around exactly the right
thing to do?

Cheers,
-T

-- 
--------------------------------------------------
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090408/5f3db1f6/attachment.htm>

From dalcinl at gmail.com  Wed Apr  8 04:58:59 2009
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Tue, 7 Apr 2009 23:58:59 -0300
Subject: [Python-Dev] calling dictresize outside dictobject.c
In-Reply-To: <20090407163449.GA10119@panix.com>
References: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu>
	<20090407163449.GA10119@panix.com>
Message-ID: <e7ba66e40904071958n27c103ack65a6cdaa300f3131@mail.gmail.com>

Did you read the post until the end? The OP is asking a question
related to a very low level detail of dict implementation and making
an offer to write a patch that could speed-up dict.setdefault() in
core CPython... IMHO, a poll on python-dev do makes sense...

On Tue, Apr 7, 2009 at 1:34 PM, Aahz <aahz at pythoncraft.com> wrote:
> On Mon, Apr 06, 2009, Dan Schult wrote:
>>
>> I'm trying to write a C extension which is a subclass of dict.
>> I want to do something like a setdefault() but with a single lookup.
>
> python-dev is for core development, not for questions about using Python.
> Please use comp.lang.python or the capi-sig list.
> --
> Aahz (aahz at pythoncraft.com) ? ? ? ? ? <*> ? ? ? ? http://www.pythoncraft.com/
>
> "...string iteration isn't about treating strings as sequences of strings,
> it's about treating strings as sequences of characters. ?The fact that
> characters are also strings is the reason we have problems, but characters
> are strings for other good reasons." ?--Aahz
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/dalcinl%40gmail.com
>

-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

From ggpolo at gmail.com  Wed Apr  8 05:14:13 2009
From: ggpolo at gmail.com (Guilherme Polo)
Date: Wed, 8 Apr 2009 00:14:13 -0300
Subject: [Python-Dev] http://bugs.python.org/issue2240
In-Reply-To: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com>
References: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com>
Message-ID: <ac2200130904072014j7f0566ecjeeba881d69cf65d@mail.gmail.com>

On Tue, Apr 7, 2009 at 11:54 PM, Tennessee Leeuwenburg
<tleeuwenburg at gmail.com> wrote:
> This issue has been largely resolved, but there is an outstanding bug where
> the (reviewed and committed) solution does not work on certain versions of
> FreeBSD (broken in 6.3, working in 7+). Do we have a list of 'supported
> platforms', and is FreeBSD 6.3 in it?
>
> What's the policy with regards to supporting dependencies like this? Should
> I set this issue to 'pending' seeing as no-one is currently working on a
> patch for this? Or is leaving this open and hanging around exactly the right
> thing to do?
>

I would find more appropriate to close this as fixed because the issue
was about adding setitimer and getitimer wrappers and that is done.

We could then create another issue regarding this bug in specific
versions of freebsd towards this setitimer/getitimer wrapper. That is
what makes more sense to me.

> Cheers,
> -T
>
>
>
> --
> --------------------------------------------------
> Tennessee Leeuwenburg
> http://myownhat.blogspot.com/
> "Don't believe everything you think"
>

Regards,

-- 
-- Guilherme H. Polo Goncalves

From tleeuwenburg at gmail.com  Wed Apr  8 05:24:04 2009
From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg)
Date: Wed, 8 Apr 2009 13:24:04 +1000
Subject: [Python-Dev] http://bugs.python.org/issue2240
In-Reply-To: <ac2200130904072014j7f0566ecjeeba881d69cf65d@mail.gmail.com>
References: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com>
	<ac2200130904072014j7f0566ecjeeba881d69cf65d@mail.gmail.com>
Message-ID: <43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com>

On Wed, Apr 8, 2009 at 1:14 PM, Guilherme Polo <ggpolo at gmail.com> wrote:

> On Tue, Apr 7, 2009 at 11:54 PM, Tennessee Leeuwenburg
> <tleeuwenburg at gmail.com> wrote:
> > This issue has been largely resolved, but there is an outstanding bug
> where
> > the (reviewed and committed) solution does not work on certain versions
> of
> > FreeBSD (broken in 6.3, working in 7+). Do we have a list of 'supported
> > platforms', and is FreeBSD 6.3 in it?
> >
> > What's the policy with regards to supporting dependencies like this?
> Should
> > I set this issue to 'pending' seeing as no-one is currently working on a
> > patch for this? Or is leaving this open and hanging around exactly the
> right
> > thing to do?
> >
>
> I would find more appropriate to close this as fixed because the issue
> was about adding setitimer and getitimer wrappers and that is done.
>
> We could then create another issue regarding this bug in specific
> versions of freebsd towards this setitimer/getitimer wrapper. That is
> what makes more sense to me.

Hi Guilherme,

I'd agree with that. I just wonder whether it's necessary to create another
issue, or whether the issue can be marked as 'fixed' without opening the new
issue. It seems like the bug relates only to an older version of a 'weird'
operating system <ducks> and could perhaps be left unfixed without causing
anyone any problems.

Cheers,
-T
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090408/c6c32e42/attachment.htm>

From barry at python.org  Wed Apr  8 05:28:15 2009
From: barry at python.org (Barry Warsaw)
Date: Tue, 7 Apr 2009 23:28:15 -0400
Subject: [Python-Dev] RELEASED Python 2.6.2 candidate 1
Message-ID: <67987D03-6D96-4601-A0C5-08B987A81F3B@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm happy to announce the release of Python 2.6.2 candidate 1.  This  
release contains dozens of bug fixes since Python 2.6.1.  Please see  
the NEWS file for a detailed list of changes.

Barring unforeseen problems, Python 2.6.2 final will be released  
within a few days.

    http://www.python.org/download/releases/2.6.2/NEWS.txt

For more information on Python 2.6 please see

     http://docs.python.org/dev/whatsnew/2.6.html

Source tarballs and Windows installers for this release candidate can  
be downloaded from the Python 2.6.2 page:

    http://www.python.org/download/releases/2.6.2/

Bugs can be reported in the Python bug tracker:

    http://bugs.python.org

Enjoy,
Barry

Barry Warsaw
barry at python.org
Python 2.6/3.0 Release Manager
(on behalf of the entire python-dev team)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSdwZ0HEjvBPtnXfVAQJTsAP+Krt1F6qGjuk9a7q8HwF2oAWr/peIAfDf
7HGjOpieoyyAKO1ZNqWvxZ1Ftx+I0YHjfk5OKz/1FN9H3eteFU/L5EEbJD1iTSmK
LAOycWWtWJp+OPatqveHZbGr4ap4XON05yMrzlewnnIH0iGnYjMAgxKkwVKA7MwN
BiXDeBPba1A=
=HdKG
-----END PGP SIGNATURE-----

From michele.simionato at gmail.com  Wed Apr  8 06:09:14 2009
From: michele.simionato at gmail.com (Michele Simionato)
Date: Wed, 8 Apr 2009 06:09:14 +0200
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <grgf59$j7c$1@ger.gmane.org>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
	<grgf59$j7c$1@ger.gmane.org>
Message-ID: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>

On Tue, Apr 7, 2009 at 11:04 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>
> This probably should have gone to the python-ideas list. ?In any case, I
> think it needs to start with a clear offer from Michele (directly or relayed
> by you) to contribute it to the PSF with the usual conditions.

I have no problem to contribute the module to the PSF and to maintain it.
I would just prefer to have the ability to change the function signature in
the core language rather than include in the standard library a clever hack.

               M. Simionato

From stephen at xemacs.org  Wed Apr  8 07:19:03 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 08 Apr 2009 14:19:03 +0900
Subject: [Python-Dev] http://bugs.python.org/issue2240
In-Reply-To: <43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com>
References: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com>
	<ac2200130904072014j7f0566ecjeeba881d69cf65d@mail.gmail.com>
	<43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com>
Message-ID: <87zlerhge0.fsf@xemacs.org>

Tennessee Leeuwenburg writes:

 > I'd agree with that. I just wonder whether it's necessary to create another
 > issue, or whether the issue can be marked as 'fixed' without opening the new
 > issue.

Opening a new issue has the effect of running a poll of those who
watch such issues on the tracker (in particular, I'd grandfather the
nosy list).  You could even set the new issue to pending at that time.

From tleeuwenburg at gmail.com  Wed Apr  8 07:40:42 2009
From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg)
Date: Wed, 8 Apr 2009 15:40:42 +1000
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
Message-ID: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>

Now, I know that sets aren't ordered, but...

foo = set([1,2,3,4,5])
bar = [1,2,3,4,5]

foo.pop() will reliably return 1
while bar.pop() will return 5

discuss :)

Cheers,
-T
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090408/9bfcc1d0/attachment.htm>

From python at rcn.com  Wed Apr  8 07:47:45 2009
From: python at rcn.com (Raymond Hettinger)
Date: Tue, 7 Apr 2009 22:47:45 -0700
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>
Message-ID: <937C9D2AC5034C3AB2551839855A5D99@RaymondLaptop1>

[Tennessee Leeuwenburg ]
> Now, I know that sets aren't ordered, but...
>
> foo = set([1,2,3,4,5])
> bar = [1,2,3,4,5]
>
> foo.pop() will reliably return 1 
> while bar.pop() will return 5
>
> discuss :)

If that's what you need:
    http://code.activestate.com/recipes/576694/

Raymond

From asmodai at in-nomine.org  Wed Apr  8 07:55:41 2009
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Wed, 8 Apr 2009 07:55:41 +0200
Subject: [Python-Dev] http://bugs.python.org/issue2240
In-Reply-To: <43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com>
References: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com>
	<ac2200130904072014j7f0566ecjeeba881d69cf65d@mail.gmail.com>
	<43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com>
Message-ID: <20090408055541.GA13110@nexus.in-nomine.org>

-On [20090408 05:24], Tennessee Leeuwenburg (tleeuwenburg at gmail.com) wrote:
>It seems like the bug relates only to an older version of a 'weird'
>operating system <ducks> and could perhaps be left unfixed without causing
>anyone any problems.

Being one of the FreeBSD guys I'll throw peanuts at you. :P

In any case, 6.3 is from early 2008 and 6.4 is from November 2008. The
6-STABLE branch is still open and a lot of users are still tracking this.

However, the main focus is 7 and with 8 looming on the horizon. And FreeBSD
7 does away with libc_r and uses a whole different model for its threading.
Are the tests going ok there? If so, then I shouldn't worry about the 6
branch.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Few are those who see with their own eyes and feel with their own hearts...

From jackdied at gmail.com  Wed Apr  8 08:10:15 2009
From: jackdied at gmail.com (Jack diederich)
Date: Wed, 8 Apr 2009 02:10:15 -0400
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
	<grgf59$j7c$1@ger.gmane.org>
	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>
Message-ID: <b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>

On Wed, Apr 8, 2009 at 12:09 AM, Michele Simionato
<michele.simionato at gmail.com> wrote:
> On Tue, Apr 7, 2009 at 11:04 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>
>> This probably should have gone to the python-ideas list. ?In any case, I
>> think it needs to start with a clear offer from Michele (directly or relayed
>> by you) to contribute it to the PSF with the usual conditions.
>
> I have no problem to contribute the module to the PSF and to maintain it.
> I would just prefer to have the ability to change the function signature in
> the core language rather than include in the standard library a clever hack.

Flipping Michele's commit bit (if he wants it) is overdue.  A quick
google doesn't show he refused it in the past, but the same search
shows the things things he did do - including the explication of MRO
in 2.3 (http://www.python.org/download/releases/2.3/mro/).  Plus he's
a softie for decorators, as am I.

-Jack

From jbarham at gmail.com  Wed Apr  8 08:13:19 2009
From: jbarham at gmail.com (John Barham)
Date: Tue, 7 Apr 2009 23:13:19 -0700
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
In-Reply-To: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>
Message-ID: <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>

Tennessee Leeuwenburg wrote:
> Now, I know that sets aren't ordered, but...
>
> foo = set([1,2,3,4,5])
> bar = [1,2,3,4,5]
>
> foo.pop() will reliably return 1
> while bar.pop() will return 5
>
> discuss :)

As designed.

If you play around a bit it becomes clear that what set.pop() returns
is independent of the insertion order:

PythonWin 2.5.2 (r252:60911, Mar 27 2008, 17:57:18) [MSC v.1310 32 bit
(Intel)] on win32.
>>> foo = set([5,4,3,2,1]) # Order reversed from above
>>> foo.pop()
1
>>> foo = set([-1,0,1,2,3,4,5])
>>> foo.pop()
0
>>> foo = set([-1,1,2,3,4,5])
>>> foo.pop()
1

As the documentation says
(http://docs.python.org/library/stdtypes.html#set.pop) set.pop() is
free to return an arbitrary element.

list.pop() however always returns the last element of the list, unless
of course you specify some other index:
http://docs.python.org/library/stdtypes.html#mutable-sequence-types,
point 6.

 John

From dickinsm at gmail.com  Wed Apr  8 08:44:35 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Wed, 8 Apr 2009 07:44:35 +0100
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
In-Reply-To: <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>
	<4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>
Message-ID: <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>

On Wed, Apr 8, 2009 at 7:13 AM, John Barham <jbarham at gmail.com> wrote:
> If you play around a bit it becomes clear that what set.pop() returns
> is independent of the insertion order:

It might look like that, but I don't think this is
true in general (at least, with the current implementation):

>>> foo = set([1, 65537])
>>> foo.pop()
1
>>> foo = set([65537, 1])
>>> foo.pop()
65537

Mark

From michele.simionato at gmail.com  Wed Apr  8 09:17:17 2009
From: michele.simionato at gmail.com (Michele Simionato)
Date: Wed, 8 Apr 2009 09:17:17 +0200
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
	<grgf59$j7c$1@ger.gmane.org>
	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>
	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>
Message-ID: <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>

On Wed, Apr 8, 2009 at 8:10 AM, Jack diederich <jackdied at gmail.com> wrote:
> Plus he's a softie for decorators, as am I.

I must admit that while I still like decorators, I do like them as
much as in the past.
I also see an overuse of decorators in various libraries for things that could
be done more clearly without them ;-(
But this is tangential.
What I would really like to know is the future of PEP 362, i.e. having
a signature object that could be taken from an undecorated function
and added to the decorated function.
I do not recall people having anything against it, in principle,
and there is also an implementation in the sandbox, but
after three years nothing happened. I guess this is just not
a high priority for the core developers.

From solipsis at pitrou.net  Wed Apr  8 11:42:49 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 8 Apr 2009 09:42:49 +0000 (UTC)
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>
	<4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>
	<5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>
Message-ID: <loom.20090408T094059-974@post.gmane.org>

Mark Dickinson <dickinsm <at> gmail.com> writes:
> 
> On Wed, Apr 8, 2009 at 7:13 AM, John Barham <jbarham <at> gmail.com> wrote:
> > If you play around a bit it becomes clear that what set.pop() returns
> > is independent of the insertion order:
> 
> It might look like that, but I don't think this is
> true in general (at least, with the current implementation):

Not to mention that other implementations (Jython, etc.) will probably exhibit
yet different behaviour, and the CPython hash functions are not engraved in
stone either. If you want to write portable code, you can't rely on *any*
reproduceable ordering for random set member access.

Regards

Antoine.

From tleeuwenburg at gmail.com  Wed Apr  8 12:57:07 2009
From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg)
Date: Wed, 8 Apr 2009 20:57:07 +1000
Subject: [Python-Dev] http://bugs.python.org/issue2240
In-Reply-To: <20090408055541.GA13110@nexus.in-nomine.org>
References: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com>
	<ac2200130904072014j7f0566ecjeeba881d69cf65d@mail.gmail.com>
	<43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com>
	<20090408055541.GA13110@nexus.in-nomine.org>
Message-ID: <43c8685c0904080357i19ab2f1u628222b97875a131@mail.gmail.com>

On Wed, Apr 8, 2009 at 3:55 PM, Jeroen Ruigrok van der Werven <
asmodai at in-nomine.org> wrote:

> -On [20090408 05:24], Tennessee Leeuwenburg (tleeuwenburg at gmail.com)
> wrote:
> >It seems like the bug relates only to an older version of a 'weird'
> >operating system <ducks> and could perhaps be left unfixed without causing
> >anyone any problems.
>
> Being one of the FreeBSD guys I'll throw peanuts at you. :P
>
> In any case, 6.3 is from early 2008 and 6.4 is from November 2008. The
> 6-STABLE branch is still open and a lot of users are still tracking this.
>
> However, the main focus is 7 and with 8 looming on the horizon. And FreeBSD
> 7 does away with libc_r and uses a whole different model for its threading.
> Are the tests going ok there? If so, then I shouldn't worry about the 6
> branch.

:)

Thanks for your input. I've done the paper shuffling so someone else can
pick up the FreeBSD cleanup job as a new issue...

Cheers,
-T
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090408/74917607/attachment.htm>

From jackdied at gmail.com  Wed Apr  8 12:57:25 2009
From: jackdied at gmail.com (Jack diederich)
Date: Wed, 8 Apr 2009 06:57:25 -0400
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
In-Reply-To: <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>
	<4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>
	<5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>
Message-ID: <b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>

On Wed, Apr 8, 2009 at 2:44 AM, Mark Dickinson <dickinsm at gmail.com> wrote:
> On Wed, Apr 8, 2009 at 7:13 AM, John Barham <jbarham at gmail.com> wrote:
>> If you play around a bit it becomes clear that what set.pop() returns
>> is independent of the insertion order:
>
> It might look like that, but I don't think this is
> true in general (at least, with the current implementation):
>
>>>> foo = set([1, 65537])
>>>> foo.pop()
> 1
>>>> foo = set([65537, 1])
>>>> foo.pop()
> 65537

You wrote a program to find the two smallest ints that would have a
hash collision in the CPython set implementation?  I'm impressed.  And
by impressed I mean frightened.

-Jack

From solipsis at pitrou.net  Wed Apr  8 13:10:21 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 8 Apr 2009 11:10:21 +0000 (UTC)
Subject: [Python-Dev] Dropping bytes "support" in json
Message-ID: <loom.20090408T110540-221@post.gmane.org>

Hello,

We're in the process of forward-porting the recent (massive) json updates to
3.1, and we are also thinking of dropping remnants of support of the bytes type
in the json library (in 3.1, again). This bytes support almost didn't work at
all, but there was a lot of C and Python code for it nevertheless. We're also
thinking of dropping the "encoding" argument in the various APIs, since it is
useless.

Under the new situation, json would only ever allow str as input, and output str
as well. By posting here, I want to know whether anybody would oppose this
(knowing, once again, that bytes support is already broken in the current py3k
trunk).

The bug entry is: http://bugs.python.org/issue4136

Regards

Antoine.

From steve at holdenweb.com  Wed Apr  8 13:57:09 2009
From: steve at holdenweb.com (Steve Holden)
Date: Wed, 08 Apr 2009 07:57:09 -0400
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
In-Reply-To: <b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>	<4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>	<5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>
	<b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>
Message-ID: <gri3ek$pfm$1@ger.gmane.org>

Jack diederich wrote:
> On Wed, Apr 8, 2009 at 2:44 AM, Mark Dickinson <dickinsm at gmail.com> wrote:
>> On Wed, Apr 8, 2009 at 7:13 AM, John Barham <jbarham at gmail.com> wrote:
>>> If you play around a bit it becomes clear that what set.pop() returns
>>> is independent of the insertion order:
>> It might look like that, but I don't think this is
>> true in general (at least, with the current implementation):
>>
>>>>> foo = set([1, 65537])
>>>>> foo.pop()
>> 1
>>>>> foo = set([65537, 1])
>>>>> foo.pop()
>> 65537
> 
> You wrote a program to find the two smallest ints that would have a
> hash collision in the CPython set implementation?  I'm impressed.  And
> by impressed I mean frightened.
> 
Given the two numbers in question (1, 2**16+1) I suspect this is the
result of analysis rather than algorithm.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Watch PyCon on video now!          http://pycon.blip.tv/

From gripho66 at gmail.com  Wed Apr  8 13:58:04 2009
From: gripho66 at gmail.com (Andrea Griffini)
Date: Wed, 8 Apr 2009 13:58:04 +0200
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
In-Reply-To: <b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>
	<4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>
	<5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>
	<b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>
Message-ID: <c500530a0904080458r3e816702w2778efddc2e51c4d@mail.gmail.com>

On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich <jackdied at gmail.com> wrote:
> You wrote a program to find the two smallest ints that would have a
> hash collision in the CPython set implementation? ?I'm impressed. ?And
> by impressed I mean frightened.

?

print set([0,8]).pop(), set([8,0]).pop()

Andrea

From ideasman42 at gmail.com  Wed Apr  8 14:04:11 2009
From: ideasman42 at gmail.com (Campbell Barton)
Date: Wed, 8 Apr 2009 05:04:11 -0700
Subject: [Python-Dev] PyCFunction_* Missing
Message-ID: <7c1ab96d0904080504o3b58b1bdvedd31ac872239921@mail.gmail.com>

Hi, Just noticed the new Python 2.6.2 docs now dont have any reference to
* PyCFunction_New
* PyCFunction_NewEx
* PyCFunction_Check
* PyCFunction_Call

Ofcourse these are still in the source code but Im wondering if this
is intentional that these functions should be for internal use only?
-- 
- Campbell

From duncan.booth at suttoncourtenay.org.uk  Wed Apr  8 14:30:05 2009
From: duncan.booth at suttoncourtenay.org.uk (Duncan Booth)
Date: Wed, 8 Apr 2009 12:30:05 +0000 (UTC)
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>
	<4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>
	<5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>
	<b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>
	<c500530a0904080458r3e816702w2778efddc2e51c4d@mail.gmail.com>
Message-ID: <Xns9BE78951BF8EFduncanrcpcouk@127.0.0.1>

Andrea Griffini <gripho66 at gmail.com> wrote:

> On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich <jackdied at gmail.com>
> wrote: 
>> You wrote a program to find the two smallest ints that would have a
>> hash collision in the CPython set implementation? ?I'm impressed.
>> ?And by impressed I mean frightened.
> 
> ?
> 
> print set([0,8]).pop(), set([8,0]).pop()

If 'smallest ints' means the sum of the absolute values then these are 
slightly smaller:

>>> print set([-1,6]).pop(), set([6,-1]).pop()
6 -1

From p.f.moore at gmail.com  Wed Apr  8 14:58:31 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 8 Apr 2009 13:58:31 +0100
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
In-Reply-To: <Xns9BE78951BF8EFduncanrcpcouk@127.0.0.1>
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>
	<4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>
	<5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>
	<b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>
	<c500530a0904080458r3e816702w2778efddc2e51c4d@mail.gmail.com>
	<Xns9BE78951BF8EFduncanrcpcouk@127.0.0.1>
Message-ID: <79990c6b0904080558y3b9bb5a2kf4976b104b65baa1@mail.gmail.com>

2009/4/8 Duncan Booth <duncan.booth at suttoncourtenay.org.uk>:
> Andrea Griffini <gripho66 at gmail.com> wrote:
>
>> On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich <jackdied at gmail.com>
>> wrote:
>>> You wrote a program to find the two smallest ints that would have a
>>> hash collision in the CPython set implementation? ?I'm impressed.
>>> ?And by impressed I mean frightened.
>>
>> ?
>>
>> print set([0,8]).pop(), set([8,0]).pop()
>
> If 'smallest ints' means the sum of the absolute values then these are
> slightly smaller:
>
>>>> print set([-1,6]).pop(), set([6,-1]).pop()
> 6 -1

Can't resist:

>>> print set([-2,-1]).pop(), set([-1,-2]).pop()
-1 -2

Paul.

From steve at holdenweb.com  Wed Apr  8 17:14:09 2009
From: steve at holdenweb.com (Steve Holden)
Date: Wed, 08 Apr 2009 11:14:09 -0400
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
In-Reply-To: <79990c6b0904080558y3b9bb5a2kf4976b104b65baa1@mail.gmail.com>
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>	<4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>	<5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>	<b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>	<c500530a0904080458r3e816702w2778efddc2e51c4d@mail.gmail.com>	<Xns9BE78951BF8EFduncanrcpcouk@127.0.0.1>
	<79990c6b0904080558y3b9bb5a2kf4976b104b65baa1@mail.gmail.com>
Message-ID: <grif04$5br$1@ger.gmane.org>

Paul Moore wrote:
> 2009/4/8 Duncan Booth <duncan.booth at suttoncourtenay.org.uk>:
>> Andrea Griffini <gripho66 at gmail.com> wrote:
>>
>>> On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich <jackdied at gmail.com>
>>> wrote:
>>>> You wrote a program to find the two smallest ints that would have a
>>>> hash collision in the CPython set implementation?  I'm impressed.
>>>>  And by impressed I mean frightened.
>>> ?
>>>
>>> print set([0,8]).pop(), set([8,0]).pop()
>> If 'smallest ints' means the sum of the absolute values then these are
>> slightly smaller:
>>
>>>>> print set([-1,6]).pop(), set([6,-1]).pop()
>> 6 -1
> 
> Can't resist:
> 
>>>> print set([-2,-1]).pop(), set([-1,-2]).pop()
> -1 -2
> 
>>> a = 0.001
>>> b = 0.002
>>> print set([a, b]).pop(), set([b, a]).pop()
0.002 0.001

Let's stop here ...

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Watch PyCon on video now!          http://pycon.blip.tv/

From p.f.moore at gmail.com  Wed Apr  8 17:38:35 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 8 Apr 2009 16:38:35 +0100
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
In-Reply-To: <grif04$5br$1@ger.gmane.org>
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>
	<4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>
	<5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>
	<b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>
	<c500530a0904080458r3e816702w2778efddc2e51c4d@mail.gmail.com>
	<Xns9BE78951BF8EFduncanrcpcouk@127.0.0.1>
	<79990c6b0904080558y3b9bb5a2kf4976b104b65baa1@mail.gmail.com>
	<grif04$5br$1@ger.gmane.org>
Message-ID: <79990c6b0904080838h12f520f8g96d5197214d820e@mail.gmail.com>

2009/4/8 Steve Holden <steve at holdenweb.com>:
> Paul Moore wrote:
>> 2009/4/8 Duncan Booth <duncan.booth at suttoncourtenay.org.uk>:
>>> Andrea Griffini <gripho66 at gmail.com> wrote:
>>>
>>>> On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich <jackdied at gmail.com>
>>>> wrote:
>>>>> You wrote a program to find the two smallest ints that would have a
>>>>> hash collision in the CPython set implementation? ?I'm impressed.
>>>>> ?And by impressed I mean frightened.
>>>> ?
>>>>
>>>> print set([0,8]).pop(), set([8,0]).pop()
>>> If 'smallest ints' means the sum of the absolute values then these are
>>> slightly smaller:
>>>
>>>>>> print set([-1,6]).pop(), set([6,-1]).pop()
>>> 6 -1
>>
>> Can't resist:
>>
>>>>> print set([-2,-1]).pop(), set([-1,-2]).pop()
>> -1 -2
>>
>>>> a = 0.001
>>>> b = 0.002
>>>> print set([a, b]).pop(), set([b, a]).pop()
> 0.002 0.001

Cheat! We were using integers...

:-)

Paul.

From jbaker at zyasoft.com  Wed Apr  8 17:50:55 2009
From: jbaker at zyasoft.com (Jim Baker)
Date: Wed, 8 Apr 2009 09:50:55 -0600
Subject: [Python-Dev] Contributor Agreements for Patches - was [Jython-dev]
	Jython on Google AppEngine!
Message-ID: <d03bb4010904080850h7eece089i1e279b4a5f01cbb3@mail.gmail.com>

A question that arose on this thread, which I'm forwarding for context (and
we're quite happy about it too!):

   - What is the scope of a patch that requires a contributor agreement?
   This particular patch on #1188 simply adds obvious (in retrospect of course)
   handling on SecurityException so that it's treated in a similar fashion to
   IOException (possibly a bit more buried), so it seems like a minor patch.
   - Do Google employees, working on company time, automatically get treated
   as contributors with existing contributor agreements on file with the PSF?
   If so, are there are other companies that automatically get this treatment?
   - Should we change the workflow for roundup to make this assignment of
   license clearer (see Tobias's idea in the thread about a click-though
   agreement).

In these matters, Jython, as a project under the Python Software Foundation,
intends to follow the same policy as CPython.

- Jim

---------- Forwarded message ----------
From: Frank Wierzbicki <fwierzbicki at gmail.com>
Date: Wed, Apr 8, 2009 at 9:32 AM
Subject: Re: [Jython-dev] Jython on Google AppEngine!
To: James Robinson <jamesr at google.com>
Cc: Jython Developers <jython-dev at lists.sourceforge.net>, Alan Kennedy <
jython-dev at xhaus.com>

On Wed, Apr 8, 2009 at 11:22 AM, James Robinson <jamesr at google.com> wrote:
> I submitted 1188 and I'm a Google employee working on company time.  Let
me
> know if anything further is needed, but we have quite a few contributors
to
> the Python project working here.
Excellent, and thanks!  1188 was already slated for inclusion in our
upcoming RC, but knowing that it is in support of GAE moves it up to a
very high priority.

-Frank

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Jython-dev mailing list
Jython-dev at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-dev

-- 
Jim Baker
jbaker at zyasoft.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090408/7f3b72eb/attachment.htm>

From python at rcn.com  Wed Apr  8 17:51:29 2009
From: python at rcn.com (Raymond Hettinger)
Date: Wed, 8 Apr 2009 08:51:29 -0700
Subject: [Python-Dev] Dropping bytes "support" in json
References: <loom.20090408T110540-221@post.gmane.org>
Message-ID: <E42902A003094B45A3ADB58C2D1DB25A@RaymondLaptop1>

> We're in the process of forward-porting the recent (massive) json updates to
> 3.1, and we are also thinking of dropping remnants of support of the bytes type
> in the json library (in 3.1, again). This bytes support almost didn't work at
> all, but there was a lot of C and Python code for it nevertheless. We're also
> thinking of dropping the "encoding" argument in the various APIs, since it is
> useless.
> 
> Under the new situation, json would only ever allow str as input, and output str
> as well. By posting here, I want to know whether anybody would oppose this
> (knowing, once again, that bytes support is already broken in the current py3k
> trunk).

+1

Raymond

From jbaker at zyasoft.com  Wed Apr  8 17:53:38 2009
From: jbaker at zyasoft.com (Jim Baker)
Date: Wed, 8 Apr 2009 09:53:38 -0600
Subject: [Python-Dev] Contributor Agreements for Patches - was
	[Jython-dev] Jython on Google AppEngine!
In-Reply-To: <d03bb4010904080850h7eece089i1e279b4a5f01cbb3@mail.gmail.com>
References: <d03bb4010904080850h7eece089i1e279b4a5f01cbb3@mail.gmail.com>
Message-ID: <d03bb4010904080853p6d731b87re1b042525db3529a@mail.gmail.com>

Oops, didn't attach the entire thread, so see below:

On Wed, Apr 8, 2009 at 9:50 AM, Jim Baker <jbaker at zyasoft.com> wrote:

> A question that arose on this thread, which I'm forwarding for context (and
> we're quite happy about it too!):
>
>    - What is the scope of a patch that requires a contributor agreement?
>    This particular patch on #1188 simply adds obvious (in retrospect of course)
>    handling on SecurityException so that it's treated in a similar fashion to
>    IOException (possibly a bit more buried), so it seems like a minor patch.
>    - Do Google employees, working on company time, automatically get
>    treated as contributors with existing contributor agreements on file with
>    the PSF? If so, are there are other companies that automatically get this
>    treatment?
>    - Should we change the workflow for roundup to make this assignment of
>    license clearer (see Tobias's idea in the thread about a click-though
>    agreement).
>
> In these matters, Jython, as a project under the Python Software
> Foundation, intends to follow the same policy as CPython.
>
> - Jim
>

Forwarded conversation
Subject: [Jython-dev] Jython on Google AppEngine!
------------------------

From: *Alan Kennedy* <jython-dev at xhaus.com>
Date: Wed, Apr 8, 2009 at 6:37 AM
To: Jython Developers <jython-dev at lists.sourceforge.net>, jython users <
jython-users at lists.sourceforge.net>

Hi all,

As you may know, Google announced Java for AppEngine yesterday!

http://googleappengine.blogspot.com/2009/04/seriously-this-time-new-language-on-app.html

And they're also supporting all of the various languages that run on
the JVM, including jython.

http://groups.google.com/group/google-appengine-java/web/will-it-play-in-app-engine

They say about jython

"""
- Jython 2.2 works out of the box.
- Jython 2.5 requires patches which we'll supply until the changes
make it directly into Jython:
 - jython-r5996-patched-for-appengine.jar is the complete jython
binary library, patched for app engine
 - jython-r5996-appengine.patch is the patch file that contains the
source code for the changes
"""

They provide the patches they used to make 2.5 work

http://google-appengine-java.googlegroups.com/web/jython-r5996-appengine.patch

I definitely think this is an important patch to consider for the 2.5RC!

It would be nice if Google could say Jython 2.2 works out of the box,
and jython 2.5 works out of the box.

Alan.

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Jython-dev mailing list
Jython-dev at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-dev

----------
From: *Tobias Ivarsson* <thobes at gmail.com>
Date: Wed, Apr 8, 2009 at 8:18 AM
To: Alan Kennedy <jython-dev at xhaus.com>
Cc: Jython Developers <jython-dev at lists.sourceforge.net>

Most things in that patch look ok. I'd like to do a more thorough analysis
of the implications of each change though.

The catching of SecurityException is fine, but I want to look at the places
where they drop the exceptions that they caught in their context, and make
sure that silently ignoring the exception is a valid approach. The other
changes are few but slightly more controversial.

Are Google willing to sign a contributors agreement and license this patch
to us? otherwise someone who has not looked on it yet (i.e. not me), should
probably experiment with Jython on GAE and find out what needs to be patched
to get Jython to run there.

/Tobias

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Jython-dev mailing list
Jython-dev at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-dev

----------
From: *Jim Baker* <jbaker at zyasoft.com>
Date: Wed, Apr 8, 2009 at 8:33 AM
To: Alan Kennedy <jython-dev at xhaus.com>
Cc: Jython Developers <jython-dev at lists.sourceforge.net>, jython users <
jython-users at lists.sourceforge.net>

This is the same patch set requested in http://bugs.jython.org/issue1188:
"Patch against trunk to handle SecurityExceptions". Now we know the source
of the request, and the specific application is very clear: a sandboxed
Jython, running under a fairly strict security manager.

The bug is a blocker for the release candidate, so this fix will be part of
2.5.

We would love to see more work testing the full scope of environments Jython
needs to run under, and any resulting bugs.

- Jim

-- 
Jim Baker
jbaker at zyasoft.com

----------
From: *James Robinson* <jamesr at google.com>
Date: Wed, Apr 8, 2009 at 8:30 AM
To: Tobias Ivarsson <thobes at gmail.com>
Cc: Jython Developers <jython-dev at lists.sourceforge.net>, Alan Kennedy <
jython-dev at xhaus.com>

I have a patch up on your issue tracker already, I'll ping it shortly.  It's
a very small patch and the SecurityExceptions that are caught and ignored
are treated the same as I/O exceptions in the vast majority of cases (which
they really are).

- James

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Jython-dev mailing list
Jython-dev at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-dev

----------
From: *Jim Baker* <jbaker at zyasoft.com>
Date: Wed, Apr 8, 2009 at 8:36 AM
To: James Robinson <jamesr at google.com>
Cc: Tobias Ivarsson <thobes at gmail.com>, Jython Developers <
jython-dev at lists.sourceforge.net>, Alan Kennedy <jython-dev at xhaus.com>

Right, this is a very small patch, we haven't required contributor
agreements in similar cases. I think we want to consider how to replicate
this setup however so we don't inadvertently reverse things.

- Jim

----------
From: *Tobias Ivarsson* <thobes at gmail.com>
Date: Wed, Apr 8, 2009 at 8:40 AM
To: Jim Baker <jbaker at zyasoft.com>
Cc: James Robinson <jamesr at google.com>, Jython Developers <
jython-dev at lists.sourceforge.net>, Alan Kennedy <jython-dev at xhaus.com>

Could we add a click-through agreement for patch submissions? Patches are
usually small enough to not be a big deal, but such a thing would leave us
entirely safe.

/Tobias

----------
From: *Frank Wierzbicki* <fwierzbicki at gmail.com>
Date: Wed, Apr 8, 2009 at 8:44 AM
To: Tobias Ivarsson <thobes at gmail.com>
Cc: Jython Developers <jython-dev at lists.sourceforge.net>, Alan Kennedy <
jython-dev at xhaus.com>

Google is a member of the PSF, so as long as Google wants this
contributed I think it's okay.  To be safe we should get an explicit
statement, but since the patch is small, this probably isn't strictly
necessary.  FWIW this is how my on-the-clock contributions to Jython
are protected (Sun is a member of the PSF and allows my
contributions).

-Frank

----------
From: *James Robinson* <jamesr at google.com>
Date: Wed, Apr 8, 2009 at 9:22 AM
To: Frank Wierzbicki <fwierzbicki at gmail.com>
Cc: Jython Developers <jython-dev at lists.sourceforge.net>, Alan Kennedy <
jython-dev at xhaus.com>

I submitted 1188 and I'm a Google employee working on company time.  Let me
know if anything further is needed, but we have quite a few contributors to
the Python project working here.

- James

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Jython-dev mailing list
Jython-dev at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-dev

----------
From: *Frank Wierzbicki* <fwierzbicki at gmail.com>
Date: Wed, Apr 8, 2009 at 9:33 AM
To: Tobias Ivarsson <thobes at gmail.com>
Cc: Jim Baker <jbaker at zyasoft.com>, Jython Developers <
jython-dev at lists.sourceforge.net>, Alan Kennedy <jython-dev at xhaus.com>

A click through is a very good idea, I think Jim is going to find out
what they do for CPython.

-Frank

----------
From: *Frank Wierzbicki* <fwierzbicki at gmail.com>
Date: Wed, Apr 8, 2009 at 9:32 AM
To: James Robinson <jamesr at google.com>
Cc: Jython Developers <jython-dev at lists.sourceforge.net>, Alan Kennedy <
jython-dev at xhaus.com>

Excellent, and thanks!  1188 was already slated for inclusion in our
upcoming RC, but knowing that it is in support of GAE moves it up to a
very high priority.

-- 
Jim Baker
jbaker at zyasoft.com

-- 
Jim Baker
jbaker at zyasoft.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090408/6ad9d250/attachment-0001.htm>

From aahz at pythoncraft.com  Wed Apr  8 18:14:59 2009
From: aahz at pythoncraft.com (Aahz)
Date: Wed, 8 Apr 2009 09:14:59 -0700
Subject: [Python-Dev] Update PEP 374 (DVCS)
Message-ID: <20090408161459.GA24661@panix.com>

Someone listed this URL on c.l.py and I thought it would make a good
reference addition to PEP 374 (DVCS decision):

http://www.catb.org/~esr/writings/version-control/version-control.html
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"...string iteration isn't about treating strings as sequences of strings, 
it's about treating strings as sequences of characters.  The fact that
characters are also strings is the reason we have problems, but characters 
are strings for other good reasons."  --Aahz

From guido at python.org  Wed Apr  8 19:51:55 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 8 Apr 2009 10:51:55 -0700
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com> 
	<grgf59$j7c$1@ger.gmane.org>
	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> 
	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com> 
	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>
Message-ID: <ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>

On Wed, Apr 8, 2009 at 12:17 AM, Michele Simionato
<michele.simionato at gmail.com> wrote:
> On Wed, Apr 8, 2009 at 8:10 AM, Jack diederich <jackdied at gmail.com> wrote:
>> Plus he's a softie for decorators, as am I.

This worries me a bit.

There was a remark (though perhaps meant humorously) in Michele's page
about decorators that worried me too: "For instance, typical
implementations of decorators involve nested functions, and we all
know that flat is better than nested." I find the nested-function
pattern very clear and easy to grasp, whereas I find using another
decorator (a meta-decorator?) to hide this pattern unnecessarily
obscuring what's going on.

I also happen to disagree in many cases with decorators that attempt
to change the signature of the wrapper function to that of the wrapped
function. While this may make certain kinds of introspection possible,
again it obscures what's going on to a future maintainer of the code,
and the cleverness can get in the way of good old-fashioned debugging.

> I must admit that while I still like decorators, I do like them as
> much as in the past.
> I also see an overuse of decorators in various libraries for things that could
> be done more clearly without them ;-(

Right.

> But this is tangential.

(All this BTW is not to say that I don't trust you with commit
privileges if you were to be interested in contributing. I just don't
think that adding that particular decorator module to the stdlib would
be wise. It can be debated though.)

> What I would really like to know is the future of PEP 362, i.e. having
> a signature object that could be taken from an undecorated function
> and added to the decorated function.
> I do not recall people having anything against it, in principle,
> and there is also an implementation in the sandbox, but
> after three years nothing happened. I guess this is just not
> a high priority for the core developers.

That's likely true. To me, introspection is mostly useful for certain
situations like debugging or interactively finding help, but I would
hesitate to build a large amount of stuff (whether a library,
framework or application) on systematic use of introspection. In fact,
I rarely use the inspect module and had to type help(inspect) to
figure out what you meant by "signature". :-) I guess one reason is
that in my mind, and in the way I tend to write code, I don't write
APIs that require introspection -- for example, I don't like APIs that
do different things when given a "callable" as opposed to something
else (common practices in web frameworks notwithstanding), and
thinking about it I would like it even less if an API cared about the
*actual* signature of a function I pass into it. I like APIs that say,
for example, "argument 'f' must be a function of two arguments, an int
and a string," and then I assume that if I pass it something for 'f'
it will try to call that something with an int and a string. If I pass
it something else, well, I'll get a type error. But it gives me the
freedom to pass something that doesn't even have a signature but
happens to be callable in that way regardless (e.g. a bound method of
a built-in type).

I will probably regret saying this. So be it. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Wed Apr  8 20:30:02 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Apr 2009 20:30:02 +0200
Subject: [Python-Dev] slightly inconsistent set/list pop behaviour
In-Reply-To: <b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>
References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com>	<4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com>	<5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com>
	<b8e622740904080357n47f76cf8pc0ff053a55b0f2a5@mail.gmail.com>
Message-ID: <49DCED2A.4090401@v.loewis.de>

>>>>> foo = set([1, 65537])
>>>>> foo.pop()
>> 1
>>>>> foo = set([65537, 1])
>>>>> foo.pop()
>> 65537
> 
> You wrote a program to find the two smallest ints that would have a
> hash collision in the CPython set implementation?  I'm impressed.  And
> by impressed I mean frightened.

Well, Mark is the guy who deals with floating point numbers for fun.
*That* should frighten you :-)

Martin

From martin at v.loewis.de  Wed Apr  8 20:33:35 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Apr 2009 20:33:35 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <loom.20090408T110540-221@post.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>
Message-ID: <49DCEDFF.7050708@v.loewis.de>

> We're in the process of forward-porting the recent (massive) json updates to
> 3.1, and we are also thinking of dropping remnants of support of the bytes type
> in the json library (in 3.1, again). This bytes support almost didn't work at
> all, but there was a lot of C and Python code for it nevertheless. We're also
> thinking of dropping the "encoding" argument in the various APIs, since it is
> useless.
> 
> Under the new situation, json would only ever allow str as input, and output str
> as well. By posting here, I want to know whether anybody would oppose this
> (knowing, once again, that bytes support is already broken in the current py3k
> trunk).

What does Bob Ippolito think about this change? IIUC, he considers
simplejson's speed one of its primary advantages, and also attributes it
to the fact that he can parse directly out of byte strings, and marshal
into them (which is important, as you typically receive them over the
wire). Having to run them through a codec slows parsing down.

Regards,
Martin

From martin at v.loewis.de  Wed Apr  8 20:37:39 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 08 Apr 2009 20:37:39 +0200
Subject: [Python-Dev] Contributor Agreements for Patches - was
 [Jython-dev] Jython on Google AppEngine!
In-Reply-To: <d03bb4010904080850h7eece089i1e279b4a5f01cbb3@mail.gmail.com>
References: <d03bb4010904080850h7eece089i1e279b4a5f01cbb3@mail.gmail.com>
Message-ID: <49DCEEF3.7020603@v.loewis.de>

>     * What is the scope of a patch that requires a contributor
>       agreement?

Unfortunately, that question was never fully answered (or I forgot
what the answer was).

>     * Do Google employees, working on company time, automatically get
>       treated as contributors with existing contributor agreements on
>       file with the PSF?

Yes, they do.

>       If so, are there are other companies that
>       automatically get this treatment?

Not that I know of.

>     * Should we change the workflow for roundup to make this assignment
>       of license clearer (see Tobias's idea in the thread about a
>       click-though agreement).

I think we do need something written; a lawyer may be able to tell
precisely.

I still hope that we can record, in the tracker, which contributors have
signed an agreement.

> In these matters, Jython, as a project under the Python Software
> Foundation, intends to follow the same policy as CPython.

Please keep pushing. From this message alone, I find two questions
to the lawyer, and one (possibly two) feature requests for the bug
tracker.

Regards,
Martin

From pje at telecommunity.com  Wed Apr  8 20:41:13 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 08 Apr 2009 14:41:13 -0400
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.co
 m>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
	<grgf59$j7c$1@ger.gmane.org>
	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>
	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>
	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>
	<ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>
Message-ID: <20090408183847.180053A4063@sparrow.telecommunity.com>

At 10:51 AM 4/8/2009 -0700, Guido van Rossum wrote:
>I would like it even less if an API cared about the
>*actual* signature of a function I pass into it.

One notable use of callable argument inspection is Bobo, the 
12-years-ago predecessor to Zope, which used argument information to 
determine form or query string parameter names.  (Were Bobo being 
written for the first time today for Python 3, I imagine it would use 
argument annotations to specify types, instead of requiring them to 
be in the client-side field names.)

Bobo, of course, is just a single case of the general pattern of 
tools that expose a callable to some other (possibly 
explicitly-typed) system.  E.g., wrapping Python functions for 
exposure to C, Java, .NET, CORBA, SOAP, etc.

Anyway, it's nice for decorators to be transparent to inspection when 
the decorator doesn't actually modify the calling signature, so that 
you can then use your decorated functions with tools like the above.

From alex.neundorf at kitware.com  Wed Apr  8 21:45:18 2009
From: alex.neundorf at kitware.com (Alexander Neundorf)
Date: Wed, 8 Apr 2009 21:45:18 +0200
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>
	<49DBD6F9.7030502@canterbury.ac.nz>
	<806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>
	<5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com>
Message-ID: <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com>

On Wed, Apr 8, 2009 at 4:18 AM, David Cournapeau <cournape at gmail.com> wrote:
...
>> I guess something similar could be useful for Python, maybe this is
>> what distutils actually do ?
>
> distutils does roughly everything that autotools does, and more:
>  - configuration: not often used in extensions, we (numpy) are the
> exception I would guess
>  - build
>  - installation
>  - tarball generation
>  - bdist_ installers (msi, .exe on windows, .pkg/.mpkg on mac os x,
> rpm/deb on Linux)

I think cmake can do all of the above (cpack supports creating packages).

>  - registration to pypi

No idea what this is .

Alex

From skip at pobox.com  Wed Apr  8 21:53:08 2009
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 8 Apr 2009 14:53:08 -0500
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>
	<49DBD6F9.7030502@canterbury.ac.nz>
	<806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>
	<5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com>
	<806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com>
Message-ID: <18909.164.374915.626585@montanaro.dyndns.org>

    >> - registration to pypi

    Alex> No idea what this is .

http://pypi.python.org/

It is, in some ways, a CPAN-like system for Python.

Skip

From ncoghlan at gmail.com  Wed Apr  8 23:40:44 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 09 Apr 2009 07:40:44 +1000
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <20090408183847.180053A4063@sparrow.telecommunity.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>	<grgf59$j7c$1@ger.gmane.org>	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>	<ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>
	<20090408183847.180053A4063@sparrow.telecommunity.com>
Message-ID: <49DD19DC.20204@gmail.com>

P.J. Eby wrote:
> Anyway, it's nice for decorators to be transparent to inspection when
> the decorator doesn't actually modify the calling signature, so that you
> can then use your decorated functions with tools like the above.

If anyone wanted to take PEP 362 up again, we could easily add a
__signature__ attribute to functools.update_wrapper. It may be too late
to hammer it into shape for 3.1/2.7 though (I don't recall how far the
PEP was from being ready for prime time) .

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From eric at trueblade.com  Thu Apr  9 01:07:45 2009
From: eric at trueblade.com (Eric Smith)
Date: Wed, 08 Apr 2009 19:07:45 -0400
Subject: [Python-Dev] Deprecating PyOS_ascii_formatd
Message-ID: <49DD2E41.80401@trueblade.com>

Assuming that Mark's and my changes in the py3k-short-float-repr branch 
get checked in shortly, I'd like to deprecate PyOS_ascii_formatd. Its 
functionality is largely being replaced by PyOS_double_to_string, which 
we're introducing on our branch.

PyOS_ascii_formatd was introduced to fix the issue in PEP 331. 
PyOS_double_to_string addresses all of the same issues, namely a 
non-locale aware double-to-string conversion. PyOS_ascii_formatd has an 
unfortunate interface. It accepts a printf-like format string for a 
single double parameter. It must parse the format string into the 
parameters it uses. All uses of it inside Python already know the 
parameters and must build up a format string using sprintf, only to turn 
around and have PyOS_ascii_formatd reparse it.

In the branch I've replaced all of the internal calls to 
PyOS_ascii_format with PyOS_double_to_string.

My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in 3.2.

The 2.7 situation is tricker, because we're not planning on backporting 
the short-float-repr work back to 2.7. In 2.7 I guess we'll leave 
PyOS_ascii_formatd around, unfortunately.

FWIW, I didn't find any external callers of it using Google code search.

And as a reminder, the py3k-short-float-repr changes are on Rietveld at 
http://codereview.appspot.com/33084/show. So far, no comments.

From thobes at gmail.com  Thu Apr  9 01:10:50 2009
From: thobes at gmail.com (Tobias Ivarsson)
Date: Thu, 9 Apr 2009 01:10:50 +0200
Subject: [Python-Dev] Contributor Agreements for Patches - was
	[Jython-dev] Jython on Google AppEngine!
In-Reply-To: <49DCEEF3.7020603@v.loewis.de>
References: <d03bb4010904080850h7eece089i1e279b4a5f01cbb3@mail.gmail.com>
	<49DCEEF3.7020603@v.loewis.de>
Message-ID: <9997d5e60904081610q306746a7x710a4edb804e1cda@mail.gmail.com>

On Wed, Apr 8, 2009 at 8:37 PM, "Martin v. L?wis" <martin at v.loewis.de>wrote:
--8<--

> >     * Should we change the workflow for roundup to make this assignment
> >       of license clearer (see Tobias's idea in the thread about a
> >       click-though agreement).
>
> I think we do need something written; a lawyer may be able to tell
> precisely.

The company I work for does open source development. And our lawyers said
that our model of having contributors send an e-mail with the text "I agree"
and our CLA as an attachment was perfectly valid, no hand written signature
needed. From there the step to a click through for something as simple as a
patch isn't too far. But I would not claim that I know any of these things,
I'm just hoping that we can have a simple process with no legal gray areas.

>
>
> I still hope that we can record, in the tracker, which contributors have
> signed an agreement.

That would be good.

Cheers,
Tobias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/7496bdf7/attachment.htm>

From solipsis at pitrou.net  Thu Apr  9 01:31:31 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 8 Apr 2009 23:31:31 +0000 (UTC)
Subject: [Python-Dev] Dropping bytes "support" in json
References: <loom.20090408T110540-221@post.gmane.org>
	<49DCEDFF.7050708@v.loewis.de>
Message-ID: <loom.20090408T231751-930@post.gmane.org>

Martin v. L?wis <martin <at> v.loewis.de> writes:
> 
> What does Bob Ippolito think about this change? IIUC, he considers
> simplejson's speed one of its primary advantages, and also attributes it
> to the fact that he can parse directly out of byte strings, and marshal
> into them (which is important, as you typically receive them over the
> wire).

The only thing I know is that the new version (the one I've tried to merge) is
massively faster than the old one - several times faster - and within 20-30% of
the speed of the 2.x version (*). Besides, Bob doesn't really seem to care about
porting to py3k (he hasn't said anything about it until now, other than that he
didn't feel competent to do it). But I'm happy with someone proposing an
alternate patch if they want to. As for me, I just wanted to fill the gap and
I'm not interested in doing lot of work on this issue.

(*)

timeit -s "import json; l=['abc']*100" "json.dumps(l)"

-> trunk: 33.4 usec per loop
-> py3k + patch: 37.1 usec per loop
-> vanilla py3k: 314 usec per loop

timeit -s "import json; s=json.dumps(['abc']*100)" "json.loads(s)"

-> trunk: 44.8 usec per loop
-> py3k + patch: 35.4 usec per loop
-> vanilla py3k: 1.48 msec per loop (!)

Regards

Antoine.

From guido at python.org  Thu Apr  9 02:36:20 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 8 Apr 2009 17:36:20 -0700
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <loom.20090408T110540-221@post.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>
Message-ID: <ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>

On Wed, Apr 8, 2009 at 4:10 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> We're in the process of forward-porting the recent (massive) json updates to
> 3.1, and we are also thinking of dropping remnants of support of the bytes type
> in the json library (in 3.1, again). This bytes support almost didn't work at
> all, but there was a lot of C and Python code for it nevertheless. We're also
> thinking of dropping the "encoding" argument in the various APIs, since it is
> useless.
>
> Under the new situation, json would only ever allow str as input, and output str
> as well. By posting here, I want to know whether anybody would oppose this
> (knowing, once again, that bytes support is already broken in the current py3k
> trunk).
>
> The bug entry is: http://bugs.python.org/issue4136

I'm kind of surprised that a serialization protocol like JSON wouldn't
support reading/writing bytes (as the serialized format -- I don't
care about having bytes as values, since JavaScript doesn't have
something equivalent AFAIK, and hence JSON doesn't allow it IIRC).
Marshal and Pickle, for example, *always* treat the serialized format
as bytes. And since in most cases it will be sent over a socket, at
some point the serialized representation *will* be bytes, I presume.
What makes supporting this hard?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From cournape at gmail.com  Thu Apr  9 03:57:44 2009
From: cournape at gmail.com (David Cournapeau)
Date: Thu, 9 Apr 2009 10:57:44 +0900
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>
	<49DBD6F9.7030502@canterbury.ac.nz>
	<806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>
	<5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com>
	<806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com>
Message-ID: <5b8d13220904081857w46237b57t82d8a4006f00adbb@mail.gmail.com>

On Thu, Apr 9, 2009 at 4:45 AM, Alexander Neundorf
<alex.neundorf at kitware.com> wrote:

> I think cmake can do all of the above (cpack supports creating packages).

I am sure it is - it is just a lot of work, specially if you want to
stay compatible with distutils-built extensions :)

cheers,

David

From michele.simionato at gmail.com  Thu Apr  9 06:31:41 2009
From: michele.simionato at gmail.com (Michele Simionato)
Date: Thu, 9 Apr 2009 06:31:41 +0200
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
	<grgf59$j7c$1@ger.gmane.org>
	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>
	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>
	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>
	<ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>
Message-ID: <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com>

On Wed, Apr 8, 2009 at 7:51 PM, Guido van Rossum <guido at python.org> wrote:
>
> There was a remark (though perhaps meant humorously) in Michele's page
> about decorators that worried me too: "For instance, typical
> implementations of decorators involve nested functions, and we all
> know that flat is better than nested." I find the nested-function
> pattern very clear and easy to grasp, whereas I find using another
> decorator (a meta-decorator?) to hide this pattern unnecessarily
> obscuring what's going on.

I understand your point and I will freely admit that I have always had mixed
feelings about the advantages of a meta decorator with
respect to plain simple nested functions. I see pros and contras.
If functools.update_wrapper could preserve the signature I
would probably use it over the decorator module.

> I also happen to disagree in many cases with decorators that attempt
> to change the signature of the wrapper function to that of the wrapped
> function. While this may make certain kinds of introspection possible,
> again it obscures what's going on to a future maintainer of the code,
> and the cleverness can get in the way of good old-fashioned debugging.

Then perhaps you misunderstand the goal of the decorator module.
The raison d'etre of the module is to PRESERVE the signature:
update_wrapper unfortunately *changes* it.

When confronted with a library which I do not not know, I often run
over it pydoc, or
sphinx, or a custom made documentation tool, to extract the
signature of functions. For instance, if I see a method
get_user(self, username) I have a good hint about what it is supposed
to do. But if the library (say a web framework) uses non signature-preserving
decorators, my documentation tool says to me that there is function
get_user(*args, **kwargs) which frankly is not enough [this is the
optimistic case, when the author of the decorator has taken care
to preserve the name of the original function].
 I *hate* losing information about the true signature of functions, since I also
use a lot IPython, Python help, etc.

>> I must admit that while I still like decorators, I do like them as
>> much as in the past.

Of course there was a missing NOT in this sentence, but you all understood
the intended meaning.

> (All this BTW is not to say that I don't trust you with commit
> privileges if you were to be interested in contributing. I just don't
> think that adding that particular decorator module to the stdlib would
> be wise. It can be debated though.)

Fine. As I have repeated many time that particular module was never
meant for inclusion in the standard library. But I feel strongly about
the possibility of being able to preserve (not change!) the function
signature.

> To me, introspection is mostly useful for certain
> situations like debugging or interactively finding help, but I would
> hesitate to build a large amount of stuff (whether a library,
> framework or application) on systematic use of introspection. In fact,
> I rarely use the inspect module and had to type help(inspect) to
> figure out what you meant by "signature". :-) I guess one reason is
> that in my mind, and in the way I tend to write code, I don't write
> APIs that require introspection -- for example, I don't like APIs that
> do different things when given a "callable" as opposed to something
> else (common practices in web frameworks notwithstanding), and
> thinking about it I would like it even less if an API cared about the
> *actual* signature of a function I pass into it. I like APIs that say,
> for example, "argument 'f' must be a function of two arguments, an int
> and a string," and then I assume that if I pass it something for 'f'
> it will try to call that something with an int and a string. If I pass
> it something else, well, I'll get a type error. But it gives me the
> freedom to pass something that doesn't even have a signature but
> happens to be callable in that way regardless (e.g. a bound method of
> a built-in type).

I do not think everybody disagree with your point here. My point still
stands, though: objects should not lie about their signature, especially
during  debugging and when generating documentation from code.

From solipsis at pitrou.net  Thu Apr  9 07:15:09 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 9 Apr 2009 05:15:09 +0000 (UTC)
Subject: [Python-Dev] Dropping bytes "support" in json
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
Message-ID: <loom.20090409T043042-835@post.gmane.org>

Guido van Rossum <guido <at> python.org> writes:
> 
> I'm kind of surprised that a serialization protocol like JSON wouldn't
> support reading/writing bytes (as the serialized format -- I don't
> care about having bytes as values, since JavaScript doesn't have
> something equivalent AFAIK, and hence JSON doesn't allow it IIRC).
> Marshal and Pickle, for example, *always* treat the serialized format
> as bytes. And since in most cases it will be sent over a socket, at
> some point the serialized representation *will* be bytes, I presume.
> What makes supporting this hard?

It's not hard, it just means a lot of duplicated code if the library wants to
support both str and bytes in an optimized way as Martin alluded to. This
duplicated code already exists in the C parts to support the 2.x semantics of
accepting unicode objects as well as str, but not in the Python parts, which
explains why the bytes support is broken in py3k - in 2.x, the same Python code
can be used for str and unicode.

On the other hand, supporting it without going after the last percents of
performance should be fairly trivial (by encoding/decoding before doing the
processing proper), and it would avoid the current duplicated code.

As for reading/writing bytes over the wire, JSON is often used in the same
context as HTML: you are supposed to know the charset and decode/encode the
payload using that charset. However, the RFC specifies a default encoding of
utf-8. (*)

(*) http://www.ietf.org/rfc/rfc4627.txt

The RFC also specifies a discrimination algorithm for non-supersets of ASCII
(?Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.?), but it is not
implemented in the json module:

>>> json.loads('"hi"')
'hi'
>>> json.loads(u'"hi"'.encode('utf16'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads
    return _default_decoder.decode(s)
  File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

Regards

Antoine.

From martin at v.loewis.de  Thu Apr  9 07:55:20 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 09 Apr 2009 07:55:20 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <loom.20090408T231751-930@post.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>	<49DCEDFF.7050708@v.loewis.de>
	<loom.20090408T231751-930@post.gmane.org>
Message-ID: <49DD8DC8.8020302@v.loewis.de>

> Besides, Bob doesn't really seem to care about
> porting to py3k (he hasn't said anything about it until now, other than that he
> didn't feel competent to do it).

That is quite unfortunate, and suggests that perhaps the module
shouldn't have been added to Python in the first place.

I can understand that you don't want to spend much time on it. How
about removing it from 3.1? We could re-add it when long-term support
becomes more likely.

Regards,
Martin

From python at rcn.com  Thu Apr  9 09:16:24 2009
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 9 Apr 2009 00:16:24 -0700
Subject: [Python-Dev] Dropping bytes "support" in json
References: <loom.20090408T110540-221@post.gmane.org>	<49DCEDFF.7050708@v.loewis.de><loom.20090408T231751-930@post.gmane.org>
	<49DD8DC8.8020302@v.loewis.de>
Message-ID: <351C98023EB24D4EACF1794F7CDAC272@RaymondLaptop1>

[Antoine Pitrou]
>> Besides, Bob doesn't really seem to care about
>> porting to py3k (he hasn't said anything about it until now, other than that he
>> didn't feel competent to do it).

His actual words were: "I will need some help with 3.0 since I am not well versed in the changes to the C API or Python code for 
that, but merging for 2.6.1 should be no big deal."

[MvL]
> That is quite unfortunate, and suggests that perhaps the module
> shouldn't have been added to Python in the first place.

Bob participated actively in http://bugs.python.org/issue4136 and was responsive to detailed patch review.  He gave a popular talk 
at PyCon less than two weeks ago.  He's not derelict.

> I can understand that you don't want to spend much time on it. How
> about removing it from 3.1? We could re-add it when long-term support
> becomes more likely.

I'm speechless.

Raymond 

From dirkjan at ochtman.nl  Thu Apr  9 09:59:56 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Thu, 9 Apr 2009 09:59:56 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <loom.20090409T043042-835@post.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
Message-ID: <ea2499da0904090059k6ca30d4aq7b1b6f9d4662662d@mail.gmail.com>

On Thu, Apr 9, 2009 at 07:15, Antoine Pitrou <solipsis at pitrou.net> wrote:
> The RFC also specifies a discrimination algorithm for non-supersets of ASCII
> (?Since the first two characters of a JSON text will always be ASCII
> ? characters [RFC0020], it is possible to determine whether an octet
> ? stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
> ? at the pattern of nulls in the first four octets.?), but it is not
> implemented in the json module:

Well, your example is bad in the context of the RFC. The RFC states
that JSON-text = object / array, meaning "loads" for '"hi"' isn't
strictly valid. The discrimination algorithm obviously only works in
the context of that grammar, where the first character of a document
must be { or [ and the next character can only be {, [, f, n, t, ", -,
a number, or insignificant whitespace (space, \t, \r, \n).

>>>> json.loads('"hi"')
> 'hi'
>>>> json.loads(u'"hi"'.encode('utf16'))
> Traceback (most recent call last):
> ?File "<stdin>", line 1, in <module>
> ?File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads
> ? ?return _default_decoder.decode(s)
> ?File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode
> ? ?obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> ?File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode
> ? ?raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded

Cheers,

Dirkjan

From ncoghlan at gmail.com  Thu Apr  9 12:54:39 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 09 Apr 2009 20:54:39 +1000
Subject: [Python-Dev] Deprecating PyOS_ascii_formatd
In-Reply-To: <49DD2E41.80401@trueblade.com>
References: <49DD2E41.80401@trueblade.com>
Message-ID: <49DDD3EF.2010501@gmail.com>

Eric Smith wrote:
> And as a reminder, the py3k-short-float-repr changes are on Rietveld at
> http://codereview.appspot.com/33084/show. So far, no comments.

I skipped over the actual number crunching parts (the test suite will do
a better job than I will of telling you whether or not you have those
parts correct), but I had a look at the various other changes to make
use of the new API.

Looks like you were able to delete some fairly respectable chunks of
redundant code!

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From barry at python.org  Thu Apr  9 13:01:19 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Apr 2009 07:01:19 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <loom.20090409T043042-835@post.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
Message-ID: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Apr 9, 2009, at 1:15 AM, Antoine Pitrou wrote:

> Guido van Rossum <guido <at> python.org> writes:
>>
>> I'm kind of surprised that a serialization protocol like JSON  
>> wouldn't
>> support reading/writing bytes (as the serialized format -- I don't
>> care about having bytes as values, since JavaScript doesn't have
>> something equivalent AFAIK, and hence JSON doesn't allow it IIRC).
>> Marshal and Pickle, for example, *always* treat the serialized format
>> as bytes. And since in most cases it will be sent over a socket, at
>> some point the serialized representation *will* be bytes, I presume.
>> What makes supporting this hard?
>
> It's not hard, it just means a lot of duplicated code if the library  
> wants to
> support both str and bytes in an optimized way as Martin alluded to.  
> This
> duplicated code already exists in the C parts to support the 2.x  
> semantics of
> accepting unicode objects as well as str, but not in the Python  
> parts, which
> explains why the bytes support is broken in py3k - in 2.x, the same  
> Python code
> can be used for str and unicode.

This is an interesting question, and something I'm struggling with for  
the email package for 3.x.  It turns out to be pretty convenient to  
have both a bytes and a string API, both for input and output, but I  
think email really wants to be represented internally as bytes.   
Maybe.  Or maybe just for content bodies and not headers, or maybe  
both.  Anyway, aside from that decision, I haven't come up with an  
elegant way to allow /output/ in both bytes and strings (input is I  
think theoretically easier by sniffing the arguments).

Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSd3Vf3EjvBPtnXfVAQKyNgQApNmI5hh9heTYynyADYaDkP8wzZFXUpgg
cKYL741MbLpOFn3IFGAGaRWBQe4Dt8i4CiIEIbg3X7QZqwQJjoTtFwxsJKmXFd1M
JR0oCB8Du2kE5YzD+avrEp+d8zwl2goxvzD9dJwziBav5V98w7PMiZc3sApklQFD
gNYzbHEOfv4=
=tjGr
-----END PGP SIGNATURE-----

From ncoghlan at gmail.com  Thu Apr  9 13:06:21 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 09 Apr 2009 21:06:21 +1000
Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382:
 Namespace Packages)
In-Reply-To: <49DBA78F.7010904@v.loewis.de>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>	<49DB4624.604@egenix.com>
	<49DBA78F.7010904@v.loewis.de>
Message-ID: <49DDD6AD.9020708@gmail.com>

Martin v. L?wis wrote:
>> Such a policy would then translate to a dead end for Python 2.x
>> based applications.
> 
> 2.x based applications *are* in a dead end, with the only exit
> being portage to 3.x.

The actual end of the dead end just happens to be in 2013 or so :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Thu Apr  9 13:10:22 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 9 Apr 2009 11:10:22 +0000 (UTC)
Subject: [Python-Dev] Dropping bytes "support" in json
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<ea2499da0904090059k6ca30d4aq7b1b6f9d4662662d@mail.gmail.com>
Message-ID: <loom.20090409T110938-916@post.gmane.org>

Dirkjan Ochtman <dirkjan <at> ochtman.nl> writes:
> 
> The RFC states
> that JSON-text = object / array, meaning "loads" for '"hi"' isn't
> strictly valid.

Sure, but then:

>>> json.loads('[]')
[]
>>> json.loads(u'[]'.encode('utf16'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads
    return _default_decoder.decode(s)
  File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

Cheers

Antoine.

From eric at trueblade.com  Thu Apr  9 13:56:21 2009
From: eric at trueblade.com (Eric Smith)
Date: Thu, 09 Apr 2009 07:56:21 -0400
Subject: [Python-Dev] Deprecating PyOS_ascii_formatd
In-Reply-To: <49DDD3EF.2010501@gmail.com>
References: <49DD2E41.80401@trueblade.com> <49DDD3EF.2010501@gmail.com>
Message-ID: <49DDE265.4070605@trueblade.com>

Nick Coghlan wrote:
> Eric Smith wrote:
>> And as a reminder, the py3k-short-float-repr changes are on Rietveld at
>> http://codereview.appspot.com/33084/show. So far, no comments.

> Looks like you were able to delete some fairly respectable chunks of
> redundant code!

Wait until you see how much nasty code gets deleted when I can actually 
remove PyOS_ascii_formatd!

And thanks for your comments on Rietveld, especially catching the memory 
leak.

Eric.

From dirkjan at ochtman.nl  Thu Apr  9 14:02:43 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Thu, 9 Apr 2009 14:02:43 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <loom.20090409T110938-916@post.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<ea2499da0904090059k6ca30d4aq7b1b6f9d4662662d@mail.gmail.com>
	<loom.20090409T110938-916@post.gmane.org>
Message-ID: <ea2499da0904090502l4b953787p6e07ba422f2ffc11@mail.gmail.com>

On Thu, Apr 9, 2009 at 13:10, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Sure, but then:
>
>>>> json.loads('[]')
> []
>>>> json.loads(u'[]'.encode('utf16'))
> Traceback (most recent call last):
> ?File "<stdin>", line 1, in <module>
> ?File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads
> ? ?return _default_decoder.decode(s)
> ?File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode
> ? ?obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> ?File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode
> ? ?raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded

Right. :) Just wanted to point your test might not be testing what you
want to test.

Cheers,

Dirkjan

From steve at holdenweb.com  Thu Apr  9 14:07:15 2009
From: steve at holdenweb.com (Steve Holden)
Date: Thu, 09 Apr 2009 08:07:15 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
Message-ID: <grkodk$j4p$1@ger.gmane.org>

Barry Warsaw wrote:
> On Apr 9, 2009, at 1:15 AM, Antoine Pitrou wrote:
> 
>> Guido van Rossum <guido <at> python.org> writes:
>>>
>>> I'm kind of surprised that a serialization protocol like JSON wouldn't
>>> support reading/writing bytes (as the serialized format -- I don't
>>> care about having bytes as values, since JavaScript doesn't have
>>> something equivalent AFAIK, and hence JSON doesn't allow it IIRC).
>>> Marshal and Pickle, for example, *always* treat the serialized format
>>> as bytes. And since in most cases it will be sent over a socket, at
>>> some point the serialized representation *will* be bytes, I presume.
>>> What makes supporting this hard?
> 
>> It's not hard, it just means a lot of duplicated code if the library
>> wants to
>> support both str and bytes in an optimized way as Martin alluded to. This
>> duplicated code already exists in the C parts to support the 2.x
>> semantics of
>> accepting unicode objects as well as str, but not in the Python parts,
>> which
>> explains why the bytes support is broken in py3k - in 2.x, the same
>> Python code
>> can be used for str and unicode.
> 
> This is an interesting question, and something I'm struggling with for
> the email package for 3.x.  It turns out to be pretty convenient to have
> both a bytes and a string API, both for input and output, but I think
> email really wants to be represented internally as bytes.  Maybe.  Or
> maybe just for content bodies and not headers, or maybe both.  Anyway,
> aside from that decision, I haven't come up with an elegant way to allow
> /output/ in both bytes and strings (input is I think theoretically
> easier by sniffing the arguments).
> 
The real problem I came across in storing email in a relational database
was the inability to store messages as Unicode. Some messages have a
body in one encoding and an attachment in another, so the only ways to
store the messages are either as a monolithic bytes string that gets
parsed when the individual components are required or as a sequence of
components in the database's preferred encoding (if you want to keep the
original encoding most relational databases won't be able to help unless
you store the components as bytes).

All in all, as you might expect from a system that's been growing up
since 1970 or so, it can be quite intractable.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Watch PyCon on video now!          http://pycon.blip.tv/

From ncoghlan at gmail.com  Thu Apr  9 14:11:28 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 09 Apr 2009 22:11:28 +1000
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>	<grgf59$j7c$1@ger.gmane.org>	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>	<ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>
	<4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com>
Message-ID: <49DDE5F0.4070000@gmail.com>

Michele Simionato wrote:
> On Wed, Apr 8, 2009 at 7:51 PM, Guido van Rossum <guido at python.org> wrote:
>> There was a remark (though perhaps meant humorously) in Michele's page
>> about decorators that worried me too: "For instance, typical
>> implementations of decorators involve nested functions, and we all
>> know that flat is better than nested." I find the nested-function
>> pattern very clear and easy to grasp, whereas I find using another
>> decorator (a meta-decorator?) to hide this pattern unnecessarily
>> obscuring what's going on.
> 
> I understand your point and I will freely admit that I have always had mixed
> feelings about the advantages of a meta decorator with
> respect to plain simple nested functions. I see pros and contras.
> If functools.update_wrapper could preserve the signature I
> would probably use it over the decorator module.

Yep, update_wrapper was a compromise along the lines of "well, at least
we can make sure the relevant metadata refers to the original function
rather than the relatively uninteresting wrapper, even if the signature
itself is lost". The idea being that you can often figure out the
signature from the doc string even when introspection has been broken by
an intervening wrapper.

One of my hopes for PEP 362 was that I would be able to just add
__signature__ to the list of copied attributes, but that PEP is
currently short a champion to work through the process of resolving the
open issues and creating an up to date patch (Brett ended up with too
many things on his plate so he wasn't able to do it, and nobody else has
offered to take it over).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Thu Apr  9 14:17:37 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 09 Apr 2009 22:17:37 +1000
Subject: [Python-Dev] Mercurial?
In-Reply-To: <49DA7C91.6010202@v.loewis.de>
References: <20090404154049.GA23987@panix.com>	<acd65fa20904042255x737752d1x9bef0c5c317bea0d@mail.gmail.com>	<49D87CD4.1000909@ochtman.nl>	<acd65fa20904050642w568f6543x5332c91c10f22703@mail.gmail.com>	<49D8BC81.7040007@ochtman.nl>
	<49D9EB15.8070806@gmail.com> <49DA7C91.6010202@v.loewis.de>
Message-ID: <49DDE761.9000206@gmail.com>

Martin v. L?wis wrote:
> Nick Coghlan wrote:
>> Dirkjan Ochtman wrote:
>>> I have a stab at an author map at http://dirkjan.ochtman.nl/author-map.
>>> Could use some review, but it seems like a good start.
>> Martin may be able to provide a better list of names based on the
>> checkin name<->SSH public key mapping in the SVN setup.
> 
> I think the identification in the SSH keys is useless. It contains
> strings like "loewis at mira" or "ncoghlan at uberwald", or even multiple
> of them (barry at wooz, barry at resist, ...).

Ah, I forgot our SVN accounts weren't linked up to our email addresses.
I guess that means the existing list won't be as useful as I thought it
might be.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From michele.simionato at gmail.com  Thu Apr  9 14:18:51 2009
From: michele.simionato at gmail.com (Michele Simionato)
Date: Thu, 9 Apr 2009 14:18:51 +0200
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <49DDE5F0.4070000@gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
	<grgf59$j7c$1@ger.gmane.org>
	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>
	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>
	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>
	<ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>
	<4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com>
	<49DDE5F0.4070000@gmail.com>
Message-ID: <4edc17eb0904090518s62db9461hb190f0db29abb871@mail.gmail.com>

On Thu, Apr 9, 2009 at 2:11 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> One of my hopes for PEP 362 was that I would be able to just add
> __signature__ to the list of copied attributes, but that PEP is
> currently short a champion to work through the process of resolving the
> open issues and creating an up to date patch (Brett ended up with too
> many things on his plate so he wasn't able to do it, and nobody else has
> offered to take it over).

I am totally ignorant about the internals of Python and I cannot certainly
take that role. But I would like to hear from Guido if he wants to support
a __signature__ object or if he does not care. In the first case
I think somebody will take the job, in the second case it is better to
reject the PEP and be done with it.

From aahz at pythoncraft.com  Thu Apr  9 14:53:12 2009
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 9 Apr 2009 05:53:12 -0700
Subject: [Python-Dev] Adding new features to Python 2.x (PEP
	382:	Namespace Packages)
In-Reply-To: <49DDD6AD.9020708@gmail.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
	<49DB4624.604@egenix.com> <49DBA78F.7010904@v.loewis.de>
	<49DDD6AD.9020708@gmail.com>
Message-ID: <20090409125312.GB1909@panix.com>

On Thu, Apr 09, 2009, Nick Coghlan wrote:
>
> Martin v. L?wis wrote:
>>> Such a policy would then translate to a dead end for Python 2.x
>>> based applications.
>> 
>> 2.x based applications *are* in a dead end, with the only exit
>> being portage to 3.x.
> 
> The actual end of the dead end just happens to be in 2013 or so :)

More like 2016 or 2020 -- as of January, my former employer was still
using Python 2.3, and I wouldn't be surprised if 1.5.2 was still out in
the wilds.  The transition to 3.x is more extreme, and lots of people
will continue making do for years after any formal support is dropped.

Whether this warrants including PEP 382 in 2.x, I don't know; I still
don't really understand this proposal.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?

From ncoghlan at gmail.com  Thu Apr  9 15:16:26 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 09 Apr 2009 23:16:26 +1000
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <4edc17eb0904090518s62db9461hb190f0db29abb871@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>	<grgf59$j7c$1@ger.gmane.org>	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>	<ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>	<4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com>	<49DDE5F0.4070000@gmail.com>
	<4edc17eb0904090518s62db9461hb190f0db29abb871@mail.gmail.com>
Message-ID: <49DDF52A.60704@gmail.com>

Michele Simionato wrote:
> On Thu, Apr 9, 2009 at 2:11 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> One of my hopes for PEP 362 was that I would be able to just add
>> __signature__ to the list of copied attributes, but that PEP is
>> currently short a champion to work through the process of resolving the
>> open issues and creating an up to date patch (Brett ended up with too
>> many things on his plate so he wasn't able to do it, and nobody else has
>> offered to take it over).
> 
> I am totally ignorant about the internals of Python and I cannot certainly
> take that role. But I would like to hear from Guido if he wants to support
> a __signature__ object or if he does not care. In the first case
> I think somebody will take the job, in the second case it is better to
> reject the PEP and be done with it.

I don't recall Guido being opposed when PEP 362 was first being
discussed (keeping in mind that was more than 2 years ago, so he's quite
entitled to have changed his mind in the meantime!).

That said, it's a sensible, largely straightforward idea, and by
creating the object lazily it doesn't even have to incur a runtime cost
in programs that don't do much introspection.

I think the main problem leading to the current lack of movement on the
PEP is that the existing inspect module is good enough for most
practical purposes (which are fairly rare in the first place), so this
isn't perceived as a huge gain even for the folks that are interested in
introspection.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Thu Apr  9 15:28:08 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 09 Apr 2009 23:28:08 +1000
Subject: [Python-Dev] Adding new features to Python 2.x
 (PEP	382:	Namespace Packages)
In-Reply-To: <20090409125312.GB1909@panix.com>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>	<49DB4624.604@egenix.com>
	<49DBA78F.7010904@v.loewis.de>	<49DDD6AD.9020708@gmail.com>
	<20090409125312.GB1909@panix.com>
Message-ID: <49DDF7E8.9000001@gmail.com>

Aahz wrote:
> On Thu, Apr 09, 2009, Nick Coghlan wrote:
>> Martin v. L?wis wrote:
>>>> Such a policy would then translate to a dead end for Python 2.x
>>>> based applications.
>>> 2.x based applications *are* in a dead end, with the only exit
>>> being portage to 3.x.
>> The actual end of the dead end just happens to be in 2013 or so :)
> 
> More like 2016 or 2020 -- as of January, my former employer was still
> using Python 2.3, and I wouldn't be surprised if 1.5.2 was still out in
> the wilds.

Indeed - I know of a system that will finally be migrating from Python
2.2 to Python *2.4* later this year :)

>  The transition to 3.x is more extreme, and lots of people
> will continue making do for years after any formal support is dropped.

Yeah, I was only referring to the likely minimum time frame that
python-dev would continue providing security releases. As you say, the
actual 2.x version of the language will live on long after the day we
close all remaining 2.x only bug reports and patches as "out of date".

> Whether this warrants including PEP 382 in 2.x, I don't know; I still
> don't really understand this proposal.

I'd personally still prefer to keep the guideline that new features that
are easy to backport *should* be backported, but that's really a
decision for the authors of each new feature.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From asmodai at in-nomine.org  Thu Apr  9 15:38:30 2009
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Thu, 9 Apr 2009 15:38:30 +0200
Subject: [Python-Dev] py3k build erroring out on fileio?
Message-ID: <20090409133830.GD13110@nexus.in-nomine.org>

Just to make sure I am not doing something silly, with a configure line as
such: ./configure --prefix=/home/asmodai/local --with-wide-unicode
--with-pymalloc --with-threads --with-computed-gotos, would there be any
reason why I am getting the following error with both BSD make and gmake:

make: don't know how to make ./Modules/_fileio.c. Stop

[Will log an issue if it turns out to, indeed, be a problem with the tree
and not me.]

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Forgive us our trespasses, as we forgive those that trespass against us...

From benjamin at python.org  Thu Apr  9 15:41:12 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 9 Apr 2009 08:41:12 -0500
Subject: [Python-Dev] py3k build erroring out on fileio?
In-Reply-To: <20090409133830.GD13110@nexus.in-nomine.org>
References: <20090409133830.GD13110@nexus.in-nomine.org>
Message-ID: <1afaf6160904090641h347a6cf4o936b2b161dd31130@mail.gmail.com>

2009/4/9 Jeroen Ruigrok van der Werven <asmodai at in-nomine.org>:
> Just to make sure I am not doing something silly, with a configure line as
> such: ./configure --prefix=/home/asmodai/local --with-wide-unicode
> --with-pymalloc --with-threads --with-computed-gotos, would there be any
> reason why I am getting the following error with both BSD make and gmake:
>
> make: don't know how to make ./Modules/_fileio.c. Stop
>
> [Will log an issue if it turns out to, indeed, be a problem with the tree
> and not me.]

It seems your Makefile is outdated. We moved the _fileio.c module
around a few days, so maybe you just need a make distclean.

-- 
Regards,
Benjamin

From asmodai at in-nomine.org  Thu Apr  9 16:04:55 2009
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Thu, 9 Apr 2009 16:04:55 +0200
Subject: [Python-Dev] py3k build erroring out on fileio?
In-Reply-To: <1afaf6160904090641h347a6cf4o936b2b161dd31130@mail.gmail.com>
References: <20090409133830.GD13110@nexus.in-nomine.org>
	<1afaf6160904090641h347a6cf4o936b2b161dd31130@mail.gmail.com>
Message-ID: <20090409140455.GF13110@nexus.in-nomine.org>

-On [20090409 15:41], Benjamin Peterson (benjamin at python.org) wrote:
>It seems your Makefile is outdated. We moved the _fileio.c module
>around a few days, so maybe you just need a make distclean.

Yes, that was the cause. Thanks Benjamin.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
You yourself, as much as anybody in the entire universe, deserve your love
and affection...

From janssen at parc.com  Thu Apr  9 17:08:50 2009
From: janssen at parc.com (Bill Janssen)
Date: Thu, 9 Apr 2009 08:08:50 PDT
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
Message-ID: <66887.1239289730@parc.com>

Barry Warsaw <barry at python.org> wrote:

> Anyway, aside from that decision, I haven't come up with an  
> elegant way to allow /output/ in both bytes and strings (input is I  
> think theoretically easier by sniffing the arguments).

Probably a good thing.  It just promotes more confusion to do things
that way, IMO.

Bill

From john at arbash-meinel.com  Thu Apr  9 17:02:14 2009
From: john at arbash-meinel.com (John Arbash Meinel)
Date: Thu, 09 Apr 2009 10:02:14 -0500
Subject: [Python-Dev] Rethinking intern() and its data structure
Message-ID: <49DE0DF6.1040900@arbash-meinel.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've been doing some memory profiling of my application, and I've found
some interesting results with how intern() works. I was pretty surprised
to see that the "interned" dict was actually consuming a significant
amount of total memory.
To give the specific values, after doing:
  bzr branch A B
of a small project, the total memory consumption is ~21MB

Of that, the largest single object is the 'interned' dict, at 1.57MB,
which contains 22k strings. One interesting bit, the size of it + the
referenced strings is only 2.4MB. So the "interned" dict *by itself* is
2/3rds the size of the dict + strings it contains.

It also means that the average size of a referenced string is 37.4
bytes. A 'str' has 24 bytes of overhead, so the average string is 13.5
characters long. So to save references to 13.5*22k ~ 300kB of character
data, we are paying 2.4MB, or about 8:1 overhead.

When I looked at the actual references from interned, I saw mostly
variable names. Considering that every variable goes through the python
intern dict. And when you look at the intern function, it doesn't use
setdefault logic, it actually does a get() followed by a set(), which
means the cost of interning is 1-2 lookups depending on likelyhood, etc.
(I saw a whole lot of strings as the error codes in win32all /
winerror.py, and windows error codes tend to be longer-than-average
variable length.)

Anyway, I the internals of intern() could be done a bit better. Here are
some concrete things:

  a) Don't keep a double reference to both key and value to the same
     object (1 pointer per entry), this could be as simple as using a
     Set() instead of a dict()

  b) Don't cache the hash key in the set, as strings already cache them.
     (1 long per entry). This is a big win for space, but would need to
     be balanced against lookup and collision resolving speed.

     My guess is that reducing the size of the set will actually improve
     speed more, because more items can fit in cache. It depends on how
     many times you need to resolve a collision. If the string hash is
     sufficiently spread out, and the load factor is reasonable, then
     likely when you actually find an item in the set, it will be the
     item you want, and you'll need to bring the string object into
     cache anyway, so that you can do a string comparison (rather than
     just a hash comparison.)

  c) Use the existing lookup function one time. (PySet->lookup())
     Sets already have a "lookup" which is optimized for strings, and
     returns a pointer to where the object would go if it exists. Which
     means the intern() function can do a single lookup resolving any
     collisions, and return the object or insert without doing a second
     lookup.

  d) Having a special structure might also allow for separate optimizing
     of things like 'default size', 'grow rate', 'load factor', etc. A
     lot of this could be tuned specifically knowing that we really only
     have 1 of these objects, and it is going to be pointing at a lot of
     strings that are < 50 bytes long.

     If hashes of variable name strings are well distributed, we could
     probably get away with a load factor of 2. If we know we are likely
     to have lots and lots that never go away (you rarely *unload*
     modules, and all variable names are in the intern dict), that would
     suggest having a large initial size, and probably a wide growth
     factor to avoid spending a lot of time resizing the set.

  e) How tuned is String.hash() for the fact that most of these strings
     are going to be ascii text? (I know that python wants to support
     non-ascii variable names, but I still think there is going to be an
     overwhelming bias towards characters in the range 65-122 ('A'-'z').

Also note that the performance of the "interned" dict gets even worse on
64-bit platforms. Where the size of a 'dictentry' doubles, but the
average length of a variable name wouldn't change.

Anyway, I would be happy to implement something along the lines of a
"StringSet", or maybe the "InternSet", etc. I just wanted to check if
people would be interested or not.

John
=:->

PS> I'm not yet subscribed to python-dev, so if you could make sure to
CC me in replies, I would appreciate it.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkneDfYACgkQJdeBCYSNAAPMywCfQVWOg51dtIkWT/jttVTARV0g
WJ4An1w7ypB+akHT5hiSwRKoUhH7ez4j
=9TTp
-----END PGP SIGNATURE-----

From aahz at pythoncraft.com  Thu Apr  9 17:31:23 2009
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 9 Apr 2009 08:31:23 -0700
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE0DF6.1040900@arbash-meinel.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
Message-ID: <20090409153123.GA2971@panix.com>

On Thu, Apr 09, 2009, John Arbash Meinel wrote:
>
> PS> I'm not yet subscribed to python-dev, so if you could make sure to
> CC me in replies, I would appreciate it.

Please do subscribe to python-dev ASAP; I also suggest that you subscribe
to python-ideas, because I suspect that this is sufficiently blue-sky to
start there.

As always, this is the kind of thing where code trumps gedanken, so you
shouldn't expect much activity unless either you are willing to make at
least initial attempts at trying out your ideas or someone else just
happens to find it interesting.  In general, the core Python
implementation strives for simplicity, so there's already some built-in
pushback.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?

From dirkjan at ochtman.nl  Thu Apr  9 17:40:18 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Thu, 9 Apr 2009 17:40:18 +0200
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <20090409153123.GA2971@panix.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<20090409153123.GA2971@panix.com>
Message-ID: <ea2499da0904090840p4428f78elb6ac9d39090bbb3d@mail.gmail.com>

On Thu, Apr 9, 2009 at 17:31, Aahz <aahz at pythoncraft.com> wrote:
> Please do subscribe to python-dev ASAP; I also suggest that you subscribe
> to python-ideas, because I suspect that this is sufficiently blue-sky to
> start there.

It might also be interesting to the unladen-swallow guys.

Cheers,

Dirkjan

From daniel at stutzbachenterprises.com  Thu Apr  9 17:55:47 2009
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Thu, 9 Apr 2009 10:55:47 -0500
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
Message-ID: <eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>

On Thu, Apr 9, 2009 at 6:01 AM, Barry Warsaw <barry at python.org> wrote:

> Anyway, aside from that decision, I haven't come up with an elegant way to
> allow /output/ in both bytes and strings (input is I think theoretically
> easier by sniffing the arguments).
>

Won't this work? (assuming dumps() always returns a string)

def dumpb(obj, encoding='utf-8', *args, **kw):
    s = dumps(obj, *args, **kw)
    return s.encode(encoding)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/1013820d/attachment.htm>

From tonynelson at georgeanelson.com  Thu Apr  9 17:05:38 2009
From: tonynelson at georgeanelson.com (Tony Nelson)
Date: Thu, 9 Apr 2009 11:05:38 -0400
Subject: [Python-Dev] email package Bytes vs Unicode (was Re: Dropping bytes
 "support" in json)
In-Reply-To: <grkodk$j4p$1@ger.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
Message-ID: <p04330100c603badeb135@[192.168.123.162]>

(email-sig added)

At 08:07 -0400 04/09/2009, Steve Holden wrote:
>Barry Warsaw wrote:
 ...
>> This is an interesting question, and something I'm struggling with for
>> the email package for 3.x.  It turns out to be pretty convenient to have
>> both a bytes and a string API, both for input and output, but I think
>> email really wants to be represented internally as bytes.  Maybe.  Or
>> maybe just for content bodies and not headers, or maybe both.  Anyway,
>> aside from that decision, I haven't come up with an elegant way to allow
>> /output/ in both bytes and strings (input is I think theoretically
>> easier by sniffing the arguments).
>>
>The real problem I came across in storing email in a relational database
>was the inability to store messages as Unicode. Some messages have a
>body in one encoding and an attachment in another, so the only ways to
>store the messages are either as a monolithic bytes string that gets
>parsed when the individual components are required or as a sequence of
>components in the database's preferred encoding (if you want to keep the
>original encoding most relational databases won't be able to help unless
>you store the components as bytes).
 ...

I found it confusing myself, and did it wrong for a while.  Now, I
understand that essages come over the wire as bytes, either 7-bit US-ASCII
or 8-bit whatever, and are parsed at the receiver.  I think of the database
as a wire to the future, and store the data as bytes (a BLOB), letting the
future receiver parse them as it did the first time, when I cleaned the
message.  Data I care to query is extracted into fields (in UTF-8, what I
usually use for char fields).  I have no need to store messages as Unicode,
and they aren't Unicode anyway.  I have no need ever to flatten a message
to Unicode, only to US-ASCII or, for messages (spam) that are corrupt, raw
8-bit data.

If you need the data from the message, by all means extract it and store it
in whatever form is useful to the purpose of the database.  If you need the
entire message, store it intact in the database, as the bytes it is.  Email
isn't Unicode any more than a JPEG or other image types (often payloads in
a message) are Unicode.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>

From steve at holdenweb.com  Thu Apr  9 18:20:31 2009
From: steve at holdenweb.com (Steve Holden)
Date: Thu, 09 Apr 2009 12:20:31 -0400
Subject: [Python-Dev] email package Bytes vs Unicode (was Re: Dropping
 bytes "support" in json)
In-Reply-To: <p04330100c603badeb135@[192.168.123.162]>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<grkodk$j4p$1@ger.gmane.org>
	<p04330100c603badeb135@[192.168.123.162]>
Message-ID: <grl78j$7sl$1@ger.gmane.org>

Tony Nelson wrote:
> (email-sig added)
> 
> At 08:07 -0400 04/09/2009, Steve Holden wrote:
>> Barry Warsaw wrote:
>  ...
>>> This is an interesting question, and something I'm struggling with for
>>> the email package for 3.x.  It turns out to be pretty convenient to have
>>> both a bytes and a string API, both for input and output, but I think
>>> email really wants to be represented internally as bytes.  Maybe.  Or
>>> maybe just for content bodies and not headers, or maybe both.  Anyway,
>>> aside from that decision, I haven't come up with an elegant way to allow
>>> /output/ in both bytes and strings (input is I think theoretically
>>> easier by sniffing the arguments).
>>>
>> The real problem I came across in storing email in a relational database
>> was the inability to store messages as Unicode. Some messages have a
>> body in one encoding and an attachment in another, so the only ways to
>> store the messages are either as a monolithic bytes string that gets
>> parsed when the individual components are required or as a sequence of
>> components in the database's preferred encoding (if you want to keep the
>> original encoding most relational databases won't be able to help unless
>> you store the components as bytes).
>  ...
> 
> I found it confusing myself, and did it wrong for a while.  Now, I
> understand that essages come over the wire as bytes, either 7-bit US-ASCII
> or 8-bit whatever, and are parsed at the receiver.  I think of the database
> as a wire to the future, and store the data as bytes (a BLOB), letting the
> future receiver parse them as it did the first time, when I cleaned the
> message.  Data I care to query is extracted into fields (in UTF-8, what I
> usually use for char fields).  I have no need to store messages as Unicode,
> and they aren't Unicode anyway.  I have no need ever to flatten a message
> to Unicode, only to US-ASCII or, for messages (spam) that are corrupt, raw
> 8-bit data.
> 
> If you need the data from the message, by all means extract it and store it
> in whatever form is useful to the purpose of the database.  If you need the
> entire message, store it intact in the database, as the bytes it is.  Email
> isn't Unicode any more than a JPEG or other image types (often payloads in
> a message) are Unicode.

This is all great, and I did quite quickly realize that the best
approach was to store the mails in their network byte-stream format as
bytes. The approach was negated in my own case because of PostgreSQL's
execrable BLOB-handling capabilities. I took a look at the escaping they
required, snorted with derision and gave it up as a bad job.

PostgreSQL strongly encourages you to store text as encoded columns.
Because emails lack an encoding it turns out this is a most inconvenient
storage type for it. Sadly BLOBs are such a pain in PostgreSQL that it's
easier to store the messages in external files and just use the
relational database to index those files to retrieve content, so that's
what I ended up doing.

regards
 Steve

-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Watch PyCon on video now!          http://pycon.blip.tv/

From collinw at gmail.com  Thu Apr  9 18:29:00 2009
From: collinw at gmail.com (Collin Winter)
Date: Thu, 9 Apr 2009 09:29:00 -0700
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE0DF6.1040900@arbash-meinel.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
Message-ID: <43aa6ff70904090929t657a3154rdfc0f66c180469eb@mail.gmail.com>

Hi John,

On Thu, Apr 9, 2009 at 8:02 AM, John Arbash Meinel
<john at arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I've been doing some memory profiling of my application, and I've found
> some interesting results with how intern() works. I was pretty surprised
> to see that the "interned" dict was actually consuming a significant
> amount of total memory.
> To give the specific values, after doing:
> ?bzr branch A B
> of a small project, the total memory consumption is ~21MB

[snip]

> Anyway, I the internals of intern() could be done a bit better. Here are
> some concrete things:

[snip]

Memory usage is definitely something we're interested in improving.
Since you've already looked at this in some detail, could you try
implementing one or two of your ideas and see if it makes a difference
in memory consumption? Changing from a dict to a set looks promising,
and should be a fairly self-contained way of starting on this. If it
works, please post the patch on http://bugs.python.org with your
results and assign it to me for review.

Thanks,
Collin Winter

From john.arbash.meinel at gmail.com  Thu Apr  9 18:34:24 2009
From: john.arbash.meinel at gmail.com (John Arbash Meinel)
Date: Thu, 09 Apr 2009 11:34:24 -0500
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <43aa6ff70904090929t657a3154rdfc0f66c180469eb@mail.gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<43aa6ff70904090929t657a3154rdfc0f66c180469eb@mail.gmail.com>
Message-ID: <49DE2390.4070305@gmail.com>

...

>> Anyway, I the internals of intern() could be done a bit better. Here are
>> some concrete things:
>>     
>
> [snip]
>
> Memory usage is definitely something we're interested in improving.
> Since you've already looked at this in some detail, could you try
> implementing one or two of your ideas and see if it makes a difference
> in memory consumption? Changing from a dict to a set looks promising,
> and should be a fairly self-contained way of starting on this. If it
> works, please post the patch on http://bugs.python.org with your
> results and assign it to me for review.
>
> Thanks,
> Collin Winter
>   
(I did end up subscribing, just with a different email address :)

What is the best branch to start working from? "trunk"?

John
=:->

From collinw at gmail.com  Thu Apr  9 18:36:29 2009
From: collinw at gmail.com (Collin Winter)
Date: Thu, 9 Apr 2009 09:36:29 -0700
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE2390.4070305@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<43aa6ff70904090929t657a3154rdfc0f66c180469eb@mail.gmail.com>
	<49DE2390.4070305@gmail.com>
Message-ID: <43aa6ff70904090936y32ea66b9o44a6eda4d50502b3@mail.gmail.com>

On Thu, Apr 9, 2009 at 9:34 AM, John Arbash Meinel
<john.arbash.meinel at gmail.com> wrote:
> ...
>
>>> Anyway, I the internals of intern() could be done a bit better. Here are
>>> some concrete things:
>>>
>>
>> [snip]
>>
>> Memory usage is definitely something we're interested in improving.
>> Since you've already looked at this in some detail, could you try
>> implementing one or two of your ideas and see if it makes a difference
>> in memory consumption? Changing from a dict to a set looks promising,
>> and should be a fairly self-contained way of starting on this. If it
>> works, please post the patch on http://bugs.python.org with your
>> results and assign it to me for review.
>>
>> Thanks,
>> Collin Winter
>>
> (I did end up subscribing, just with a different email address :)
>
> What is the best branch to start working from? "trunk"?

That's a good place to start, yes. If the idea works well, we'll want
to port it to the py3k branch, too, but that can wait.

Collin

From lists at cheimes.de  Thu Apr  9 19:05:24 2009
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 09 Apr 2009 19:05:24 +0200
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE0DF6.1040900@arbash-meinel.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
Message-ID: <49DE2AD4.6090605@cheimes.de>

John Arbash Meinel wrote:
> When I looked at the actual references from interned, I saw mostly
> variable names. Considering that every variable goes through the python
> intern dict. And when you look at the intern function, it doesn't use
> setdefault logic, it actually does a get() followed by a set(), which
> means the cost of interning is 1-2 lookups depending on likelyhood, etc.
> (I saw a whole lot of strings as the error codes in win32all /
> winerror.py, and windows error codes tend to be longer-than-average
> variable length.)

I've read your posting twice but I'm still not sure if you are aware of
the most important feature of interned strings. In the first place
interning not about saving some bytes of memory but a speed
optimization. Interned strings can be compared with a simple and fast
pointer comparison. With interend strings you can simple write:

    char *a, *b;
    if (a == b) {
        ...
    }

Instead of:

    char *a, *b;
    if (strcmp(a, b) == 0) {
        ...
    }

A compiler can optimize the pointer comparison much better than a
function call.

> Anyway, I the internals of intern() could be done a bit better. Here are
> some concrete things:
> 
>   a) Don't keep a double reference to both key and value to the same
>      object (1 pointer per entry), this could be as simple as using a
>      Set() instead of a dict()
> 
>   b) Don't cache the hash key in the set, as strings already cache them.
>      (1 long per entry). This is a big win for space, but would need to
>      be balanced against lookup and collision resolving speed.
> 
>      My guess is that reducing the size of the set will actually improve
>      speed more, because more items can fit in cache. It depends on how
>      many times you need to resolve a collision. If the string hash is
>      sufficiently spread out, and the load factor is reasonable, then
>      likely when you actually find an item in the set, it will be the
>      item you want, and you'll need to bring the string object into
>      cache anyway, so that you can do a string comparison (rather than
>      just a hash comparison.)
> 
>   c) Use the existing lookup function one time. (PySet->lookup())
>      Sets already have a "lookup" which is optimized for strings, and
>      returns a pointer to where the object would go if it exists. Which
>      means the intern() function can do a single lookup resolving any
>      collisions, and return the object or insert without doing a second
>      lookup.
> 
>   d) Having a special structure might also allow for separate optimizing
>      of things like 'default size', 'grow rate', 'load factor', etc. A
>      lot of this could be tuned specifically knowing that we really only
>      have 1 of these objects, and it is going to be pointing at a lot of
>      strings that are < 50 bytes long.
> 
>      If hashes of variable name strings are well distributed, we could
>      probably get away with a load factor of 2. If we know we are likely
>      to have lots and lots that never go away (you rarely *unload*
>      modules, and all variable names are in the intern dict), that would
>      suggest having a large initial size, and probably a wide growth
>      factor to avoid spending a lot of time resizing the set.

I agree that a dict is not the most memory efficient data structure for
interned strings. However dicts are extremely well tested and highly
optimized. Any specialized data structure needs to be desinged and
tested very carefully. If you happen to break the interning system it's
going to lead to rather nasty and hard to debug problems.

>   e) How tuned is String.hash() for the fact that most of these strings
>      are going to be ascii text? (I know that python wants to support
>      non-ascii variable names, but I still think there is going to be an
>      overwhelming bias towards characters in the range 65-122 ('A'-'z').

Python 3.0 uses unicode for all names. You have to design something that
can be adopted to unicode, too. By the way do you know that dicts have
an optimized lookup function for strings? It's called lookdict_unicode /
 lookdict_string.

> Also note that the performance of the "interned" dict gets even worse on
> 64-bit platforms. Where the size of a 'dictentry' doubles, but the
> average length of a variable name wouldn't change.
> 
> Anyway, I would be happy to implement something along the lines of a
> "StringSet", or maybe the "InternSet", etc. I just wanted to check if
> people would be interested or not.

Since interning is mostly used in the core and extension modules you
might want to experiment with a different growth rate. The interning
data structure could start with a larger value and have a slower, non
progressive data growth rate.

Christian

From tonynelson at georgeanelson.com  Thu Apr  9 19:14:21 2009
From: tonynelson at georgeanelson.com (Tony Nelson)
Date: Thu, 9 Apr 2009 13:14:21 -0400
Subject: [Python-Dev] email package Bytes vs Unicode (was Re: Dropping
 bytes "support" in json)
In-Reply-To: <grl78j$7sl$1@ger.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>	<p04330100c603badeb135@[192.168.123.162]>
	<grl78j$7sl$1@ger.gmane.org>
Message-ID: <p04330103c603daff3e8b@[192.168.123.162]>

(email-sig dropped, as I didn't see Steve Holden's message there)

At 12:20 -0400 04/09/2009, Steve Holden wrote:
>Tony Nelson wrote:
 ...
>> If you need the data from the message, by all means extract it and store it
>> in whatever form is useful to the purpose of the database.  If you need the
>> entire message, store it intact in the database, as the bytes it is.  Email
>> isn't Unicode any more than a JPEG or other image types (often payloads in
>> a message) are Unicode.
>
>This is all great, and I did quite quickly realize that the best
>approach was to store the mails in their network byte-stream format as
>bytes. The approach was negated in my own case because of PostgreSQL's
>execrable BLOB-handling capabilities. I took a look at the escaping they
>required, snorted with derision and gave it up as a bad job.
 ...

I use MySQL, but sort of intend to learn PostgreSQL.  I didn't know that
PostgreSQL has no real support for BLOBs.  I agree that having to import
them from a file is awful.  Also, there appears to be a severe limit on the
size of character data fields, so storing in Base64 is out.  About the only
thing to do then is to use external storage for the BLOBs.

Still, email seems to demand such binary storage, whether all databases
provide it or not.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>

From phd at phd.pp.ru  Thu Apr  9 19:24:24 2009
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Thu, 9 Apr 2009 21:24:24 +0400
Subject: [Python-Dev] BLOBs in Pg (was: email package Bytes vs Unicode)
In-Reply-To: <p04330103c603daff3e8b@[192.168.123.162]>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<p04330100c603badeb135@[192.168.123.162]>
	<grl78j$7sl$1@ger.gmane.org>
	<p04330103c603daff3e8b@[192.168.123.162]>
Message-ID: <20090409172424.GD26429@phd.pp.ru>

On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote:
> I use MySQL, but sort of intend to learn PostgreSQL.  I didn't know that
> PostgreSQL has no real support for BLOBs.

   I think it has - BYTEA data type.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From john.arbash.meinel at gmail.com  Thu Apr  9 19:35:05 2009
From: john.arbash.meinel at gmail.com (John Arbash Meinel)
Date: Thu, 09 Apr 2009 12:35:05 -0500
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE2AD4.6090605@cheimes.de>
References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de>
Message-ID: <49DE31C9.103@gmail.com>

Christian Heimes wrote:
> John Arbash Meinel wrote:
>> When I looked at the actual references from interned, I saw mostly
>> variable names. Considering that every variable goes through the python
>> intern dict. And when you look at the intern function, it doesn't use
>> setdefault logic, it actually does a get() followed by a set(), which
>> means the cost of interning is 1-2 lookups depending on likelyhood, etc.
>> (I saw a whole lot of strings as the error codes in win32all /
>> winerror.py, and windows error codes tend to be longer-than-average
>> variable length.)
> 
> I've read your posting twice but I'm still not sure if you are aware of
> the most important feature of interned strings. In the first place
> interning not about saving some bytes of memory but a speed
> optimization. Interned strings can be compared with a simple and fast
> pointer comparison. With interend strings you can simple write:
> 
>     char *a, *b;
>     if (a == b) {
>         ...
>     }
> 
> Instead of:
> 
>     char *a, *b;
>     if (strcmp(a, b) == 0) {
>         ...
>     }
> 
> A compiler can optimize the pointer comparison much better than a
> function call.
> 

Certainly. But there is a cost associated with calling intern() in the
first place. You created a string, and you are now trying to de-dup it.
That cost is both in the memory to track all strings interned so far,
and the cost to do a dict lookup. And the way intern is currently
written, there is a third cost when the item doesn't exist yet, which is
another lookup to insert the object.

I'll also note that increasing memory does have a semi-direct effect on
performance, because more memory requires more time to bring memory back
and forth from main memory to CPU caches.

...

> I agree that a dict is not the most memory efficient data structure for
> interned strings. However dicts are extremely well tested and highly
> optimized. Any specialized data structure needs to be desinged and
> tested very carefully. If you happen to break the interning system it's
> going to lead to rather nasty and hard to debug problems.

Sure. My plan was to basically take the existing Set/Dict design, and
just tweak it slightly for the expected operations of "interned".

> 
>>   e) How tuned is String.hash() for the fact that most of these strings
>>      are going to be ascii text? (I know that python wants to support
>>      non-ascii variable names, but I still think there is going to be an
>>      overwhelming bias towards characters in the range 65-122 ('A'-'z').
> 
> Python 3.0 uses unicode for all names. You have to design something that
> can be adopted to unicode, too. By the way do you know that dicts have
> an optimized lookup function for strings? It's called lookdict_unicode /
>  lookdict_string.

Sure, but so does PySet. I'm not sure about lookset_unicode, but I would
guess that exists or should exist for py3k.

> 
>> Also note that the performance of the "interned" dict gets even worse on
>> 64-bit platforms. Where the size of a 'dictentry' doubles, but the
>> average length of a variable name wouldn't change.
>>
>> Anyway, I would be happy to implement something along the lines of a
>> "StringSet", or maybe the "InternSet", etc. I just wanted to check if
>> people would be interested or not.
> 
> Since interning is mostly used in the core and extension modules you
> might want to experiment with a different growth rate. The interning
> data structure could start with a larger value and have a slower, non
> progressive data growth rate.
> 
> Christian

I'll also mention that there are other uses for intern() where it is
uniquely suitable. Namely, if you are parsing lots of text with
redundant strings, it is a way to decrease total memory consumption.
(And potentially speed up future comparisons, etc.)

The main reason why intern() is useful for this is because it doesn't
make strings immortal, as would happen if you used some other structure.
Because strings know about the "interned" object.

The options for a 3rd-party structure fall down into something like:

1) A cache that makes the strings immortal. (IIRC this is what older
versions of Python did.)

2) A cache that is periodically walked to see if any of the objects are
no longer externally referenced. The main problem here is that walking
is O(all-objects), whereas doing the checking at refcount=0 time means
you only check objects when you think the last reference has gone away.

3) Hijacking PyStringType->dealloc, so that when the refcount goes to 0
and Python want's to destroy the string, you then trigger your own cache
to look and see if it should remove the object.

Even further, you either have to check on every string dealloc, or
re-use PyStringObject->ob_sstate to track that you have placed this
string into your custom structure. Which would preclude ever calling
intern() on this string, because intern() doesn't just check a couple
bits, it looks at the entire ob_sstate value.

I think you could make it work, such that if your custom cache had set
some values, then intern() would just return without evaluating, and
during dealloc you could make sure that you set ob_sstate back to 0
before letting the rest of the python machinery dealloc the string.

John
=:->

From steve at holdenweb.com  Thu Apr  9 20:05:54 2009
From: steve at holdenweb.com (Steve Holden)
Date: Thu, 09 Apr 2009 14:05:54 -0400
Subject: [Python-Dev] BLOBs in Pg
In-Reply-To: <20090409172424.GD26429@phd.pp.ru>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<grkodk$j4p$1@ger.gmane.org>	<p04330100c603badeb135@[192.168.123.162]>	<grl78j$7sl$1@ger.gmane.org>	<p04330103c603daff3e8b@[192.168.123.162]>
	<20090409172424.GD26429@phd.pp.ru>
Message-ID: <49DE3902.70103@holdenweb.com>

Oleg Broytmann wrote:
> On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote:
>> I use MySQL, but sort of intend to learn PostgreSQL.  I didn't know that
>> PostgreSQL has no real support for BLOBs.
> 
>    I think it has - BYTEA data type.
> 
But the Python DB adapters appears to require some fairly hairy escaping
of the data to make it usable with the cursor execute() method. IMHO you
shouldn't have to escape data that is passed for insertion via a
parameterized query.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Watch PyCon on video now!          http://pycon.blip.tv/

From john.arbash.meinel at gmail.com  Thu Apr  9 20:20:11 2009
From: john.arbash.meinel at gmail.com (John Arbash Meinel)
Date: Thu, 09 Apr 2009 13:20:11 -0500
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <d38f5330904091042l415c8912r9ca12fefda7b1ce1@mail.gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<d38f5330904091042l415c8912r9ca12fefda7b1ce1@mail.gmail.com>
Message-ID: <49DE3C5B.6020308@gmail.com>

Alexander Belopolsky wrote:
> On Thu, Apr 9, 2009 at 11:02 AM, John Arbash Meinel
> <john at arbash-meinel.com> wrote:
> ...
>>  a) Don't keep a double reference to both key and value to the same
>>     object (1 pointer per entry), this could be as simple as using a
>>     Set() instead of a dict()
>>
> 
> There is a rejected patch implementing just that:
> http://bugs.python.org/issue1507011 .
> 

Thanks for the heads up.

So reading that thread, the final reason it was rejected was 2 part:

  Without reviewing the patch again, I also doubt it is capable of
  getting rid of the reference count cheating: essentially, this
  cheating enables the interning dictionary to have weak references to
  strings, this is important to allow automatic collection of certain
  interned strings. This feature needs to be preserved, so the cheating
  in the reference count must continue.

That specific argument was invalid. Because the patch just changed the
refcount trickery to use +- 1. And I'm pretty sure Alexander's argument
was just that +- 2 was weird, not that the "weakref" behavior was bad.

The other argument against the patch was based on the idea that:
  The operation "give me the member equal but not identical to E" is
  conceptually a lookup operation; the mathematical set construct has no
  such operation, and the Python set models it closely. IOW, set is
  *not* a dict with key==value.

I don't know if there was any consensus reached on this, since only
Martin responded this way.

I can say that for my "do some work with a medium size code base", the
overhead of "interned" as a dictionary was 1.5MB out of 20MB total memory.

Simply changing it to a Set would drop this to 1.0MB. I have no proof
about the impact on performance, since I haven't benchmarked it yet.

Changing it to a StringSet could further drop it to 0.5MB. I would guess
that any performance impact would depend on whether the total size of
'interned' would fit inside L2 cache or not.

There is a small bug in the original patch adding the string to the set
failed. Namely it would return "t == NULL" which would be "t != s" and
the intern in place would end up setting your pointer to NULL rather
than doing nothing and clearing the error code.

So I guess some of it comes down to whether "loweis" would also reject
this change on the basis that mathematically a "set is not a dict".
Though given that his claim "nobody else is speaking in favor of the
patch", while at least Colin Winter has expressed some interest at this
point.

John
=:->

From martin at v.loewis.de  Thu Apr  9 20:25:35 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Apr 2009 20:25:35 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
Message-ID: <49DE3D9F.3000902@v.loewis.de>

> This is an interesting question, and something I'm struggling with for
> the email package for 3.x.  It turns out to be pretty convenient to have
> both a bytes and a string API, both for input and output, but I think
> email really wants to be represented internally as bytes.  Maybe.  Or
> maybe just for content bodies and not headers, or maybe both.  Anyway,
> aside from that decision, I haven't come up with an elegant way to allow
> /output/ in both bytes and strings (input is I think theoretically
> easier by sniffing the arguments).

If you allow for content-transfer-encoding: 8bit, I think there is just
no way to represent email as text. You have to accept conversion to,
say, base64 (or quoted-unreadable) when converting an email message to
text.

Regards,
Martin

From tonynelson at georgeanelson.com  Thu Apr  9 20:43:16 2009
From: tonynelson at georgeanelson.com (Tony Nelson)
Date: Thu, 9 Apr 2009 14:43:16 -0400
Subject: [Python-Dev] BLOBs in Pg (was: email package Bytes vs Unicode)
In-Reply-To: <20090409172424.GD26429@phd.pp.ru>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>	<p04330100c603badeb135@[192.168.123.162]>
	<grl78j$7sl$1@ger.gmane.org>	<p04330103c603daff3e8b@[192.168.123.162]>
	<20090409172424.GD26429@phd.pp.ru>
Message-ID: <p04330104c603ed41892f@[192.168.123.162]>

At 21:24 +0400 04/09/2009, Oleg Broytmann wrote:
>On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote:
>> I use MySQL, but sort of intend to learn PostgreSQL.  I didn't know that
>> PostgreSQL has no real support for BLOBs.
>
>   I think it has - BYTEA data type.

So it does; I see that now that I've opened up the PostgreSQL docs.  I
don't find escaping data to be a problem -- I do it for all untrusted data.

So, after all, there isn't an example of a database that makes onerous the
storing of email and other such byte-oriented data, and Python's email
package has no need for workarounds in that area.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>

From martin at v.loewis.de  Thu Apr  9 21:06:40 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Apr 2009 21:06:40 +0200
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE3C5B.6020308@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com>	<d38f5330904091042l415c8912r9ca12fefda7b1ce1@mail.gmail.com>
	<49DE3C5B.6020308@gmail.com>
Message-ID: <49DE4740.2040205@v.loewis.de>

> So I guess some of it comes down to whether "loweis" would also reject
> this change on the basis that mathematically a "set is not a dict".

I'd like to point out that this was not the reason to reject it.
Instead, this (or, the opposite of it) was given as a reason why this
patch should be accepted (in msg50482). I found that a weak rationale
for making that change, in particular because I think the rationale
is incorrect.

I like your rationale (save memory) much more, and was asking in the
tracker for specific numbers, which weren't forthcoming.

> Though given that his claim "nobody else is speaking in favor of the
> patch", while at least Colin Winter has expressed some interest at this
> point.

Again, at that point in the tracker, none of the other committers had
spoken in favor of the patch. Since I wasn't convinced of its
correctness, and nobody else (whom I trust) had reviewed it as correct,
I rejected it.

Now that you brought up a specific numbers, I tried to verify them,
and found them correct (although a bit unfortunate), please see my
test script below. Up to 21800 interned strings, the dict takes (only)
384kiB. It then grows, requiring 1536kiB. Whether or not having 22k
interned strings is "typical", I still don't know.

Wrt. your proposed change, I would be worried about maintainability,
in particular if it would copy parts of the set implementation.

Regards,
Martin

import gc, sys
def find_interned_dict():
    cand = None
    for o in gc.get_objects():
        if not isinstance(o, dict):
            continue
        if "find_interned_dict" not in o:
            continue
        for k,v in o.iteritems():
            if k is not v:
                break
        else:
            assert not cand
            cand = o
    return cand

d = find_interned_dict()
print len(d), sys.getsizeof(d)

l = []
for i in range(20000):
    if i%100==0:
        print len(d), sys.getsizeof(d)

    l.append(intern(repr(i)))

From benjamin at python.org  Thu Apr  9 21:17:39 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 9 Apr 2009 14:17:39 -0500
Subject: [Python-Dev] calling dictresize outside dictobject.c
In-Reply-To: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu>
References: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu>
Message-ID: <1afaf6160904091217g30cbda5bt27529a4fe44e5f0e@mail.gmail.com>

Hi Dan,
Thanks for your interest.

2009/4/6 Dan Schult <dschult at colgate.edu>:
> Hi,
> I'm trying to write a C extension which is a subclass of dict.
> I want to do something like a setdefault() but with a single lookup.
>
> Looking through the dictobject code, the three workhorse
> routines lookdict, insertdict and dictresize are not available
> directly for functions outside dictobject.c,
> but I can get at lookdict through dict->ma_lookup().
>
> So I use lookdict to get the PyDictEntry (call it ep) I'm looking for.
> The comments for lookdict say ep is ready to be set... so I do that.
> Then I check whether the dict needs to be resized--following the
> nice example of PyDict_SetItem. ?But I can't call dictresize to finish
> off the process.
>
> Should I be using PyDict_SetItem directly? ?No... it does its own lookup.
> I don't want a second lookup! ? I already know which entry will be filled.
>
> So then I look at the code for setdefault and it also does
> a double lookup for checking and setting an entry.
>
> What subtle issue am I missing?
> Why does setdefault do a double lookup?
> More globally, why isn't dictresize available through the C-API?

Because it's not useful outside the intimate implementation details of
dictobject.c

>
> If there isn't a reason to do a double lookup I have a patch for setdefault,
> but I thought I should ask here first.

Raymond tells me the cost of the second lookup is negligible because
of caching, but PyObject_Hash needn't be called two times. He's
working on a patch later today.

-- 
Regards,
Benjamin

From alexandre at peadrop.com  Thu Apr  9 21:51:15 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Thu, 9 Apr 2009 15:51:15 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <loom.20090409T043042-835@post.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com> 
	<loom.20090409T043042-835@post.gmane.org>
Message-ID: <acd65fa20904091251icd7dac1gfce4a97d02522130@mail.gmail.com>

On Thu, Apr 9, 2009 at 1:15 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> As for reading/writing bytes over the wire, JSON is often used in the same
> context as HTML: you are supposed to know the charset and decode/encode the
> payload using that charset. However, the RFC specifies a default encoding of
> utf-8. (*)
>
>
> (*) http://www.ietf.org/rfc/rfc4627.txt
>

That is one short and sweet RFC. :-)

> The RFC also specifies a discrimination algorithm for non-supersets of ASCII
> (?Since the first two characters of a JSON text will always be ASCII
> ? characters [RFC0020], it is possible to determine whether an octet
> ? stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
> ? at the pattern of nulls in the first four octets.?), but it is not
> implemented in the json module:
>

Given the RFC specifies that the encoding used should be one of the
encodings defined by Unicode, wouldn't be a better idea to remove the
"unicode" support, instead? To me, it would make sense to use the
detection algorithms for Unicode to sniff the encoding of the JSON
stream and then use the detected encoding to decode the strings embed
in the JSON stream.

Cheers,
-- Alexandre

From john.arbash.meinel at gmail.com  Thu Apr  9 21:59:02 2009
From: john.arbash.meinel at gmail.com (John Arbash Meinel)
Date: Thu, 09 Apr 2009 14:59:02 -0500
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE4740.2040205@v.loewis.de>
References: <49DE0DF6.1040900@arbash-meinel.com>	<d38f5330904091042l415c8912r9ca12fefda7b1ce1@mail.gmail.com>
	<49DE3C5B.6020308@gmail.com> <49DE4740.2040205@v.loewis.de>
Message-ID: <49DE5386.7070908@gmail.com>

...

> I like your rationale (save memory) much more, and was asking in the
> tracker for specific numbers, which weren't forthcoming.
> 

...

> Now that you brought up a specific numbers, I tried to verify them,
> and found them correct (although a bit unfortunate), please see my
> test script below. Up to 21800 interned strings, the dict takes (only)
> 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k
> interned strings is "typical", I still don't know.

Given that every variable name in any file is interned, it can grow
pretty rapidly. As an extreme case, consider the file
"win32/lib/winerror.py" which tracks all possible win32 errors.

>>> import winerror
>>> print len(winerror.__dict__)
1872

So a single error file has 1.9k strings.

My python version (2.5.2) doesn't have 'sys.getsizeof()', but otherwise
your code looks correct.

If all I do is find the interned dict, I see:
>>> print len(d)
5037

So stock python, without importing much extra (just os, sys, gc, etc.)
has almost 5k strings already.

I don't have a great regex yet for just extracting how many unique
strings there are in a given bit of source code.

However, if I do:

import gc, sys
def find_interned_dict():
    cand = None
    for o in gc.get_objects():
        if not isinstance(o, dict):
            continue
        if "find_interned_dict" not in o:
            continue
        for k,v in o.iteritems():
            if k is not v:
                break
        else:
            assert not cand
            cand = o
    return cand

d = find_interned_dict()
print len(d)

# Just import a few of the core structures
from bzrlib import branch, repository, workingtree, builtins
print len(d)

I start at 5k strings, and after just importing the important bits of
bzrlib, I'm at:
19,316

Now, the bzrlib source code isn't particularly huge. It is about 3.7MB /
91k lines of .py files (that is, without importing the test suite).

Memory consumption with just importing bzrlib shows up at 15MB, with
300kB taken up by the intern dict.

If I then import some extra bits of bzrlib, like http support, ftp
support, and sftp support (which brings in python's httplib, and
paramiko, and ssh/sftp implementation), I'm up to:
>>> print len(d)
25186

Memory has jumped to 23MB, (interned is now 1.57MB) and I haven't
actually done anything but import python code yet. If I sum the size of
the PyString objects held in intern() it ammounts to 940KB. Though they
refer to only 335KB of char data. (or an average of 13 bytes per string).

> 
> Wrt. your proposed change, I would be worried about maintainability,
> in particular if it would copy parts of the set implementation.

Right, so in the first part, I would just use Set(), as it could then
save 1/3rd of the memory it uses today. (Dropping down to 1MB from 1.5MB.)

I don't have numbers on how much that would improve CPU times, I would
imagine improving 'intern()' would impact import times more than run
times, simply because import time is interning a *lot* of strings.

Though honestly, Bazaar would really like this, because startup overhead
for us is almost 400ms to 'do nothing', which is a lot for a command
line app.

John
=:->

From martin at v.loewis.de  Thu Apr  9 22:05:28 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Apr 2009 22:05:28 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <351C98023EB24D4EACF1794F7CDAC272@RaymondLaptop1>
References: <loom.20090408T110540-221@post.gmane.org>	<49DCEDFF.7050708@v.loewis.de><loom.20090408T231751-930@post.gmane.org>
	<49DD8DC8.8020302@v.loewis.de>
	<351C98023EB24D4EACF1794F7CDAC272@RaymondLaptop1>
Message-ID: <49DE5508.7000309@v.loewis.de>

>> I can understand that you don't want to spend much time on it. How
>> about removing it from 3.1? We could re-add it when long-term support
>> becomes more likely.
> 
> I'm speechless.

It seems that my statement has surprised you, so let me explain:

I think we should refrain from making design decisions (such as
API decisions) without Bob's explicit consent, unless we assign
a new maintainer for the simplejson module (perhaps just for the
3k branch, which perhaps would be a fork from Bob's code).

Antoine suggests that Bob did not comment on the issues at hand,
therefore, we should not proceed with the proposed design. Since
the 3.1 release is only a few weeks ahead, we have the choice of
either shipping with the broken version that is currently in the
3k branch, or drop the module from the 3k branch. I believe our
users are better served by not having to waste time with a module
that doesn't quite work, or may change.

Regards,
Martin

From martin at v.loewis.de  Thu Apr  9 22:13:40 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 09 Apr 2009 22:13:40 +0200
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE5386.7070908@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com>	<d38f5330904091042l415c8912r9ca12fefda7b1ce1@mail.gmail.com>
	<49DE3C5B.6020308@gmail.com> <49DE4740.2040205@v.loewis.de>
	<49DE5386.7070908@gmail.com>
Message-ID: <49DE56F4.6050904@v.loewis.de>

> I don't have numbers on how much that would improve CPU times, I would
> imagine improving 'intern()' would impact import times more than run
> times, simply because import time is interning a *lot* of strings.
> 
> Though honestly, Bazaar would really like this, because startup overhead
> for us is almost 400ms to 'do nothing', which is a lot for a command
> line app.

Maybe I misunderstand your proposed change: how could the representation
of the interning dict possibly change the runtime of interning? (let
alone significantly)

Regards,
Martin

From martin at v.loewis.de  Thu Apr  9 22:19:43 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 09 Apr 2009 22:19:43 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <acd65fa20904091251icd7dac1gfce4a97d02522130@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<acd65fa20904091251icd7dac1gfce4a97d02522130@mail.gmail.com>
Message-ID: <49DE585F.6040209@v.loewis.de>

Alexandre Vassalotti wrote:
> On Thu, Apr 9, 2009 at 1:15 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> As for reading/writing bytes over the wire, JSON is often used in the same
>> context as HTML: you are supposed to know the charset and decode/encode the
>> payload using that charset. However, the RFC specifies a default encoding of
>> utf-8. (*)
>>
>>
>> (*) http://www.ietf.org/rfc/rfc4627.txt
>>
> 
> That is one short and sweet RFC. :-)

It is indeed well-specified. Unfortunately, it only talks about the
application/json type; the pre-existing other versions of json in MIME
types vary widely, such as text/plain (possibly with a charset=
parameter), text/json, or text/javascript. For these, the RFC doesn't
apply.

> Given the RFC specifies that the encoding used should be one of the
> encodings defined by Unicode, wouldn't be a better idea to remove the
> "unicode" support, instead? To me, it would make sense to use the
> detection algorithms for Unicode to sniff the encoding of the JSON
> stream and then use the detected encoding to decode the strings embed
> in the JSON stream.

That might be reasonable. (but then, I also stand by my view that we
shouldn't proceed without Bob's approval).

Regards,
Martin

From john.arbash.meinel at gmail.com  Thu Apr  9 22:22:04 2009
From: john.arbash.meinel at gmail.com (John Arbash Meinel)
Date: Thu, 09 Apr 2009 15:22:04 -0500
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE56F4.6050904@v.loewis.de>
References: <49DE0DF6.1040900@arbash-meinel.com>	<d38f5330904091042l415c8912r9ca12fefda7b1ce1@mail.gmail.com>
	<49DE3C5B.6020308@gmail.com> <49DE4740.2040205@v.loewis.de>
	<49DE5386.7070908@gmail.com> <49DE56F4.6050904@v.loewis.de>
Message-ID: <49DE58EC.4000803@gmail.com>

Martin v. L?wis wrote:
>> I don't have numbers on how much that would improve CPU times, I would
>> imagine improving 'intern()' would impact import times more than run
>> times, simply because import time is interning a *lot* of strings.
>>
>> Though honestly, Bazaar would really like this, because startup overhead
>> for us is almost 400ms to 'do nothing', which is a lot for a command
>> line app.
> 
> Maybe I misunderstand your proposed change: how could the representation
> of the interning dict possibly change the runtime of interning? (let
> alone significantly)
> 
> Regards,
> Martin
> 

Decreasing memory consumption lets more things fit in cache. Once the
size of 'interned' is greater than fits into L2 cache, you start paying
the cost of a full memory fetch, which is usually measured in 100s of
cpu cycles.

Avoiding double lookups in the dictionary would be less overhead, though
the second lookup is probably pretty fast if there are no collisions,
since everything would already be in the local CPU cache.

If we were dealing in objects that were KB in size, it wouldn't matter.
But as the intern dict quickly gets into MB, it starts to make a bigger
difference.

How big of a difference would be very CPU and dataset size specific. But
certainly caches make certain things much faster, and once you overflow
a cache, performance can take a surprising turn.

So my primary goal is certainly a decrease of memory consumption. I
think it will have a small knock-on effect of improving performance, I
don't have anything to give concrete numbers.

Also, consider that resizing has to evaluate every object, thus paging
in all X bytes, and assigning to another 2X bytes. Cutting X by
(potentially 3), would probably have a small but measurable effect.

John
=:->

From steve at holdenweb.com  Thu Apr  9 22:42:21 2009
From: steve at holdenweb.com (Steve Holden)
Date: Thu, 09 Apr 2009 16:42:21 -0400
Subject: [Python-Dev] BLOBs in Pg
In-Reply-To: <p04330104c603ed41892f@[192.168.123.162]>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<grkodk$j4p$1@ger.gmane.org>	<p04330100c603badeb135@[192.168.123.162]>	<grl78j$7sl$1@ger.gmane.org>	<p04330103c603daff3e8b@[192.168.123.162]>	<20090409172424.GD26429@phd.pp.ru>
	<p04330104c603ed41892f@[192.168.123.162]>
Message-ID: <grlmje$q0k$1@ger.gmane.org>

Tony Nelson wrote:
> At 21:24 +0400 04/09/2009, Oleg Broytmann wrote:
>> On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote:
>>> I use MySQL, but sort of intend to learn PostgreSQL.  I didn't know that
>>> PostgreSQL has no real support for BLOBs.
>>   I think it has - BYTEA data type.
> 
> So it does; I see that now that I've opened up the PostgreSQL docs.  I
> don't find escaping data to be a problem -- I do it for all untrusted data.
> 
You shouldn't have to when you are using parameterized queries.

> So, after all, there isn't an example of a database that makes onerous the
> storing of email and other such byte-oriented data, and Python's email
> package has no need for workarounds in that area.

Create a table:

CREATE TABLE tst
(
   id serial,
   byt bytea,
    PRIMARY KEY (id)
) WITH (OIDS=FALSE)
;
ALTER TABLE tst OWNER TO steve;

The following program prints "0":

import psycopg2 as db
conn = db.connect(database="maildb", user="@@@", password="@@@",
host="localhost", port=5432)
curs = conn.cursor()
curs.execute("DELETE FROM tst")
curs.execute("INSERT INTO tst (byt) VALUES (%s)",
             ("".join(chr(i) for i in range(256)), ))
conn.commit()
curs.execute("SELECT byt FROM tst")
for st, in curs.fetchall():
    print len(st)

If I change the date to use range(1, 256) I get a ProgrammingError fron
PostgreSQL "invalid input syntax for type bytea".

If I can't pass a 256-byte string into a BLOB and get it back without
anything like this happening then there's *something* in the chain that
makes the database useless. My current belief is that this something is
fairly deeply embedded in the PostgreSQL engine. No "syntax" should be
necessary.

I suppose if we have to go round again on this we should take it to
email as we have gotten pretty far off-topic for python-dev.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Watch PyCon on video now!          http://pycon.blip.tv/

From aahz at pythoncraft.com  Thu Apr  9 22:53:26 2009
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 9 Apr 2009 13:53:26 -0700
Subject: [Python-Dev] BLOBs in Pg
In-Reply-To: <grlmje$q0k$1@ger.gmane.org>
References: <ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<p04330100c603badeb135@[192.168.123.162]>
	<grl78j$7sl$1@ger.gmane.org>
	<p04330103c603daff3e8b@[192.168.123.162]>
	<20090409172424.GD26429@phd.pp.ru>
	<p04330104c603ed41892f@[192.168.123.162]>
	<grlmje$q0k$1@ger.gmane.org>
Message-ID: <20090409205326.GA2807@panix.com>

On Thu, Apr 09, 2009, Steve Holden wrote:
>
> import psycopg2 as db
> conn = db.connect(database="maildb", user="@@@", password="@@@",
> host="localhost", port=5432)
> curs = conn.cursor()
> curs.execute("DELETE FROM tst")
> curs.execute("INSERT INTO tst (byt) VALUES (%s)",
>              ("".join(chr(i) for i in range(256)), ))
> conn.commit()
> curs.execute("SELECT byt FROM tst")
> for st, in curs.fetchall():
>     print len(st)
> 
> If I change the date to use range(1, 256) I get a ProgrammingError fron
> PostgreSQL "invalid input syntax for type bytea".
> 
> If I can't pass a 256-byte string into a BLOB and get it back without
> anything like this happening then there's *something* in the chain that
> makes the database useless. My current belief is that this something is
> fairly deeply embedded in the PostgreSQL engine. No "syntax" should be
> necessary.

You're not using a parameterized query.  I suggest you post to c.l.py for
more information.  ;-)
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?

From phd at phd.pp.ru  Thu Apr  9 23:12:17 2009
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Fri, 10 Apr 2009 01:12:17 +0400
Subject: [Python-Dev] BLOBs in Pg
In-Reply-To: <grlmje$q0k$1@ger.gmane.org>
References: <ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<p04330100c603badeb135@[192.168.123.162]>
	<grl78j$7sl$1@ger.gmane.org>
	<p04330103c603daff3e8b@[192.168.123.162]>
	<20090409172424.GD26429@phd.pp.ru>
	<p04330104c603ed41892f@[192.168.123.162]>
	<grlmje$q0k$1@ger.gmane.org>
Message-ID: <20090409211217.GA7897@phd.pp.ru>

On Thu, Apr 09, 2009 at 04:42:21PM -0400, Steve Holden wrote:
> If I can't pass a 256-byte string into a BLOB and get it back without
> anything like this happening then there's *something* in the chain that
> makes the database useless.

import psycopg2

con = psycopg2.connect(database="test")
cur = con.cursor()
cur.execute("CREATE TABLE test (id serial, data BYTEA)")
cur.execute('INSERT INTO test (data) VALUES (%s)', (psycopg2.Binary(''.join([chr(i) for i in range(256)])),))
cur.execute('SELECT * FROM test ORDER BY id')
for rec in cur.fetchall():
   print rec[0], type(rec[1]), repr(str(rec[1]))

Result:

1 <type 'buffer'> '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

   What am I doing wrong?

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From bob at redivi.com  Thu Apr  9 23:13:50 2009
From: bob at redivi.com (Bob Ippolito)
Date: Thu, 9 Apr 2009 14:13:50 -0700
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49DE5508.7000309@v.loewis.de>
References: <loom.20090408T110540-221@post.gmane.org>
	<49DCEDFF.7050708@v.loewis.de>
	<loom.20090408T231751-930@post.gmane.org>
	<49DD8DC8.8020302@v.loewis.de>
	<351C98023EB24D4EACF1794F7CDAC272@RaymondLaptop1>
	<49DE5508.7000309@v.loewis.de>
Message-ID: <6a36e7290904091413i10994056k754b6ce04a93c0c5@mail.gmail.com>

On Thu, Apr 9, 2009 at 1:05 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>>> I can understand that you don't want to spend much time on it. How
>>> about removing it from 3.1? We could re-add it when long-term support
>>> becomes more likely.
>>
>> I'm speechless.
>
> It seems that my statement has surprised you, so let me explain:
>
> I think we should refrain from making design decisions (such as
> API decisions) without Bob's explicit consent, unless we assign
> a new maintainer for the simplejson module (perhaps just for the
> 3k branch, which perhaps would be a fork from Bob's code).
>
> Antoine suggests that Bob did not comment on the issues at hand,
> therefore, we should not proceed with the proposed design. Since
> the 3.1 release is only a few weeks ahead, we have the choice of
> either shipping with the broken version that is currently in the
> 3k branch, or drop the module from the 3k branch. I believe our
> users are better served by not having to waste time with a module
> that doesn't quite work, or may change.

Most of my time to spend on json/simplejson and these mailing list
discussions is on weekends, I try not to bother with it when I'm busy
doing Actual Work unless there is a bug or some other issue that needs
more immediate attention. I also wasn't aware that I was expected to
comment on those issues. I'm CC'ed on the discussion for issue4136 but
I don't see any unanswered questions directed at me.

I have the issues (issue5723, issue4136) starred in my gmail and I
planned to look at it more closely later, hopefully on Friday or
Saturday.

As far as Python 3 goes, I honestly have not yet familiarized myself
with the changes to the IO infrastructure and what the new idioms are.
At this time, I can't make any educated decisions with regard to how
it should be done because I don't know exactly how bytes are supposed
to work and what the common idioms are for other libraries in the
stdlib that do similar things. Until I figure that out, someone else
is better off making decisions about the Python 3 version. My guess is
that it should work the same way as it does in Python 2.x: take bytes
or unicode input in loads (which means encoding is still relevant). I
also think the output of dumps should also be bytes, since it is a
serialization, but I am not sure how other libraries do this in Python
3 because one could argue that it is also text. If other libraries
that do text/text encodings (e.g. binascii, mimelib, ...) use str for
input and output instead of bytes then maybe Antoine's changes are the
right solution and I just don't know better because I'm not up to
speed with how people write Python 3 code.

I'll do my best to find some time to look into Python 3 more closely
soon, but thus far I have not been very motivated to do so because
Python 3 isn't useful for us at work and twiddling syntax isn't a very
interesting problem for me to solve.

-bob

From martin at v.loewis.de  Fri Apr 10 00:07:23 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Fri, 10 Apr 2009 00:07:23 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <6a36e7290904091413i10994056k754b6ce04a93c0c5@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>	<49DCEDFF.7050708@v.loewis.de>	<loom.20090408T231751-930@post.gmane.org>	<49DD8DC8.8020302@v.loewis.de>	<351C98023EB24D4EACF1794F7CDAC272@RaymondLaptop1>	<49DE5508.7000309@v.loewis.de>
	<6a36e7290904091413i10994056k754b6ce04a93c0c5@mail.gmail.com>
Message-ID: <49DE719B.8050101@v.loewis.de>

> As far as Python 3 goes, I honestly have not yet familiarized myself
> with the changes to the IO infrastructure and what the new idioms are.
> At this time, I can't make any educated decisions with regard to how
> it should be done because I don't know exactly how bytes are supposed
> to work and what the common idioms are for other libraries in the
> stdlib that do similar things.

It's really very similar to 2.x: the "bytes" type is to used in all
interfaces that operate on byte sequences that may or may not represent
characters; in particular, for interface where the operating system
deliberately uses bytes - ie. low-level file IO and socket IO; also
for cases where the encoding is embedded in the stream that still
needs to be processed (e.g. XML parsing).

(Unicode) strings should be used where the data is truly text by
nature, i.e. where no encoding information is necessary to find out
what characters are intended. It's used on interfaces where the
encoding is known (e.g. text IO, where the encoding is specified
on opening, XML parser results, with the declared encoding, and
GUI libraries, which naturally expect text).

> Until I figure that out, someone else
> is better off making decisions about the Python 3 version.

Some of us can certainly explain to you how this is supposed to
work. However, we need you to check any assumption against the
known use cases - would the users of the module be happy if it
worked one way or the other?

> My guess is
> that it should work the same way as it does in Python 2.x: take bytes
> or unicode input in loads (which means encoding is still relevant). I
> also think the output of dumps should also be bytes, since it is a
> serialization, but I am not sure how other libraries do this in Python
> 3 because one could argue that it is also text.

This, indeed, had been an endless debate, and, in the end, the decision
was somewhat arbitrary. Here are some examples:

- base64.encodestring expects bytes (naturally, since it is supposed to
  encode arbitrary binary data), and produces bytes (debatably)
- binascii.b2a_hex likewise (expect and produce bytes)
- pickle.dumps produces bytes (uniformly, both for binary and text
  pickles)
- marshal.dumps likewise
- email.message.Message().as_string produces a (unicode) string
  (see Barry's recent thread on whether that's a good thing; the
  email package hasn't been fully ported to 3k, either)
- the XML libraries (continue to) parse bytes, and produce
  Unicode strings
- for the IO libraries, see above

> If other libraries
> that do text/text encodings (e.g. binascii, mimelib, ...) use str for
> input and output

See above - most of them don't; mimetools is no longer (replaced by
email package)

> instead of bytes then maybe Antoine's changes are the
> right solution and I just don't know better because I'm not up to
> speed with how people write Python 3 code.

There isn't too much fresh end-user code out there, so we can't really
tell, either. As for standard library users - users will do whatever
the library forces them to do.

This is why I'm so concerned about this issue: we should get it right,
or not done at all. I still think you would be the best person to
determine what is right.

> I'll do my best to find some time to look into Python 3 more closely
> soon, but thus far I have not been very motivated to do so because
> Python 3 isn't useful for us at work and twiddling syntax isn't a very
> interesting problem for me to solve.

And I didn't expect you to - it seems people are quite willing to do
the actual work, as long as there is some guidance.

Regards,
Martin

From martin at v.loewis.de  Fri Apr 10 00:10:18 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Fri, 10 Apr 2009 00:10:18 +0200
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE58EC.4000803@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com>	<d38f5330904091042l415c8912r9ca12fefda7b1ce1@mail.gmail.com>	<49DE3C5B.6020308@gmail.com>
	<49DE4740.2040205@v.loewis.de>	<49DE5386.7070908@gmail.com>
	<49DE56F4.6050904@v.loewis.de> <49DE58EC.4000803@gmail.com>
Message-ID: <49DE724A.1070300@v.loewis.de>

> Also, consider that resizing has to evaluate every object, thus paging
> in all X bytes, and assigning to another 2X bytes. Cutting X by
> (potentially 3), would probably have a small but measurable effect.

I'm *very* skeptical about claims on performance in the absence of
actual measurements. Too many effects come together, so the actual
performance is difficult to predict (and, for that prediction, you
would need *at least* a work load that you want to measure - starting
bzr would be such a workload, of course).

Regards,
Martin

From steve at holdenweb.com  Fri Apr 10 01:56:25 2009
From: steve at holdenweb.com (Steve Holden)
Date: Thu, 09 Apr 2009 19:56:25 -0400
Subject: [Python-Dev] BLOBs in Pg
In-Reply-To: <20090409211217.GA7897@phd.pp.ru>
References: <ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<grkodk$j4p$1@ger.gmane.org>	<p04330100c603badeb135@[192.168.123.162]>	<grl78j$7sl$1@ger.gmane.org>	<p04330103c603daff3e8b@[192.168.123.162]>	<20090409172424.GD26429@phd.pp.ru>	<p04330104c603ed41892f@[192.168.123.162]>	<grlmje$q0k$1@ger.gmane.org>
	<20090409211217.GA7897@phd.pp.ru>
Message-ID: <grm1vb$npa$1@ger.gmane.org>

Oleg Broytmann wrote:
> On Thu, Apr 09, 2009 at 04:42:21PM -0400, Steve Holden wrote:
>> If I can't pass a 256-byte string into a BLOB and get it back without
>> anything like this happening then there's *something* in the chain that
>> makes the database useless.
> 
> import psycopg2
> 
> con = psycopg2.connect(database="test")
> cur = con.cursor()
> cur.execute("CREATE TABLE test (id serial, data BYTEA)")
> cur.execute('INSERT INTO test (data) VALUES (%s)', (psycopg2.Binary(''.join([chr(i) for i in range(256)])),))
> cur.execute('SELECT * FROM test ORDER BY id')
> for rec in cur.fetchall():
>    print rec[0], type(rec[1]), repr(str(rec[1]))
> 
> Result:
> 
> 1 <type 'buffer'> '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
> 
>    What am I doing wrong?
> 
> Oleg.

Corresponding with me, probably. Thank you Oleg. I feel suddenly saner
again.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
Holden Web LLC                 http://www.holdenweb.com/
Watch PyCon on video now!          http://pycon.blip.tv/

From jake at youtube.com  Fri Apr 10 02:37:56 2009
From: jake at youtube.com (Jake McGuire)
Date: Thu, 9 Apr 2009 17:37:56 -0700
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE4740.2040205@v.loewis.de>
References: <49DE0DF6.1040900@arbash-meinel.com>	<d38f5330904091042l415c8912r9ca12fefda7b1ce1@mail.gmail.com>
	<49DE3C5B.6020308@gmail.com> <49DE4740.2040205@v.loewis.de>
Message-ID: <0C73464C-CA60-4DAA-9E7B-88D9D0F5FD42@youtube.com>

On Apr 9, 2009, at 12:06 PM, Martin v. L?wis wrote:
> Now that you brought up a specific numbers, I tried to verify them,
> and found them correct (although a bit unfortunate), please see my
> test script below. Up to 21800 interned strings, the dict takes (only)
> 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k
> interned strings is "typical", I still don't know.
>
> Wrt. your proposed change, I would be worried about maintainability,
> in particular if it would copy parts of the set implementation.

I connected to a random one of our processes, which has been running  
for a typical amount of time and is currently at ~300MB RSS.

(gdb) p *(PyDictObject*)interned
$2 = {ob_refcnt = 1,
       ob_type = 0x8121240,
       ma_fill = 97239,
       ma_used = 95959,
       ma_mask = 262143,
       ma_table = 0xa493c008,
       ....}

Going from 3MB to 2.25MB isn't much, but it's not nothing, either.

I'd be skeptical of cache performance arguments given that the strings  
used in any particular bit of code should be spread pretty much evenly  
throughout the hash table, and 3MB seems solidly bigger than any L2  
cache I know of.  You should be able to get meaningful numbers out of  
a C profiler, but I'd be surprised to see the act of interning taking  
a noticeable amount of time.

-jake

From greg.ewing at canterbury.ac.nz  Fri Apr 10 03:01:26 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 10 Apr 2009 13:01:26 +1200
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE0DF6.1040900@arbash-meinel.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
Message-ID: <49DE9A66.2020109@canterbury.ac.nz>

John Arbash Meinel wrote:
> And when you look at the intern function, it doesn't use
> setdefault logic, it actually does a get() followed by a set(), which
> means the cost of interning is 1-2 lookups depending on likelyhood, etc.

Keep in mind that intern() is called fairly rarely, mostly
only at module load time. It may not be worth attempting
to speed it up.

-- 
Greg

From benjamin at python.org  Fri Apr 10 03:01:50 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 9 Apr 2009 20:01:50 -0500
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE9A66.2020109@canterbury.ac.nz>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<49DE9A66.2020109@canterbury.ac.nz>
Message-ID: <1afaf6160904091801k2d5ffccdm700bee842bf1a1f5@mail.gmail.com>

2009/4/9 Greg Ewing <greg.ewing at canterbury.ac.nz>:
> John Arbash Meinel wrote:
>>
>> And when you look at the intern function, it doesn't use
>> setdefault logic, it actually does a get() followed by a set(), which
>> means the cost of interning is 1-2 lookups depending on likelyhood, etc.
>
> Keep in mind that intern() is called fairly rarely, mostly
> only at module load time. It may not be worth attempting
> to speed it up.

That's very important, though, for a command line tool for bazaar.
Even a few fractions of a second can make a difference in user
perception of speed.

-- 
Regards,
Benjamin

From greg.ewing at canterbury.ac.nz  Fri Apr 10 03:22:10 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 10 Apr 2009 13:22:10 +1200
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE31C9.103@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de>
	<49DE31C9.103@gmail.com>
Message-ID: <49DE9F42.5000704@canterbury.ac.nz>

John Arbash Meinel wrote:
> And the way intern is currently
> written, there is a third cost when the item doesn't exist yet, which is
> another lookup to insert the object.

That's even rarer still, since it only happens the first
time you load a piece of code that uses a given variable
name anywhere in any module.

-- 
Greg

From john.arbash.meinel at gmail.com  Fri Apr 10 03:24:04 2009
From: john.arbash.meinel at gmail.com (John Arbash Meinel)
Date: Thu, 09 Apr 2009 20:24:04 -0500
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE9F42.5000704@canterbury.ac.nz>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<49DE2AD4.6090605@cheimes.de>	<49DE31C9.103@gmail.com>
	<49DE9F42.5000704@canterbury.ac.nz>
Message-ID: <49DE9FB4.9060908@gmail.com>

Greg Ewing wrote:
> John Arbash Meinel wrote:
>> And the way intern is currently
>> written, there is a third cost when the item doesn't exist yet, which is
>> another lookup to insert the object.
> 
> That's even rarer still, since it only happens the first
> time you load a piece of code that uses a given variable
> name anywhere in any module.
> 

Somewhat true, though I know it happens 25k times during startup of
bzr... And I would be a *lot* happier if startup time was 100ms instead
of 400ms.

John
=:->

From nyamatongwe at gmail.com  Fri Apr 10 03:49:04 2009
From: nyamatongwe at gmail.com (Neil Hodgson)
Date: Fri, 10 Apr 2009 11:49:04 +1000
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <5b8d13220904081857w46237b57t82d8a4006f00adbb@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>
	<49DBD6F9.7030502@canterbury.ac.nz>
	<806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>
	<5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com>
	<806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com>
	<5b8d13220904081857w46237b57t82d8a4006f00adbb@mail.gmail.com>
Message-ID: <50862ebd0904091849q7f28fa5bmeaf3b9061629a1c6@mail.gmail.com>

   cmake does not produce relative paths in its generated make and
project files. There is an option CMAKE_USE_RELATIVE_PATHS which
appears to do this but the documentation says:

"""This option does not work for more complicated projects, and
relative paths are used when possible. In general, it is not possible
to move CMake generated makefiles to a different location regardless
of the value of this variable."""

   This means that generated Visual Studio project files will not work
for other people unless a particular absolute build location is
specified for everyone which will not suit most. Each person that
wants to build Python will have to run cmake before starting Visual
Studio thus increasing the prerequisites.

   Neil

From barry at python.org  Fri Apr 10 04:26:22 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Apr 2009 22:26:22 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <grkodk$j4p$1@ger.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
Message-ID: <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>

On Apr 9, 2009, at 8:07 AM, Steve Holden wrote:

> The real problem I came across in storing email in a relational  
> database
> was the inability to store messages as Unicode. Some messages have a
> body in one encoding and an attachment in another, so the only ways to
> store the messages are either as a monolithic bytes string that gets
> parsed when the individual components are required or as a sequence of
> components in the database's preferred encoding (if you want to keep  
> the
> original encoding most relational databases won't be able to help  
> unless
> you store the components as bytes).
>
> All in all, as you might expect from a system that's been growing up
> since 1970 or so, it can be quite intractable.

There are really two ways to look at an email message.  It's either an  
unstructured blob of bytes, or it's a structured tree of objects.   
Those objects have headers and payload.  The payload can be of any  
type, though I think it generally breaks down into "strings" for text/ 
* types and bytes for anything else (not counting multiparts).

The email package isn't a perfect mapping to this, which is something  
I want to improve.  That aside, I think storing a message in a  
database means storing some or all of the headers separately from the  
byte stream (or text?) of its payload.  That's for non-multipart  
types.  It would be more complicated to represent a message tree of  
course.

It does seem to make sense to think about headers as text header names  
and text header values.  Of course, header values can contain almost  
anything and there's an encoding to bring it back to 7-bit ASCII, but  
again, you really have two views of a header value.  Which you want  
really depends on your application.

Maybe you just care about the text of both the header name and value.   
In that case, I think you want the values as unicodes, and probably  
the headers as unicodes containing only ASCII.  So your table would be  
strings in both cases.  OTOH, maybe your application cares about the  
raw underlying encoded data, in which case the header names are  
probably still strings of ASCII-ish unicodes and the values are  
bytes.  It's this distinction (and I think the competing use cases)  
that make a true Python 3.x API for email more complicated.

Thinking about this stuff makes me nostalgic for the sloppy happy days  
of Python 2.x

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/cdf11303/attachment.pgp>

From barry at python.org  Fri Apr 10 04:29:12 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Apr 2009 22:29:12 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <66887.1239289730@parc.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<66887.1239289730@parc.com>
Message-ID: <BA8C3F0A-2664-4FD4-92BF-958694BF27E1@python.org>

On Apr 9, 2009, at 11:08 AM, Bill Janssen wrote:

> Barry Warsaw <barry at python.org> wrote:
>
>> Anyway, aside from that decision, I haven't come up with an
>> elegant way to allow /output/ in both bytes and strings (input is I
>> think theoretically easier by sniffing the arguments).
>
> Probably a good thing.  It just promotes more confusion to do things
> that way, IMO.

Very possibly so.  But applications will definitely want stuff like  
the text/plain payload as a unicode, or the image/gif payload as a  
bytes (or even as a PIL image or whatever).

Not that I think the email package needs to know about every content  
type under the sun, but I do think that it should be pluggable so as  
to allow applications to more conveniently access the data that way.   
Possibly the defaults should be unicodes for any text/* type and bytes  
for everything else.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/74827d90/attachment.pgp>

From barry at python.org  Fri Apr 10 04:38:11 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Apr 2009 22:38:11 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
Message-ID: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>

On Apr 9, 2009, at 11:55 AM, Daniel Stutzbach wrote:

> On Thu, Apr 9, 2009 at 6:01 AM, Barry Warsaw <barry at python.org> wrote:
> Anyway, aside from that decision, I haven't come up with an elegant  
> way to allow /output/ in both bytes and strings (input is I think  
> theoretically easier by sniffing the arguments).
>
> Won't this work? (assuming dumps() always returns a string)
>
> def dumpb(obj, encoding='utf-8', *args, **kw):
>     s = dumps(obj, *args, **kw)
>     return s.encode(encoding)

So, what I'm really asking is this.  Let's say you agree that there  
are use cases for accessing a header value as either the raw encoded  
bytes or the decoded unicode.  What should this return:

 >>> message['Subject']

The raw bytes or the decoded unicode?

Okay, so you've picked one.  Now how do you spell the other way?

The Message class probably has these explicit methods:

 >>> Message.get_header_bytes('Subject')
 >>> Message.get_header_string('Subject')

(or better names... it's late and I'm tired ;).  One of those maps to  
message['Subject'] but which is the more obvious choice?

Now, setting headers.  Sometimes you have some unicode thing and  
sometimes you have some bytes.  You need to end up with bytes in the  
ASCII range and you'd like to leave the header value unencoded if so.   
But in both cases, you might have bytes or characters outside that  
range, so you need an explicit encoding, defaulting to utf-8 probably.

 >>> Message.set_header('Subject', 'Some text', encoding='utf-8')
 >>> Message.set_header('Subject', b'Some bytes')

One of those maps to

 >>> message['Subject'] = ???

I'm open to any suggestions here!
-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/952033a1/attachment.pgp>

From barry at python.org  Fri Apr 10 04:40:30 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Apr 2009 22:40:30 -0400
Subject: [Python-Dev] email package Bytes vs Unicode (was Re: Dropping
	bytes "support" in json)
In-Reply-To: <grl78j$7sl$1@ger.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<grkodk$j4p$1@ger.gmane.org>
	<p04330100c603badeb135@[192.168.123.162]>
	<grl78j$7sl$1@ger.gmane.org>
Message-ID: <657BFEEA-04E3-418F-86C0-D2F80C75DB96@python.org>

On Apr 9, 2009, at 12:20 PM, Steve Holden wrote:

> PostgreSQL strongly encourages you to store text as encoded columns.
> Because emails lack an encoding it turns out this is a most  
> inconvenient
> storage type for it. Sadly BLOBs are such a pain in PostgreSQL that  
> it's
> easier to store the messages in external files and just use the
> relational database to index those files to retrieve content, so  
> that's
> what I ended up doing.

That's not insane for other reasons.  Do you really want to store 10MB  
of mp3 data in your database?

Which of course reminds me that I want to add an interface, probably  
to the parser and message class, to allow an application to store  
message payloads in other than memory.  Parsing and holding onto  
messages with huge payloads can kill some applications, when you might  
not care too much about the actual payload content.

Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/c79c2fa9/attachment-0001.pgp>

From barry at python.org  Fri Apr 10 04:41:55 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Apr 2009 22:41:55 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49DE3D9F.3000902@v.loewis.de>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<49DE3D9F.3000902@v.loewis.de>
Message-ID: <C2E1CF4C-4CF2-4F6C-8861-9B82027189D7@python.org>

On Apr 9, 2009, at 2:25 PM, Martin v. L?wis wrote:

>> This is an interesting question, and something I'm struggling with  
>> for
>> the email package for 3.x.  It turns out to be pretty convenient to  
>> have
>> both a bytes and a string API, both for input and output, but I think
>> email really wants to be represented internally as bytes.  Maybe.  Or
>> maybe just for content bodies and not headers, or maybe both.   
>> Anyway,
>> aside from that decision, I haven't come up with an elegant way to  
>> allow
>> /output/ in both bytes and strings (input is I think theoretically
>> easier by sniffing the arguments).
>
> If you allow for content-transfer-encoding: 8bit, I think there is  
> just
> no way to represent email as text. You have to accept conversion to,
> say, base64 (or quoted-unreadable) when converting an email message to
> text.

Agreed.  But applications will want to deal with some parts of the  
message as text on the boundaries.  Internally, it should be all bytes  
(although even that is a pain to write ;).

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/184e3c5f/attachment.pgp>

From aahz at pythoncraft.com  Fri Apr 10 04:52:03 2009
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 9 Apr 2009 19:52:03 -0700
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
Message-ID: <20090410025203.GA199@panix.com>

On Thu, Apr 09, 2009, Barry Warsaw wrote:
>
> So, what I'm really asking is this.  Let's say you agree that there are 
> use cases for accessing a header value as either the raw encoded bytes or 
> the decoded unicode.  What should this return:
>
> >>> message['Subject']
>
> The raw bytes or the decoded unicode?

Let's make that the raw bytes by default -- we can add a parameter to
Message() to specify that the default where possible is unicode for
returned values, if that isn't too painful.

Here's my reasoning: ultimately, everyone NEEDS to understand that the
underlying transport for e-mail is bytes (similar to sockets).  We do
people no favors by pasting over this too much.  We can overlay
convenience at various points, but except for text payloads, everything
should be bytes by default.  

Even for text payloads, I'm not entirely certain the default shouldn't be
bytes: consider an HTML attachment that you want to compare against the
output from a webserver.  Still, as long as it's easy to get bytes for
text payloads, I think overall I'm still leaning toward unicode for them.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?

From glyph at divmod.com  Fri Apr 10 05:11:51 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 10 Apr 2009 03:11:51 -0000
Subject: [Python-Dev] the email module, text,
	and bytes (was Re:  Dropping bytes "support" in json)
In-Reply-To: <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
Message-ID: <20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com>

On 02:26 am, barry at python.org wrote:
>There are really two ways to look at an email message.  It's either an 
>unstructured blob of bytes, or it's a structured tree of objects. 
>Those objects have headers and payload.  The payload can be of any 
>type, though I think it generally breaks down into "strings" for text/ 
>* types and bytes for anything else (not counting multiparts).

I think this is a problematic way to model bytes vs. text; it gives text 
a special relationship to bytes which should be avoided.

IMHO the right way to think about domains like this is a multi-level 
representation.  The "low level" representation is always bytes, whether 
your MIME type is text/whatever or application/x-i-dont-know.

The thing that's "special" about text is that it's a "high level" 
representation that the standard library can know about.  But the 
'email' package ought to support being extended to support other types 
just as well.  For example, I want to ask for image/png content as 
PIL.Image objects, not bags of bytes.  Of course this presupposes some 
way for PIL itself to get at some bytes, but then you need the email 
module itself to get at the bytes to convert to text in much the same 
way.  There also needs to be layering at the level of 
bytes->base64->some different bytes->PIL->Image.  There are mail clients 
that will base64-encode unusual encodings so you have to do that same 
layering for text sometimes.

I'm also being somewhat handwavy with talk of "low" and "high" level 
representations; of course there are actually multiple levels beyond 
that.  I might want text/x-python content to show up as an AST, but the 
intermediate DOM-parsing representation really wants to operate on 
characters.  Similarly for a DOM and text/html content.  (Modulo the 
usual encoding-detection weirdness present in parsers.)

So, as long as there's a crisp definition of what layer of the MIME 
stack one is operating on, I don't think that there's really any 
ambiguity at all about what type you should be getting.

From barry at python.org  Fri Apr 10 05:03:35 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Apr 2009 23:03:35 -0400
Subject: [Python-Dev] the email module, text,
	and bytes (was Re:  Dropping bytes "support" in json)
In-Reply-To: <20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
	<20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com>
Message-ID: <ACC56383-7F1B-4CB0-908F-E75E1390AE51@python.org>

On Apr 9, 2009, at 11:11 PM, glyph at divmod.com wrote:

> I think this is a problematic way to model bytes vs. text; it gives  
> text a special relationship to bytes which should be avoided.
>
> IMHO the right way to think about domains like this is a multi-level  
> representation.  The "low level" representation is always bytes,  
> whether your MIME type is text/whatever or application/x-i-dont-know.

This is a really good point, and I really should be clearer when  
describing my current thinking (sleep would help :).

> The thing that's "special" about text is that it's a "high level"  
> representation that the standard library can know about.  But the  
> 'email' package ought to support being extended to support other  
> types just as well.  For example, I want to ask for image/png  
> content as PIL.Image objects, not bags of bytes.  Of course this  
> presupposes some way for PIL itself to get at some bytes, but then  
> you need the email module itself to get at the bytes to convert to  
> text in much the same way.  There also needs to be layering at the  
> level of bytes->base64->some different bytes->PIL->Image.  There are  
> mail clients that will base64-encode unusual encodings so you have  
> to do that same layering for text sometimes.
>
> I'm also being somewhat handwavy with talk of "low" and "high" level  
> representations; of course there are actually multiple levels beyond  
> that.  I might want text/x-python content to show up as an AST, but  
> the intermediate DOM-parsing representation really wants to operate  
> on characters.  Similarly for a DOM and text/html content.  (Modulo  
> the usual encoding-detection weirdness present in parsers.)

When I was talking about supporting text/* content types as strings, I  
was definitely thinking about using basically the same plug-in or  
higher level or whatever API to do that as you might use to get PIL  
images from an image/gif.

> So, as long as there's a crisp definition of what layer of the MIME  
> stack one is operating on, I don't think that there's really any  
> ambiguity at all about what type you should be getting.

In that case, we really need the bytes-in-bytes-out-bytes-in-the-chewy- 
center API first, and build things on top of that.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/25c444cd/attachment.pgp>

From barry at python.org  Fri Apr 10 05:05:37 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Apr 2009 23:05:37 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <20090410025203.GA199@panix.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
Message-ID: <663162E3-D2EB-4417-93D0-4764BC94646C@python.org>

On Apr 9, 2009, at 10:52 PM, Aahz wrote:

> On Thu, Apr 09, 2009, Barry Warsaw wrote:
>>
>> So, what I'm really asking is this.  Let's say you agree that there  
>> are
>> use cases for accessing a header value as either the raw encoded  
>> bytes or
>> the decoded unicode.  What should this return:
>>
>>>>> message['Subject']
>>
>> The raw bytes or the decoded unicode?
>
> Let's make that the raw bytes by default -- we can add a parameter to
> Message() to specify that the default where possible is unicode for
> returned values, if that isn't too painful.

I don't know whether the parameter thing will work or not, but you're  
probably right that we need to get the bytes-everywhere API first.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/76fcfcb9/attachment.pgp>

From ncoghlan at gmail.com  Fri Apr 10 05:21:05 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 10 Apr 2009 13:21:05 +1000
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
Message-ID: <49DEBB21.70305@gmail.com>

Barry Warsaw wrote:
> I don't know whether the parameter thing will work or not, but you're
> probably right that we need to get the bytes-everywhere API first.

Given that json is a wire protocol, that sounds like the right approach
for json as well. Once bytes-everywhere works, then a text API can be
built on top of it, but it is difficult to build a bytes API on top of a
text one.

So I guess the IO library *is* the right model: bytes at the bottom of
the stack, with text as a wrapper around it (mediated by codecs).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From barry at python.org  Fri Apr 10 05:23:40 2009
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Apr 2009 23:23:40 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49DEBB21.70305@gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
Message-ID: <0047AD0A-7B5B-4703-96D6-BD26B9752E7D@python.org>

On Apr 9, 2009, at 11:21 PM, Nick Coghlan wrote:

> Barry Warsaw wrote:
>> I don't know whether the parameter thing will work or not, but you're
>> probably right that we need to get the bytes-everywhere API first.
>
> Given that json is a wire protocol, that sounds like the right  
> approach
> for json as well. Once bytes-everywhere works, then a text API can be
> built on top of it, but it is difficult to build a bytes API on top  
> of a
> text one.

Agreed!

> So I guess the IO library *is* the right model: bytes at the bottom of
> the stack, with text as a wrapper around it (mediated by codecs).

Yes, that's a very interesting (and proven?) model.  I don't quite see  
how we could apply that email and json, but it seems like there's a  
good idea there. ;)

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/c1447da2/attachment.pgp>

From tonynelson at georgeanelson.com  Fri Apr 10 05:41:58 2009
From: tonynelson at georgeanelson.com (Tony Nelson)
Date: Thu, 9 Apr 2009 23:41:58 -0400
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
Message-ID: <p04330101c6046b191e4a@[192.168.123.162]>

At 22:38 -0400 04/09/2009, Barry Warsaw wrote:
 ...
>So, what I'm really asking is this.  Let's say you agree that there
>are use cases for accessing a header value as either the raw encoded
>bytes or the decoded unicode.  What should this return:
>
> >>> message['Subject']
>
>The raw bytes or the decoded unicode?

That's an easy one:  Subject: is an unstructured header, so it must be
text, thus Unicode.  We're looking at a high-level representation of an
email message, with parsed header fields and a MIME message tree.

>Okay, so you've picked one.  Now how do you spell the other way?

message.get_header_bytes('Subject')

Oh, I see that's what you picked.

>The Message class probably has these explicit methods:
>
> >>> Message.get_header_bytes('Subject')
> >>> Message.get_header_string('Subject')
>
>(or better names... it's late and I'm tired ;).  One of those maps to
>message['Subject'] but which is the more obvious choice?

Structured header fields are more of a problem.  Any header with addresses
should return a list of addresses.  I think the default return type should
depend on the data type.  To get an explicit bytes or string or list of
addresses, be explicit; otherwise, for convenience, return the appropriate
type for the particular header field name.

>Now, setting headers.  Sometimes you have some unicode thing and
>sometimes you have some bytes.  You need to end up with bytes in the
>ASCII range and you'd like to leave the header value unencoded if so.
>But in both cases, you might have bytes or characters outside that
>range, so you need an explicit encoding, defaulting to utf-8 probably.

Never for header fields.  The default is always RFC 2047, unless it isn't,
say for params.

The Message class should create an object of the appropriate subclass of
Header based on the name (or use the existing object, see other
discussion), and that should inspect its argument and DTRT or complain.

>
> >>> Message.set_header('Subject', 'Some text', encoding='utf-8')
> >>> Message.set_header('Subject', b'Some bytes')
>
>One of those maps to
>
> >>> message['Subject'] = ???

The expected data type should depend on the header field.  For Subject:, it
should be bytes to be parsed or verbatim text.  For To:, it should be a
list of addresses or bytes or text to be parsed.

The email package should be pythonic, and not require deep understanding of
dozens of RFCs to use properly.  Users don't need to know about the raw
bytes; that's the whole point of MIME and any email package.  It should be
easy to set header fields with their natural data types, and doing it with
bad data should produce an error.  This may require a bit more care in the
message parser, to always produce a parsed message with defects.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>

From mike.klaas at gmail.com  Fri Apr 10 05:42:37 2009
From: mike.klaas at gmail.com (Mike Klaas)
Date: Thu, 9 Apr 2009 20:42:37 -0700
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE9FB4.9060908@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<49DE2AD4.6090605@cheimes.de>	<49DE31C9.103@gmail.com>
	<49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com>
Message-ID: <99425C9C-BE44-4564-84DC-3BAF370CFAE0@gmail.com>

On 9-Apr-09, at 6:24 PM, John Arbash Meinel wrote:

> Greg Ewing wrote:
>> John Arbash Meinel wrote:
>>> And the way intern is currently
>>> written, there is a third cost when the item doesn't exist yet,  
>>> which is
>>> another lookup to insert the object.
>>
>> That's even rarer still, since it only happens the first
>> time you load a piece of code that uses a given variable
>> name anywhere in any module.
>>
>
> Somewhat true, though I know it happens 25k times during startup of
> bzr... And I would be a *lot* happier if startup time was 100ms  
> instead
> of 400ms.

I don't want to quash your idealism too severely, but it is extremely  
unlikely that you are going to get anywhere near that kind of speed up  
by tweaking string interning.  25k times doing anything (computation)  
just isn't all that much.

$ python -mtimeit -s 'd=dict.fromkeys(xrange(10000000))' 'for x in  
xrange(25000): d.get(x)'
100 loops, best of 3: 8.28 msec per loop

Perhaps this isn't representative (int hashing is ridiculously cheap,  
for instance), but the dict itself is far bigger than the dict you are  
dealing with and such would have similar cache-busting properties.   
And yet, 25k accesses (plus python->c dispatching costs which you are  
paying with interning) consume only ~10ms.  You could do more good by  
eliminating a handful of disk seeks by reducing the number of imported  
modules...

-Mike

From guido at python.org  Fri Apr 10 05:55:34 2009
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Apr 2009 20:55:34 -0700
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com> 
	<grgf59$j7c$1@ger.gmane.org>
	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> 
	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com> 
	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> 
	<ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com> 
	<4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com>
Message-ID: <ca471dc20904092055n39930562g39d99e452391036b@mail.gmail.com>

On Wed, Apr 8, 2009 at 9:31 PM, Michele Simionato
<michele.simionato at gmail.com> wrote:
> Then perhaps you misunderstand the goal of the decorator module.
> The raison d'etre of the module is to PRESERVE the signature:
> update_wrapper unfortunately *changes* it.
>
> When confronted with a library which I do not not know, I often run
> over it pydoc, or sphinx, or a custom made documentation tool, to extract the
> signature of functions.

Ah, I see. Personally I rarely trust automatically extracted
documentation -- too often in my experience it is out of date or
simply absent. Extracting the signatures in theory wouldn't lie, but
in practice I still wouldn't trust it -- not only because of what
decorators might or might not do, but because it might still be
misleading. Call me old-fashioned, but I prefer to read the source
code.

 For instance, if I see a method
> get_user(self, username) I have a good hint about what it is supposed
> to do. But if the library (say a web framework) uses non signature-preserving
> decorators, my documentation tool says to me that there is function
> get_user(*args, **kwargs) which frankly is not enough [this is the
> optimistic case, when the author of the decorator has taken care
> to preserve the name of the original function].

But seeing the decorator is often essential for understanding what
goes on! Even if the decorator preserves the signature (in truth or
according inspect), many decorators *do* something, and it's important
to know how a function is decorated. For example, I work a lot with a
small internal framework at Google whose decorators can raise
exceptions and set instance variables; they also help me understand
under which conditions a method can be called.

> ?I *hate* losing information about the true signature of functions, since I also
> use a lot IPython, Python help, etc.

I guess we just have different styles. That's fine.

>>> I must admit that while I still like decorators, I do like them as
>>> much as in the past.
>
> Of course there was a missing NOT in this sentence, but you all understood
> the intended meaning.
>
>> (All this BTW is not to say that I don't trust you with commit
>> privileges if you were to be interested in contributing. I just don't
>> think that adding that particular decorator module to the stdlib would
>> be wise. It can be debated though.)
>
> Fine. As I have repeated many time that particular module was never
> meant for inclusion in the standard library.

Then perhaps it shouldn't -- I haven't looked but if you don't plan
stdlib inclusion it is often the case that the API style and/or
implementation details make stdlib inclusion unrealistic. (Though
admittedly some older modules wouldn't be accepted by today's
standards either -- and I'm not just talking PEP-8 compliance! :-)

> But I feel strongly about
> the possibility of being able to preserve (not change!) the function
> signature.

That could be added to functools if enough people want it.

> I do not think everybody disagree with your point here. My point still
> stands, though: objects should not lie about their signature, especially
> during ?debugging and when generating documentation from code.

Source code never lies. Debuggers should make access to the source
code a key point. And good documentation should be written by a human,
not automatically cobbled together from source code and a few doc
strings.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tonynelson at georgeanelson.com  Fri Apr 10 05:59:54 2009
From: tonynelson at georgeanelson.com (Tony Nelson)
Date: Thu, 9 Apr 2009 23:59:54 -0400
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
Message-ID: <p04330100c6046a4bedc6@[192.168.123.162]>

At 22:26 -0400 04/09/2009, Barry Warsaw wrote:

>There are really two ways to look at an email message.  It's either an
>unstructured blob of bytes, or it's a structured tree of objects.
>Those objects have headers and payload.  The payload can be of any
>type, though I think it generally breaks down into "strings" for text/
>* types and bytes for anything else (not counting multiparts).
>
>The email package isn't a perfect mapping to this, which is something
>I want to improve.  That aside, I think storing a message in a
>database means storing some or all of the headers separately from the
>byte stream (or text?) of its payload.  That's for non-multipart
>types.  It would be more complicated to represent a message tree of
>course.

Storing an email message in a database does mean storing some of the header
fields as database fields, but the set of email header fields is open, so
any "unused" fields in a message must be stored elsewhere.  It isn't useful
to just have a bag of name/value pairs in a table.  General message MIME
payload trees don't map well to a database either, unless one wants to get
very relational.  Sometimes the database needs to represent the entire
email message, header fields and MIME tree, but only if it is an email
program and usually not even then.  Usually, the database has a specific
purpose, and can be designed for the data it cares about; it may choose to
keep the original message as bytes.

>It does seem to make sense to think about headers as text header names
>and text header values.  Of course, header values can contain almost
>anything and there's an encoding to bring it back to 7-bit ASCII, but
>again, you really have two views of a header value.  Which you want
>really depends on your application.

I think of header fields as having text-like names (the set of allowed
characters is more than just text, though defined headers don't make use of
that), but the data is either bytes or it should be parsed into something
appropriate:  text for unstructured fields like Subject:, a list of
addresses for address fields like To:.  Many of the structured header
fields have a reasonable mapping to text; certainly this is true for adress
header fields.  Content-Type header fields are barely text, they can be so
convolutedly structured, but I suppose one could flatten one of them to
text instead of bytes if the user wanted.  It's not very useful, though,
except for debugging (either by the programmer or the recipient who wants
to know what was cleaned from the message).

>Maybe you just care about the text of both the header name and value.
>In that case, I think you want the values as unicodes, and probably
>the headers as unicodes containing only ASCII.  So your table would be
>strings in both cases.  OTOH, maybe your application cares about the
>raw underlying encoded data, in which case the header names are
>probably still strings of ASCII-ish unicodes and the values are
>bytes.  It's this distinction (and I think the competing use cases)
>that make a true Python 3.x API for email more complicated.

If a database stores the Subject: header field, it would be as text.  The
various recipient address fields are a one message to many names and
addresses mapping, and need a related table of name/address fields, with
each field being text.  The original message (or whatever part of it one
preserves) should be bytes.  I don't think this complicates the email
package API; rather, it just shows where generality is needed.

>Thinking about this stuff makes me nostalgic for the sloppy happy days
>of Python 2.x

You now have the opportunity to finally unsnarl that mess.  It is not an
insurmountable opportunity.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>

From jyasskin at gmail.com  Fri Apr 10 06:04:09 2009
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Thu, 9 Apr 2009 21:04:09 -0700
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE9FB4.9060908@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de>
	<49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz>
	<49DE9FB4.9060908@gmail.com>
Message-ID: <5d44f72f0904092104y66073939q2ea4ea87937bef69@mail.gmail.com>

On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel
<john.arbash.meinel at gmail.com> wrote:
> Greg Ewing wrote:
>> John Arbash Meinel wrote:
>>> And the way intern is currently
>>> written, there is a third cost when the item doesn't exist yet, which is
>>> another lookup to insert the object.
>>
>> That's even rarer still, since it only happens the first
>> time you load a piece of code that uses a given variable
>> name anywhere in any module.
>>
>
> Somewhat true, though I know it happens 25k times during startup of
> bzr... And I would be a *lot* happier if startup time was 100ms instead
> of 400ms.

I think you have plenty of a case to try it out. If you code it up and
it doesn't speed anything up, well then we've learned something, and
maybe it'll be useful anyway for the memory savings. If it does speed
things up, well then Python's faster. I wouldn't waste time arguing
about it before you have the change written.

Good luck!
Jeffrey

From collinw at gmail.com  Fri Apr 10 06:07:54 2009
From: collinw at gmail.com (Collin Winter)
Date: Thu, 9 Apr 2009 21:07:54 -0700
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DE9FB4.9060908@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de>
	<49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz>
	<49DE9FB4.9060908@gmail.com>
Message-ID: <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>

On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel
<john.arbash.meinel at gmail.com> wrote:
> Greg Ewing wrote:
>> John Arbash Meinel wrote:
>>> And the way intern is currently
>>> written, there is a third cost when the item doesn't exist yet, which is
>>> another lookup to insert the object.
>>
>> That's even rarer still, since it only happens the first
>> time you load a piece of code that uses a given variable
>> name anywhere in any module.
>>
>
> Somewhat true, though I know it happens 25k times during startup of
> bzr... And I would be a *lot* happier if startup time was 100ms instead
> of 400ms.

Quite so. We have a number of internal tools, and they find that
frequently just starting up Python takes several times the duration of
the actual work unit itself. I'd be very interested to review any
patches you come up with to improve start-up time; so far on this
thread, there's been a lot of theory and not much practice. I'd
approach this iteratively: first replace the dict with a set, then if
that bears fruit, consider a customized data structure; if that bears
fruit, etc.

Good luck, and be sure to let us know what you find,
Collin Winter

From guido at python.org  Fri Apr 10 06:26:53 2009
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Apr 2009 21:26:53 -0700
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de>
	<49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> 
	<49DE9FB4.9060908@gmail.com>
	<43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>
Message-ID: <ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>

On Thu, Apr 9, 2009 at 9:07 PM, Collin Winter <collinw at gmail.com> wrote:
> On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel <john.arbash.meinel at gmail.com> wrote:

> >And I would be a *lot* happier if startup time was 100ms instead
> > of 400ms.
>
> Quite so. We have a number of internal tools, and they find that
> frequently just starting up Python takes several times the duration of
> the actual work unit itself. I'd be very interested to review any
> patches you come up with to improve start-up time; so far on this
> thread, there's been a lot of theory and not much practice. I'd
> approach this iteratively: first replace the dict with a set, then if
> that bears fruit, consider a customized data structure; if that bears
> fruit, etc.
>
> Good luck, and be sure to let us know what you find,

Just to add some skepticism, has anyone done any kind of
instrumentation of bzr start-up behavior?  IIRC every time I was asked
to reduce the start-up cost of some Python app, the cause was too many
imports, and the solution was either to speed up import itself (.pyc
files were the first thing ever that came out of that -- importing
from a single .zip file is one of the more recent tricks) or to reduce
the number of modules imported at start-up (or both :-). Heavy-weight
frameworks are usually the root cause, but usually there's nothing
that can be done about that by the time you've reached this point. So,
amen on the good luck, but please start with a bit of analysis.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Apr 10 06:34:19 2009
From: guido at python.org (Guido van Rossum)
Date: Thu, 9 Apr 2009 21:34:19 -0700
Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382:
	Namespace Packages)
In-Reply-To: <20090409125312.GB1909@panix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> 
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> 
	<49DB4624.604@egenix.com> <49DBA78F.7010904@v.loewis.de>
	<49DDD6AD.9020708@gmail.com> <20090409125312.GB1909@panix.com>
Message-ID: <ca471dc20904092134q20ed36e5s2411dd0676aa4abb@mail.gmail.com>

On Thu, Apr 9, 2009 at 5:53 AM, Aahz <aahz at pythoncraft.com> wrote:
> On Thu, Apr 09, 2009, Nick Coghlan wrote:
>>
>> Martin v. L?wis wrote:
>>>> Such a policy would then translate to a dead end for Python 2.x
>>>> based applications.
>>>
>>> 2.x based applications *are* in a dead end, with the only exit
>>> being portage to 3.x.
>>
>> The actual end of the dead end just happens to be in 2013 or so :)
>
> More like 2016 or 2020 -- as of January, my former employer was still
> using Python 2.3, and I wouldn't be surprised if 1.5.2 was still out in
> the wilds. ?The transition to 3.x is more extreme, and lots of people
> will continue making do for years after any formal support is dropped.

There's nothing wrong with that. People using 1.5.2 today certainly
aren't asking for support, and people using 2.3 probably aren't
expecting much either. That's fine, those Python versions are as
stable as the rest of their environment. (I betcha they're still using
GCC 2.96 too, though they probably don't have any reason to build a
new Python binary from source. :-)

People *will* be using 2.6 well past 2013. But will they care about
the Python community actively supporting it? Of course not! Anything
we did would probably break something for them.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From john.arbash.meinel at gmail.com  Fri Apr 10 06:38:55 2009
From: john.arbash.meinel at gmail.com (John Arbash Meinel)
Date: Thu, 09 Apr 2009 23:38:55 -0500
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <99425C9C-BE44-4564-84DC-3BAF370CFAE0@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<49DE2AD4.6090605@cheimes.de>	<49DE31C9.103@gmail.com>
	<49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com>
	<99425C9C-BE44-4564-84DC-3BAF370CFAE0@gmail.com>
Message-ID: <49DECD5F.7@gmail.com>

...
>> Somewhat true, though I know it happens 25k times during startup of
>> bzr... And I would be a *lot* happier if startup time was 100ms instead
>> of 400ms.
> 
> I don't want to quash your idealism too severely, but it is extremely
> unlikely that you are going to get anywhere near that kind of speed up
> by tweaking string interning.  25k times doing anything (computation)
> just isn't all that much.
> 
> $ python -mtimeit -s 'd=dict.fromkeys(xrange(10000000))' 'for x in
> xrange(25000): d.get(x)'
> 100 loops, best of 3: 8.28 msec per loop
> 
> Perhaps this isn't representative (int hashing is ridiculously cheap,
> for instance), but the dict itself is far bigger than the dict you are
> dealing with and such would have similar cache-busting properties.  And
> yet, 25k accesses (plus python->c dispatching costs which you are paying
> with interning) consume only ~10ms.  You could do more good by
> eliminating a handful of disk seeks by reducing the number of imported
> modules...
> 
> -Mike
> 

You're also using timeit over the same set of 25k keys, which means it
only has to load that subset. And as you are using identical runs each
time, those keys are already loaded into your cache lines... And given
how hash(int) works, they are all sequential in memory, and all 10M in
your original set have 0 collisions. Actually, at 10M, you'll have a
dict of size 20M entries, and the first 10M entries will be full, and
the trailing 10M entries will all be empty.

That said, you're right, the benefits of a smaller structure are going
to be small. I'll just point that if I just do a small tweak to your
timing and do:

$ python -mtimeit -s 'd=dict.fromkeys(xrange(10000000))' 'for x in
  xrange(25000): d.get(x)'
100 loops, best of 3: 6.27 msec per loop

So slightly faster than yours, *but*, lets try a much smaller dict:

$ python -mtimeit -s 'd=dict.fromkeys(xrange(25000))' 'for x in
  xrange(25000): d.get(x)'
100 loops, best of 3: 6.35 msec per loop

Pretty much the same time. Well within the noise margin. But if I go
back to the "big dict" and actually select 25k keys across the whole set:

$ TIMEIT -s 'd=dict.fromkeys(xrange(10000000));' \
 -s keys=range(0,10000000,10000000/25000)' \
 'for x in keys: d.get(x)'
100 loops, best of 3: 13.1 msec per loop

Now I'm still accessing 25k keys, but I'm doing it across the whole
range, and suddenly the time *doubled*.

What about slightly more random access:
$ TIMEIT -s 'import random; d=dict.fromkeys(xrange(10000000));'
	-s 'bits = range(0, 10000000, 400); random.shuffle(bits)'\
 	'for x in bits: d.get(x)'
100 loops, best of 3: 15.5 msec per loop

Not as big of a difference as I thought it would be... But I bet if
there was a way to put the random shuffle in the inner loop, so you
weren't accessing the same identical 25k keys internally, you might get
more interesting results.

As for other bits about exercising caches:

$ shuffle(range(0, 10000000, 400))
100 loops, best of 3: 15.5 msec per loop

$ shuffle(range(0, 10000000, 40))
10 loops, best of 3: 175 msec per loop

10x more keys, costs 11.3x, pretty close to linear.

$ shuffle(range(0, 10000000, 10))
10 loops, best of 3: 739 msec per loop

4x the keys, 4.5x the time, starting to get more into nonlinear effects.

Anyway, you're absolutely right. intern() overhead is a tiny fraction of
'import bzrlib.*' time, so I don't expect to see amazing results. That
said, accessing 25k keys in a smaller structure is 2x faster than
accessing 25k keys spread across a larger structure.

John
=:->

From glyph at divmod.com  Fri Apr 10 07:19:02 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 10 Apr 2009 05:19:02 -0000
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
Message-ID: <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>

On 02:38 am, barry at python.org wrote:
>So, what I'm really asking is this.  Let's say you agree that there 
>are use cases for accessing a header value as either the raw encoded 
>bytes or the decoded unicode.  What should this return:
>
> >>> message['Subject']
>
>The raw bytes or the decoded unicode?

My personal preference would be to just get deprecate this API, and get 
rid of it, replacing it with a slightly more explicit one.

    message.headers['Subject']
    message.bytes_headers['Subject']
>Now, setting headers.  Sometimes you have some unicode thing and 
>sometimes you have some bytes.  You need to end up with bytes in the 
>ASCII range and you'd like to leave the header value unencoded if so. 
>But in both cases, you might have bytes or characters outside that 
>range, so you need an explicit encoding, defaulting to utf-8 probably.

    message.headers['Subject'] = 'Some text'

should be equivalent to

    message.headers['Subject'] = Header('Some text')

My preference would be that

    message.headers['Subject'] = b'Some Bytes'

would simply raise an exception.  If you've got some bytes, you should 
instead do

    message.bytes_headers['Subject'] = b'Some Bytes'

or

    message.headers['Subject'] = Header(bytes=b'Some Bytes', 
encoding='utf-8')

Explicit is better than implicit, right?

From glyph at divmod.com  Fri Apr 10 07:28:36 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 10 Apr 2009 05:28:36 -0000
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49DEBB21.70305@gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
Message-ID: <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>

On 03:21 am, ncoghlan at gmail.com wrote:
>Barry Warsaw wrote:

>>I don't know whether the parameter thing will work or not, but you're
>>probably right that we need to get the bytes-everywhere API first.

>Given that json is a wire protocol, that sounds like the right approach
>for json as well. Once bytes-everywhere works, then a text API can be
>built on top of it, but it is difficult to build a bytes API on top of 
>a
>text one.

I wish I could agree, but JSON isn't really a wire protocol.  According 
to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the 
serialization of structured data".  There are some notes about encoding, 
but it is very clearly described in terms of unicode code points.
>So I guess the IO library *is* the right model: bytes at the bottom of
>the stack, with text as a wrapper around it (mediated by codecs).

In email's case this is true, but in JSON's case it's not.  JSON is a 
format defined as a sequence of code points; MIME is defined as a 
sequence of octets.

From turnbull at sk.tsukuba.ac.jp  Fri Apr 10 07:22:04 2009
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Fri, 10 Apr 2009 14:22:04 +0900
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
Message-ID: <87zlepf5hf.fsf@xemacs.org>

Barry Warsaw writes:

 > There are really two ways to look at an email message.  It's either an  
 > unstructured blob of bytes, or it's a structured tree of objects.

Indeed!

 > Those objects have headers and payload.  The payload can be of any  
 > type, though I think it generally breaks down into "strings" for text/ 
 > * types and bytes for anything else (not counting multiparts).

*sigh*  Why are you back-tracking?

The payload should be of an appropriate *object* type.  Atomic object
types will have their content stored as string or bytes [nb I use
Python 3 terminology throughout].  Composite types (multipart/*) won't
need string or bytes attributes AFAICS.

Start by implementing the application/octet-stream and
text/plain;charset=utf-8 object types, of course.

 > It does seem to make sense to think about headers as text header names  
 > and text header values.

I disagree.  IMHO, structured header types should have object values,
and something like

message['to'] = "Barry 'da FLUFL' Warsaw <barry at python.org>"

should be smart enough to detect that it's a string and attempt to
(flexibly) parse it into a fullname and a mailbox adding escapes, etc.
Whether these should be structured objects or they can be strings or
bytes, I'm not sure (probably bytes, not strings, though -- see next
exampl).  OTOH

message['to'] = b'''"Barry 'da.FLUFL' Warsaw" <barry at python.org>'''

should assume that the client knows what they are doing, and should
parse it strictly (and I mean "be a real bastard", eg, raise an
exception on any non-ASCII octet), merely dividing it into fullname
and mailbox, and caching the bytes for later insertion in a
wire-format message.

 > In that case, I think you want the values as unicodes, and probably  
 > the headers as unicodes containing only ASCII.  So your table would be  
 > strings in both cases.  OTOH, maybe your application cares about the  
 > raw underlying encoded data, in which case the header names are  
 > probably still strings of ASCII-ish unicodes and the values are  
 > bytes.  It's this distinction (and I think the competing use cases)  
 > that make a true Python 3.x API for email more complicated.

I don't see why you can't have the email API be specific, with
message['to'] always returning a structured_header object (or maybe
even more specifically an address_header object), and methods like

message['to'].build_header_as_text()

which returns

"""To: "Barry 'da.FLUFL' Warsaw" <barry at python.org>"""

and

message['to'].build_header_in_wire_format()

which returns

b"""To: "Barry 'da.FLUFL' Warsaw" <barry at python.org>"""

Then have email.textview.Message and email.wireview.Message which
provide a simple interface where message['to'] would invoke
.build_header_as_text() and .build_header_in_wire_format()
respectively.

 > Thinking about this stuff makes me nostalgic for the sloppy happy days  
 > of Python 2.x

Er, yeah.

Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly y'rs,

From fetchinson at googlemail.com  Fri Apr 10 07:21:22 2009
From: fetchinson at googlemail.com (Daniel Fetchinson)
Date: Thu, 9 Apr 2009 22:21:22 -0700
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <ca471dc20904092055n39930562g39d99e452391036b@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
	<grgf59$j7c$1@ger.gmane.org>
	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>
	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>
	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>
	<ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>
	<4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com>
	<ca471dc20904092055n39930562g39d99e452391036b@mail.gmail.com>
Message-ID: <fbe2e2100904092221w55aac48dhc7c21a1fd87df36c@mail.gmail.com>

>> Then perhaps you misunderstand the goal of the decorator module.
>> The raison d'etre of the module is to PRESERVE the signature:
>> update_wrapper unfortunately *changes* it.
>>
>> When confronted with a library which I do not not know, I often run
>> over it pydoc, or sphinx, or a custom made documentation tool, to extract
>> the
>> signature of functions.
>
> Ah, I see. Personally I rarely trust automatically extracted
> documentation -- too often in my experience it is out of date or
> simply absent. Extracting the signatures in theory wouldn't lie, but
> in practice I still wouldn't trust it -- not only because of what
> decorators might or might not do, but because it might still be
> misleading. Call me old-fashioned, but I prefer to read the source
> code.
>
>  For instance, if I see a method
>> get_user(self, username) I have a good hint about what it is supposed
>> to do. But if the library (say a web framework) uses non
>> signature-preserving
>> decorators, my documentation tool says to me that there is function
>> get_user(*args, **kwargs) which frankly is not enough [this is the
>> optimistic case, when the author of the decorator has taken care
>> to preserve the name of the original function].
>
> But seeing the decorator is often essential for understanding what
> goes on! Even if the decorator preserves the signature (in truth or
> according inspect), many decorators *do* something, and it's important
> to know how a function is decorated. For example, I work a lot with a
> small internal framework at Google whose decorators can raise
> exceptions and set instance variables; they also help me understand
> under which conditions a method can be called.
>
>>  I *hate* losing information about the true signature of functions, since
>> I also
>> use a lot IPython, Python help, etc.
>
> I guess we just have different styles. That's fine.
>
>>>> I must admit that while I still like decorators, I do like them as
>>>> much as in the past.
>>
>> Of course there was a missing NOT in this sentence, but you all understood
>> the intended meaning.
>>
>>> (All this BTW is not to say that I don't trust you with commit
>>> privileges if you were to be interested in contributing. I just don't
>>> think that adding that particular decorator module to the stdlib would
>>> be wise. It can be debated though.)
>>
>> Fine. As I have repeated many time that particular module was never
>> meant for inclusion in the standard library.
>
> Then perhaps it shouldn't -- I haven't looked but if you don't plan
> stdlib inclusion it is often the case that the API style and/or
> implementation details make stdlib inclusion unrealistic. (Though
> admittedly some older modules wouldn't be accepted by today's
> standards either -- and I'm not just talking PEP-8 compliance! :-)
>
>> But I feel strongly about
>> the possibility of being able to preserve (not change!) the function
>> signature.
>
> That could be added to functools if enough people want it.

My original suggestion for inclusion in stdlib was motivated by this
reason alone: I'd like to see an official one way of preserving
function signatures by decorators. If there are better ways of doing
it than the decorator module, that's totally fine, but there should be
one.

Cheers,
Daniel

>> I do not think everybody disagree with your point here. My point still
>> stands, though: objects should not lie about their signature, especially
>> during  debugging and when generating documentation from code.
>
> Source code never lies. Debuggers should make access to the source
> code a key point. And good documentation should be written by a human,
> not automatically cobbled together from source code and a few doc
> strings.

-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown

From sylvain.thenault at logilab.fr  Fri Apr 10 09:49:00 2009
From: sylvain.thenault at logilab.fr (Sylvain =?utf-8?B?VGjDqW5hdWx0?=)
Date: Fri, 10 Apr 2009 09:49:00 +0200
Subject: [Python-Dev] BLOBs in Pg
In-Reply-To: <49DE3902.70103@holdenweb.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<p04330100c603badeb135@[192.168.123.162]>
	<grl78j$7sl$1@ger.gmane.org>
	<p04330103c603daff3e8b@[192.168.123.162]>
	<20090409172424.GD26429@phd.pp.ru> <49DE3902.70103@holdenweb.com>
Message-ID: <20090410074900.GB21832@lupus.logilab.fr>

On 09 avril 14:05, Steve Holden wrote:
> Oleg Broytmann wrote:
> > On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote:
> >> I use MySQL, but sort of intend to learn PostgreSQL.  I didn't know that
> >> PostgreSQL has no real support for BLOBs.
> > 
> >    I think it has - BYTEA data type.
> > 
> But the Python DB adapters appears to require some fairly hairy escaping
> of the data to make it usable with the cursor execute() method. IMHO you
> shouldn't have to escape data that is passed for insertion via a
> parameterized query.

can't you simply use dbmodule.Binary to do the job?

-- 
Sylvain Th?nault                               LOGILAB, Paris (France)
Formations Python, Debian, M?th. Agiles: http://www.logilab.fr/formations
D?veloppement logiciel sur mesure:       http://www.logilab.fr/services
CubicWeb, the semantic web framework:    http://www.cubicweb.org

From ncoghlan at gmail.com  Fri Apr 10 10:40:28 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 10 Apr 2009 18:40:28 +1000
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
Message-ID: <49DF05FC.9040208@gmail.com>

glyph at divmod.com wrote:
> On 03:21 am, ncoghlan at gmail.com wrote:
>> Given that json is a wire protocol, that sounds like the right approach
>> for json as well. Once bytes-everywhere works, then a text API can be
>> built on top of it, but it is difficult to build a bytes API on top of a
>> text one.
> 
> I wish I could agree, but JSON isn't really a wire protocol.  According
> to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the
> serialization of structured data".  There are some notes about encoding,
> but it is very clearly described in terms of unicode code points.

Ah, my apologies - if the RFC defines things such that the native format
is Unicode, then yes, the appropriate Python 3.x data type for the base
implementation would indeed be strings.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Fri Apr 10 10:52:40 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 10 Apr 2009 18:52:40 +1000
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<49DE2AD4.6090605@cheimes.de>	<49DE31C9.103@gmail.com>
	<49DE9F42.5000704@canterbury.ac.nz>
	<49DE9FB4.9060908@gmail.com>	<43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>
	<ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>
Message-ID: <49DF08D8.9080806@gmail.com>

Guido van Rossum wrote:
> Just to add some skepticism, has anyone done any kind of
> instrumentation of bzr start-up behavior?  IIRC every time I was asked
> to reduce the start-up cost of some Python app, the cause was too many
> imports, and the solution was either to speed up import itself (.pyc
> files were the first thing ever that came out of that -- importing
> from a single .zip file is one of the more recent tricks) or to reduce
> the number of modules imported at start-up (or both :-). Heavy-weight
> frameworks are usually the root cause, but usually there's nothing
> that can be done about that by the time you've reached this point. So,
> amen on the good luck, but please start with a bit of analysis.

This problem (slow application startup times due to too many imports at
startup, which can in turn can be due to top level imports for library
or framework functionality that a given application doesn't actually
use) is actually the main reason I sometimes wish for a nice, solid lazy
module import mechanism that manages to avoid the potential deadlock
problems created by using import statements inside functions.

Providing a clean API and implementation for that functionality is a
pretty tough nut to crack though, so I'm not holding my breath...

Cheers,
Nick.

P.S. It's only an occasional fairly idle wish for me though, or I'd have
at least tried to come up with something myself by now.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From robert.collins at canonical.com  Fri Apr 10 11:19:39 2009
From: robert.collins at canonical.com (Robert Collins)
Date: Fri, 10 Apr 2009 19:19:39 +1000
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com>
	<49DE9F42.5000704@canterbury.ac.nz>  <49DE9FB4.9060908@gmail.com>
	<43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>
	<ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>
Message-ID: <1239355179.2892.224.camel@lifeless-64>

On Thu, 2009-04-09 at 21:26 -0700, Guido van Rossum wrote:

> Just to add some skepticism, has anyone done any kind of
> instrumentation of bzr start-up behavior?

We sure have. 'bzr --profile-imports' reports on the time to import
different modules (both cumulative and individually).

We have a lazy module loader that allows us to defer loading modules we
might not use (though if they are needed we are in fact going to pay for
loading them eventually).

We monkeypatch the standard library where modules we want are
unreasonably expensive to import (for instance by making a regex we
wouldn't use be lazy compiled rather than compiled at import time).

>   IIRC every time I was asked
> to reduce the start-up cost of some Python app, the cause was too many
> imports, and the solution was either to speed up import itself (.pyc
> files were the first thing ever that came out of that -- importing
> from a single .zip file is one of the more recent tricks) or to reduce
> the number of modules imported at start-up (or both :-). Heavy-weight
> frameworks are usually the root cause, but usually there's nothing
> that can be done about that by the time you've reached this point. So,
> amen on the good luck, but please start with a bit of analysis.

Certainly, import time is part of it:
robertc at lifeless-64:~$ python -m timeit -s 'import sys;  import
bzrlib.errors' "del sys.modules['bzrlib.errors']; import bzrlib.errors"
10 loops, best of 3: 18.7 msec per loop

(errors.py is 3027 lines long with 347 exception classes).

We've also looked lower - python does a lot of stat operations search
for imports and determining if the pyc is up to date; these appear to
only really matter on cold-cache imports (but they matter a lot then);
in hot-cache situations they are insignificant.

Uhm, there's probably more - but I just wanted to note that we have done
quite a bit of analysis. I think a large chunk of our problem is having
too much code loaded when only a small fraction will be used in any one
operation. Consider importing bzrlib errors - 10% of the startup time
for 'bzr help'. In any operation only a few of those exceptions will be
used - and typically 0.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/d0d2126f/attachment.pgp>

From solipsis at pitrou.net  Fri Apr 10 13:41:07 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 10 Apr 2009 11:41:07 +0000 (UTC)
Subject: [Python-Dev] Dropping bytes "support" in json
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
Message-ID: <loom.20090410T114021-346@post.gmane.org>

<glyph <at> divmod.com> writes:
> 
> In email's case this is true, but in JSON's case it's not.  JSON is a 
> format defined as a sequence of code points; MIME is defined as a 
> sequence of octets.

Another to look at it is that JSON is a subset of Javascript, and as such is
text rather than bytes.

Regards

Antoine.

From solipsis at pitrou.net  Fri Apr 10 13:52:00 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 10 Apr 2009 11:52:00 +0000 (UTC)
Subject: [Python-Dev] Rethinking intern() and its data structure
References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de>
	<49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz>
	<49DE9FB4.9060908@gmail.com>
	<43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>
	<ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>
	<1239355179.2892.224.camel@lifeless-64>
Message-ID: <loom.20090410T114654-896@post.gmane.org>

Robert Collins <robert.collins <at> canonical.com> writes:
> 
> (errors.py is 3027 lines long with 347 exception classes).

347 exception classes? Perhaps your framework is over-engineered.

Similarly, when using a heavy Web framework, reloading a Web app can take
several seconds... but I won't blame Python for that.

Regards

Antoine.

From p.f.moore at gmail.com  Fri Apr 10 13:53:47 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 10 Apr 2009 12:53:47 +0100
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49DF05FC.9040208@gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<49DF05FC.9040208@gmail.com>
Message-ID: <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com>

2009/4/10 Nick Coghlan <ncoghlan at gmail.com>:
> glyph at divmod.com wrote:
>> On 03:21 am, ncoghlan at gmail.com wrote:
>>> Given that json is a wire protocol, that sounds like the right approach
>>> for json as well. Once bytes-everywhere works, then a text API can be
>>> built on top of it, but it is difficult to build a bytes API on top of a
>>> text one.
>>
>> I wish I could agree, but JSON isn't really a wire protocol. ?According
>> to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the
>> serialization of structured data". ?There are some notes about encoding,
>> but it is very clearly described in terms of unicode code points.
>
> Ah, my apologies - if the RFC defines things such that the native format
> is Unicode, then yes, the appropriate Python 3.x data type for the base
> implementation would indeed be strings.

Indeed, the RFC seems to clearly imply that loads should take a
Unicode string, dumps should produce one, and load/dump should work in
terms of text files (not byte files).

On the other hand, further down in the document:

"""
3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.
"""

This is at best confused (in my utterly non-expert opinion :-)) as
Unicode isn't an encoding...

I would guess that what the RFC is trying to say is that JSON is text
(Unicode) and where a byte stream purporting to be JSON is encountered
without a defined encoding, this is how to guess one.

That implies that loads can/should also allow bytes as input, applying
the given algorithm to guess an encoding. And similarly load
can/should accept a byte stream, on the same basis. (There's no need
to allow the possibility of accepting bytes plus an encoding - in that
case the user should decode the bytes before passing Unicode to the
JSON module).

An alternative might be for the JSON module to register a special
encoding ('JSON-guess'?) which captures the rules here. Then there's
no need for special bytes parameter handling.

Of course, this is all from a native English speaker, who therefore
has no idea of the real life issues involved in Unicode :-)

Paul.

From robert.collins at canonical.com  Fri Apr 10 14:16:30 2009
From: robert.collins at canonical.com (Robert Collins)
Date: Fri, 10 Apr 2009 22:16:30 +1000
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <loom.20090410T114654-896@post.gmane.org>
References: <49DE0DF6.1040900@arbash-meinel.com>
	<49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com>
	<49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com>
	<43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>
	<ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>
	<1239355179.2892.224.camel@lifeless-64>
	<loom.20090410T114654-896@post.gmane.org>
Message-ID: <1239365790.2892.229.camel@lifeless-64>

On Fri, 2009-04-10 at 11:52 +0000, Antoine Pitrou wrote:
> Robert Collins <robert.collins <at> canonical.com> writes:
> > 
> > (errors.py is 3027 lines long with 347 exception classes).
> 
> 347 exception classes? Perhaps your framework is over-engineered.
> 
> Similarly, when using a heavy Web framework, reloading a Web app can take
> several seconds... but I won't blame Python for that.

Well, we've added exceptions as we needed them. This isn't much
different to errno in C programs; the errno range has expanded as people
have wanted to signal that specific situations have arisen. The key
thing for us is to have both something that can be caught (for library
users of bzrlib) and something that can be formatted with variable
substitution (for displaying to users). If there are better ways to
approach this in python than what we've done, that would be great.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/dc4cc5b2/attachment.pgp>

From martin at v.loewis.de  Fri Apr 10 14:55:45 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Apr 2009 14:55:45 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <loom.20090410T114021-346@post.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	<49DEBB21.70305@gmail.com>	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<loom.20090410T114021-346@post.gmane.org>
Message-ID: <49DF41D1.7030003@v.loewis.de>

>> In email's case this is true, but in JSON's case it's not.  JSON is a 
>> format defined as a sequence of code points; MIME is defined as a 
>> sequence of octets.
> 
> Another to look at it is that JSON is a subset of Javascript, and as such is
> text rather than bytes.

I don't think this can be approached from a theoretical point of view.
Instead, what matters is how users want to use it.

Regards,
Martin

From fuzzyman at voidspace.org.uk  Fri Apr 10 14:57:43 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Fri, 10 Apr 2009 13:57:43 +0100
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <ca471dc20904092055n39930562g39d99e452391036b@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
	<grgf59$j7c$1@ger.gmane.org>	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>
	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>
	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>
	<ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>
	<4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com>
	<ca471dc20904092055n39930562g39d99e452391036b@mail.gmail.com>
Message-ID: <49DF4247.7060504@voidspace.org.uk>

Guido van Rossum wrote:
> On Wed, Apr 8, 2009 at 9:31 PM, Michele Simionato
> <michele.simionato at gmail.com> wrote:
>   
>> Then perhaps you misunderstand the goal of the decorator module.
>> The raison d'etre of the module is to PRESERVE the signature:
>> update_wrapper unfortunately *changes* it.
>>
>> When confronted with a library which I do not not know, I often run
>> over it pydoc, or sphinx, or a custom made documentation tool, to extract the
>> signature of functions.
>>     
>
> Ah, I see. Personally I rarely trust automatically extracted
> documentation -- too often in my experience it is out of date or
> simply absent. Extracting the signatures in theory wouldn't lie, but
> in practice I still wouldn't trust it -- not only because of what
> decorators might or might not do, but because it might still be
> misleading. Call me old-fashioned, but I prefer to read the source
> code.
>   

If you auto-generate API documentation by introspection (which we do at 
Resolver Systems) then preserving signatures can also be important. 
Interactive use (support for help), and more straightforward tracebacks 
in the event of usage errors are other reasons to want to preserve 
signatures and function name.

>  For instance, if I see a method
>   
>> get_user(self, username) I have a good hint about what it is supposed
>> to do. But if the library (say a web framework) uses non signature-preserving
>> decorators, my documentation tool says to me that there is function
>> get_user(*args, **kwargs) which frankly is not enough [this is the
>> optimistic case, when the author of the decorator has taken care
>> to preserve the name of the original function].
>>     
>
> But seeing the decorator is often essential for understanding what
> goes on! Even if the decorator preserves the signature (in truth or
> according inspect), many decorators *do* something, and it's important
> to know how a function is decorated. For example, I work a lot with a
> small internal framework at Google whose decorators can raise
> exceptions and set instance variables; they also help me understand
> under which conditions a method can be called.
>   

Having methods renamed to 'wrapped' and their signature changed to 
*args, **kwargs may tell you there *is* a decorator but doesn't give you 
any useful information about what it does. If you look at the code then 
the decorator is obvious (whether or not it mangles the method)...
> [+1]
>> But I feel strongly about
>> the possibility of being able to preserve (not change!) the function
>> signature.
>>     
>
> That could be added to functools if enough people want it.
>
>   

+1

Michael

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From barry at python.org  Fri Apr 10 15:31:46 2009
From: barry at python.org (Barry Warsaw)
Date: Fri, 10 Apr 2009 09:31:46 -0400
Subject: [Python-Dev] Python 2.6.2 final
Message-ID: <776F906E-418C-4A2C-8C6C-2B0036B49AFA@python.org>

I wanted to cut Python 2.6.2 final tonight, but for family reasons I  
won't be able to do so until Monday.  Please be conservative in any  
commits to the 2.6 branch between now and then.

bugs.python.org is apparently down right now, but I set issue 5724 to  
release blocker for 2.6.2.  This is waiting for input from Mark  
Dickinson, and it relates to test_cmath failing on Solaris 10.  If  
Mark fixes that, he's welcome to commit it, otherwise I will remove  
the release blocker tag on the issue and release 2.6.2 anyway.

Plan on me tagging 2.6.2 final Sunday evening.

Cheers,
-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/efa8c63a/attachment.pgp>

From bill.hoffman at kitware.com  Fri Apr 10 16:13:30 2009
From: bill.hoffman at kitware.com (Bill Hoffman)
Date: Fri, 10 Apr 2009 10:13:30 -0400
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
In-Reply-To: <50862ebd0904091849q7f28fa5bmeaf3b9061629a1c6@mail.gmail.com>
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>	<18907.17310.201358.697994@montanaro.dyndns.org>	<5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>	<49DBD6F9.7030502@canterbury.ac.nz>	<806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>	<5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com>	<806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com>	<5b8d13220904081857w46237b57t82d8a4006f00adbb@mail.gmail.com>
	<50862ebd0904091849q7f28fa5bmeaf3b9061629a1c6@mail.gmail.com>
Message-ID: <49DF540A.9030808@kitware.com>

Neil Hodgson wrote:
>    cmake does not produce relative paths in its generated make and
> project files. There is an option CMAKE_USE_RELATIVE_PATHS which
> appears to do this but the documentation says:
> 
> """This option does not work for more complicated projects, and
> relative paths are used when possible. In general, it is not possible
> to move CMake generated makefiles to a different location regardless
> of the value of this variable."""
> 
>    This means that generated Visual Studio project files will not work
> for other people unless a particular absolute build location is
> specified for everyone which will not suit most. Each person that
> wants to build Python will have to run cmake before starting Visual
> Studio thus increasing the prerequisites.
> 

This is true.  CMake does not generate stand alone transferable 
projects. CMake must be installed on the machine where the compilation 
is done.  CMake will automatically re-run if any of the inputs are 
changed, and have visual studio re-load the project, and CMake can be 
used for simple cross platform commands like file copy and and other 
operations so that the build files do not depend on shell commands or 
anything system specific.

-Bill

From ncoghlan at gmail.com  Fri Apr 10 16:53:00 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 11 Apr 2009 00:53:00 +1000
Subject: [Python-Dev] decorator module in stdlib?
In-Reply-To: <ca471dc20904092055n39930562g39d99e452391036b@mail.gmail.com>
References: <fbe2e2100904062255x8b443c1p501f60b90af4e826@mail.gmail.com>
	<grgf59$j7c$1@ger.gmane.org>	<4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com>
	<b8e622740904072310sa899e4ele7b234584be4f1df@mail.gmail.com>
	<4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com>
	<ca471dc20904081051p5a53780bkd0c46644444d6b6c@mail.gmail.com>
	<4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com>
	<ca471dc20904092055n39930562g39d99e452391036b@mail.gmail.com>
Message-ID: <49DF5D4C.5040607@gmail.com>

Guido van Rossum wrote:
> On Wed, Apr 8, 2009 at 9:31 PM, Michele Simionato
>> But I feel strongly about
>> the possibility of being able to preserve (not change!) the function
>> signature.
> 
> That could be added to functools if enough people want it.

No objection in principle here - it's just hard to do cleanly without
PEP 362's __signature__ attribute to underpin it. Without that as a
basis, I expect you'd end up being forced to do something similar to
what Michele does in the decorator module - inspect the function being
wrapped and then use exec to generate a wrapper with a matching signature.

Another nice introspection enhancement might be to give class and
function objects writable __file__ and __line__ attributes (initially
set appropriately by the compiler) and have the inspect modules use
those when they're available. Then functools.update_wrapper() could be
adjusted to copy those attributes, meaning that the wrapper function
would point back to the original (decorated) function for the source
code, rather than to the definition of the wrapper (note that the actual
wrapper code could still be found by looking at the metadata on the
function's __code__ attribute).

Unfortunately-ideas-aren't-working-code'ly,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From __peter__ at web.de  Fri Apr 10 10:58:56 2009
From: __peter__ at web.de (Peter Otten)
Date: Fri, 10 Apr 2009 10:58:56 +0200
Subject: [Python-Dev] Rethinking intern() and its data structure
References: <49DE0DF6.1040900@arbash-meinel.com>
	<49DE2AD4.6090605@cheimes.de>	<49DE31C9.103@gmail.com>
	<49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com>
	<99425C9C-BE44-4564-84DC-3BAF370CFAE0@gmail.com>
	<49DECD5F.7@gmail.com>
Message-ID: <grn1og$l2u$1@ger.gmane.org>

John Arbash Meinel wrote:

> Not as big of a difference as I thought it would be... But I bet if
> there was a way to put the random shuffle in the inner loop, so you
> weren't accessing the same identical 25k keys internally, you might get
> more interesting results.

You can prepare a few random samples during startup:

$ python -m timeit -s"from random import sample; d =
dict.fromkeys(xrange(10**7)); nextrange = iter([sample(xrange(10**7),25000)
for i in range(200)]).next" "for x in nextrange(): d.get(x)"
10 loops, best of 3: 20.2 msec per loop

To put it into perspective:

$ python -m timeit -s"d = dict.fromkeys(xrange(10**7)); nextrange =
iter([range(25000)]*200).next" "for x in nextrange(): d.get(x)"
100 loops, best of 3: 10.9 msec per loop

Peter

From a.badger at gmail.com  Fri Apr 10 16:56:20 2009
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 10 Apr 2009 07:56:20 -0700
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <1239355179.2892.224.camel@lifeless-64>
References: <49DE0DF6.1040900@arbash-meinel.com>	<49DE2AD4.6090605@cheimes.de>
	<49DE31C9.103@gmail.com>	<49DE9F42.5000704@canterbury.ac.nz>
	<49DE9FB4.9060908@gmail.com>	<43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>	<ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>
	<1239355179.2892.224.camel@lifeless-64>
Message-ID: <49DF5E14.3030108@gmail.com>

Robert Collins wrote:

> Certainly, import time is part of it:
> robertc at lifeless-64:~$ python -m timeit -s 'import sys;  import
> bzrlib.errors' "del sys.modules['bzrlib.errors']; import bzrlib.errors"
> 10 loops, best of 3: 18.7 msec per loop
> 
> (errors.py is 3027 lines long with 347 exception classes).
> 
> We've also looked lower - python does a lot of stat operations search
> for imports and determining if the pyc is up to date; these appear to
> only really matter on cold-cache imports (but they matter a lot then);
> in hot-cache situations they are insignificant.
> 
Tarek, Georg, and I talked about a way to do both multi-version and
speedup of this exact problem with import in the future at pycon.  I had
to leave before the hackfest got started, though, so I don't know where
the idea went from there.  Tarek, did this idea progress any?

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/8b16663a/attachment.pgp>

From foom at fuhm.net  Fri Apr 10 17:08:04 2009
From: foom at fuhm.net (James Y Knight)
Date: Fri, 10 Apr 2009 11:08:04 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
Message-ID: <A286FA62-B1F0-4DB4-BC38-9D1E0F85A92A@fuhm.net>

On Apr 9, 2009, at 10:38 PM, Barry Warsaw wrote:
> So, what I'm really asking is this.  Let's say you agree that there  
> are use cases for accessing a header value as either the raw encoded  
> bytes or the decoded unicode.

As I said in the thread having nearly the same exact discussion on web- 
sig, except about WSGI headers...

> What should this return:
>
> >>> message['Subject']
>
> The raw bytes or the decoded unicode?

Until you write a parser for every header, you simply cannot decode to  
unicode. The only sane choices are:
1) raw bytes
2) parsed structured data

There's no "decoded to unicode but not parsed" option: that's doing  
things in the wrong order. If you RFC2047-decode the header before  
doing tokenization and parsing, you will just have a *broken*  
implementation.

Here's an example where it matters. If you decode the RFC2047 part  
before parsing, you'd decide that there's two recipients to the  
message. There aren't. "<broken at example.com>, " is the display-name of  
"actual at example.com", not a second recipient.

   To: =?UTF-8?B?PGJyb2tlbkBleGFtcGxlLmNvbT4sIA==?= <actual at example.com>

Here's a quote from RFC2047:
> NOTE: Decoding and display of encoded-words occurs *after* a  
> structured field body is parsed into tokens. It is therefore  
> possible to hide 'special' characters in encoded-words which, when  
> displayed, will be indistinguishable from 'special' characters in  
> the surrounding text. For this and other reasons, it is NOT  
> generally possible to translate a message header containing 'encoded- 
> word's to an unencoded form which can be parsed by an RFC 822 mail  
> reader.
And another quote for good measure:
> (2) Any header field not defined as '*text' should be parsed  
> according to the syntax rules for that header field. However, any  
> 'word' that appears within a 'phrase' should be treated as an  
> 'encoded-word' if it meets the syntax rules in section 2. Otherwise  
> it should be treated as an ordinary 'word'.

Now, I suppose there's also a third possibility:
3) US-ASCII-only strings, unmolested except for doing  
a .decode('ascii'). That'll give you a string all right, but it's  
really just cheating. It's not actually a text string in any  
meaningful sense.

(in all this I'm assuming your question is not about the "Subject"  
header in particular; that is of course just unstructured text so the  
parse step doesn't actually do anything...).

James

From stephen at xemacs.org  Fri Apr 10 17:38:03 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 11 Apr 2009 00:38:03 +0900
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<49DF05FC.9040208@gmail.com>
	<79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com>
Message-ID: <87vdpcfrj8.fsf@xemacs.org>

Paul Moore writes:

 > On the other hand, further down in the document:
 > 
 > """
 > 3.  Encoding
 > 
 >    JSON text SHALL be encoded in Unicode.  The default encoding is
 >    UTF-8.
 > 
 >    Since the first two characters of a JSON text will always be ASCII
 >    characters [RFC0020], it is possible to determine whether an octet
 >    stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
 >    at the pattern of nulls in the first four octets.
 > """
 > 
 > This is at best confused (in my utterly non-expert opinion :-)) as
 > Unicode isn't an encoding...

The word "encoding" (by itself) does not have a standard definition
AFAIK.  However, since Unicode *is* a "coded character set" (plus a
bunch of hairy usage rules), there's nothing wrong with saying "text
is encoded in Unicode".  The RFC 2130 and Unicode TR#17 taxonomies are
annoying verbose and pedantic to say the least.

So what is being said there (in UTR#17 terminology) is

(1) JSON is *text*, that is, a sequence of characters.
(2) The abstract repertoire and coded character set are defined by the
    Unicode standard.
(3) The default transfer encoding syntax is UTF-8.

 > That implies that loads can/should also allow bytes as input, applying
 > the given algorithm to guess an encoding.

It's not a guess, unless the data stream is corrupt---or nonconforming.

But it should not be the JSON package's responsibility to deal with
corruption or non-conformance (eg, ISO-8859-15-encoded programs).
That's the whole point of specifying the coded character set in the
standard the first place.  I think it's a bad idea for any of the core
JSON API to accept or produce bytes in any language that provides a
Unicode string type.

That doesn't mean Python's module shouldn't provide convenience
functions to read and write JSON serialized as UTF-8 (in fact, that
*should* be done, IMO) and/or other UTFs (I'm not so happy about
that).  But those who write programs using them should not report bugs
until they've checked out and eliminated the possibility of an
encoding screwup!

From bob at redivi.com  Fri Apr 10 17:55:25 2009
From: bob at redivi.com (Bob Ippolito)
Date: Fri, 10 Apr 2009 08:55:25 -0700
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <87vdpcfrj8.fsf@xemacs.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<49DF05FC.9040208@gmail.com>
	<79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com>
	<87vdpcfrj8.fsf@xemacs.org>
Message-ID: <6a36e7290904100855x7ce48f2ege72b4825fd792579@mail.gmail.com>

On Fri, Apr 10, 2009 at 8:38 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Paul Moore writes:
>
> ?> On the other hand, further down in the document:
> ?>
> ?> """
> ?> 3. ?Encoding
> ?>
> ?> ? ?JSON text SHALL be encoded in Unicode. ?The default encoding is
> ?> ? ?UTF-8.
> ?>
> ?> ? ?Since the first two characters of a JSON text will always be ASCII
> ?> ? ?characters [RFC0020], it is possible to determine whether an octet
> ?> ? ?stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
> ?> ? ?at the pattern of nulls in the first four octets.
> ?> """
> ?>
> ?> This is at best confused (in my utterly non-expert opinion :-)) as
> ?> Unicode isn't an encoding...
>
> The word "encoding" (by itself) does not have a standard definition
> AFAIK. ?However, since Unicode *is* a "coded character set" (plus a
> bunch of hairy usage rules), there's nothing wrong with saying "text
> is encoded in Unicode". ?The RFC 2130 and Unicode TR#17 taxonomies are
> annoying verbose and pedantic to say the least.
>
> So what is being said there (in UTR#17 terminology) is
>
> (1) JSON is *text*, that is, a sequence of characters.
> (2) The abstract repertoire and coded character set are defined by the
> ? ?Unicode standard.
> (3) The default transfer encoding syntax is UTF-8.
>
> ?> That implies that loads can/should also allow bytes as input, applying
> ?> the given algorithm to guess an encoding.
>
> It's not a guess, unless the data stream is corrupt---or nonconforming.
>
> But it should not be the JSON package's responsibility to deal with
> corruption or non-conformance (eg, ISO-8859-15-encoded programs).
> That's the whole point of specifying the coded character set in the
> standard the first place. ?I think it's a bad idea for any of the core
> JSON API to accept or produce bytes in any language that provides a
> Unicode string type.
>
> That doesn't mean Python's module shouldn't provide convenience
> functions to read and write JSON serialized as UTF-8 (in fact, that
> *should* be done, IMO) and/or other UTFs (I'm not so happy about
> that). ?But those who write programs using them should not report bugs
> until they've checked out and eliminated the possibility of an
> encoding screwup!

The current implementation doesn't do any encoding guesswork and I
have no intention to allow that as a feature. The input must be
unicode, UTF-8 bytes, or an encoding must be specified.

Personally most of experience with JSON is as a wire protocol and thus
bytes, so the obvious function to encode json should do that. There
probably should be another function to get unicode output, but nobody
has ever asked for that in the Python 2.x version. They either want
the default behavior (encoding as ASCII str which can be used as
unicode due to implementation details of Python 2.x) or encoding as a
more compact UTF-8 str (without escaping non-ASCII code points).
Perhaps Python 3 users would ask for a unicode output when decoding
though.

-bob

From martin at v.loewis.de  Fri Apr 10 18:11:26 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Apr 2009 18:11:26 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <87vdpcfrj8.fsf@xemacs.org>
References: <loom.20090408T110540-221@post.gmane.org>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	<49DEBB21.70305@gmail.com>	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>	<49DF05FC.9040208@gmail.com>	<79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com>
	<87vdpcfrj8.fsf@xemacs.org>
Message-ID: <49DF6FAE.3040602@v.loewis.de>

> (3) The default transfer encoding syntax is UTF-8.

Notice that the RFC is partially irrelevant. It only applies
to the application/json mime type, and JSON is used in various
other protocols, using various other encodings.

> I think it's a bad idea for any of the core
> JSON API to accept or produce bytes in any language that provides a
> Unicode string type.

So how do you integrate the encoding detection that the RFC suggests
to be done?

Regards,
Martin

From janssen at parc.com  Fri Apr 10 18:35:44 2009
From: janssen at parc.com (Bill Janssen)
Date: Fri, 10 Apr 2009 09:35:44 PDT
Subject: [Python-Dev] [Email-SIG]  the email module, text,
	and bytes (was Re: Dropping bytes "support" in json)
In-Reply-To: <ACC56383-7F1B-4CB0-908F-E75E1390AE51@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
	<20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com>
	<ACC56383-7F1B-4CB0-908F-E75E1390AE51@python.org>
Message-ID: <92023.1239381344@parc.com>

Barry Warsaw <barry at python.org> wrote:

> In that case, we really need the
> bytes-in-bytes-out-bytes-in-the-chewy-
> center API first, and build things on top of that.

Yep.

Bill

From barry at python.org  Fri Apr 10 18:56:09 2009
From: barry at python.org (Barry Warsaw)
Date: Fri, 10 Apr 2009 12:56:09 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
Message-ID: <F40AE8EC-08CC-4634-AA82-264587552F47@python.org>

On Apr 10, 2009, at 1:19 AM, glyph at divmod.com wrote:

> On 02:38 am, barry at python.org wrote:
>> So, what I'm really asking is this.  Let's say you agree that there  
>> are use cases for accessing a header value as either the raw  
>> encoded bytes or the decoded unicode.  What should this return:
>>
>> >>> message['Subject']
>>
>> The raw bytes or the decoded unicode?
>
> My personal preference would be to just get deprecate this API, and  
> get rid of it, replacing it with a slightly more explicit one.
>
>   message.headers['Subject']
>   message.bytes_headers['Subject']

This is pretty darn clever Glyph.  Stop that! :)

I'm not 100% sure I like the name .bytes_headers or that .headers  
should be the decoded header (rather than have .headers return the  
bytes thingie and say .decoded_headers return the decoded thingies),  
but I do like the general approach.

>> Now, setting headers.  Sometimes you have some unicode thing and  
>> sometimes you have some bytes.  You need to end up with bytes in  
>> the ASCII range and you'd like to leave the header value unencoded  
>> if so. But in both cases, you might have bytes or characters  
>> outside that range, so you need an explicit encoding, defaulting to  
>> utf-8 probably.
>
>   message.headers['Subject'] = 'Some text'
>
> should be equivalent to
>
>   message.headers['Subject'] = Header('Some text')

Yes, absolutely.  I think we're all in general agreement that header  
values should be instances of Header, or subclasses thereof.

> My preference would be that
>
>   message.headers['Subject'] = b'Some Bytes'
>
> would simply raise an exception.  If you've got some bytes, you  
> should instead do
>
>   message.bytes_headers['Subject'] = b'Some Bytes'
>
> or
>
>   message.headers['Subject'] = Header(bytes=b'Some Bytes',  
> encoding='utf-8')
>
> Explicit is better than implicit, right?

Yes.

Again, I really like the general idea, if I might quibble about some  
of the details.  Thanks for a great suggestion.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/33ffffa6/attachment.pgp>

From fumanchu at aminus.org  Fri Apr 10 18:47:11 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Fri, 10 Apr 2009 09:47:11 -0700
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
Message-ID: <1239382031.8682.11.camel@haku>

On Thu, 2009-04-09 at 22:38 -0400, Barry Warsaw wrote:
> On Apr 9, 2009, at 11:55 AM, Daniel Stutzbach wrote:
> 
> > On Thu, Apr 9, 2009 at 6:01 AM, Barry Warsaw <barry at python.org> wrote:
> > Anyway, aside from that decision, I haven't come up with an elegant  
> > way to allow /output/ in both bytes and strings (input is I think  
> > theoretically easier by sniffing the arguments).
> >
> > Won't this work? (assuming dumps() always returns a string)
> >
> > def dumpb(obj, encoding='utf-8', *args, **kw):
> >     s = dumps(obj, *args, **kw)
> >     return s.encode(encoding)
> 
> So, what I'm really asking is this.  Let's say you agree that there  
> are use cases for accessing a header value as either the raw encoded  
> bytes or the decoded unicode.  What should this return:
> 
>  >>> message['Subject']
> 
> The raw bytes or the decoded unicode?
> 
> Okay, so you've picked one.  Now how do you spell the other way?
> 
> The Message class probably has these explicit methods:
> 
>  >>> Message.get_header_bytes('Subject')
>  >>> Message.get_header_string('Subject')
> 
> (or better names... it's late and I'm tired ;).  One of those maps to  
> message['Subject'] but which is the more obvious choice?
> 
> Now, setting headers.  Sometimes you have some unicode thing and  
> sometimes you have some bytes.  You need to end up with bytes in the  
> ASCII range and you'd like to leave the header value unencoded if so.   
> But in both cases, you might have bytes or characters outside that  
> range, so you need an explicit encoding, defaulting to utf-8 probably.
> 
>  >>> Message.set_header('Subject', 'Some text', encoding='utf-8')
>  >>> Message.set_header('Subject', b'Some bytes')
> 
> One of those maps to
> 
>  >>> message['Subject'] = ???
> 
> I'm open to any suggestions here!

Syntactically, there's no sense in providing:

    Message.set_header('Subject', 'Some text', encoding='utf-16')

...since you could more clearly write the same as:

    Message.set_header('Subject', 'Some text'.encode('utf-16'))

The only interesting case is if you provided a *default* encoding, so that:

    Message.default_header_encoding = 'utf-16'
    Message.set_header('Subject', 'Some text')

...has the same effect.

But it would be far easier to do all the encoding at once in an output()
or serialize() method. Do different headers need different encodings? If
so, make message['Subject'] a subclass of str and give it an .encoding
attribute (with a default). If not, Message.header_encoding should be
sufficient.

Robert Brewer
fumanchu at aminus.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/1880daf9/attachment-0001.htm>

From barry at python.org  Fri Apr 10 19:08:26 2009
From: barry at python.org (Barry Warsaw)
Date: Fri, 10 Apr 2009 13:08:26 -0400
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <p04330101c6046b191e4a@[192.168.123.162]>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<p04330101c6046b191e4a@[192.168.123.162]>
Message-ID: <595A42B2-0D3B-4886-960B-F16D50D0CC5A@python.org>

On Apr 9, 2009, at 11:41 PM, Tony Nelson wrote:

> At 22:38 -0400 04/09/2009, Barry Warsaw wrote:
> ...
>> So, what I'm really asking is this.  Let's say you agree that there
>> are use cases for accessing a header value as either the raw encoded
>> bytes or the decoded unicode.  What should this return:
>>
>>>>> message['Subject']
>>
>> The raw bytes or the decoded unicode?
>
> That's an easy one:  Subject: is an unstructured header, so it must be
> text, thus Unicode.  We're looking at a high-level representation of  
> an
> email message, with parsed header fields and a MIME message tree.

I'm liking Glyph's suggestion here.  We'll probably have to support  
the message['Subject'] API for backward compatibility, but in that  
case it really should be a bytes API.

>> (or better names... it's late and I'm tired ;).  One of those maps to
>> message['Subject'] but which is the more obvious choice?
>
> Structured header fields are more of a problem.  Any header with  
> addresses
> should return a list of addresses.  I think the default return type  
> should
> depend on the data type.  To get an explicit bytes or string or list  
> of
> addresses, be explicit; otherwise, for convenience, return the  
> appropriate
> type for the particular header field name.

Yes, structured headers are trickier.  In a separate message, James  
Knight makes some excellent points, which I agree with.  However the  
email package obviously cannot support every time of structured header  
possible.  It must support this through extensibility.

The obvious way is through inheritance (i.e. subclasses of Header),  
but in my experience, using inheritance of the Message class really  
doesn't work very well.  You need to pass around factories to parsing  
functions and your application tends to have its own hierarchy of  
subclasses for whatever extra things it needs.  ISTM that subclassing  
is simply not the right pattern to support extensibility in the  
Message objects or Header objects.  Yes, this leads me to think that  
all the MIME* subclasses are essentially /wrong/.

Having said all that, the email package must support structured  
headers.  Look at the insanity which is the current folding whitespace  
splitting and the impossibility of the current code to do the right  
thing for say Subject headers and Received headers, and you begin to  
see why it must be possible to extend this stuff.

>> Now, setting headers.  Sometimes you have some unicode thing and
>> sometimes you have some bytes.  You need to end up with bytes in the
>> ASCII range and you'd like to leave the header value unencoded if so.
>> But in both cases, you might have bytes or characters outside that
>> range, so you need an explicit encoding, defaulting to utf-8  
>> probably.
>
> Never for header fields.  The default is always RFC 2047, unless it  
> isn't,
> say for params.
>
> The Message class should create an object of the appropriate  
> subclass of
> Header based on the name (or use the existing object, see other
> discussion), and that should inspect its argument and DTRT or  
> complain.

>>>>> Message.set_header('Subject', 'Some text', encoding='utf-8')
>>>>> Message.set_header('Subject', b'Some bytes')
>>
>> One of those maps to
>>
>>>>> message['Subject'] = ???
>
> The expected data type should depend on the header field.  For  
> Subject:, it
> should be bytes to be parsed or verbatim text.  For To:, it should  
> be a
> list of addresses or bytes or text to be parsed.

At a higher level, yes.  At the low level, it has to be bytes.

> The email package should be pythonic, and not require deep  
> understanding of
> dozens of RFCs to use properly.  Users don't need to know about the  
> raw
> bytes; that's the whole point of MIME and any email package.  It  
> should be
> easy to set header fields with their natural data types, and doing  
> it with
> bad data should produce an error.  This may require a bit more care  
> in the
> message parser, to always produce a parsed message with defects.

I agree that we should have some higher level APIs that make it easy  
to compose email messages, and probably easy-ish to parse a byte  
stream into an email message tree.  But we can't build those without  
the lower level raw support.  I'm also convinced that this lower level  
will be the domain of those crazy enough to have the RFCs tattooed to  
the back of their eyelids.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/8f9e960f/attachment.pgp>

From barry at python.org  Fri Apr 10 19:12:48 2009
From: barry at python.org (Barry Warsaw)
Date: Fri, 10 Apr 2009 13:12:48 -0400
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <p04330100c6046a4bedc6@[192.168.123.162]>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
	<p04330100c6046a4bedc6@[192.168.123.162]>
Message-ID: <50EC006F-CF96-45F4-AD71-73B9DE7E510E@python.org>

On Apr 9, 2009, at 11:59 PM, Tony Nelson wrote:

>> Thinking about this stuff makes me nostalgic for the sloppy happy  
>> days
>> of Python 2.x
>
> You now have the opportunity to finally unsnarl that mess.  It is  
> not an
> insurmountable opportunity.

No, it's just a full time job <wink>.  Now where did I put that hack- 
drink-coffee-twitter clone?

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/15108bc1/attachment.pgp>

From barry at python.org  Fri Apr 10 19:21:45 2009
From: barry at python.org (Barry Warsaw)
Date: Fri, 10 Apr 2009 13:21:45 -0400
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <87zlepf5hf.fsf@xemacs.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
	<87zlepf5hf.fsf@xemacs.org>
Message-ID: <67879F1D-B386-4B9B-8203-86DB977BD7FF@python.org>

On Apr 10, 2009, at 1:22 AM, Stephen J. Turnbull wrote:

>> Those objects have headers and payload.  The payload can be of any
>> type, though I think it generally breaks down into "strings" for  
>> text/
>> * types and bytes for anything else (not counting multiparts).
>
> *sigh*  Why are you back-tracking?

I'm not.  Sleep deprivation on makes it seem like that.

> The payload should be of an appropriate *object* type.  Atomic object
> types will have their content stored as string or bytes [nb I use
> Python 3 terminology throughout].  Composite types (multipart/*) won't
> need string or bytes attributes AFAICS.

Yes, agreed.

> Start by implementing the application/octet-stream and
> text/plain;charset=utf-8 object types, of course.

Yes.  See my lament about using inheritance for this.

>> It does seem to make sense to think about headers as text header  
>> names
>> and text header values.
>
> I disagree.  IMHO, structured header types should have object values,
> and something like

While I agree, there's still a need for a higher level API that make  
it easy to do the simple things.

> message['to'] = "Barry 'da FLUFL' Warsaw <barry at python.org>"
>
> should be smart enough to detect that it's a string and attempt to
> (flexibly) parse it into a fullname and a mailbox adding escapes, etc.
> Whether these should be structured objects or they can be strings or
> bytes, I'm not sure (probably bytes, not strings, though -- see next
> exampl).  OTOH
>
> message['to'] = b'''"Barry 'da.FLUFL' Warsaw" <barry at python.org>'''
>
> should assume that the client knows what they are doing, and should
> parse it strictly (and I mean "be a real bastard", eg, raise an
> exception on any non-ASCII octet), merely dividing it into fullname
> and mailbox, and caching the bytes for later insertion in a
> wire-format message.

I agree that the Message class needs to be strict.  A parser needs to  
be lenient; see the .defects attribute introduced in the current email  
package.  Oh, and this reminds me that we still haven't talked about  
idempotency.  That's an important principle in the current email  
package, but do we need to give up on that?

>> In that case, I think you want the values as unicodes, and probably
>> the headers as unicodes containing only ASCII.  So your table would  
>> be
>> strings in both cases.  OTOH, maybe your application cares about the
>> raw underlying encoded data, in which case the header names are
>> probably still strings of ASCII-ish unicodes and the values are
>> bytes.  It's this distinction (and I think the competing use cases)
>> that make a true Python 3.x API for email more complicated.
>
> I don't see why you can't have the email API be specific, with
> message['to'] always returning a structured_header object (or maybe
> even more specifically an address_header object), and methods like
>
> message['to'].build_header_as_text()
>
> which returns
>
> """To: "Barry 'da.FLUFL' Warsaw" <barry at python.org>"""
>
> and
>
> message['to'].build_header_in_wire_format()
>
> which returns
>
> b"""To: "Barry 'da.FLUFL' Warsaw" <barry at python.org>"""
>
> Then have email.textview.Message and email.wireview.Message which
> provide a simple interface where message['to'] would invoke
> .build_header_as_text() and .build_header_in_wire_format()
> respectively.

This seems similar to Glyph's basic idea, but with a different spelling.

>> Thinking about this stuff makes me nostalgic for the sloppy happy  
>> days
>> of Python 2.x
>
> Er, yeah.
>
> Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly  
> y'rs,

Can I have my uucp address back now?
-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/64999950/attachment.pgp>

From v+python at g.nevcal.com  Fri Apr 10 20:00:54 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Fri, 10 Apr 2009 11:00:54 -0700
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
Message-ID: <49DF8956.5050501@g.nevcal.com>

On approximately 4/10/2009 9:56 AM, came the following characters from 
the keyboard of Barry Warsaw:
> On Apr 10, 2009, at 1:19 AM, glyph at divmod.com wrote:
>> On 02:38 am, barry at python.org wrote:
>>> So, what I'm really asking is this.  Let's say you agree that there 
>>> are use cases for accessing a header value as either the raw encoded 
>>> bytes or the decoded unicode.  What should this return:
>>>
>>> >>> message['Subject']
>>>
>>> The raw bytes or the decoded unicode?
>>
>> My personal preference would be to just get deprecate this API, and 
>> get rid of it, replacing it with a slightly more explicit one.
>>
>>   message.headers['Subject']
>>   message.bytes_headers['Subject']
>
> This is pretty darn clever Glyph.  Stop that! :)
>
> I'm not 100% sure I like the name .bytes_headers or that .headers 
> should be the decoded header (rather than have .headers return the 
> bytes thingie and say .decoded_headers return the decoded thingies), 
> but I do like the general approach.

If one name has to be longer than the other, it should be the bytes 
version.  Real user code is more likely to want to use the text version, 
and hopefully there will be more of that type of code than 
implementations using bytes.

Of course, one could use message.header and message.bythdr and they'd be 
the same length.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From fuzzyman at voidspace.org.uk  Fri Apr 10 20:06:13 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Fri, 10 Apr 2009 19:06:13 +0100
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <49DF8956.5050501@g.nevcal.com>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
	<49DF8956.5050501@g.nevcal.com>
Message-ID: <49DF8A95.4010700@voidspace.org.uk>

Glenn Linderman wrote:
> On approximately 4/10/2009 9:56 AM, came the following characters from 
> the keyboard of Barry Warsaw:
>> On Apr 10, 2009, at 1:19 AM, glyph at divmod.com wrote:
>>> On 02:38 am, barry at python.org wrote:
>>>> So, what I'm really asking is this.  Let's say you agree that there 
>>>> are use cases for accessing a header value as either the raw 
>>>> encoded bytes or the decoded unicode.  What should this return:
>>>>
>>>> >>> message['Subject']
>>>>
>>>> The raw bytes or the decoded unicode?
>>>
>>> My personal preference would be to just get deprecate this API, and 
>>> get rid of it, replacing it with a slightly more explicit one.
>>>
>>>   message.headers['Subject']
>>>   message.bytes_headers['Subject']
>>
>> This is pretty darn clever Glyph.  Stop that! :)
>>
>> I'm not 100% sure I like the name .bytes_headers or that .headers 
>> should be the decoded header (rather than have .headers return the 
>> bytes thingie and say .decoded_headers return the decoded thingies), 
>> but I do like the general approach.
>
> If one name has to be longer than the other, it should be the bytes 
> version.  Real user code is more likely to want to use the text 
> version, and hopefully there will be more of that type of code than 
> implementations using bytes.
>
> Of course, one could use message.header and message.bythdr and they'd 
> be the same length.
>
>
Shouldn't headers always be text?

Michael

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From stephen at xemacs.org  Fri Apr 10 20:13:35 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 11 Apr 2009 03:13:35 +0900
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49DF6FAE.3040602@v.loewis.de>
References: <loom.20090408T110540-221@post.gmane.org>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<49DF05FC.9040208@gmail.com>
	<79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com>
	<87vdpcfrj8.fsf@xemacs.org> <49DF6FAE.3040602@v.loewis.de>
Message-ID: <87r600fkc0.fsf@xemacs.org>

"Martin v. L?wis" writes:

 > > (3) The default transfer encoding syntax is UTF-8.
 > 
 > Notice that the RFC is partially irrelevant. It only applies
 > to the application/json mime type, and JSON is used in various
 > other protocols, using various other encodings.

Sure.  That's their problem.  In Python, Unicode is the native
encoding, and we have codecs to deal with the outside world, no?  That
happens to match very well not only with RFC 4627, but the sidebar on
json.org that defines JSON.

 > > I think it's a bad idea for any of the core JSON API to accept or
 > > produce bytes in any language that provides a Unicode string type.
 > 
 > So how do you integrate the encoding detection that the RFC suggests
 > to be done?

I suggest you don't.  That's mission creep.  Think about writing tests
for it, and remember that out in the wild those "various other
encodings" almost certainly include Shift JIS, Big5, and KOI8-R.  Both
those considerations point to "er, let's delegate detection and
en/decoding to the nice folks who maintain the codec suite."  Where
it's embedded in some other protocol which specifies a TES, the TES
can be implemented there, too.

As I wrote earlier, I don't see anything wrong with providing a
wrapper module that deals with some default/common/easy cases.  But
I'd stick it in the contrib directory.

From barry at python.org  Fri Apr 10 20:55:23 2009
From: barry at python.org (Barry Warsaw)
Date: Fri, 10 Apr 2009 14:55:23 -0400
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <49DF8956.5050501@g.nevcal.com>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
	<49DF8956.5050501@g.nevcal.com>
Message-ID: <71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org>

On Apr 10, 2009, at 2:00 PM, Glenn Linderman wrote:

> If one name has to be longer than the other, it should be the bytes  
> version.  Real user code is more likely to want to use the text  
> version, and hopefully there will be more of that type of code than  
> implementations using bytes.

I'm not sure we know that yet, actually.  Nothing written for Python 2  
counts, and email is too broken in 3 for any sane person to be writing  
such code for Python 3.

> Of course, one could use message.header and message.bythdr and  
> they'd be the same length.

I was trying to figure out what  a 'thdr' was that we'd want to index  
'by' it. :)

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/e6382e3d/attachment.pgp>

From barry at python.org  Fri Apr 10 20:55:56 2009
From: barry at python.org (Barry Warsaw)
Date: Fri, 10 Apr 2009 14:55:56 -0400
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <49DF8A95.4010700@voidspace.org.uk>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
	<49DF8956.5050501@g.nevcal.com> <49DF8A95.4010700@voidspace.org.uk>
Message-ID: <BBE1C6FA-6DA7-4E61-ABB7-7276AA998872@python.org>

On Apr 10, 2009, at 2:06 PM, Michael Foord wrote:

> Shouldn't headers always be text?

/me weeps

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/abed0eb6/attachment.pgp>

From stephen at xemacs.org  Fri Apr 10 21:04:22 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 11 Apr 2009 04:04:22 +0900
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <67879F1D-B386-4B9B-8203-86DB977BD7FF@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
	<87zlepf5hf.fsf@xemacs.org>
	<67879F1D-B386-4B9B-8203-86DB977BD7FF@python.org>
Message-ID: <87prfkfhzd.fsf@xemacs.org>

Shouldn't this thread move lock stock and .signature to email-sig?

Barry Warsaw writes:

 > >> It does seem to make sense to think about headers as text header
 > >> names and text header values.
 > >
 > > I disagree.  IMHO, structured header types should have object values,
 > > and something like
 > 
 > While I agree, there's still a need for a higher level API that make  
 > it easy to do the simple things.

Sure.  I'm suggesting that the way to determine whether something is
simple or not is by whether it falls out naturally from correct
structure.  Ie, no operations that only a Cirque du Soleil juggler can
perform are allowed.

 > I agree that the Message class needs to be strict.  A parser needs to  
 > be lenient;

Not always.  The Postel Principle only applies to stuph coming in off
the wire.  But we're *also* going to be parsing pseudo-email
components that are being handed to us by applications (eg, the
perennial control-character-in-the-unremovable-address Mailman bug).
Our parser should Just Say No to that crap.

 > see the .defects attribute introduced in the current email  
 > package.  Oh, and this reminds me that we still haven't talked about  
 > idempotency.  That's an important principle in the current email  
 > package, but do we need to give up on that?

"Idempotency"?  I'm not sure what that means in the context of the
email package ... multiplication by zero?<wink>  Do you mean that
.parse().to_wire() should be idempotent?  Yes, I think that's a good
idea, and it shouldn't be too hard to implement by (optionally?)
caching the whole original message or individual components (headers
with all whitespace including folding cached verbatim, etc).  I think
caching has to be done, since stuff like "did the original fold with a
leading tab or a leading space, and at what column" and so on seems
kind of pointless to encode as attributes on Header objects.

[Description of MessageTextView and MessageWireView elided.]

 > This seems similar to Glyph's basic idea, but with a different spelling.

Yes.  I don't much care which way it's done, and Glyph's style of
spelling is more explicit.  But I was thinking in terms of the number
of people who are surely going to sing "Mama don' 'low no Unicodes
roun' here" and squeal "codec WTF?! outta mah face, man!"

From stephen at xemacs.org  Fri Apr 10 21:06:59 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 11 Apr 2009 04:06:59 +0900
Subject: [Python-Dev] [Email-SIG]  the email module, text,
	and bytes (was Re: Dropping bytes "support" in json)
In-Reply-To: <92023.1239381344@parc.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
	<20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com>
	<ACC56383-7F1B-4CB0-908F-E75E1390AE51@python.org>
	<92023.1239381344@parc.com>
Message-ID: <87ocv4fhv0.fsf@xemacs.org>

Bill Janssen writes:
 > Barry Warsaw <barry at python.org> wrote:
 > 
 > > In that case, we really need the
 > > bytes-in-bytes-out-bytes-in-the-chewy-
 > > center API first, and build things on top of that.
 > 
 > Yep.

Uh, I hate to rain on a parade, but isn't that how we arrived at the
*current* email package?

From pje at telecommunity.com  Fri Apr 10 21:05:17 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 10 Apr 2009 15:05:17 -0400
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <49DF08D8.9080806@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de>
	<49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz>
	<49DE9FB4.9060908@gmail.com>
	<43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>
	<ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>
	<49DF08D8.9080806@gmail.com>
Message-ID: <20090410190248.913033A4063@sparrow.telecommunity.com>

At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote:
>This problem (slow application startup times due to too many imports at
>startup, which can in turn can be due to top level imports for library
>or framework functionality that a given application doesn't actually
>use) is actually the main reason I sometimes wish for a nice, solid lazy
>module import mechanism that manages to avoid the potential deadlock
>problems created by using import statements inside functions.

Have you tried http://pypi.python.org/pypi/Importing ? Or more 
specifically, http://peak.telecommunity.com/DevCenter/Importing#lazy-imports ?

It does of course use the import lock, but as long as your top-level 
module code doesn't acquire locks (directly or indirectly), it 
shouldn't be possible to deadlock.  (Or more precisely, to add any 
*new* deadlocks that you didn't already have.)

From barry at python.org  Fri Apr 10 21:04:01 2009
From: barry at python.org (Barry Warsaw)
Date: Fri, 10 Apr 2009 15:04:01 -0400
Subject: [Python-Dev] [Email-SIG]  the email module, text,
	and bytes (was Re: Dropping bytes "support" in json)
In-Reply-To: <87ocv4fhv0.fsf@xemacs.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org>
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
	<20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com>
	<ACC56383-7F1B-4CB0-908F-E75E1390AE51@python.org>
	<92023.1239381344@parc.com> <87ocv4fhv0.fsf@xemacs.org>
Message-ID: <F87C1713-27A1-4D2B-BA42-1AC70B77073C@python.org>

On Apr 10, 2009, at 3:06 PM, Stephen J. Turnbull wrote:

> Bill Janssen writes:
>> Barry Warsaw <barry at python.org> wrote:
>>
>>> In that case, we really need the
>>> bytes-in-bytes-out-bytes-in-the-chewy-
>>> center API first, and build things on top of that.
>>
>> Yep.
>
> Uh, I hate to rain on a parade, but isn't that how we arrived at the
> *current* email package?

Not really.  We got here because <ahem>we</ahem> were too damn sloppy  
about the distinction.

I'm going to remove python-dev from subsequent follow ups.  Please  
join us at email-sig for further discussion.

Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/cb8193c0/attachment.pgp>

From aahz at pythoncraft.com  Fri Apr 10 21:05:56 2009
From: aahz at pythoncraft.com (Aahz)
Date: Fri, 10 Apr 2009 12:05:56 -0700
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <BBE1C6FA-6DA7-4E61-ABB7-7276AA998872@python.org>
References: <ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
	<49DF8956.5050501@g.nevcal.com> <49DF8A95.4010700@voidspace.org.uk>
	<BBE1C6FA-6DA7-4E61-ABB7-7276AA998872@python.org>
Message-ID: <20090410190555.GA5843@panix.com>

On Fri, Apr 10, 2009, Barry Warsaw wrote:
> On Apr 10, 2009, at 2:06 PM, Michael Foord wrote:
>>
>> Shouldn't headers always be text?
>
> /me weeps

/me hands Barry a hankie
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?

From turnbull at sk.tsukuba.ac.jp  Fri Apr 10 21:22:09 2009
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Sat, 11 Apr 2009 04:22:09 +0900
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <1239382031.8682.11.camel@haku>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<1239382031.8682.11.camel@haku>
Message-ID: <87myaofh5q.fsf@xemacs.org>

Robert Brewer writes:

 > Syntactically, there's no sense in providing:
 > 
 >     Message.set_header('Subject', 'Some text', encoding='utf-16')
 > 
 > ...since you could more clearly write the same as:
 > 
 >     Message.set_header('Subject', 'Some text'.encode('utf-16'))

Which you now must *parse* and guess the encoding to determine how to
RFC-2047-encode the binary mush.  I think the encoding parameter is
necessary here.

 > But it would be far easier to do all the encoding at once in an
 > output() or serialize() method. Do different headers need different
 > encodings?

You can have multiple encodings within a single header (and a na?ve
algorithm might very well encode "The price of G?del-Escher-Bach is
?25" as "The price of =?ISO-8859-1?Q?G=F6del-Escher-Bach?= is
=?ISO-8859-15?Q?=A425?=").

 > If so, make message['Subject'] a subclass of str and give it an
 > .encoding attribute (with a default).

But if you've set the .encoding attribute, you don't need to encode
'Some text'; .set_header() can take care of it for you.  And what
about the possibility that the encoding attributes disagree with the
argument you passed to the codec?

From ctb at msu.edu  Fri Apr 10 22:38:09 2009
From: ctb at msu.edu (C. Titus Brown)
Date: Fri, 10 Apr 2009 13:38:09 -0700
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
Message-ID: <20090410203809.GA24530@idyll.org>

Hi all,

this year we have 10-12 GSoC applications that I've put in the "relevant
to core Python development" category.  These projects, if mentors etc
are found, are *guaranteed* a slot under the PSF GSoC umbrella.  As
backup GSoC admin and general busybody, I've taken on the work of
coordinating these as a special subgroup within the PSF GSoC, and I
thought it would be good to mention them to python-dev.

Note that all of them have been run by a few different committers,
including Martin, Tarek, Benjamin, and Brett, and they've been obliging
enough to triage a few of them.  Thanks, guys!

Here's what's left after that triage.  Note that except for the four at
the top, these have all received positive support from *someone* who is
a committer and I don't think we need to discuss them here -- patches
etc. can go through normal "python-dev" channels during the course of the
summer.

I am looking for feedback on the first four, though.  Can these
reasonably be considered "core" priorites for Python?  Remember, this
"costs" us something in the sense of preferring these over Python
subprojects like (random example) Cython, NumPy, PySoy, Tahoe, Gajim,
etc.

---

Questionable "core":

2x "port NumPy to py3k" -- NumPy is a major Python module and porting it
	to py3k fits with Guido's request that "more stuff get ported".
	To be clear, I don't think anyone expects all of NumPy to get
	ported this summer, but these students will work through issues
	associated with porting big chunks o' code to py3k.

	One medium/strong proposal, one medium/weak proposal.

Comments/thoughts?

2x "improve testing tools for py3k" -- variously focus on improving test
	coverage and testing wrappers.

	One proposes to provide a nice wrapper to make nose and py.test
	capable of running the regrtests, which (with no change to
	regrtest) would let people run tests in parallel, distribute or
	run tests across multiple machines (including Snakebite), tag
	and run subsets of tests with personal and/or public tags, and
	otherwise take advantage of many of the nice features of nose
	and py.test.

	The other proposes to measure & increase the code coverage of
	the py3k tests in both Python and C, integrate across multiple
	machines, and otherwise provide a nice set of integrated reports
	that anyone can generate on their own machines.  This proposal,
	in particular, could move smoothly towards the effort to produce
	a "Python-wide" test suite for CPython/IronPython/PyPy/Jython.
	(This wasn't integrated into the proposal because I only found
	out about it after the proposals were due.)

	I personally think that both testing proposals are good, and
	they grew out of conversations I had with Brett, who thinks that
	the general ideas are good.  So, err, I'm looking for pushback,
	I guess ;).  I can expand on these ideas a bit if people are
	interested.

	Both proposals are medium at least, and I've personally been
	positively impressed with the student interaction.

Comments/thoughts?

---

Unquestionably "core" by my criteria above:

3to2 tool -- 'nuff said.

subprocess improvement -- integrating, testing, and proposing some of
	the various subprocess improvements that have passed across this
	list & the bug tracker

IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker
	issues relating to IDLE and Tkinter.

roundup VCS integration / build tools to support core development --
	a single student proposed both of these and has received some
	support.  See http://slexy.org/view/s2pFgWxufI for details.

sphinx framework improvement -- support for per-paragraph comments and
	user/developer interface for submitting/committing fixes 

2x "keyring package" -- see
http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/.
The poorer one of these will probably be axed unless Tarek gives it
strong support.

--

--titus
-- 
C. Titus Brown, ctb at msu.edu

From ggpolo at gmail.com  Fri Apr 10 22:53:23 2009
From: ggpolo at gmail.com (Guilherme Polo)
Date: Fri, 10 Apr 2009 17:53:23 -0300
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <20090410203809.GA24530@idyll.org>
References: <20090410203809.GA24530@idyll.org>
Message-ID: <ac2200130904101353j2fe0dd21o77f01894a520fbd3@mail.gmail.com>

On Fri, Apr 10, 2009 at 5:38 PM, C. Titus Brown <ctb at msu.edu> wrote:
> Hi all,
>
> this year we have 10-12 GSoC applications that I've put in the "relevant
> to core Python development" category. ?These projects, if mentors etc
> are found, are *guaranteed* a slot under the PSF GSoC umbrella. ?As
> backup GSoC admin and general busybody, I've taken on the work of
> coordinating these as a special subgroup within the PSF GSoC, and I
> thought it would be good to mention them to python-dev.
>
> Note that all of them have been run by a few different committers,
> including Martin, Tarek, Benjamin, and Brett, and they've been obliging
> enough to triage a few of them. ?Thanks, guys!
>
> Here's what's left after that triage.
> .
> .
>
> IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker
> ? ? ? ?issues relating to IDLE and Tkinter.
>

Is it important, for the discussion, to mention that it also involves
testing this area (idle and tkinter), Titus ? I'm considering this
more important than "just" dealing with the tracker issues.

> --titus
> --
> C. Titus Brown, ctb at msu.edu

Regards,

-- 
-- Guilherme H. Polo Goncalves

From ctb at msu.edu  Fri Apr 10 23:02:26 2009
From: ctb at msu.edu (C. Titus Brown)
Date: Fri, 10 Apr 2009 14:02:26 -0700
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <ac2200130904101353j2fe0dd21o77f01894a520fbd3@mail.gmail.com>
References: <20090410203809.GA24530@idyll.org>
	<ac2200130904101353j2fe0dd21o77f01894a520fbd3@mail.gmail.com>
Message-ID: <20090410210226.GB13018@idyll.org>

On Fri, Apr 10, 2009 at 05:53:23PM -0300, Guilherme Polo wrote:
-> >
-> > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker
-> > ? ? ? ?issues relating to IDLE and Tkinter.
-> >
-> 
-> Is it important, for the discussion, to mention that it also involves
-> testing this area (idle and tkinter), Titus ? I'm considering this
-> more important than "just" dealing with the tracker issues.

What, I tell you that your app is going to be accepted and we shouldn't
argue about it, and you want to argue about it? ;)

--titus
-- 
C. Titus Brown, ctb at msu.edu

From tjreedy at udel.edu  Fri Apr 10 23:05:17 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 10 Apr 2009 17:05:17 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
Message-ID: <grocad$e1c$1@ger.gmane.org>

glyph at divmod.com wrote:
> 
> On 03:21 am, ncoghlan at gmail.com wrote:
>> Barry Warsaw wrote:
> 
>>> I don't know whether the parameter thing will work or not, but you're
>>> probably right that we need to get the bytes-everywhere API first.
> 
>> Given that json is a wire protocol, that sounds like the right approach
>> for json as well. Once bytes-everywhere works, then a text API can be
>> built on top of it, but it is difficult to build a bytes API on top of a
>> text one.
> 
> I wish I could agree, but JSON isn't really a wire protocol.  According 
> to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the 
> serialization of structured data".  There are some notes about encoding, 
> but it is very clearly described in terms of unicode code points.
>> So I guess the IO library *is* the right model: bytes at the bottom of
>> the stack, with text as a wrapper around it (mediated by codecs).
> 
> In email's case this is true, but in JSON's case it's not.  JSON is a 
> format defined as a sequence of code points; MIME is defined as a 
> sequence of octets.

What is the 'bytes support' issue for json?  Is it about content within 
a json text? Or about the transport format of a json text?

Reading rfc4627, a json text is a unicode string representation of an 
instance of one of 6 classes.  In Python terms, they are Nonetype, bool, 
numbers (int, float, decimal?), (unicode) str, list, and [string-keyed] 
dict.  The representation is nearly identical to Python's literals and 
displays.

For transport,  the encoding SHALL be one of UTF-8, -16LE/BE, -32LE/BD, 
with UFT-8 the 'default'.

So a json parser (a restricted eval()) tokenizes and parses a stream of 
unicode chars which in Python could come from either a unicode string or 
decoded bytes object.  The bytes decoding could be either bulk or 
incremental.

Similarly, a json generator (an repr()-like function) produces a stream 
of unicode chars which again could be optionally encoded to bytes, 
either incrementally or in bulk.

The standard does not specify any correspondence between representations 
and domain objects,  For Python making 'null', 'true', and 'false' 
inter-convert with None, True, False is obvious.  Numbers are slightly 
more problemmtical.  A generator could produce decimal literals from 
both floats and decimals but without a non-json extension, a parser 
could only convert back to one, so the other would not round-trip. (Int 
could be handled by the presence or absence of '.0'.)  Similarly, tuples 
could be represented, like lists, as json square-bracketed arrays, but 
they would be converted back to lists, not tuples, unless a non-json 
extension were used.

So the two possible byte-suppost content issues I see are how to 
represent them as legal json strings and/or whether some device should 
be added to make them round-trip.  But as indicated above, these two 
issues are not unique to bytes.

Terry Jan Reedy

From tleeuwenburg at gmail.com  Fri Apr 10 23:26:12 2009
From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg)
Date: Sat, 11 Apr 2009 07:26:12 +1000
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <20090410203809.GA24530@idyll.org>
References: <20090410203809.GA24530@idyll.org>
Message-ID: <43c8685c0904101426q3796d459w2510d236f5f5831@mail.gmail.com>

Well, I think Numpy is of huge importance to a major Python user segment,
the scientific community. I don't know if that makes it 'core', but I
strongly agree that it's important.
Better testing is always useful, and more "core", but IMO less important.

-T

On Sat, Apr 11, 2009 at 6:38 AM, C. Titus Brown <ctb at msu.edu> wrote:

> Hi all,
>
> this year we have 10-12 GSoC applications that I've put in the "relevant
> to core Python development" category.  These projects, if mentors etc
> are found, are *guaranteed* a slot under the PSF GSoC umbrella.  As
> backup GSoC admin and general busybody, I've taken on the work of
> coordinating these as a special subgroup within the PSF GSoC, and I
> thought it would be good to mention them to python-dev.
>
> Note that all of them have been run by a few different committers,
> including Martin, Tarek, Benjamin, and Brett, and they've been obliging
> enough to triage a few of them.  Thanks, guys!
>
> Here's what's left after that triage.  Note that except for the four at
> the top, these have all received positive support from *someone* who is
> a committer and I don't think we need to discuss them here -- patches
> etc. can go through normal "python-dev" channels during the course of the
> summer.
>
> I am looking for feedback on the first four, though.  Can these
> reasonably be considered "core" priorites for Python?  Remember, this
> "costs" us something in the sense of preferring these over Python
> subprojects like (random example) Cython, NumPy, PySoy, Tahoe, Gajim,
> etc.
>
> ---
>
> Questionable "core":
>
> 2x "port NumPy to py3k" -- NumPy is a major Python module and porting it
>        to py3k fits with Guido's request that "more stuff get ported".
>        To be clear, I don't think anyone expects all of NumPy to get
>        ported this summer, but these students will work through issues
>        associated with porting big chunks o' code to py3k.
>
>        One medium/strong proposal, one medium/weak proposal.
>
> Comments/thoughts?
>
> 2x "improve testing tools for py3k" -- variously focus on improving test
>        coverage and testing wrappers.
>
>        One proposes to provide a nice wrapper to make nose and py.test
>        capable of running the regrtests, which (with no change to
>        regrtest) would let people run tests in parallel, distribute or
>        run tests across multiple machines (including Snakebite), tag
>        and run subsets of tests with personal and/or public tags, and
>        otherwise take advantage of many of the nice features of nose
>        and py.test.
>
>        The other proposes to measure & increase the code coverage of
>        the py3k tests in both Python and C, integrate across multiple
>        machines, and otherwise provide a nice set of integrated reports
>        that anyone can generate on their own machines.  This proposal,
>        in particular, could move smoothly towards the effort to produce
>        a "Python-wide" test suite for CPython/IronPython/PyPy/Jython.
>        (This wasn't integrated into the proposal because I only found
>        out about it after the proposals were due.)
>
>        I personally think that both testing proposals are good, and
>        they grew out of conversations I had with Brett, who thinks that
>        the general ideas are good.  So, err, I'm looking for pushback,
>        I guess ;).  I can expand on these ideas a bit if people are
>        interested.
>
>        Both proposals are medium at least, and I've personally been
>        positively impressed with the student interaction.
>
> Comments/thoughts?
>
> ---
>
> Unquestionably "core" by my criteria above:
>
> 3to2 tool -- 'nuff said.
>
> subprocess improvement -- integrating, testing, and proposing some of
>        the various subprocess improvements that have passed across this
>        list & the bug tracker
>
> IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker
>        issues relating to IDLE and Tkinter.
>
> roundup VCS integration / build tools to support core development --
>        a single student proposed both of these and has received some
>        support.  See http://slexy.org/view/s2pFgWxufI for details.
>
> sphinx framework improvement -- support for per-paragraph comments and
>        user/developer interface for submitting/committing fixes
>
> 2x "keyring package" -- see
>
> http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/
> .
> The poorer one of these will probably be axed unless Tarek gives it
> strong support.
>
> --
>
> --titus
> --
> C. Titus Brown, ctb at msu.edu
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/tleeuwenburg%40gmail.com
>

-- 
--------------------------------------------------
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090411/2bbde288/attachment.htm>

From ggpolo at gmail.com  Fri Apr 10 23:39:46 2009
From: ggpolo at gmail.com (Guilherme Polo)
Date: Fri, 10 Apr 2009 18:39:46 -0300
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <20090410210226.GB13018@idyll.org>
References: <20090410203809.GA24530@idyll.org>
	<ac2200130904101353j2fe0dd21o77f01894a520fbd3@mail.gmail.com> 
	<20090410210226.GB13018@idyll.org>
Message-ID: <ac2200130904101439m7e706766x206a45db15496516@mail.gmail.com>

On Fri, Apr 10, 2009 at 6:02 PM, C. Titus Brown <ctb at msu.edu> wrote:
> On Fri, Apr 10, 2009 at 05:53:23PM -0300, Guilherme Polo wrote:
> -> >
> -> > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker
> -> > ? ? ? ?issues relating to IDLE and Tkinter.
> -> >
> ->
> -> Is it important, for the discussion, to mention that it also involves
> -> testing this area (idle and tkinter), Titus ? I'm considering this
> -> more important than "just" dealing with the tracker issues.
>
> What, I tell you that your app is going to be accepted and we shouldn't
> argue about it, and you want to argue about it? ;)
>

Oh awesome then :) I think I misread part of your original email.

> --titus
> --
> C. Titus Brown, ctb at msu.edu
>

-- 
-- Guilherme H. Polo Goncalves

From benjamin at python.org  Sat Apr 11 01:05:02 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Fri, 10 Apr 2009 18:05:02 -0500
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <20090410203809.GA24530@idyll.org>
References: <20090410203809.GA24530@idyll.org>
Message-ID: <1afaf6160904101605l2235f906if36aa79703cd9fd7@mail.gmail.com>

2009/4/10 C. Titus Brown <ctb at msu.edu>:
> 2x "improve testing tools for py3k" -- variously focus on improving test
> ? ? ? ?coverage and testing wrappers.
>
> ? ? ? ?One proposes to provide a nice wrapper to make nose and py.test
> ? ? ? ?capable of running the regrtests, which (with no change to
> ? ? ? ?regrtest) would let people run tests in parallel, distribute or
> ? ? ? ?run tests across multiple machines (including Snakebite), tag
> ? ? ? ?and run subsets of tests with personal and/or public tags, and
> ? ? ? ?otherwise take advantage of many of the nice features of nose
> ? ? ? ?and py.test.
>
> ? ? ? ?The other proposes to measure & increase the code coverage of
> ? ? ? ?the py3k tests in both Python and C, integrate across multiple
> ? ? ? ?machines, and otherwise provide a nice set of integrated reports
> ? ? ? ?that anyone can generate on their own machines. ?This proposal,
> ? ? ? ?in particular, could move smoothly towards the effort to produce
> ? ? ? ?a "Python-wide" test suite for CPython/IronPython/PyPy/Jython.
> ? ? ? ?(This wasn't integrated into the proposal because I only found
> ? ? ? ?out about it after the proposals were due.)
>
> ? ? ? ?I personally think that both testing proposals are good, and
> ? ? ? ?they grew out of conversations I had with Brett, who thinks that
> ? ? ? ?the general ideas are good. ?So, err, I'm looking for pushback,
> ? ? ? ?I guess ;). ?I can expand on these ideas a bit if people are
> ? ? ? ?interested.
>
> ? ? ? ?Both proposals are medium at least, and I've personally been
> ? ? ? ?positively impressed with the student interaction.

To me, both of those proposals seem to say "measure and improve test
coverage" or "nose integration" with a severe lack specific details.
Especially the nose plugin one seems like very little work. (Running
default nose in the test directory in fact works fairly well.)

Another small nit is that they should address Python 2.x, too.

-- 
Regards,
Benjamin

From ctb at msu.edu  Sat Apr 11 01:35:24 2009
From: ctb at msu.edu (C. Titus Brown)
Date: Fri, 10 Apr 2009 16:35:24 -0700
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <1afaf6160904101605l2235f906if36aa79703cd9fd7@mail.gmail.com>
References: <20090410203809.GA24530@idyll.org>
	<1afaf6160904101605l2235f906if36aa79703cd9fd7@mail.gmail.com>
Message-ID: <20090410233524.GA18347@idyll.org>

On Fri, Apr 10, 2009 at 06:05:02PM -0500, Benjamin Peterson wrote:
-> 2009/4/10 C. Titus Brown <ctb at msu.edu>:
-> > 2x "improve testing tools for py3k" -- variously focus on improving test
-> > ?? ?? ?? ??coverage and testing wrappers.
-> >
-> > ?? ?? ?? ??One proposes to provide a nice wrapper to make nose and py.test
-> > ?? ?? ?? ??capable of running the regrtests, which (with no change to
-> > ?? ?? ?? ??regrtest) would let people run tests in parallel, distribute or
-> > ?? ?? ?? ??run tests across multiple machines (including Snakebite), tag
-> > ?? ?? ?? ??and run subsets of tests with personal and/or public tags, and
-> > ?? ?? ?? ??otherwise take advantage of many of the nice features of nose
-> > ?? ?? ?? ??and py.test.
-> >
-> > ?? ?? ?? ??The other proposes to measure & increase the code coverage of
-> > ?? ?? ?? ??the py3k tests in both Python and C, integrate across multiple
-> > ?? ?? ?? ??machines, and otherwise provide a nice set of integrated reports
-> > ?? ?? ?? ??that anyone can generate on their own machines. ??This proposal,
-> > ?? ?? ?? ??in particular, could move smoothly towards the effort to produce
-> > ?? ?? ?? ??a "Python-wide" test suite for CPython/IronPython/PyPy/Jython.
-> > ?? ?? ?? ??(This wasn't integrated into the proposal because I only found
-> > ?? ?? ?? ??out about it after the proposals were due.)
-> >
-> > ?? ?? ?? ??I personally think that both testing proposals are good, and
-> > ?? ?? ?? ??they grew out of conversations I had with Brett, who thinks that
-> > ?? ?? ?? ??the general ideas are good. ??So, err, I'm looking for pushback,
-> > ?? ?? ?? ??I guess ;). ??I can expand on these ideas a bit if people are
-> > ?? ?? ?? ??interested.
-> >
-> > ?? ?? ?? ??Both proposals are medium at least, and I've personally been
-> > ?? ?? ?? ??positively impressed with the student interaction.
-> 
-> To me, both of those proposals seem to say "measure and improve test
-> coverage" or "nose integration" with a severe lack specific details.
-> Especially the nose plugin one seems like very little work. (Running
-> default nose in the test directory in fact works fairly well.)

...fairly, yes ;).  But not perfectly.  And certainly not with
equivalent guarantees to regrtest, which is really what Python
developers need.  Tracking down the corner cases, writing up examples,
setting up tags, getting multiprocess to work properly, and making sure
that coverage recording works properly, and then getting people to try
it out on THEIR machines, is likely to be a lot of work.

The plugin ecosystem for nose is growing daily and supporting that for
core would be fantastic; extending it to py.test (whose plugin interface
is now mostly compatible with nose) would be even better.

The lack of detail on the code coverage is intentional, IMO.  It's
non-trivial to get a full handle on C code coverage integrated with
Python code coverage -- or at least it has been for me -- so I supported
the student focusing on first writing robust coverage analysis tools,
and only then deciding what to "hit" with more tests.  I will encourage
the student to talk to this list (or the "tests" list in the stdlib sig)
in order to target areas that are more relevant to people.

I have had a hard time getting a good sense of what core code is well
tested and what is not well tested, across various platforms.  While
Walter's C/Python integrated code coverage site is nice, it would be
even nicer to have a way to generate all that information within any
particular checkout on a real-time basis.  Doing so in the context of
Snakebite would be icing... and I think it's worth supporting in core,
especially if it can be done without any changes *to* core.

-> Another small nit is that they should address Python 2.x, too.

I asked that they focus on EITHER 2.x or 3.x, since "too broad" is an
equally valid criticism.  Certainly 3.x is the future so I though
focusing on increasing code coverage, and especially C code coverage,
could best be applied to 3.x.

cheers,
--titus
--
C. Titus Brown, ctb at msu.edu

From jackdied at gmail.com  Sat Apr 11 01:53:56 2009
From: jackdied at gmail.com (Jack diederich)
Date: Fri, 10 Apr 2009 19:53:56 -0400
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <20090410203809.GA24530@idyll.org>
References: <20090410203809.GA24530@idyll.org>
Message-ID: <b8e622740904101653g4e44e4bmffb5f664ba493756@mail.gmail.com>

On Fri, Apr 10, 2009 at 4:38 PM, C. Titus Brown <ctb at msu.edu> wrote:
[megasnip]
> roundup VCS integration / build tools to support core development --
> ? ? ? ?a single student proposed both of these and has received some
> ? ? ? ?support. ?See http://slexy.org/view/s2pFgWxufI for details.

>From the listed webpage I have no idea what he is promising (a
combination of very high level and very low level tasks).  If he is
offering all the same magic for Hg that Trac does for SVN (autolinking
"r2001" text to patches, for example) then I'm +1.  That should be
cake even for a student project.

He says vague things about patches too, but I'm not sure what.  If he
wanted to make that into a 'patchbot' that just applied every patch in
isolation and ran 'make && make test' and posted results in the
tracker I'd be a happy camper.

But maybe those are goals for next year, because I'm not quite sure
what the proposal is.

-Jack

From greg.ewing at canterbury.ac.nz  Sat Apr 11 02:41:14 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 11 Apr 2009 12:41:14 +1200
Subject: [Python-Dev] Lazy importing (was Rethinking intern() and its data
	structure)
In-Reply-To: <49DF08D8.9080806@gmail.com>
References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de>
	<49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz>
	<49DE9FB4.9060908@gmail.com>
	<43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>
	<ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>
	<49DF08D8.9080806@gmail.com>
Message-ID: <49DFE72A.5010905@canterbury.ac.nz>

Nick Coghlan wrote:

> I sometimes wish for a nice, solid lazy
> module import mechanism that manages to avoid the potential deadlock
> problems created by using import statements inside functions.

I created an ad-hoc one of these for PyGUI recently.
I can send you the code if you're interested.

I didn't have any problems with deadlocks, but I
did find one rather annoying problem. It seems that
an exception occurring at certain times during the
import process gets swallowed and turned into a
generic ImportError. I had to resort to catching
exceptions and printing my own traceback in order
to diagnose missing auto-imported names.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Sat Apr 11 02:51:29 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 11 Apr 2009 12:51:29 +1200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<49DF05FC.9040208@gmail.com>
	<79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com>
Message-ID: <49DFE991.8090605@canterbury.ac.nz>

Paul Moore wrote:

> 3.  Encoding
> 
>    JSON text SHALL be encoded in Unicode.  The default encoding is
>    UTF-8.
> 
> This is at best confused (in my utterly non-expert opinion :-)) as
> Unicode isn't an encoding...

I'm inclined to agree. I'd go further and say that if JSON
is really mean to be a text format, the standard has no
business mentioning encodings at all.

The reason you use a text format in the first place is that
you have some way of transmitting text, and you want to
send something that isn't text. In that situation, the
encoding is already determined by whatever means you're
using to send the text.

-- 
Greg

From brendan at kublai.com  Sat Apr 11 02:52:01 2009
From: brendan at kublai.com (Brendan Cully)
Date: Fri, 10 Apr 2009 17:52:01 -0700
Subject: [Python-Dev] Rethinking intern() and its data structure
In-Reply-To: <20090410190248.913033A4063@sparrow.telecommunity.com>
References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de>
	<49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz>
	<49DE9FB4.9060908@gmail.com>
	<43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com>
	<ca471dc20904092126t60b8b22cma9c23429ba53d7be@mail.gmail.com>
	<49DF08D8.9080806@gmail.com>
	<20090410190248.913033A4063@sparrow.telecommunity.com>
Message-ID: <20090411005201.GD7706@kremvax.cs.ubc.ca>

On Friday, 10 April 2009 at 15:05, P.J. Eby wrote:
> At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote:
>> This problem (slow application startup times due to too many imports at
>> startup, which can in turn can be due to top level imports for library
>> or framework functionality that a given application doesn't actually
>> use) is actually the main reason I sometimes wish for a nice, solid lazy
>> module import mechanism that manages to avoid the potential deadlock
>> problems created by using import statements inside functions.

I'd love to see that too. I imagine it would be beneficial for many
python applications.

> Have you tried http://pypi.python.org/pypi/Importing ? Or more  
> specifically, 
> http://peak.telecommunity.com/DevCenter/Importing#lazy-imports ?

Here's what we do in Mercurial, which is a little more user-friendly,
but possibly too magical for general use (but provides us a very nice
speedup):

http://www.selenic.com/repo/index.cgi/hg/file/tip/mercurial/demandimport.py#l1

It's nice and small, and it is invisible to the rest of the code, but
it's probably too aggressive for all users. The biggest problem is
probably that ImportErrors are deferred until first access, which
trips up modules that do things like

try:
  import foo
except ImportError
  import fallback as foo

of which there are a few. The mercurial module maintains a blacklist
as a bandaid, but it'd be great to have a real fix.

From guido at python.org  Sat Apr 11 04:11:44 2009
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Apr 2009 19:11:44 -0700
Subject: [Python-Dev] [Email-SIG] the email module, text,
	and bytes (was 	Re: Dropping bytes "support" in json)
In-Reply-To: <F87C1713-27A1-4D2B-BA42-1AC70B77073C@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<loom.20090409T043042-835@post.gmane.org> 
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<grkodk$j4p$1@ger.gmane.org> 
	<1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org>
	<20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com>
	<ACC56383-7F1B-4CB0-908F-E75E1390AE51@python.org>
	<92023.1239381344@parc.com> <87ocv4fhv0.fsf@xemacs.org>
	<F87C1713-27A1-4D2B-BA42-1AC70B77073C@python.org>
Message-ID: <ca471dc20904101911u3010c268s61fd72c66cb7f007@mail.gmail.com>

On Fri, Apr 10, 2009 at 12:04 PM, Barry Warsaw <barry at python.org> wrote:
> On Apr 10, 2009, at 3:06 PM, Stephen J. Turnbull wrote:
>
>> Bill Janssen writes:
>>>
>>> Barry Warsaw <barry at python.org> wrote:
>>>
>>>> In that case, we really need the
>>>> bytes-in-bytes-out-bytes-in-the-chewy-
>>>> center API first, and build things on top of that.
>>>
>>> Yep.
>>
>> Uh, I hate to rain on a parade, but isn't that how we arrived at the
>> *current* email package?
>
> Not really. ?We got here because <ahem>we</ahem> were too damn sloppy about
> the distinction.

Agreed. I take full responsibility -- the str/unicode approach we
introduced in 2.0 seemed like the best thing we could do at the time,
but in retrospect it would've been better if we'd left str alone and
introduced a unicode type that was truly distinct -- like str in 3.0.
The email package is not the only system that ended up with a muddled
distinction between the two as a result.

> I'm going to remove python-dev from subsequent follow ups. ?Please join us
> at email-sig for further discussion.
>
>Barry

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Apr 11 04:16:35 2009
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Apr 2009 19:16:35 -0700
Subject: [Python-Dev] Going off-line for a week
Message-ID: <ca471dc20904101916x66dc88a5jc0d4166d455c71bd@mail.gmail.com>

Folks, I'm going off-line for a week to enjoy a family vacation. When
I come back I'll probably just archive most email unread, so now's
your chance to add braces to the language. :-)

Not-yet-retiring-ly y'rs,

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Sat Apr 11 05:06:25 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Apr 2009 05:06:25 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <grocad$e1c$1@ger.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	<49DEBB21.70305@gmail.com>	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<grocad$e1c$1@ger.gmane.org>
Message-ID: <49E00931.6050107@v.loewis.de>

>> In email's case this is true, but in JSON's case it's not.  JSON is a
>> format defined as a sequence of code points; MIME is defined as a
>> sequence of octets.
> 
> What is the 'bytes support' issue for json?  Is it about content within
> a json text? Or about the transport format of a json text?

The question is whether the json parsing should take bytes or str as
input, and whether the json marshalling should produce bytes or str.
More specifically, the question is whether it is ok to drop bytes.

I personally think that it needs to support bytes, and that perhaps
str support is optional (as you could always explicitly encode the
str as UTF-8 before passing it to the JSON parser, if you somehow
managed to get a str of JSON to parse).

However, I really think that this question cannot be answered by
reading the RFC. It should be answered by verifying how people use
the json library in 2.x.

> The standard does not specify any correspondence between representations
> and domain objects

And that is not the issue at all; nobody is debating what output the
parsing should produce.

Regards,
Martin

From thiagoharry at riseup.net  Sat Apr 11 03:58:26 2009
From: thiagoharry at riseup.net (Harry (Thiago Leucz Astrizi))
Date: Fri, 10 Apr 2009 22:58:26 -0300 (BRT)
Subject: [Python-Dev] Needing help to change the grammar
Message-ID: <thiagoharry.1239415106.squirrel@tern.riseup.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello everybody. My name is Thiago and currently I'm working as a
teacher in a high school in Brazil. I have plans to offer in the
school a programming course to the students, but I had some problems
to find a good lang?age. As a Python programmer, I really like the
language's syntax and I think that Python is very good to teach
programming. But there's a little problem: the commands and keywords
are in english and this can be an obstacle to the teenagers that could
enter in the course.

Because of this, I decided to create a Python version with keywords in
portuguese and with some modifications in the grammar to be more
portuguese-like. To this, I'm using Python 3.0.1 source code.

I already read PEP 306 (How to Change Python's Grammar) and changed
the suggested files. My changes currently are working properly except
for one thing: the "comp_op". The code that in english Python is
written as "is not", in portuguese Python shall be "n?o ?". Besides
the translations to the words "is" and "not", I'm also changing the
order in which they appear letting "not" before "is".

It appears to be a simple change, but strangely, I'm not being able to
perform it. I already made correct modifications in Grammar/Grammar
file, the new keywords already appear in Lib/keyword.py and I also
changed the function validate_comp_op in Modules/parsermodule.c:

static int
validate_comp_op(node *tree)
{
(...)
    else if ((res = validate_numnodes(tree, 2, "comp_op")) != 0) {
        res = (validate_ntype(CHILD(tree, 0), NAME)
               && validate_ntype(CHILD(tree, 1), NAME)
               && (((strcmp(STR(CHILD(tree, 0)), "n?o") == 0)
                    && (strcmp(STR(CHILD(tree, 1)), "?") == 0))
                   || ((strcmp(STR(CHILD(tree, 0)), "n?o") == 0)
                       && (strcmp(STR(CHILD(tree, 1)), "em") == 0))));
        if (!res && !PyErr_Occurred())
            err_string("operador de compara??o desconhecido");
    }
    return (res);
}

I also looked in the other files proposed in the PEP but I didn't find
in them nothing that I recognized as needing changes.

But when I type "make" to compile the new language, the following
error appears in Lib/encodings/__init__.py (which I already
translated to the portuguese Python):

harry at skynet:~/Python-3.0.1$ make
Fatal Python error: Py_Initialize:
      can't initialize sys standard streams File
"/home/harry/Python-3.0.1/Lib/encodings/__init__.py", line 73
se entry n?o ? _unknown: ^ SyntaxError: invalid syntax

The comp_op doesn't work! I don't know more what to change. Perhaps
there's some file that I should modify, but I didn't paid attention
enough in it... Please, anybody has some idea of what should I do?
Thanks a lot.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFJ3/eTmNGEzq1zP84RAh5vAJ492eVFgbR5KCCJNdTJOIR/Xtfb0ACdE0NG
Yxnxmo9yjOL6H8J93nPBcJs=
=6VLu
-----END PGP SIGNATURE-----

From skippy.hammond at gmail.com  Sat Apr 11 06:36:00 2009
From: skippy.hammond at gmail.com (Mark Hammond)
Date: Sat, 11 Apr 2009 14:36:00 +1000
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49E00931.6050107@v.loewis.de>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	<49DEBB21.70305@gmail.com>	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>	<grocad$e1c$1@ger.gmane.org>
	<49E00931.6050107@v.loewis.de>
Message-ID: <49E01E30.8060302@gmail.com>

[Dropping email sig]

On 11/04/2009 1:06 PM, "Martin v. L?wis" wrote:

> However, I really think that this question cannot be answered by
> reading the RFC. It should be answered by verifying how people use
> the json library in 2.x.

In the absence of anything more formal, here are 2 anecdotes:

* The python-twitter package seems to:
   - Use dumps() mainly to get string objects.  It uses it both for 
__str__, and for an API called 'AsJsonString' - the intent of this seems 
to be to provide strings for the consumer of the twitter API - its not 
clear how such consumers would use them.  Note that this API doesn't 
seem to need to 'write' json objects, else I suspect they would then be 
expecting dumps to return bytes to put on the wire.  They expect loads 
to accept the bytes they are reading directly off the wire.

* couchdb's wrappers use these functions purely as bytes - they are 
either decoding an application/json object from the bits they read, or 
they are encoding it to use directly in the body of a request (or even 
directly in the URL of the request!)

I find myself conflicted.  On one hand I believe the most common use of 
json will be to exchange data with something inherently byte-based.  On 
the other hand though, json itself seems to be naturally "stringy" and 
the most natural interface for a casual user would be strings.

I'm personally leaning slightly towards strings, putting the burden on 
bytes-users of json to explicitly use the appropriate encoding, even in 
cases where it *must* be utf8.  On the other hand, I'm too lazy to dig 
back through this large thread, but I seem to recall a suggestion that 
using bytes would be significantly faster.  If that is true, I'd be 
happy to settle for bytes as I believe the most common *actual* use of 
json will be via things like the twitter and couch libraries - and may 
even be a key bottleneck for such libraries - so people will not be 
directly exposed to its interface...

Mark

Cheers,

Mark

From martin at v.loewis.de  Sat Apr 11 07:45:49 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Apr 2009 07:45:49 +0200
Subject: [Python-Dev] Needing help to change the grammar
In-Reply-To: <thiagoharry.1239415106.squirrel@tern.riseup.net>
References: <thiagoharry.1239415106.squirrel@tern.riseup.net>
Message-ID: <49E02E8D.5090005@v.loewis.de>

> It appears to be a simple change, but strangely, I'm not being able to
> perform it. I already made correct modifications in Grammar/Grammar
> file, the new keywords already appear in Lib/keyword.py and I also
> changed the function validate_comp_op in Modules/parsermodule.c:
> 
> static int
> validate_comp_op(node *tree)
> {
> (...)
>     else if ((res = validate_numnodes(tree, 2, "comp_op")) != 0) {
>         res = (validate_ntype(CHILD(tree, 0), NAME)
>                && validate_ntype(CHILD(tree, 1), NAME)
>                && (((strcmp(STR(CHILD(tree, 0)), "n?o") == 0)
>                     && (strcmp(STR(CHILD(tree, 1)), "?") == 0))
>                    || ((strcmp(STR(CHILD(tree, 0)), "n?o") == 0)
>                        && (strcmp(STR(CHILD(tree, 1)), "em") == 0))));
>         if (!res && !PyErr_Occurred())
>             err_string("operador de compara??o desconhecido");
>     }
>     return (res);
> }
> 

Notice that Python source is represented in UTF-8 in the parser.
It might be that the C source code has a different encoding, which
would cause the strcmp to fail.

Regards,
Martin

From martin at v.loewis.de  Sat Apr 11 07:49:50 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Apr 2009 07:49:50 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49E01E30.8060302@gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	<49DEBB21.70305@gmail.com>	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>	<grocad$e1c$1@ger.gmane.org>	<49E00931.6050107@v.loewis.de>
	<49E01E30.8060302@gmail.com>
Message-ID: <49E02F7E.6010605@v.loewis.de>

> I'm personally leaning slightly towards strings, putting the burden on
> bytes-users of json to explicitly use the appropriate encoding, even in
> cases where it *must* be utf8.  On the other hand, I'm too lazy to dig
> back through this large thread, but I seem to recall a suggestion that
> using bytes would be significantly faster. 

Not sure whether it would be *significantly* faster, but yes, Bob wrote
an accelerator for parsing out of a byte string to make it really fast;
IIRC, he claims that it is faster than pickling.

Regards,
Martin

From martin at v.loewis.de  Sat Apr 11 08:13:35 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Apr 2009 08:13:35 +0200
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <20090410203809.GA24530@idyll.org>
References: <20090410203809.GA24530@idyll.org>
Message-ID: <49E0350F.8040506@v.loewis.de>

> 2x "keyring package" -- see
> http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/.
> The poorer one of these will probably be axed unless Tarek gives it
> strong support.

I don't think these are good "core" projects. Even if the students come
up with a complete solution, it shouldn't be integrated with the
standard library right away. Instead, it should have a life outside the
standard library, and be considered for inclusion only if the user
community wants it.

I'm also skeptical that this is a good SoC project in the first place.
Coming up with a wrapper for, say, Apple Keychain, could be a good
project. Coming up with a unifying API for all keychains is out of
scope, IMO; various past attempts at unifying APIs have demonstrated
that creating them is difficult, and might require writing a PEP
(whose acceptance then might not happen within a summer).

Regards,
Martin

From jackdied at gmail.com  Sat Apr 11 08:20:24 2009
From: jackdied at gmail.com (Jack diederich)
Date: Sat, 11 Apr 2009 02:20:24 -0400
Subject: [Python-Dev] Needing help to change the grammar
In-Reply-To: <thiagoharry.1239415106.squirrel@tern.riseup.net>
References: <thiagoharry.1239415106.squirrel@tern.riseup.net>
Message-ID: <b8e622740904102320na23074fod4317de771d80254@mail.gmail.com>

On Fri, Apr 10, 2009 at 9:58 PM, Harry (Thiago Leucz Astrizi)
<thiagoharry at riseup.net> wrote:
>
> Hello everybody. My name is Thiago and currently I'm working as a
> teacher in a high school in Brazil. I have plans to offer in the
> school a programming course to the students, but I had some problems
> to find a good lang?age. As a Python programmer, I really like the
> language's syntax and I think that Python is very good to teach
> programming. But there's a little problem: the commands and keywords
> are in english and this can be an obstacle to the teenagers that could
> enter in the course.
>
> Because of this, I decided to create a Python version with keywords in
> portuguese and with some modifications in the grammar to be more
> portuguese-like. To this, I'm using Python 3.0.1 source code.

I love the idea (and most recently edited PEP 306) so here are a few
suggestions;

Brazil has many python programmers so you might be able to make quick
progress by asking them for volunteer time.

To bug-hunt your technical problem: try switching the "not is"
operator to include an underscore "not_is."  The python LL(1) grammar
checker works for python but isn't robust, and does miss some grammar
ambiguities.  Making the operator a single word might reveal a bug in
the parser.

Please consider switching your students to 'real' python part way
through the course.  If they want to use the vast amount of python
code on the internet as examples they will need to know the few
English keywords.

Also - most python core developers are not native English speakers and
do OK :)  PyCon speakers are about 25% non-native English speakers and
EuroPython speakers are about the reverse (my rough estimate - I'd
love to see some hard numbers).

Keep up the Good Work,

-Jack

From ncoghlan at gmail.com  Sat Apr 11 09:09:33 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 11 Apr 2009 17:09:33 +1000
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>	<49DF8956.5050501@g.nevcal.com>
	<71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org>
Message-ID: <49E0422D.10704@gmail.com>

Barry Warsaw wrote:
>> Of course, one could use message.header and message.bythdr and they'd
>> be the same length.
> 
> I was trying to figure out what  a 'thdr' was that we'd want to index
> 'by' it. :)

In the discussions about os.environ, the suggested approach was to just
tack a 'b' onto the end of the name to get the bytes version (i.e.
os.environb).

That aligns nicely with the b"" prefix for bytes literals, and isn't
much of a typing or reading burden when dealing with the bytes API
instead of the text one.

A similar naming scheme (i.e. msg.headers and msg.headersb) would
probably work for email as well.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Sat Apr 11 10:12:23 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 11 Apr 2009 08:12:23 +0000 (UTC)
Subject: [Python-Dev] Dropping bytes "support" in json
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	<49DEBB21.70305@gmail.com>	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>	<grocad$e1c$1@ger.gmane.org>	<49E00931.6050107@v.loewis.de>
	<49E01E30.8060302@gmail.com> <49E02F7E.6010605@v.loewis.de>
Message-ID: <loom.20090411T080102-471@post.gmane.org>

Martin v. L?wis <martin <at> v.loewis.de> writes:
> 
> Not sure whether it would be *significantly* faster, but yes, Bob wrote
> an accelerator for parsing out of a byte string to make it really fast;
> IIRC, he claims that it is faster than pickling.

Isn't premature optimization the root of all evil?

Besides, the fact that many values in a typical JSON object will be strings, and
must be encoded from/decoded to unicode objects in py3k, suggests that
accepting/outputting unicode as default is the laziest (i.e. the best) choice
performance-wise.

But you don't have to trust me: look at the quick numbers I've posted. The py3k
version (in the str-only incarnation I've proposed) is sometimes actually faster
than the trunk version:
http://mail.python.org/pipermail/python-dev/2009-April/088498.html

Regards

Antoine.

From stephen at xemacs.org  Sat Apr 11 10:35:01 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 11 Apr 2009 17:35:01 +0900
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49DFE991.8090605@canterbury.ac.nz>
References: <loom.20090408T110540-221@post.gmane.org>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<49DF05FC.9040208@gmail.com>
	<79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com>
	<49DFE991.8090605@canterbury.ac.nz>
Message-ID: <87myantwp6.fsf@xemacs.org>

Greg Ewing writes:

 > The reason you use a text format in the first place is that
 > you have some way of transmitting text, and you want to
 > send something that isn't text. In that situation, the
 > encoding is already determined by whatever means you're
 > using to send the text.

Determined, yes, but all too often in a nondeterministic way.  That's
precisely the problem that the spec is trying to avert.  People often
schlep "text" around as if that were well-defined, forcing receivers
to guess what is meant.  Having a spec isn't going to stop them, but
at least you can lash them with a wet noodle.

The specification of at least the abstract character repertoire and
coded character set also allows implementers like Python to proceed
confidently with their usual internal encoding.

From chris at simplistix.co.uk  Sat Apr 11 11:17:27 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 11 Apr 2009 10:17:27 +0100
Subject: [Python-Dev] How do I update http://www.python.org/dev/faq?
Message-ID: <49E06027.60409@simplistix.co.uk>

Hi All,

How do I update the faq on the website?

This section:

http://python.org/dev/faq/#how-to-test-a-patch

...could do with fleshing out from this discussion:

http://mail.python.org/pipermail/python-dev/2009-March/086771.html

...and the link to:

http://www.python.org/doc/lib/module-test.html

...still ends up at the 2.5.2 docs.

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From chris at simplistix.co.uk  Sat Apr 11 12:12:31 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 11 Apr 2009 11:12:31 +0100
Subject: [Python-Dev] Test failures on Python 2.7 (trunk)
Message-ID: <49E06D0F.2080905@simplistix.co.uk>

Hi All,

Got these when running from checkout on Mac OS:

Could not find '/Users/chris/py2k/Lib/test' in sys.path to remove it
...
test test_asynchat produced unexpected output:
**********************************************************************
error: uncaptured python exception, closing channel 
<test.test_asynchat.echo_client at 0x263a4b8> (<class 
'socket.error'>:[Errno 9] Bad file descriptor 
[/Users/chris/py2k/Lib/asyncore.py|readwrite|107] 
[/Users/chris/py2k/Lib/asyncore.py|handle_expt_event|441] 
[<string>|getsockopt|1] [/Users/chris/py2k/Lib/socket.py|_dummy|165])
...(lots of repeats of the above)
**********************************************************************
test_asyncore
test test_asyncore failed -- Traceback (most recent call last):
   File "/Users/chris/py2k/Lib/test/test_asyncore.py", line 144, in 
test_readwrite
     self.assertEqual(tobj.read, True)
AssertionError: False != True
...
test test_macostools failed -- Traceback (most recent call last):
   File "/Users/chris/py2k/Lib/test/test_macostools.py", line 90, in 
test_mkalias_relative
     macostools.mkalias(test_support.TESTFN, TESTFN2, sys.prefix)
   File "/Users/chris/py2k/Lib/plat-mac/macostools.py", line 40, in mkalias
     relativefsr = File.FSRef(relative)
Error: (-35, 'no such volume')

Should I expect these? If so, why?

cheers,

Chris

From chris at simplistix.co.uk  Sat Apr 11 12:14:32 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 11 Apr 2009 11:14:32 +0100
Subject: [Python-Dev] Test failure on Py3k branch
Message-ID: <49E06D88.6060507@simplistix.co.uk>

Hi All,

Also got the following failure from a py3k checkout:

test test_cmd_line failed -- Traceback (most recent call last):
   File "/Users/chris/py3k/Lib/test/test_cmd_line.py", line 143, in 
test_run_code
     0)
AssertionError: 1 != 0

Should I expect this or does someone owe beer? ;-)

Chris

From mario.danic at gmail.com  Sat Apr 11 12:21:18 2009
From: mario.danic at gmail.com (Mario)
Date: Sat, 11 Apr 2009 12:21:18 +0200
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <b8e622740904101653g4e44e4bmffb5f664ba493756@mail.gmail.com>
References: <20090410203809.GA24530@idyll.org>
	<b8e622740904101653g4e44e4bmffb5f664ba493756@mail.gmail.com>
Message-ID: <79957db20904110321n58e50f3o4d8ede6ffc97070c@mail.gmail.com>

>
>
> He says vague things about patches too, but I'm not sure what.  If he
> wanted to make that into a 'patchbot' that just applied every patch in
> isolation and ran 'make && make test' and posted results in the
> tracker I'd be a happy camper.
>
>
Jack, how about you write that idea down on the wiki page mentioned in the
proposal, along with the use case? Following that, I'll see if I can do
anything about it to make it a reality.

Cheers,
M.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090411/1024d452/attachment.htm>

From dickinsm at gmail.com  Sat Apr 11 12:39:08 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sat, 11 Apr 2009 11:39:08 +0100
Subject: [Python-Dev] Test failure on Py3k branch
In-Reply-To: <49E06D88.6060507@simplistix.co.uk>
References: <49E06D88.6060507@simplistix.co.uk>
Message-ID: <5c6f2a5d0904110339l1f614e0hfaeb0f253c8eede@mail.gmail.com>

On Sat, Apr 11, 2009 at 11:14 AM, Chris Withers <chris at simplistix.co.uk> wrote:
> Also got the following failure from a py3k checkout:
>
> test test_cmd_line failed -- Traceback (most recent call last):
> ?File "/Users/chris/py3k/Lib/test/test_cmd_line.py", line 143, in
> test_run_code
> ? ?0)
> AssertionError: 1 != 0

Are you on OS X?  This looks like

http://bugs.python.org/issue4388

Mark

From chris at simplistix.co.uk  Sat Apr 11 12:41:19 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 11 Apr 2009 11:41:19 +0100
Subject: [Python-Dev] Test failure on Py3k branch
In-Reply-To: <5c6f2a5d0904110339l1f614e0hfaeb0f253c8eede@mail.gmail.com>
References: <49E06D88.6060507@simplistix.co.uk>
	<5c6f2a5d0904110339l1f614e0hfaeb0f253c8eede@mail.gmail.com>
Message-ID: <49E073CF.5080702@simplistix.co.uk>

Mark Dickinson wrote:
> On Sat, Apr 11, 2009 at 11:14 AM, Chris Withers <chris at simplistix.co.uk> wrote:
>> Also got the following failure from a py3k checkout:
>>
>> test test_cmd_line failed -- Traceback (most recent call last):
>>  File "/Users/chris/py3k/Lib/test/test_cmd_line.py", line 143, in
>> test_run_code
>>    0)
>> AssertionError: 1 != 0
> 
> Are you on OS X?  This looks like
> 
> http://bugs.python.org/issue4388

Yup, that looks like it.

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From chris at simplistix.co.uk  Sat Apr 11 12:41:59 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 11 Apr 2009 11:41:59 +0100
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <gr5p9b$sba$1@ger.gmane.org>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>	<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>	<49D35A39.7020507@simplistix.co.uk>	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>	<49D52B2C.5050509@simplistix.co.uk>	<ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com>	<49D52C5B.7010506@simplistix.co.uk>	<ca471dc20904021449x659f930bn7b61feec6b640752@mail.gmail.com>	<49D63465.80401@simplistix.co.uk>
	<gr5p9b$sba$1@ger.gmane.org>
Message-ID: <49E073F7.9060309@simplistix.co.uk>

Steve Holden wrote:
>> Anything using an exec 
> 
> that can be done in some other (more pythonic way)

There's *always* another way ;-)

>> is broken by definition ;-)
>>
>> Benjamin?
>>
> We've just had a fairly clear demonstration that small semantic changes
> to the language can leave unexpected areas borked.

Oh? I don't follow...

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From ncoghlan at gmail.com  Sat Apr 11 13:10:40 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 11 Apr 2009 21:10:40 +1000
Subject: [Python-Dev] Test failures on Python 2.7 (trunk)
In-Reply-To: <49E06D0F.2080905@simplistix.co.uk>
References: <49E06D0F.2080905@simplistix.co.uk>
Message-ID: <49E07AB0.6020301@gmail.com>

Chris Withers wrote:
> Hi All,
> 
> Got these when running from checkout on Mac OS:
> 
> Could not find '/Users/chris/py2k/Lib/test' in sys.path to remove it
> ...
> test test_asynchat produced unexpected output:
> **********************************************************************
> error: uncaptured python exception, closing channel
> <test.test_asynchat.echo_client at 0x263a4b8> (<class
> 'socket.error'>:[Errno 9] Bad file descriptor
> [/Users/chris/py2k/Lib/asyncore.py|readwrite|107]
> [/Users/chris/py2k/Lib/asyncore.py|handle_expt_event|441]
> [<string>|getsockopt|1] [/Users/chris/py2k/Lib/socket.py|_dummy|165])
> ...(lots of repeats of the above)
> **********************************************************************
> test_asyncore
> test test_asyncore failed -- Traceback (most recent call last):
>   File "/Users/chris/py2k/Lib/test/test_asyncore.py", line 144, in
> test_readwrite
>     self.assertEqual(tobj.read, True)
> AssertionError: False != True

I'm getting the asyncore failure on Linux as well (no unexpected output
though - just the final exception).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From chris at simplistix.co.uk  Sat Apr 11 13:23:18 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 11 Apr 2009 12:23:18 +0100
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <49D9FD15.9030406@simplistix.co.uk>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>		<ca471dc20903312025sea732faucaeb3b5ad1b2eec2@mail.gmail.com>		<49D35A39.7020507@simplistix.co.uk>		<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>		<49D52B2C.5050509@simplistix.co.uk>		<ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com>		<49D52C5B.7010506@simplistix.co.uk>		<ca471dc20904021449x659f930bn7b61feec6b640752@mail.gmail.com>		<49D63465.80401@simplistix.co.uk>	<1afaf6160904031427p7fa95d07q340fd54cb7c34963@mail.gmail.com>
	<49D9FD15.9030406@simplistix.co.uk>
Message-ID: <49E07DA6.2010100@simplistix.co.uk>

Chris Withers wrote:
> Benjamin Peterson wrote:
>>>>> Assuming it breaks no tests, would there be objection to me committing
>>>>> the
>>>>> above change to the Python 3 trunk?
>>>> That's up to Benjamin. Personally, I live by "if it ain't broke, don't
>>>> fix it." :-)
>>> Anything using an exec is broken by definition ;-)
>>
>> "practicality beats purity"
>>
>>> Benjamin?
>>
>> +0
> 
> OK, well, I'll use it as my first "test commit" when I get a chance :-)

Actually, this was gone on the py3k branch already.

I've committed the fix to trunk, is there anything else I need to do?

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From dickinsm at gmail.com  Sat Apr 11 14:20:28 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sat, 11 Apr 2009 13:20:28 +0100
Subject: [Python-Dev] Python 2.6.2 final
In-Reply-To: <776F906E-418C-4A2C-8C6C-2B0036B49AFA@python.org>
References: <776F906E-418C-4A2C-8C6C-2B0036B49AFA@python.org>
Message-ID: <5c6f2a5d0904110520o2ea97af9t4cd18a168db795d5@mail.gmail.com>

On Fri, Apr 10, 2009 at 2:31 PM, Barry Warsaw <barry at python.org> wrote:
> bugs.python.org is apparently down right now, but I set issue 5724 to
> release blocker for 2.6.2. ?This is waiting for input from Mark Dickinson,
> and it relates to test_cmath failing on Solaris 10.

I'd prefer to leave this alone for 2.6.2.  There's a fix posted to the issue
tracker, but it's not entirely trivial and I think the risk of accidental
breakage outweighs the niceness of seeing 'all tests passed' on
Solaris.

Mark

From chris at simplistix.co.uk  Sat Apr 11 14:33:33 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 11 Apr 2009 13:33:33 +0100
Subject: [Python-Dev] email header encoding
In-Reply-To: <87myaofh5q.fsf@xemacs.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<1239382031.8682.11.camel@haku>
	<87myaofh5q.fsf@xemacs.org>
Message-ID: <49E08E1D.6070207@simplistix.co.uk>

Stephen J. Turnbull wrote:
> Robert Brewer writes:
> 
>  > Syntactically, there's no sense in providing:
>  > 
>  >     Message.set_header('Subject', 'Some text', encoding='utf-16')
>  > 
>  > ...since you could more clearly write the same as:
>  > 
>  >     Message.set_header('Subject', 'Some text'.encode('utf-16'))
> 
> Which you now must *parse* and guess the encoding to determine how to
> RFC-2047-encode the binary mush.  I think the encoding parameter is
> necessary here.

Indeed.

>  > But it would be far easier to do all the encoding at once in an
>  > output() or serialize() method. Do different headers need different
>  > encodings?
> 
> You can have multiple encodings within a single header (and a na?ve

"can" and "should" are two very different things.
When is it even a good idea to have more than one encoding in a single 
header?

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From chris at simplistix.co.uk  Sat Apr 11 14:39:40 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 11 Apr 2009 13:39:40 +0100
Subject: [Python-Dev] headers api for email package
In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
Message-ID: <49E08F8C.5030205@simplistix.co.uk>

Barry Warsaw wrote:
>  >>> message['Subject']
> 
> The raw bytes or the decoded unicode?

A header object.

> Okay, so you've picked one.  Now how do you spell the other way?

str(message['Subject'])
bytes(message['Subject'])

> Now, setting headers.  Sometimes you have some unicode thing and 
> sometimes you have some bytes.  You need to end up with bytes in the 
> ASCII range and you'd like to leave the header value unencoded if so.  
> But in both cases, you might have bytes or characters outside that 
> range, so you need an explicit encoding, defaulting to utf-8 probably.
> 
>  >>> Message.set_header('Subject', 'Some text', encoding='utf-8')
>  >>> Message.set_header('Subject', b'Some bytes')

Where you just want "a damned valid email and stop making my life hard!":

Message['Subject']='Some text'

Where you care about what encoding is used:

Message['Subject']=Header('Some text',encoding='utf-8')

If you have bytes, for whatever reason:

Message['Subject']=b'some bytes'.decode('utf-8')

...because only you know what encoding those bytes use!

> One of those maps to
> 
>  >>> message['Subject'] = ???

...should only accept text or a Header object.

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From chris at simplistix.co.uk  Sat Apr 11 14:41:46 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 11 Apr 2009 13:41:46 +0100
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <49E0422D.10704@gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>	<49DF8956.5050501@g.nevcal.com>	<71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org>
	<49E0422D.10704@gmail.com>
Message-ID: <49E0900A.3000302@simplistix.co.uk>

Nick Coghlan wrote:
> Barry Warsaw wrote:
>>> Of course, one could use message.header and message.bythdr and they'd
>>> be the same length.
>> I was trying to figure out what  a 'thdr' was that we'd want to index
>> 'by' it. :)
> 
> In the discussions about os.environ, the suggested approach was to just
> tack a 'b' onto the end of the name to get the bytes version (i.e.
> os.environb).
> 
> That aligns nicely with the b"" prefix for bytes literals, and isn't
> much of a typing or reading burden when dealing with the bytes API
> instead of the text one.
> 
> A similar naming scheme (i.e. msg.headers and msg.headersb) would
> probably work for email as well.

That just feels nasty though :-(

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From chris at simplistix.co.uk  Sat Apr 11 14:46:18 2009
From: chris at simplistix.co.uk (Chris Withers)
Date: Sat, 11 Apr 2009 13:46:18 +0100
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
Message-ID: <49E0911A.9040809@simplistix.co.uk>

glyph at divmod.com wrote:
> 
> My preference would be that
> 
>    message.headers['Subject'] = b'Some Bytes'
> 
> would simply raise an exception.  If you've got some bytes, you should 
> instead do
> 
>    message.bytes_headers['Subject'] = b'Some Bytes'

Remind me again why you need to differentiate between headers and 
bytes_headers?

I think bytes headers are evil. If you don't know the encoding when you 
have one, who does or ever will?

>    message.headers['Subject'] = Header(bytes=b'Some Bytes', 
> encoding='utf-8')
> 
> Explicit is better than implicit, right?

Indeed, and the case for the above would be to keep indempotence of 
incoming messages in applications like mailman...

...otherwise we could just decode them and be done with it.

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk

From aahz at pythoncraft.com  Sat Apr 11 15:01:04 2009
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 11 Apr 2009 06:01:04 -0700
Subject: [Python-Dev] How do I update http://www.python.org/dev/faq?
In-Reply-To: <49E06027.60409@simplistix.co.uk>
References: <49E06027.60409@simplistix.co.uk>
Message-ID: <20090411130104.GB15750@panix.com>

On Sat, Apr 11, 2009, Chris Withers wrote:
>
> How do I update the faq on the website?

Brett Cannon has been the primary maintainer, but he's offline for a
while; are you interested in picking up the task?  If yes, please
subscribe to pydotorg at python.org and then send in your SSH key to request
commit access to the website.

Otherwise, please send your suggested updates to pydotorg.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?

From rdmurray at bitdance.com  Sat Apr 11 15:14:30 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Sat, 11 Apr 2009 09:14:30 -0400 (EDT)
Subject: [Python-Dev] Test failures on Python 2.7 (trunk)
In-Reply-To: <49E07AB0.6020301@gmail.com>
References: <49E06D0F.2080905@simplistix.co.uk> <49E07AB0.6020301@gmail.com>
Message-ID: <Pine.LNX.4.64.0904110907150.26362@kimball.webabinitio.net>

On Sat, 11 Apr 2009 at 21:10, Nick Coghlan wrote:
> Chris Withers wrote:
>> Hi All,
>>
>> Got these when running from checkout on Mac OS:
>>
>> Could not find '/Users/chris/py2k/Lib/test' in sys.path to remove it
>> ...
>> test test_asynchat produced unexpected output:
>> **********************************************************************
>> error: uncaptured python exception, closing channel
>> <test.test_asynchat.echo_client at 0x263a4b8> (<class
>> 'socket.error'>:[Errno 9] Bad file descriptor
>> [/Users/chris/py2k/Lib/asyncore.py|readwrite|107]
>> [/Users/chris/py2k/Lib/asyncore.py|handle_expt_event|441]
>> [<string>|getsockopt|1] [/Users/chris/py2k/Lib/socket.py|_dummy|165])
>> ...(lots of repeats of the above)
>> **********************************************************************
>> test_asyncore
>> test test_asyncore failed -- Traceback (most recent call last):
>>   File "/Users/chris/py2k/Lib/test/test_asyncore.py", line 144, in
>> test_readwrite
>>     self.assertEqual(tobj.read, True)
>> AssertionError: False != True
>
> I'm getting the asyncore failure on Linux as well (no unexpected output
> though - just the final exception).

Ditto.  I looked at that asyncore traceback yesterday.  The way that
the flags argument to the readwrite call are propagated to the object
was changed, but the tests were not updated to match.  I haven't yet
gotten as far as figuring out why the changes were made, but svn blames
josiah.carlson for the changes (or at least the most recent ones).

--David

From benjamin at python.org  Sat Apr 11 15:21:23 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sat, 11 Apr 2009 08:21:23 -0500
Subject: [Python-Dev] issue5578 - explanation
In-Reply-To: <49E07DA6.2010100@simplistix.co.uk>
References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com>
	<Pine.LNX.4.64.0904011058000.26362@kimball.webabinitio.net>
	<49D52B2C.5050509@simplistix.co.uk>
	<ca471dc20904021418y617f1bfdja3173c41a1451423@mail.gmail.com>
	<49D52C5B.7010506@simplistix.co.uk>
	<ca471dc20904021449x659f930bn7b61feec6b640752@mail.gmail.com>
	<49D63465.80401@simplistix.co.uk>
	<1afaf6160904031427p7fa95d07q340fd54cb7c34963@mail.gmail.com>
	<49D9FD15.9030406@simplistix.co.uk>
	<49E07DA6.2010100@simplistix.co.uk>
Message-ID: <1afaf6160904110621g5d3e05bap63747462bad9f92b@mail.gmail.com>

2009/4/11 Chris Withers <chris at simplistix.co.uk>:
> Actually, this was gone on the py3k branch already.
>
> I've committed the fix to trunk, is there anything else I need to do?

Since it's not in py3k, I think not.

-- 
Regards,
Benjamin

From stephen at xemacs.org  Sat Apr 11 16:19:32 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 11 Apr 2009 23:19:32 +0900
Subject: [Python-Dev] email header encoding
In-Reply-To: <49E08E1D.6070207@simplistix.co.uk>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<1239382031.8682.11.camel@haku> <87myaofh5q.fsf@xemacs.org>
	<49E08E1D.6070207@simplistix.co.uk>
Message-ID: <87vdpbjmrv.fsf@xemacs.org>

Chris Withers writes:

 > When is it even a good idea to have more than one encoding in a single 
 > header?

I'd be happy to discuss that on email-sig, but it's really OT for
Python-Dev at this point.

From g.brandl at gmx.net  Sat Apr 11 20:12:34 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 11 Apr 2009 20:12:34 +0200
Subject: [Python-Dev] PyCFunction_* Missing
In-Reply-To: <7c1ab96d0904080504o3b58b1bdvedd31ac872239921@mail.gmail.com>
References: <7c1ab96d0904080504o3b58b1bdvedd31ac872239921@mail.gmail.com>
Message-ID: <grqmiq$gq4$1@ger.gmane.org>

Campbell Barton schrieb:
> Hi, Just noticed the new Python 2.6.2 docs now dont have any reference to
> * PyCFunction_New
> * PyCFunction_NewEx
> * PyCFunction_Check
> * PyCFunction_Call
> 
> Ofcourse these are still in the source code but Im wondering if this
> is intentional that these functions should be for internal use only?

I don't think so. PyCFunctions are mentioned in the C API reference, so
it seems that these functions simply fall into the regrettably quite large
category of public API functions that aren't documented yet.

Please open a tracker item and assign it to me, so that I don't forget this.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From ctb at msu.edu  Sat Apr 11 20:16:33 2009
From: ctb at msu.edu (C. Titus Brown)
Date: Sat, 11 Apr 2009 11:16:33 -0700
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <79957db20904110321n58e50f3o4d8ede6ffc97070c@mail.gmail.com>
References: <20090410203809.GA24530@idyll.org>
	<b8e622740904101653g4e44e4bmffb5f664ba493756@mail.gmail.com>
	<79957db20904110321n58e50f3o4d8ede6ffc97070c@mail.gmail.com>
Message-ID: <20090411181633.GG7768@idyll.org>

On Sat, Apr 11, 2009 at 12:21:18PM +0200, Mario wrote:
-> > He says vague things about patches too, but I'm not sure what.  If he
-> > wanted to make that into a 'patchbot' that just applied every patch in
-> > isolation and ran 'make && make test' and posted results in the
-> > tracker I'd be a happy camper.
-> >
-> Jack, how about you write that idea down on the wiki page mentioned in the
-> proposal, along with the use case? Following that, I'll see if I can do
-> anything about it to make it a reality.

We had a GSoC student two years back who worked on something like this;
his name is Michal Kwiatkowski.  He probably has the code working
somewhere.

It's a nontrivial problem if you want to do it properly with VMs etc.

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu

From ctb at msu.edu  Sat Apr 11 20:21:14 2009
From: ctb at msu.edu (C. Titus Brown)
Date: Sat, 11 Apr 2009 11:21:14 -0700
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <49E0350F.8040506@v.loewis.de>
References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de>
Message-ID: <20090411182114.GH7768@idyll.org>

On Sat, Apr 11, 2009 at 08:13:35AM +0200, "Martin v. L?wis" wrote:
-> > 2x "keyring package" -- see
-> > http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/.
-> > The poorer one of these will probably be axed unless Tarek gives it
-> > strong support.
-> 
-> I don't think these are good "core" projects. Even if the students come
-> up with a complete solution, it shouldn't be integrated with the
-> standard library right away. Instead, it should have a life outside the
-> standard library, and be considered for inclusion only if the user
-> community wants it.

Tarek has said he can put it into distutils on a trial basis, although
I'm sure that'll depend on what the student comes up with.

I'm using "core projects" as a shorthand for projects that directly
address the core development environment, the stdlib, and priorities of
committers on python-dev.  Tarek is a committer, and it sounded like
you, Jim, and Georg were all interested in this project, too -- that
pushes it well into "core" territory IMO.

-> I'm also skeptical that this is a good SoC project in the first place.
-> Coming up with a wrapper for, say, Apple Keychain, could be a good
-> project. Coming up with a unifying API for all keychains is out of
-> scope, IMO; various past attempts at unifying APIs have demonstrated
-> that creating them is difficult, and might require writing a PEP
-> (whose acceptance then might not happen within a summer).

Well, that's a more unassailable argument and one I agree with ;).

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu

From ziade.tarek at gmail.com  Sat Apr 11 20:41:09 2009
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Sat, 11 Apr 2009 20:41:09 +0200
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <20090411182114.GH7768@idyll.org>
References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de>
	<20090411182114.GH7768@idyll.org>
Message-ID: <94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com>

> -> I'm also skeptical that this is a good SoC project in the first place.

What is a good SoC project from your point of view ?

> -> Coming up with a wrapper for, say, Apple Keychain, could be a good
> -> project. Coming up with a unifying API for all keychains is out of
> -> scope, IMO; various past attempts at unifying APIs have demonstrated
> -> that creating them is difficult, and might require writing a PEP
> -> (whose acceptance then might not happen within a summer).
>
> Well, that's a more unassailable argument and one I agree with ;).

For this case, the student work is not a "dumb" work consisting of writing code
on an already-thaught PEP...

Part of the work will consist of working on a PEP-like document, and on
building APIs for various keychains and see if we can have an unified one.
I doubt the PEP-like document can be written before writing prototypes APIs
for various keychains has been done.

At the end of the summer, if we come up with a nice unified API, I'd
like to include
it to Distutils for the "register" command, and maybe write a PEP to have it
as part of the standard library because it makes sense to have this kind
of feature imho.

Tarek

From ziade.tarek at gmail.com  Sat Apr 11 21:13:10 2009
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Sat, 11 Apr 2009 21:13:10 +0200
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com>
References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de>
	<20090411182114.GH7768@idyll.org>
	<94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com>
Message-ID: <94bdd2610904111213q141093a6td3368cd8d370b317@mail.gmail.com>

Ok what about this then: I am changing the scope a little bit, and I
think the students will be fine with this change
since it's the same work.

"The project will consist of creating a plugin system into Distutils
to be able to store and retrieve the username/password
used by some commands, without having to store it in *clear text* in
the .pypirc file anymore.

The student will also provide some plugins for a maximum number of
existing keyring systems.
Some of these plugins might be included in Distutils, and some of them
in a third-party package.
"

Regards
Tarek

From martin at v.loewis.de  Sun Apr 12 00:19:04 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Apr 2009 00:19:04 +0200
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <20090411182114.GH7768@idyll.org>
References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de>
	<20090411182114.GH7768@idyll.org>
Message-ID: <49E11758.7070804@v.loewis.de>

> I'm using "core projects" as a shorthand for projects that directly
> address the core development environment, the stdlib, and priorities of
> committers on python-dev.  Tarek is a committer, and it sounded like
> you, Jim, and Georg were all interested in this project, too -- that
> pushes it well into "core" territory IMO.

I understand why Tarek wants it, and I can sympathise with that: to
protect PyPI passwords better (they are currently stored on disk in
plain).

Putting it into distutils might not make it "official API", but then,
I think it ought to be official API, since PyPI would be just one
(minor) application of it; Python also features a netrc module (which
probably nobody uses).

So I think it would be good to have a discussion upfront whether this
should be added to the library after the summer is over (assuming it
actually works by then). Decision to accept it or not as a SoC project
is independent, but if accepted, the student should well understand
the outcome of this discussion.

Regards,
Martin

From martin at v.loewis.de  Sun Apr 12 00:36:39 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Apr 2009 00:36:39 +0200
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com>
References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de>	
	<20090411182114.GH7768@idyll.org>
	<94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com>
Message-ID: <49E11B77.5040408@v.loewis.de>

Tarek Ziad? wrote:
>> -> I'm also skeptical that this is a good SoC project in the first place.
> 
> What is a good SoC project from your point of view ?

As a core project - tricky. Implement some long-standing complex feature
request, or fix a pile of outstanding bug reports for a module (like
the IDLE proposal). I liked the outcome of last year's "memory
profiling" project: the student added sys.getsizeof (with much of
mentoring on my side), and created a profiling library and application
that wasn't added to the core. The latter part is a biased outcome
(as I originally hoped to get something that becomes part of the
standard library - but gave up on this quickly as way too much design
went into that library); the useful core contribution (getsizeof) took
considerable amount of learning, and still had a few tricky design
issues to resolve.

In short, there must be a realistic chance that the code gets actually
used. Chances for a from-scratch library to be used are nearly zero, so
from-scratch libraries are not good projects.

In case you wonder why I give it nearly zero chance: I keep telling
long-term contributors that libraries have to be field-tested before
being considered for inclusion, and sometimes, even field-testing is
not enough (think setuptools). If SoC students get to short-cut the
process, that would send a wrong message to contributors and users.

> Part of the work will consist of working on a PEP-like document, and on
> building APIs for various keychains and see if we can have an unified one.
> I doubt the PEP-like document can be written before writing prototypes APIs
> for various keychains has been done.

That's certainly true. That's why I think it is a much larger project:
- write different wrappers
- come up with a unifying API
- field-test it for actual applications
- write a PEP

This could easily take a few years to get right (unless the actual
authors of the various keychain implementations get together, define
a common C API, which then a Python module just needs to wrap).

> At the end of the summer, if we come up with a nice unified API, I'd
> like to include
> it to Distutils for the "register" command, and maybe write a PEP to have it
> as part of the standard library because it makes sense to have this kind
> of feature imho.

I completely agree that this is a useful functionality to have, and I
also agree it *eventually* belongs into the standard library.

I just don't like the idea of bypassing the proper process by making
it part of distutils. This model (I need it, so I add it) made both
distutils and setuptools so unmaintainable.

Regards,
Martin

From martin at v.loewis.de  Sun Apr 12 00:38:51 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Apr 2009 00:38:51 +0200
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <94bdd2610904111213q141093a6td3368cd8d370b317@mail.gmail.com>
References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de>	
	<20090411182114.GH7768@idyll.org>	
	<94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com>
	<94bdd2610904111213q141093a6td3368cd8d370b317@mail.gmail.com>
Message-ID: <49E11BFB.3070102@v.loewis.de>

> The student will also provide some plugins for a maximum number of
> existing keyring systems.
> Some of these plugins might be included in Distutils, and some of them
> in a third-party package.

This is slightly better, but see my previous message (that is feature
creep in distutils, and likely, people will start using the distutils
implementation as if it were official API). Also, if you want it
pluggable, you likely come up with *another* ad-hoc plugin system.

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Sun Apr 12 01:49:00 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 12 Apr 2009 11:49:00 +1200
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <49E0900A.3000302@simplistix.co.uk>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
	<49DF8956.5050501@g.nevcal.com>
	<71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org>
	<49E0422D.10704@gmail.com> <49E0900A.3000302@simplistix.co.uk>
Message-ID: <49E12C6C.4020607@canterbury.ac.nz>

Chris Withers wrote:
> Nick Coghlan wrote:
> 
>> A similar naming scheme (i.e. msg.headers and msg.headersb) would
>> probably work for email as well.
> 
> That just feels nasty though :-(

It does tend to look like a typo to me. Inserting an
underscore (headers_b) would make it look less
accidental.

-- 
Greg

From brian.curtin at gmail.com  Sun Apr 12 02:12:37 2009
From: brian.curtin at gmail.com (curtin@acm.org)
Date: Sat, 11 Apr 2009 19:12:37 -0500
Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json
In-Reply-To: <49E0900A.3000302@simplistix.co.uk>
References: <loom.20090408T110540-221@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
	<49DF8956.5050501@g.nevcal.com>
	<71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org>
	<49E0422D.10704@gmail.com> <49E0900A.3000302@simplistix.co.uk>
Message-ID: <cf9f31f20904111712o55a0e770vc44bbd28e7880f8d@mail.gmail.com>

FWIW, that is also the way things are done in the pickle/cPickle module.
dump/dumps and load/loads to differentiate between the file object and
string ways of using that functionality.

On Sat, Apr 11, 2009 at 7:41 AM, Chris Withers <chris at simplistix.co.uk>wrote:

> Nick Coghlan wrote:
>
>> Barry Warsaw wrote:
>>
>>> Of course, one could use message.header and message.bythdr and they'd
>>>> be the same length.
>>>>
>>> I was trying to figure out what  a 'thdr' was that we'd want to index
>>> 'by' it. :)
>>>
>>
>> In the discussions about os.environ, the suggested approach was to just
>> tack a 'b' onto the end of the name to get the bytes version (i.e.
>> os.environb).
>>
>> That aligns nicely with the b"" prefix for bytes literals, and isn't
>> much of a typing or reading burden when dealing with the bytes API
>> instead of the text one.
>>
>> A similar naming scheme (i.e. msg.headers and msg.headersb) would
>> probably work for email as well.
>>
>
> That just feels nasty though :-(
>
> Chris
>
> --
> Simplistix - Content Management, Zope & Python Consulting
>           - http://www.simplistix.co.uk
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brian.curtin%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090411/0a5c9b95/attachment.htm>

From skippy.hammond at gmail.com  Sun Apr 12 04:29:24 2009
From: skippy.hammond at gmail.com (Mark Hammond)
Date: Sun, 12 Apr 2009 12:29:24 +1000
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <loom.20090411T080102-471@post.gmane.org>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410025203.GA199@panix.com>	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	<49DEBB21.70305@gmail.com>	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>	<grocad$e1c$1@ger.gmane.org>	<49E00931.6050107@v.loewis.de>	<49E01E30.8060302@gmail.com>
	<49E02F7E.6010605@v.loewis.de>
	<loom.20090411T080102-471@post.gmane.org>
Message-ID: <49E15204.3000401@gmail.com>

On 11/04/2009 6:12 PM, Antoine Pitrou wrote:
> Martin v. L?wis<martin<at>  v.loewis.de>  writes:
>> Not sure whether it would be *significantly* faster, but yes, Bob wrote
>> an accelerator for parsing out of a byte string to make it really fast;
>> IIRC, he claims that it is faster than pickling.
>
> Isn't premature optimization the root of all evil?
>
> Besides, the fact that many values in a typical JSON object will be strings, and
> must be encoded from/decoded to unicode objects in py3k, suggests that
> accepting/outputting unicode as default is the laziest (i.e. the best) choice
> performance-wise.

I don't see it as premature optimization, but rather trying to ensure 
the interface/api best suits the actual use cases.

> But you don't have to trust me: look at the quick numbers I've posted. The py3k
> version (in the str-only incarnation I've proposed) is sometimes actually faster
> than the trunk version:
> http://mail.python.org/pipermail/python-dev/2009-April/088498.html

But if all *actual* use-cases involve moving to and from utf8 encoded 
bytes, I'm not sure that little example is particularly useful.  In 
those use-cases, I'd be surprised if there wasn't significant time and 
space benefits in not asking apps to use an 'intermediate' string object 
before getting the bytes they need, particularly when the payload may be 
a significant size.

Assuming the above is all true, I'd see choosing bytes less as a 
premature optimization and more a design choice which best supports 
actual use.  So to my mind the only real question is whether the above 
*is* true, or if there are common use-cases which don't involve 
utf8-off/on-the-wire...

Cheers,

Mark

From ron.duplain at gmail.com  Sun Apr 12 04:58:07 2009
From: ron.duplain at gmail.com (Ron DuPlain)
Date: Sat, 11 Apr 2009 22:58:07 -0400
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <20090410203809.GA24530@idyll.org>
References: <20090410203809.GA24530@idyll.org>
Message-ID: <2b485bad0904111958o7008ae4u582604437afa9b6d@mail.gmail.com>

On Fri, Apr 10, 2009 at 4:38 PM, C. Titus Brown <ctb at msu.edu> wrote:
> Unquestionably "core" by my criteria above:
>
> 3to2 tool -- 'nuff said.

I worked on the 3to2 tool during the sprint last week at PyCon.  I can
chip in for GSoC in the event it does get picked up.

-Ron

PS - I'm out of town next week for a family vacation, returning online
the week of 20 Apr.

From mrts.pydev at gmail.com  Sun Apr 12 12:40:12 2009
From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=)
Date: Sun, 12 Apr 2009 13:40:12 +0300
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<49CD2930.4080307@cornell.edu> <gqjnti$qes$1@ger.gmane.org>
	<91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com>
	<ca471dc20903271926y61f16740h8c3f29e4a1e4c376@mail.gmail.com>
	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>
	<49D09ECF.5090407@trueblade.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
Message-ID: <ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>

The general consensus in python-ideas is that the following is needed, so I
bring it to python-dev to final discussions before I file a feature request
in bugs.python.org.

Proposal: add add_query_params() for appending query parameters to an URL to
urllib.parse and urlparse.

Implementation:
http://github.com/mrts/qparams/blob/83d1ec287ec10934b5e637455819cf796b1b421c/qparams.py(feel
free to fork and comment).

Behaviour (longish, guided by "simple things are simiple, complex things
possible"):

In the simplest form, parameters can be passed via keyword arguments:

    >>> add_query_params('foo', bar='baz')
    'foo?bar=baz'

    >>> add_query_params('http://example.com/a/b/c?a=b', b='d')
    'http://example.com/a/b/c?a=b&b=d'

Note that '/', if given in arguments, is encoded:

    >>> add_query_params('http://example.com/a/b/c?a=b', b='d', foo='/bar')
    'http://example.com/a/b/c?a=b&b=d&foo=%2Fbar'

Duplicates are discarded:

    >>> add_query_params('http://example.com/a/b/c?a=b', a='b')
    'http://example.com/a/b/c?a=b'

    >>> add_query_params('http://example.com/a/b/c?a=b&c=q', a='b', b='d',
    ...  c='q')
    'http://example.com/a/b/c?a=b&c=q&b=d'

But different values for the same key are supported:

    >>> add_query_params('http://example.com/a/b/c?a=b', a='c', b='d')
    'http://example.com/a/b/c?a=b&a=c&b=d'

Pass different values for a single key in a list (again, duplicates are
removed):

    >>> add_query_params('http://example.com/a/b/c?a=b', a=('q', 'b', 'c'),
    ... b='d')
    'http://example.com/a/b/c?a=b&a=q&a=c&b=d'

Keys with no value are respected, pass ``None`` to create one:

    >>> add_query_params('http://example.com/a/b/c?a', b=None)
    'http://example.com/a/b/c?a&b'

But if a value is given, the empty key is considered a duplicate (i.e. the
case of a&a=b is considered nonsensical):

    >>> add_query_params('http://example.com/a/b/c?a', a='b', c=None)
    'http://example.com/a/b/c?a=b&c'

If you need to pass in key names that are not allowed in keyword arguments,
pass them via a dictionary in second argument:

    >>> add_query_params('foo', {"+'|???": 'bar'})
    'foo?%2B%27%7C%C3%A4%C3%BC%C3%B6=bar'

Order of original parameters is retained, although similar keys are grouped
together. Order of keyword arguments is not (and can not be) retained:

    >>> add_query_params('foo?a=b&b=c&a=b&a=d', a='b')
    'foo?a=b&a=d&b=c'

    >>> add_query_params('http://example.com/a/b/c?a=b&q=c&e=d',
    ... x='y', e=1, o=2)
    'http://example.com/a/b/c?a=b&q=c&e=d&e=1&x=y&o=2'

If you need to retain the order of the added parameters, use an
:class:`OrderedDict` as the second argument (*params_dict*):

    >>> from collections import OrderedDict
    >>> od = OrderedDict()
    >>> od['xavier'] = 1
    >>> od['abacus'] = 2
    >>> od['janus'] = 3
    >>> add_query_params('http://example.com/a/b/c?a=b', od)
    'http://example.com/a/b/c?a=b&xavier=1&abacus=2&janus=3'

If both *params_dict* and keyword arguments are provided, values from the
former are used before the latter:

    >>> add_query_params('http://example.com/a/b/c?a=b', od, xavier=1.1,
    ... zorg='a', alpha='b', watt='c', borg='d')
    '
http://example.com/a/b/c?a=b&xavier=1&xavier=1.1&abacus=2&janus=3&zorg=a&borg=d&watt=c&alpha=b
'

Do nothing with a single argument:

    >>> add_query_params('a')
    'a'

    >>> add_query_params('arbitrary strange stuff?????*()+-=42')
    'arbitrary strange stuff?\xc3\xb6\xc3\xa4\xc3\xbc\xc3\xb5*()+-=42'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090412/d48fe5c7/attachment.htm>

From brian at sweetapp.com  Sun Apr 12 12:49:37 2009
From: brian at sweetapp.com (Brian Quinlan)
Date: Sun, 12 Apr 2009 11:49:37 +0100
Subject: [Python-Dev] Possible py3k io wierdness
In-Reply-To: <49DA4648.9070204@sweetapp.com>
References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com>	<loom.20090404T231154-979@post.gmane.org>	<49D874E4.6030602@sweetapp.com>	<loom.20090405T102812-215@post.gmane.org>	<3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net>	<49D9A669.9010008@sweetapp.com>	<49D9E3E0.2060408@gmail.com>
	<49DA4648.9070204@sweetapp.com>
Message-ID: <49E1C741.5020604@sweetapp.com>

I've added a new proposed patch to:
http://bugs.python.org/issue5700

The idea is:
- only IOBase implements close() (though a subclass can override close
   without causing problems so long as it calls super().close() or
   calls .flush() and ._close() directly)
- change IOBase.close to call .flush() and then ._close()
- .flush() invokes super().flush() in every class except IOBase
- ._close() invokes super()._close() in every class except IOBase
- FileIO is implemented in Python in _pyio.py so that it can have the
   same base class as the other Python-implemented files classes
- tests verify that .flush() is not called after the file is closed
- tests verify that ._close()/.flush() calls are propagated correctly

On nice side effect is that inheritance is a lot easier and MI works as 
expected i.e.

class DebugClass(IOBase):
   def flush(self):
     print(<some debug info>)
     super().flush()
   def _close(self):
     print(<some debug info>
     super()._close()

class MyClass(FileIO, DebugClass): # whatever order makes sense
   ...

m = MyClass(...)
m.close()
# Will call:
#   IOBase.close()
#   DebugClass.flush()  # FileIO has no .flush method
#   IOBase.flush()
#   FileIO._close()
#   DebugClass._close()
#   IOBase._close()

Cheers,
Brian

From mrts.pydev at gmail.com  Sun Apr 12 15:15:46 2009
From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=)
Date: Sun, 12 Apr 2009 16:15:46 +0300
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <49E1DD5A.30405@improva.dk>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com>
	<ca471dc20903271926y61f16740h8c3f29e4a1e4c376@mail.gmail.com>
	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>
	<49D09ECF.5090407@trueblade.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<49E1DD5A.30405@improva.dk>
Message-ID: <ad1f81530904120615o92e786cv184716098887c33a@mail.gmail.com>

On Sun, Apr 12, 2009 at 3:23 PM, Jacob Holm <jh at improva.dk> wrote:

> Hi Mart
>
>    >>> add_query_params('http://example.com/a/b/c?a=b', b='d', foo='/bar')
>>    'http://example.com/a/b/c?a=b&b=d&foo=%2Fbar <
>> http://example.com/a/b/c?a=b&b=d&foo=%2Fbar>'
>>
>> Duplicates are discarded:
>>
>
> Why discard duplicates?  They are valid and have a well-defined meaning.

The bad thing about reasoning about query strings is that there is no
comprehensive documentation about their meaning. Both RFC 1738 and RFC 3986
are rather vague in that matter. But I agree that duplicates actually have a
meaning (an ordered list of identical values), so I'll remove the bits that
prune them unless anyone opposes (which I doubt).

>> But if a value is given, the empty key is considered a duplicate (i.e. the
>> case of a&a=b is considered nonsensical):
>>
>
> Again, it is a valid url and this will change its meaning.  Why?

I'm uncertain whether a&a=b has a meaning, but don't see any harm in
supporting it, so I'll add the feature.

>>    >>> add_query_params('http://example.com/a/b/c?a', a='b', c=None)
>>    'http://example.com/a/b/c?a=b&c <http://example.com/a/b/c?a=b&c>'
>>
>> If you need to pass in key names that are not allowed in keyword
>> arguments,
>> pass them via a dictionary in second argument:
>>
>>    >>> add_query_params('foo', {"+'|???": 'bar'})
>>    'foo?%2B%27%7C%C3%A4%C3%BC%C3%B6=bar'
>>
>> Order of original parameters is retained, although similar keys are
>> grouped
>> together.
>>
>
> Why the grouping?  Is it a side effect of your desire to discard
> duplicates?   Changing the order like that changes the meaning of the url.
>  A concrete case where the order of field names matters is the ":records"
> converter in http://pypi.python.org/pypi/zope.httpform/1.0.1 (a small
> independent package extracted from the form handling code in zope).

 It's also related to duplicate handling, but it mostly relates to the data
structure used in the initial implementation (an OrderedDict). Re-grouping
is removed now and not having to deal with duplicates simplified the code
considerably (using a simple list of key-value tuples now).

If you change it to keep duplicates and not unnecessarily mangle the field
> order I am +1, else I am -0.

Thanks for your input! Changes pushed to github (see the updated behaviour
there as well):

http://github.com/mrts/qparams/blob/4f32670b55082f8d0ef01c33524145c3264c161a/qparams.py

MS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090412/a322d5ec/attachment-0001.htm>

From thiagoharry at riseup.net  Sun Apr 12 21:09:22 2009
From: thiagoharry at riseup.net (Harry (Thiago Leucz Astrizi))
Date: Sun, 12 Apr 2009 16:09:22 -0300 (BRT)
Subject: [Python-Dev] Needing help to change the grammar
In-Reply-To: <b8e622740904102320na23074fod4317de771d80254@mail.gmail.com>
References: <thiagoharry.1239415106.squirrel@tern.riseup.net>
	<b8e622740904102320na23074fod4317de771d80254@mail.gmail.com>
Message-ID: <thiagoharry.1239563362.squirrel@swift.riseup.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Written by "Martin v. L?wis" <martin at v.loewis.de>:
> Notice that Python source is represented in UTF-8 in the parser.  It
> might be that the C source code has a different encoding, which
> would cause the strcmp to fail.

No, all the files in the surce code were already in UTF-8. My system
is configured to treat UTF-8 as the default encoding. This is not an
encoding problem.

Written by "Jack diederich" <jackdied at gmail.com>:
> I love the idea (and most recently edited PEP 306) so here are a few
> suggestions;
>
> Brazil has many python programmers so you might be able to make
> quick progress by asking them for volunteer time.

Yes, I have plans to ask for help in the brazilian Python mailing list
when I finish to prepare the C source code for this project. Then I
expect to receive help to translate the python modules for this new
language. There's a lot of work to do.

> To bug-hunt your technical problem: try switching the "not is"
> operator to include an underscore "not_is."  The python LL(1)
> grammar checker works for python but isn't robust, and does miss
> some grammar ambiguities.  Making the operator a single word might
> reveal a bug in the parser.

Thanks for the advice, you almost guessed what went wrong. I made some
tests and already discovered what's the problem. When I change
Grammar/Grammar, Python/ast.c and Modules/parsermodule.c to transform
"is not" in "not is", everything works fine and I create a new Python
verson where "a is not None" is wrong and "a not is None" is
right. But when I translate this to "n?o ?", always happens a
SyntaxError. So the probles is really in the grammar checker that
can't handle some letters with accent.

Well, knowing where the problem is, I think that I can try to solve it
by myself. Thanks again.

> Please consider switching your students to 'real' python part way
> through the course.  If they want to use the vast amount of python
> code on the internet as examples they will need to know the few
> English keywords.
>
> Also - most python core developers are not native English speakers
> and do OK :) PyCon speakers are about 25% non-native English
> speakers and EuroPython speakers are about the reverse (my rough
> estimate - I'd love to see some hard numbers).

Yes, I know. To a more "serious" programmer, it's essential to have a
basic understanding in english and would be better for him to start
with the real Python. But my intent is not to substitute Python in
Brazil, but to create a new language that could be learned easily by
younger people for educational purposes. My intent is to show them how
a computer software works. But surely I will warn my students that to
take programming more seriously, it's important to learn how to
program in some other language, like the original Python. But thanks
for the advice.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFJ4jrjmNGEzq1zP84RAvikAJ4k25vufyWWiDvj3HFZ7Q4M38zCjgCglBGC
dPQTd7mBuswKbNstpJqRuFE=
=xApj
-----END PGP SIGNATURE-----

From tjreedy at udel.edu  Sun Apr 12 22:30:28 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 12 Apr 2009 16:30:28 -0400
Subject: [Python-Dev] Needing help to change the grammar
In-Reply-To: <thiagoharry.1239563362.squirrel@swift.riseup.net>
References: <thiagoharry.1239415106.squirrel@tern.riseup.net>	<b8e622740904102320na23074fod4317de771d80254@mail.gmail.com>
	<thiagoharry.1239563362.squirrel@swift.riseup.net>
Message-ID: <grtj13$ma0$1@ger.gmane.org>

Harry (Thiago Leucz Astrizi) wrote:

> Yes, I have plans to ask for help in the brazilian Python mailing list
> when I finish to prepare the C source code for this project. Then I
> expect to receive help to translate the python modules for this new
> language. There's a lot of work to do.

There are only a few modules that you really need to do this for for 
beginners.  Trying to convert the entire stdlib, let alone other stuff 
on pypi, strikes me as foolish.

...

> Yes, I know. To a more "serious" programmer, it's essential to have a
> basic understanding in english and would be better for him to start
> with the real Python. But my intent is not to substitute Python in
> Brazil, but to create a new language that could be learned easily by
> younger people for educational purposes. My intent is to show them how
> a computer software works. But surely I will warn my students that to
> take programming more seriously, it's important to learn how to
> program in some other language, like the original Python. But thanks
> for the advice.

If possible, and I presume it is, make your interpreter dual language. 
Source code in .py files is parsed as now (and module compiles to .pyc). 
  Source in .pyb (python-brazil) is parsed with with your new parser, 
and get a brazilian equivalent of builtins, but use the same AST and 
bytecode.  Bytecode is neither English nor Brazilian ;-).  This would 
give your students access to the whole world of Python modules and allow 
  those who want to move to normal English-based international Python to 
do so without obsoleting their existing work.

Terry Jan Reedy

PS. Since this thread is not about developing Python itself, it would be 
more appropriate on the python-ideas list if continued much further.

PPS Once unicode identifiers were allowed, I considered it inevitable 
that people would also want native-language keywords, especially for 
younger students.  So I expected a project like yours, though I expected 
the first to be in Asia.  I think dual language versions, if possible, 
would be the way to do this without ghettoizing the national versions. 
But as I said, a general discussion of this belongs on python-ideas.

From l.mastrodomenico at gmail.com  Sun Apr 12 22:59:09 2009
From: l.mastrodomenico at gmail.com (Lino Mastrodomenico)
Date: Sun, 12 Apr 2009 22:59:09 +0200
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <ad1f81530904120615o92e786cv184716098887c33a@mail.gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ca471dc20903271926y61f16740h8c3f29e4a1e4c376@mail.gmail.com>
	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>
	<49D09ECF.5090407@trueblade.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<49E1DD5A.30405@improva.dk>
	<ad1f81530904120615o92e786cv184716098887c33a@mail.gmail.com>
Message-ID: <cc93256f0904121359s48957c48tffa6ff90ecfdbcf@mail.gmail.com>

2009/4/12 Mart S?mermaa <mrts.pydev at gmail.com>:
> The bad thing about reasoning about query strings is that there is no
> comprehensive documentation about their meaning. Both RFC 1738 and RFC 3986
> are rather vague in that matter.

FYI the HTML5 spec (http://whatwg.org/html5 ) may have a better
contact with reality than the RFCs.

>From a quick scan, two sections that may be relevant are "4.10.16.3
Form submission algorithm":

<http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#form-submission-algorithm>

and "4.10.16.4 URL-encoded form data":

<http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#url-encoded-form-data>

-- 
Lino Mastrodomenico

From cs at zip.com.au  Sun Apr 12 23:17:46 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Mon, 13 Apr 2009 07:17:46 +1000
Subject: [Python-Dev] Proposed addtion to urllib.parse in 3.1 (and
	urlparse in 2.7)
In-Reply-To: <ad1f81530904120615o92e786cv184716098887c33a@mail.gmail.com>
Message-ID: <20090412211746.GA23767@cskk.homeip.net>

On 12Apr2009 16:15, Mart S?mermaa <mrts.pydev at gmail.com> wrote:
| On Sun, Apr 12, 2009 at 3:23 PM, Jacob Holm <jh at improva.dk> wrote:
| > Hi Mart
| >    >>> add_query_params('http://example.com/a/b/c?a=b', b='d', foo='/bar')
| >>    'http://example.com/a/b/c?a=b&b=d&foo=%2Fbar <
| >> http://example.com/a/b/c?a=b&b=d&foo=%2Fbar>'
| >>
| >> Duplicates are discarded:
| >
| > Why discard duplicates?  They are valid and have a well-defined meaning.
| 
| The bad thing about reasoning about query strings is that there is no
| comprehensive documentation about their meaning. Both RFC 1738 and RFC 3986
| are rather vague in that matter. But I agree that duplicates actually have a
| meaning (an ordered list of identical values), so I'll remove the bits that
| prune them unless anyone opposes (which I doubt).

+1 from me, with the following suggestion: it's probably worth adding the
to doco that people working with dict-style query_string params should
probably go make a dict or OrderedDict and use:

  add_query_params(..., **the_dict)

just to make the other use case obvious.

An alternative would be to have add_ and append_ methods with set and
list behaviour. Feels a little like API bloat, though the convenience
function can be nice.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

The wonderous pulp and fibre of the brain had been substituted by brass and
iron; he had taught wheelwork to think. - Harry Wilmot Buxton 1832,
                referring to Charles Babbage and his difference engine.

From tonynelson at georgeanelson.com  Sun Apr 12 23:41:00 2009
From: tonynelson at georgeanelson.com (Tony Nelson)
Date: Sun, 12 Apr 2009 17:41:00 -0400
Subject: [Python-Dev] Needing help to change the grammar
In-Reply-To: <grtj13$ma0$1@ger.gmane.org>
References: <thiagoharry.1239415106.squirrel@tern.riseup.net>
	<b8e622740904102320na23074fod4317de771d80254@mail.gmail.com>
	<thiagoharry.1239563362.squirrel@swift.riseup.net>
	<grtj13$ma0$1@ger.gmane.org>
Message-ID: <p04330107c6080d82eb60@[192.168.123.162]>

At 16:30 -0400 04/12/2009, Terry Reedy wrote:
 ...
>  Source in .pyb (python-brazil) is parsed with with your new parser,
 ...

In case anyone ever does this again, I suggest that the extension be the
language and optionally country code:

    .py_pt  or  .py_pt_BR
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>

From solipsis at pitrou.net  Sun Apr 12 23:56:58 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 12 Apr 2009 21:56:58 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<49CD2930.4080307@cornell.edu> <gqjnti$qes$1@ger.gmane.org>
	<91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com>
	<ca471dc20903271926y61f16740h8c3f29e4a1e4c376@mail.gmail.com>
	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>
	<49D09ECF.5090407@trueblade.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
Message-ID: <loom.20090412T215625-611@post.gmane.org>

Mart S?mermaa <mrts.pydev <at> gmail.com> writes:
> 
> Proposal: add add_query_params() for appending query parameters to an URL to
urllib.parse and urlparse.

Is there anything to /remove/ a query parameter?

From v+python at g.nevcal.com  Mon Apr 13 00:11:26 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Sun, 12 Apr 2009 15:11:26 -0700
Subject: [Python-Dev] Needing help to change the grammar
In-Reply-To: <p04330107c6080d82eb60@[192.168.123.162]>
References: <thiagoharry.1239415106.squirrel@tern.riseup.net>	<b8e622740904102320na23074fod4317de771d80254@mail.gmail.com>	<thiagoharry.1239563362.squirrel@swift.riseup.net>	<grtj13$ma0$1@ger.gmane.org>
	<p04330107c6080d82eb60@[192.168.123.162]>
Message-ID: <49E2670E.3070705@g.nevcal.com>

On approximately 4/12/2009 2:41 PM, came the following characters from 
the keyboard of Tony Nelson:
> At 16:30 -0400 04/12/2009, Terry Reedy wrote:
>  ...
>>  Source in .pyb (python-brazil) is parsed with with your new parser,
>  ...
> 
> In case anyone ever does this again, I suggest that the extension be the
> language and optionally country code:
> 
>     .py_pt  or  .py_pt_BR

Wouldn't that be a good idea for this implementation too?  It sounds 
like it is not-yet-released, as it is also not-yet-bug-free.

And actually, wouldn't it be nice if international keywords could be 
accepted as alternates if one just said

import pt_BR

An implementation along that line, except for things like reversing the 
order of "not" and "is", would allow the next national language 
customization to be done by just recoding the pt_BR module, renaming to 
pt_it or pt_fr or pt_no and translating a bunch of strings, no?

Probably it would be sufficient to allow for one language at a time, per 
module.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From rasky at develer.com  Mon Apr 13 00:55:21 2009
From: rasky at develer.com (Giovanni Bajo)
Date: Sun, 12 Apr 2009 22:55:21 +0000 (UTC)
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<gqovsd$rnv$1@ger.gmane.org>
	<806d41050903301034i8018472pbdb3f550a1629886@mail.gmail.com>
	<49D110AD.3080703@cheimes.de>
Message-ID: <grtrgp$8bf$1@ger.gmane.org>

On Mon, 30 Mar 2009 20:34:21 +0200, Christian Heimes wrote:

> Hallo Alexander!
> 
> Alexander Neundorf wrote:
>> This of course depends on the definition of "as good as" ;-) Well, I
>> have met Windows-only developers which use CMake because it is able to
>> generate project files for different versions of Visual Studio, and
>> praise it for that.
> 
> So far I haven't heard any complains about or feature requests for the
> project files. ;)

In fact, I have had one.

I asked to put all those big CJK codecs outside of python2x.dll because 
they were too big and create far larger self-contained distributions 
(aka: py2exe/pyinstaller) as would normally be required.

I was replied that it would be unconvienent to do so because of the fact 
that the build system is made by hand and it's hard to generate project 
files for each third party module.

Were those project files generated automatically, changing between 
external modules within or outside python2x dll would be a one-line 
switch in CMakeLists.txt (or similar).
-- 
Giovanni Bajo
Develer S.r.l.
http://www.develer.com

From rasky at develer.com  Mon Apr 13 01:00:04 2009
From: rasky at develer.com (Giovanni Bajo)
Date: Sun, 12 Apr 2009 23:00:04 +0000 (UTC)
Subject: [Python-Dev] Evaluated cmake as an autoconf replacement
References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com>
	<85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com>
	<18907.17310.201358.697994@montanaro.dyndns.org>
	<5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com>
	<49DBD6F9.7030502@canterbury.ac.nz>
	<806d41050904071554x30dade8eva60be765af462112@mail.gmail.com>
	<5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com>
	<806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com>
	<5b8d13220904081857w46237b57t82d8a4006f00adbb@mail.gmail.com>
	<50862ebd0904091849q7f28fa5bmeaf3b9061629a1c6@mail.gmail.com>
Message-ID: <grtrpk$8bf$2@ger.gmane.org>

On Fri, 10 Apr 2009 11:49:04 +1000, Neil Hodgson wrote:

>    This means that generated Visual Studio project files will not work
> for other people unless a particular absolute build location is
> specified for everyone which will not suit most. Each person that wants
> to build Python will have to run cmake before starting Visual Studio
> thus increasing the prerequisites.

Given that we're now stuck with using whatever Visual Studio version the 
Python maintainers decided to use, I don't see this as a problem. As in: 
there is already a far larger and invasive dependency. 

CMake is readily available on all platforms, and it can be installed in a 
couple of seconds.
-- 
Giovanni Bajo
Develer S.r.l.
http://www.develer.com

From asmodai at in-nomine.org  Mon Apr 13 10:09:08 2009
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Mon, 13 Apr 2009 10:09:08 +0200
Subject: [Python-Dev] UTF-8 Decoder
Message-ID: <20090413080908.GM13110@nexus.in-nomine.org>

[Note: I haven't looked thoroughly at our handling yet, so hence I raise the
question.]

This got posted on the Unicode list, does it seem interesting for Python
itself, the UTF-8 to UTF-16 transcoding might be?

http://bjoern.hoehrmann.de/utf-8/decoder/dfa/

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Whenever you meet difficult situations dash forward bravely and joyfully...

From mrts.pydev at gmail.com  Mon Apr 13 11:29:46 2009
From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=)
Date: Mon, 13 Apr 2009 12:29:46 +0300
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <loom.20090412T215625-611@post.gmane.org>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com>
	<ca471dc20903271926y61f16740h8c3f29e4a1e4c376@mail.gmail.com>
	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>
	<49D09ECF.5090407@trueblade.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
Message-ID: <ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>

On Mon, Apr 13, 2009 at 12:56 AM, Antoine Pitrou <solipsis at pitrou.net>wrote:

> Mart S?mermaa <mrts.pydev <at> gmail.com> writes:
> >
> > Proposal: add add_query_params() for appending query parameters to an URL
> to
> urllib.parse and urlparse.
>
> Is there anything to /remove/ a query parameter?

I'd say this is outside the scope of add_query_params().

As for the duplicate handling, I've implemented a threefold strategy that
should address all use cases raised before:

 def add_query_params(*args, **kwargs):
    """
    add_query_parms(url, [allow_dups, [args_dict, [separator]]], **kwargs)

    Appends query parameters to an URL and returns the result.

    :param url: the URL to update, a string.
    :param allow_dups: if
        * True: plainly append new parameters, allowing all duplicates
          (default),
        * False: disallow duplicates in values and regroup keys so that
          different values for the same key are adjacent,
        * None: disallow duplicates in keys -- each key can have a single
          value and later values override the value (like dict.update()).
    :param args_dict: optional dictionary of parameters, default is {}.
    :param separator: either ';' or '&', the separator between key-value
        pairs, default is '&'.
    :param kwargs: parameters as keyword arguments.

    :return: original URL with updated query parameters or the original URL
        unchanged if no parameters given.
    """

The commit is

http://github.com/mrts/qparams/blob/b9bdbec46bf919d142ff63e6b2b822b5d57b6f89/qparams.py

extensive description of the behaviour is in the doctests.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090413/46a27373/attachment.htm>

From solipsis at pitrou.net  Mon Apr 13 12:19:13 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 13 Apr 2009 10:19:13 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com>
	<ca471dc20903271926y61f16740h8c3f29e4a1e4c376@mail.gmail.com>
	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>
	<49D09ECF.5090407@trueblade.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
Message-ID: <loom.20090413T101627-359@post.gmane.org>

Mart S?mermaa <mrts.pydev <at> gmail.com> writes:
> 
> On Mon, Apr 13, 2009 at 12:56 AM, Antoine Pitrou <solipsis <at> pitrou.net>
wrote:
> Mart S?mermaa <mrts.pydev <at> gmail.com> writes:
> >
> > Proposal: add add_query_params() for appending query parameters to an URL
to
> urllib.parse and urlparse.
> Is there anything to /remove/ a query parameter?
> 
> I'd say this is outside the scope of add_query_params().

Given the name of the proposed function, sure. But it sounds a bit weird to
have a function dedicated to adding parameters and nothing to remove them.

You could e.g. rename the function to update_query_params() and decide that
every parameter whose specified value is None must atcually be removed from
the URL.

Regards

Antoine.

From fuzzyman at voidspace.org.uk  Mon Apr 13 13:53:10 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Mon, 13 Apr 2009 12:53:10 +0100
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
 3.1 (and urlparse in 2.7)
In-Reply-To: <loom.20090413T101627-359@post.gmane.org>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>	<91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com>	<ca471dc20903271926y61f16740h8c3f29e4a1e4c376@mail.gmail.com>	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>	<49D09ECF.5090407@trueblade.com>	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>	<49D0ACD5.5090209@gmail.com>	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>	<loom.20090412T215625-611@post.gmane.org>	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<loom.20090413T101627-359@post.gmane.org>
Message-ID: <49E327A6.3000801@voidspace.org.uk>

Antoine Pitrou wrote:
> Mart S?mermaa <mrts.pydev <at> gmail.com> writes:
>   
>> On Mon, Apr 13, 2009 at 12:56 AM, Antoine Pitrou <solipsis <at> pitrou.net>
>>     
> wrote:
>   
>> Mart S?mermaa <mrts.pydev <at> gmail.com> writes:
>>     
>>> Proposal: add add_query_params() for appending query parameters to an URL
>>>       
> to
>   
>> urllib.parse and urlparse.
>> Is there anything to /remove/ a query parameter?
>>
>> I'd say this is outside the scope of add_query_params().
>>     
>
> Given the name of the proposed function, sure. But it sounds a bit weird to
> have a function dedicated to adding parameters and nothing to remove them.
>
>   

Weird or not, is there actually a *need* to remove query parameters?

Michael

> You could e.g. rename the function to update_query_params() and decide that
> every parameter whose specified value is None must atcually be removed from
> the URL.
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From solipsis at pitrou.net  Mon Apr 13 14:01:51 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 13 Apr 2009 12:01:51 +0000 (UTC)
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>	<91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com>	<ca471dc20903271926y61f16740h8c3f29e4a1e4c376@mail.gmail.com>	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>	<49D09ECF.5090407@trueblade.com>	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>	<49D0ACD5.5090209@gmail.com>	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>	<loom.20090412T215625-611@post.gmane.org>	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<loom.20090413T101627-359@post.gmane.org>
	<49E327A6.3000801@voidspace.org.uk>
Message-ID: <loom.20090413T120051-834@post.gmane.org>

Michael Foord <fuzzyman <at> voidspace.org.uk> writes:
> 
> Weird or not, is there actually a *need* to remove query parameters?

Say you are filtering or sorting data based on some URL parameters. If the user
wants to remove one of those filters, you have to remove the corresponding query
parameter.

Regards

Antoine.

From orsenthil at gmail.com  Mon Apr 13 14:22:05 2009
From: orsenthil at gmail.com (Senthil Kumaran)
Date: Mon, 13 Apr 2009 17:52:05 +0530
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <loom.20090413T120051-834@post.gmane.org>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<loom.20090413T101627-359@post.gmane.org>
	<49E327A6.3000801@voidspace.org.uk>
	<loom.20090413T120051-834@post.gmane.org>
Message-ID: <7c42eba10904130522r2dbaef23ja5e785a2206177d9@mail.gmail.com>

On Mon, Apr 13, 2009 at 5:31 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Say you are filtering or sorting data based on some URL parameters. If the user
> wants to remove one of those filters, you have to remove the corresponding query
> parameter.

This is a use-case and possibly a hypothetical one which a programmer
might do under special situations.
There are lots of such use cases for which urllib.parse or urlparse
has been used for.

But my thoughts with this proposal is do we have a good RFC
specfications to implementing this?
If not and if we go by just go by the practical needs, then eventually
we will end up with bugs or feature requests in this which will take a
lot of discussions and time to get fixed.

Someone pointed out to read HTML 5.0 spec instead of RFC for this
request. I am yet to do that, but my opinion with respect to additions
to url* module is - backing of RFCs would be the best way to go and
maintain.

-- 
Senthil

From tino at wildenhain.de  Mon Apr 13 14:33:08 2009
From: tino at wildenhain.de (Tino Wildenhain)
Date: Mon, 13 Apr 2009 14:33:08 +0200
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse
 in	3.1 (and urlparse in 2.7)
In-Reply-To: <7c42eba10904130522r2dbaef23ja5e785a2206177d9@mail.gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>	<49D0ACD5.5090209@gmail.com>	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>	<loom.20090412T215625-611@post.gmane.org>	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>	<loom.20090413T101627-359@post.gmane.org>	<49E327A6.3000801@voidspace.org.uk>	<loom.20090413T120051-834@post.gmane.org>
	<7c42eba10904130522r2dbaef23ja5e785a2206177d9@mail.gmail.com>
Message-ID: <49E33104.2040302@wildenhain.de>

Hi,

Senthil Kumaran wrote:
> On Mon, Apr 13, 2009 at 5:31 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Say you are filtering or sorting data based on some URL parameters. If the user
>> wants to remove one of those filters, you have to remove the corresponding query
>> parameter.
> 
> This is a use-case and possibly a hypothetical one which a programmer
> might do under special situations.
> There are lots of such use cases for which urllib.parse or urlparse
> has been used for.
> 
> But my thoughts with this proposal is do we have a good RFC
> specfications to implementing this?
> If not and if we go by just go by the practical needs, then eventually
> we will end up with bugs or feature requests in this which will take a
> lot of discussions and time to get fixed.
> 
> Someone pointed out to read HTML 5.0 spec instead of RFC for this
> request. I am yet to do that, but my opinion with respect to additions
> to url* module is - backing of RFCs would be the best way to go and
> maintain.

I'd rather like to see an ordered dict like object returned by urlparse 
for parameters this would make extra methods superfluous.

Also note that you might need to specify the encoding
of the data somewhere (most of the times its utf-8 but it depends on the
encoding used in the form page).

A nice add-on would actually be a template form object which holds all
the expected items and their type (and if optional or not) with little
wrappers for common types (int, float, string, list, ...) which
generate nice execeptions when used somewhere and not filled/no default
or actually wrong data for a type.

Otoh, this might get a bit too much in direction of a web app framework.

Regards
Tino

From barry at python.org  Mon Apr 13 16:01:14 2009
From: barry at python.org (Barry Warsaw)
Date: Mon, 13 Apr 2009 10:01:14 -0400
Subject: [Python-Dev] Python 2.6.2 final
In-Reply-To: <5c6f2a5d0904110520o2ea97af9t4cd18a168db795d5@mail.gmail.com>
References: <776F906E-418C-4A2C-8C6C-2B0036B49AFA@python.org>
	<5c6f2a5d0904110520o2ea97af9t4cd18a168db795d5@mail.gmail.com>
Message-ID: <3E33F52B-DC06-44D5-BC91-68F4D6AD5300@python.org>

On Apr 11, 2009, at 8:20 AM, Mark Dickinson wrote:

> On Fri, Apr 10, 2009 at 2:31 PM, Barry Warsaw <barry at python.org>  
> wrote:
>> bugs.python.org is apparently down right now, but I set issue 5724 to
>> release blocker for 2.6.2.  This is waiting for input from Mark  
>> Dickinson,
>> and it relates to test_cmath failing on Solaris 10.
>
> I'd prefer to leave this alone for 2.6.2.  There's a fix posted to  
> the issue
> tracker, but it's not entirely trivial and I think the risk of  
> accidental
> breakage outweighs the niceness of seeing 'all tests passed' on
> Solaris.

Agreed.  I've knocked this back to 'high' priority and accepted it for  
2.6.3.  Mark, feel free to apply it after 2.6.2 is tagged (which  
should be in about 8 hours or 2200 UTC today).

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090413/cbca1f92/attachment-0001.pgp>

From barry at python.org  Mon Apr 13 16:11:09 2009
From: barry at python.org (Barry Warsaw)
Date: Mon, 13 Apr 2009 10:11:09 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <A286FA62-B1F0-4DB4-BC38-9D1E0F85A92A@fuhm.net>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<A286FA62-B1F0-4DB4-BC38-9D1E0F85A92A@fuhm.net>
Message-ID: <CBA855B3-2806-469D-A4A6-8AF279607A52@python.org>

On Apr 10, 2009, at 11:08 AM, James Y Knight wrote:

> Until you write a parser for every header, you simply cannot decode  
> to unicode. The only sane choices are:
> 1) raw bytes
> 2) parsed structured data

The email package does not need a parser for every header, but it  
should provide a framework that applications (or third party  
libraries) can use to extend the built-in header parsers.  A bare  
minimum for functionality requires a Content-Type parser.  I think the  
email package should also include an address header (Originator,  
Destination) parser, and a Message-ID header parser.  Possibly  
others.  The default would probably be some unstructured parser for  
headers like Subject.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090413/912f6b87/attachment.pgp>

From barry at python.org  Mon Apr 13 16:14:04 2009
From: barry at python.org (Barry Warsaw)
Date: Mon, 13 Apr 2009 10:14:04 -0400
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <49DF8956.5050501@g.nevcal.com>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
	<49DF8956.5050501@g.nevcal.com>
Message-ID: <7DF370A6-88E4-4710-9CF8-B0B3D7249383@python.org>

On Apr 10, 2009, at 2:00 PM, Glenn Linderman wrote:

> If one name has to be longer than the other, it should be the bytes  
> version.  Real user code is more likely to want to use the text  
> version, and hopefully there will be more of that type of code than  
> implementations using bytes.
>
> Of course, one could use message.header and message.bythdr and  
> they'd be the same length.

Actually, thinking about this over the weekend, it's much better for  
message['subject'] to return a Header instance in all cases.  Use  
bytes(header) to get the raw bytes.

A good API for getting the parsed and decoded header values needs to  
take into account that it won't always be a string.  For unstructured  
headers like Subject, str(header) would work just fine.  For an  
Originator or Destination address, what does str(header) return?  And  
what would be the API for getting the set of realname/addresses out of  
the header?

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090413/898936ae/attachment.pgp>

From barry at python.org  Mon Apr 13 16:28:32 2009
From: barry at python.org (Barry Warsaw)
Date: Mon, 13 Apr 2009 10:28:32 -0400
Subject: [Python-Dev] headers api for email package
In-Reply-To: <49E08F8C.5030205@simplistix.co.uk>
References: <loom.20090408T110540-221@post.gmane.org>	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>	<loom.20090409T043042-835@post.gmane.org>	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<49E08F8C.5030205@simplistix.co.uk>
Message-ID: <FD0E133D-4944-4FE6-B3FD-865947F48E2F@python.org>

On Apr 11, 2009, at 8:39 AM, Chris Withers wrote:

> Barry Warsaw wrote:
>> >>> message['Subject']
>> The raw bytes or the decoded unicode?
>
> A header object.

Yep.  You got there before I did. :)

>> Okay, so you've picked one.  Now how do you spell the other way?
>
> str(message['Subject'])

Yes for unstructured headers like Subject.  For structured headers...  
hmm.

> bytes(message['Subject'])

Yes.

>> Now, setting headers.  Sometimes you have some unicode thing and  
>> sometimes you have some bytes.  You need to end up with bytes in  
>> the ASCII range and you'd like to leave the header value unencoded  
>> if so.  But in both cases, you might have bytes or characters  
>> outside that range, so you need an explicit encoding, defaulting to  
>> utf-8 probably.
>> >>> Message.set_header('Subject', 'Some text', encoding='utf-8')
>> >>> Message.set_header('Subject', b'Some bytes')
>
> Where you just want "a damned valid email and stop making my life  
> hard!":
>
> Message['Subject']='Some text'

Yes.  In which case I propose we guess the encoding as 1) ascii, 2)  
utf-8, 3) wtf?

> Where you care about what encoding is used:
>
> Message['Subject']=Header('Some text',encoding='utf-8')

Yes.

> If you have bytes, for whatever reason:
>
> Message['Subject']=b'some bytes'.decode('utf-8')
>
> ...because only you know what encoding those bytes use!

So you're saying that __setitem__() should not accept raw bytes?

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090413/9df3eb61/attachment.pgp>

From martin at v.loewis.de  Mon Apr 13 16:44:36 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 13 Apr 2009 16:44:36 +0200
Subject: [Python-Dev] Contributor Agreements for Patches - was
 [Jython-dev] Jython on Google AppEngine!
In-Reply-To: <d03bb4010904080850h7eece089i1e279b4a5f01cbb3@mail.gmail.com>
References: <d03bb4010904080850h7eece089i1e279b4a5f01cbb3@mail.gmail.com>
Message-ID: <49E34FD4.3060809@v.loewis.de>

>     * What is the scope of a patch that requires a contributor
>       agreement?

Van's advise is as follows:

There is no definite ruling on what constitutes "work" that is
copyright-protected; estimates vary between 10 and 50 lines.
Establishing a rule based on line limits is not supported by
law. Formally, to be on the safe side, paperwork would be needed
for any contribution (no matter how small); this is tedious and
probably unnecessary, as the risk of somebody suing is small.
Also, in that case, there would be a strong case for an implied
license.

So his recommendation is to put the words

"By submitting a patch or bug report, you agree to license it under the
Apache Software License, v. 2.0, and further agree that it may be
relicensed as necessary for inclusion in Python or other downstream
projects."

into the tracker; this should be sufficient for most cases. For
committers, we should continue to require contributor forms.

Contributor forms can be electronic, but they need to name the
parties, include a signature (including electronic), and include
a company contribution agreement as necessary.

Regards,
Martin

P.S. I'm sure Van will jump in if I misunderstood parts of this.

From thobes at gmail.com  Mon Apr 13 17:45:11 2009
From: thobes at gmail.com (Tobias Ivarsson)
Date: Mon, 13 Apr 2009 17:45:11 +0200
Subject: [Python-Dev] Contributor Agreements for Patches - was
	[Jython-dev] Jython on Google AppEngine!
In-Reply-To: <49E34FD4.3060809@v.loewis.de>
References: <d03bb4010904080850h7eece089i1e279b4a5f01cbb3@mail.gmail.com>
	<49E34FD4.3060809@v.loewis.de>
Message-ID: <9997d5e60904130845t7c1f636cof05cfb86d20c9c29@mail.gmail.com>

On Mon, Apr 13, 2009 at 4:44 PM, "Martin v. L?wis" <martin at v.loewis.de>wrote:

> >     * What is the scope of a patch that requires a contributor
> >       agreement?
>
> Van's advise is as follows:
>
> There is no definite ruling on what constitutes "work" that is
> copyright-protected; estimates vary between 10 and 50 lines.
> Establishing a rule based on line limits is not supported by
> law. Formally, to be on the safe side, paperwork would be needed
> for any contribution (no matter how small); this is tedious and
> probably unnecessary, as the risk of somebody suing is small.
> Also, in that case, there would be a strong case for an implied
> license.
>
> So his recommendation is to put the words
>
> "By submitting a patch or bug report, you agree to license it under the
> Apache Software License, v. 2.0, and further agree that it may be
> relicensed as necessary for inclusion in Python or other downstream
> projects."
>
> into the tracker; this should be sufficient for most cases. For
> committers, we should continue to require contributor forms.

Sounds great to me.

Cheers,
Tobias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090413/02e1f035/attachment.htm>

From rdmurray at bitdance.com  Mon Apr 13 17:49:35 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Mon, 13 Apr 2009 11:49:35 -0400 (EDT)
Subject: [Python-Dev] headers api for email package
In-Reply-To: <FD0E133D-4944-4FE6-B3FD-865947F48E2F@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<49E08F8C.5030205@simplistix.co.uk>
	<FD0E133D-4944-4FE6-B3FD-865947F48E2F@python.org>
Message-ID: <Pine.LNX.4.64.0904131117510.26362@kimball.webabinitio.net>

On Mon, 13 Apr 2009 at 10:28, Barry Warsaw wrote:
> On Apr 11, 2009, at 8:39 AM, Chris Withers wrote:
>
>> Barry Warsaw wrote:
>> > > > >  message['Subject']
>> > The raw bytes or the decoded unicode?
>> 
>> A header object.
>
> Yep.  You got there before I did. :)

+1

>> > Okay, so you've picked one.  Now how do you spell the other way?
>> 
>> str(message['Subject'])
>
> Yes for unstructured headers like Subject.  For structured headers... hmm.

Some "reasonable" printable interpretation that has no semantic meaning?

>> bytes(message['Subject'])
>
> Yes.
>
>> > Now, setting headers.  Sometimes you have some unicode thing and 
>> > sometimes you have some bytes.  You need to end up with bytes in the 
>> > ASCII range and you'd like to leave the header value unencoded if so. 
>> > But in both cases, you might have bytes or characters outside that range, 
>> > so you need an explicit encoding, defaulting to utf-8 probably.
>> > > > >  Message.set_header('Subject', 'Some text', encoding='utf-8')
>> > > > >  Message.set_header('Subject', b'Some bytes')
>> 
>> Where you just want "a damned valid email and stop making my life hard!":
>> 
>> Message['Subject']='Some text'
>
> Yes.  In which case I propose we guess the encoding as 1) ascii, 2) utf-8, 3) 
> wtf?

Given some usenet postings I've just dealt with, (3) appears to
sometimes be spelled 'x-unknown' and sometimes (in the most recent case)
'unknown-8bit'.  A quick google turns up a hit on RFC1428 for the latter,
and a bunch of trouble tickets for the former...so I think 'wtf' is
correctly spelled 'unknown-8bit'.

However, it's not supposed to be used by mail composers, who are
expected to know the encoding.  It's for mail gateways that are
transforming something and don't know the encoding.  I'm not
sure what this means for the email module, which certainly
will be used in a mail gateways....maybe it's the responsibility
of the application code to explicitly say 'unknown encoding'?

>> Where you care about what encoding is used:
>> 
>> Message['Subject']=Header('Some text',encoding='utf-8')
>
> Yes.
>
>> If you have bytes, for whatever reason:
>> 
>> Message['Subject']=b'some bytes'.decode('utf-8')
>> 
>> ...because only you know what encoding those bytes use!
>
> So you're saying that __setitem__() should not accept raw bytes?

If I'm understanding things correctly, if it did accept bytes the
person using that interface would need to do whatever encoding (eg:
encoded-word) was needed, so the interface should check that the byte
string is 8 bit clean.  But having some sort of 'setraw' method on Header
might be better for that case.

--David

From daniel at stutzbachenterprises.com  Mon Apr 13 18:11:35 2009
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Mon, 13 Apr 2009 11:11:35 -0500
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49E00931.6050107@v.loewis.de>
References: <loom.20090408T110540-221@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>
Message-ID: <eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>

On Fri, Apr 10, 2009 at 10:06 PM, "Martin v. L?wis" <martin at v.loewis.de>wrote:

> However, I really think that this question cannot be answered by
> reading the RFC. It should be answered by verifying how people use
> the json library in 2.x.
>

I use the json module in 2.6 to communicate with a C# JSON library and a
JavaScript JSON library.  The C# and JavaScript libraries produce and
consume the equivalent of str, not the equivalent of bytes.

Yes, the data eventually has to go over a socket as bytes, but that's often
handled by a different layer of code.

For JavaScript, data is typically received by via XMLHttpRequest(), which
automatically figures out the encoding from the HTTP headers and/or other
information (defaulting to UTF-8) and returns a str-like object that I pass
to the JavaScript JSON library.

For C#, I wrap the socket in a StreamReader object, which decodes the byte
stream into a string stream (similar to Python's new TextIOWrapper class).

Hope that helps,

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090413/21b359a5/attachment.htm>

From walter at livinglogic.de  Mon Apr 13 18:39:06 2009
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Mon, 13 Apr 2009 18:39:06 +0200
Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC
In-Reply-To: <20090410233524.GA18347@idyll.org>
References: <20090410203809.GA24530@idyll.org>	<1afaf6160904101605l2235f906if36aa79703cd9fd7@mail.gmail.com>
	<20090410233524.GA18347@idyll.org>
Message-ID: <49E36AAA.6050903@livinglogic.de>

C. Titus Brown wrote:

> [...]
> I have had a hard time getting a good sense of what core code is well
> tested and what is not well tested, across various platforms.  While
> Walter's C/Python integrated code coverage site is nice, it would be
> even nicer to have a way to generate all that information within any
> particular checkout on a real-time basis.

This might have to be done incrementally. Creating the output for
http://coverage.livinglogic.de/ takes about 90 minutes. This breaks done
like this:

Downloading: 2sec
Unpacking: 3sec
Configuring: 30sec
Compiling: 1min
Running the test suite: 1hour
Reading coverage files: 8sec
Generating HTML files: 30min

> Doing so in the context of
> Snakebite would be icing... and I think it's worth supporting in core,
> especially if it can be done without any changes *to* core.

The only thing we'd probably need in core is a way to configure Python
to run with code coverage. The coverage script does this by patching the
makefile.

Running the code coverage script on Snakebite would be awesome. The
script is available from here:

    http://pypi.python.org/pypi/pycoco

> -> Another small nit is that they should address Python 2.x, too.
> 
> I asked that they focus on EITHER 2.x or 3.x, since "too broad" is an
> equally valid criticism.  Certainly 3.x is the future so I though
> focusing on increasing code coverage, and especially C code coverage,
> could best be applied to 3.x.

Servus,
   Walter

From stephen at xemacs.org  Mon Apr 13 19:15:20 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 14 Apr 2009 02:15:20 +0900
Subject: [Python-Dev] [Email-SIG]  headers api for email package
In-Reply-To: <FD0E133D-4944-4FE6-B3FD-865947F48E2F@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<49E08F8C.5030205@simplistix.co.uk>
	<FD0E133D-4944-4FE6-B3FD-865947F48E2F@python.org>
Message-ID: <873accv5jr.fsf@xemacs.org>

Barry Warsaw writes:
 > On Apr 11, 2009, at 8:39 AM, Chris Withers wrote:
 > 
 > > Barry Warsaw wrote:
 > >> >>> message['Subject']
 > >> The raw bytes or the decoded unicode?
 > >
 > > A header object.
 > 
 > Yep.  You got there before I did. :)
 > 
 > >> Okay, so you've picked one.  Now how do you spell the other way?
 > >
 > > str(message['Subject'])
 > 
 > Yes for unstructured headers like Subject.  For structured headers...  
 > hmm.

Well, suppose we get really radical here.  *People* see email as
(rich-)text.  So ... message['Subject'] returns an object, partly to
be consistent with more complex headers' APIs, but partly to remind us
that nothing in email is as simple as it seems.  Now,
str(message['Subject']) is really for presentation to the user, right?
OK, so let's make it a presentation function!  Decode the MIME-words,
optionally unfold folded lines, optionally compress spaces, etc.  This
by default returns the subject field as a single, possibly quite long,
line.  Then a higher-level API can rewrap it, add fonts etc, for fancy
presentation.  This also suggests that we don't the field tag (ie,
"Subject") to be part of this value.

Of course a *really* smart higher-level API would access structured
headers based on their structure, not on the one-size-fits-all str()
conversion.

Then MTAs see email as a string of octets.  So guess what:

 > > bytes(message['Subject'])

gives wire format.  Yow!  I think I'm just joking.  Right?

 > >> Now, setting headers.  Sometimes you have some unicode thing and  
 > >> sometimes you have some bytes.  You need to end up with bytes in  
 > >> the ASCII range and you'd like to leave the header value unencoded  
 > >> if so.  But in both cases, you might have bytes or characters  
 > >> outside that range, so you need an explicit encoding, defaulting to  
 > >> utf-8 probably.
 > >> >>> Message.set_header('Subject', 'Some text', encoding='utf-8')
 > >> >>> Message.set_header('Subject', b'Some bytes')
 > >
 > > Where you just want "a damned valid email and stop making my life  
 > > hard!":

-1  I mean, yeah, Brother, I feel your pain but it just isn't that
easy.  If that were feasible, it would be *criminal* to have a
.set_header() method at all!  In fact,

 > > Message['Subject']='Some text'

is going to (a) need to take *only* unicodes, or (b) raise Exceptions
at the slightest provocation when handed bytes.

And things only get worse if you try to provide this interface for say
"From" (let alone "Content-Type").  Is it really worth doing the
mapping interface if it's only usable with free-form headers (ie, only
Subject among the commonly used headers)?

 > Yes.  In which case I propose we guess the encoding as 1) ascii, 2)  
 > utf-8, 3) wtf?

Uh, what guessing?  If you don't know what you have but you believe it
to be a valid header field, then presumably you got it off the wire
and it's still in bytes and you just spit it out on the wire without
trying to decode or encode it.  But as I already said, I think that's
a bad idea.  Otherwise, you should have a unicode, and you simply look
at the range of the string.  If it fits in ASCII, Bob's your uncle.
If not, Bob's your aunt (and you use UTF-8).

 > > Where you care about what encoding is used:
 > >
 > > Message['Subject']=Header('Some text',encoding='utf-8')
 > 
 > Yes.
 > 
 > > If you have bytes, for whatever reason:
 > >
 > > Message['Subject']=b'some bytes'.decode('utf-8')
 > >
 > > ...because only you know what encoding those bytes use!
 > 
 > So you're saying that __setitem__() should not accept raw bytes?

How do you distinguish "raw" bytes from "encoded bytes"?
__setitem__() shouldn't accept bytes at all.  There should be an API
which sets a .formatted_for_the_wire member, and it should have a
"validate" option (ie, when true the API attempts to parse the header
and raises an exception if it fails to do so; when false, it assumes
you know what you're doing and will send out the bytes verbatim).

From martin at v.loewis.de  Mon Apr 13 19:19:18 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 13 Apr 2009 19:19:18 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>	
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>	
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>	
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	
	<20090410025203.GA199@panix.com>	
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	
	<49DEBB21.70305@gmail.com>	
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>	
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>
Message-ID: <49E37416.5030802@v.loewis.de>

> I use the json module in 2.6 to communicate with a C# JSON library and a
> JavaScript JSON library.  The C# and JavaScript libraries produce and
> consume the equivalent of str, not the equivalent of bytes.

I assume there is a TCP connection between the json module and the
C#/JavaScript libraries?

If so, it doesn't matter what representation these implementations chose
to use.

> Hope that helps,

Maybe I misunderstood, and you are *not* communicating over the wire.
In this case, I'm puzzled how you get the data from Python to the C#
JSON library, or to the JavaScript library.

Regards,
Martin

From steven.bethard at gmail.com  Mon Apr 13 19:23:45 2009
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 13 Apr 2009 10:23:45 -0700
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ca471dc20903271926y61f16740h8c3f29e4a1e4c376@mail.gmail.com>
	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>
	<49D09ECF.5090407@trueblade.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
Message-ID: <d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>

On Mon, Apr 13, 2009 at 2:29 AM, Mart S?mermaa <mrts.pydev at gmail.com> wrote:
>
>
> On Mon, Apr 13, 2009 at 12:56 AM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
>>
>> Mart S?mermaa <mrts.pydev <at> gmail.com> writes:
>> >
>> > Proposal: add add_query_params() for appending query parameters to an
>> > URL to
>> urllib.parse and urlparse.
>>
>> Is there anything to /remove/ a query parameter?
>
> I'd say this is outside the scope of add_query_params().
>
> As for the duplicate handling, I've implemented a threefold strategy that
> should address all use cases raised before:
>
> ?def add_query_params(*args, **kwargs):
> ??? """
> ??? add_query_parms(url, [allow_dups, [args_dict, [separator]]], **kwargs)
>
> ??? Appends query parameters to an URL and returns the result.
>
> ??? :param url: the URL to update, a string.
> ??? :param allow_dups: if
> ??????? * True: plainly append new parameters, allowing all duplicates
> ????????? (default),
> ??????? * False: disallow duplicates in values and regroup keys so that
> ????????? different values for the same key are adjacent,
> ??????? * None: disallow duplicates in keys -- each key can have a single
> ????????? value and later values override the value (like dict.update()).

Unnamed flag parameters are unfriendly to the reader. If I see something like:

  add_query_params(url, True, dict(a=b, c=d))

I can pretty much guess what the first and third arguments are, but I
have no clue for the second. Even if I have read the documentation
before, I may not remember whether the middle argument is "allow_dups"
or "keep_dups".

Steve

> ??? :param args_dict: optional dictionary of parameters, default is {}.
> ??? :param separator: either ';' or '&', the separator between key-value
> ??????? pairs, default is '&'.
> ??? :param kwargs: parameters as keyword arguments.
>
> ??? :return: original URL with updated query parameters or the original URL
> ??????? unchanged if no parameters given.
> ??? """
>
> The commit is
>
> http://github.com/mrts/qparams/blob/b9bdbec46bf919d142ff63e6b2b822b5d57b6f89/qparams.py
>
> extensive description of the behaviour is in the doctests.

From steve at pearwood.info  Mon Apr 13 20:32:25 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 14 Apr 2009 04:32:25 +1000
Subject: [Python-Dev] [Email-SIG]  headers api for email package
In-Reply-To: <873accv5jr.fsf@xemacs.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<FD0E133D-4944-4FE6-B3FD-865947F48E2F@python.org>
	<873accv5jr.fsf@xemacs.org>
Message-ID: <200904140432.25953.steve@pearwood.info>

On Tue, 14 Apr 2009 03:15:20 am Stephen J. Turnbull wrote:

> *People* see email as (rich-)text.

We do?

It's not clear what you actually mean by "(rich-)text". In the context 
of email, I understand it to mean HTML in the body, web-bugs, security 
exploits, 36pt hot-pink bold text on a lime-green background, and all 
the other wonderful things modern mail clients let you put in your 
email. But as far as I know, no mail client tries to render HTML tags 
inside mail headers, so you're probably not talking about HTML 
rich-text. I guess you mean Unicode characters. Am I right?

Now, correct me if I'm wrong, but I don't think mail headers can 
actually be anything *but* bytes. I see that my mail client, at least, 
sends bytes in the Subject header. If I try to send characters, e.g. 
the subject header "Testing-?-" (without the quotes), what actually 
gets sent is the bytes "=?utf-8?q?Testing-=CE=B2-?=" (again without the 
quotation marks). This seems to be covered by RFC 2047:

http://tools.ietf.org/html/rfc2047

If you're proposing converting those bytes into characters, that's all 
very well and good, but what's your strategy for dealing with the 
inevitable wrongly-formatted headers? If the header can't be correctly 
decoded into text, there still needs to be a way to get to the raw 
bytes. Apart from (e.g.) mail processing apps like SpamBayes which will 
want to inspect the raw bytes, mail readers will need to deal with 
badly formatted mail. The RFC states:

"However, a mail reader MUST NOT prevent the display or handling of a 
message because an 'encoded-word' is incorrectly formed."

[...]
> Then MTAs see email as a string of octets. ?So guess what:
>
> ?> > bytes(message['Subject'])
>
> gives wire format. ?Yow! ?I think I'm just joking. ?Right?

Er, I'm not sure. Are you joking? I hope not, because it is important to 
be able to get to the raw, unmodified bytes that the MTA sees, without 
all the fancy processing you suggest.

[...]
> Otherwise, you should have a unicode, and you simply look
> at the range of the string. ?If it fits in ASCII, Bob's your uncle.
> If not, Bob's your aunt (and you use UTF-8).

Again, correct me if I'm wrong, but *all* valid mail headers must fit in 
ASCII. RFC 5335 defines an experimental approach to allowing full 
Unicode in mail headers, but surely it's going to be a while before 
that's common, let alone standard.

http://tools.ietf.org/html/rfc5335

-- 
Steven D'Aprano

From daniel at stutzbachenterprises.com  Mon Apr 13 20:42:41 2009
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Mon, 13 Apr 2009 13:42:41 -0500
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49E37416.5030802@v.loewis.de>
References: <loom.20090408T110540-221@post.gmane.org>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410025203.GA199@panix.com>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>
	<49E37416.5030802@v.loewis.de>
Message-ID: <eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com>

On Mon, Apr 13, 2009 at 12:19 PM, "Martin v. L?wis" <martin at v.loewis.de>wrote:

> > I use the json module in 2.6 to communicate with a C# JSON library and a
> > JavaScript JSON library.  The C# and JavaScript libraries produce and
> > consume the equivalent of str, not the equivalent of bytes.
>
> I assume there is a TCP connection between the json module and the
> C#/JavaScript libraries?
>

Yes, there's a TCP connection.  Sorry for not making that clear to begin
with.

I also sometimes store JSON objects in a database.  In that case, I pass
strings to the database API which stores them in a TEXT field.  Obviously
somewhere they get encoding to bytes, but that's handled by the database.

> If so, it doesn't matter what representation these implementations chose
> to use.

True, I can always convert from bytes to str or vise versa.  Sometimes it is
illustrative to see how others have chosen to solve the same problem.  The
JSON specification and other implementations serializes an object to a
string.  Python's json.dumps() needs to either return a str or let the user
specify an encoding.

At least one of these two needs to work:

json.dumps({}).encode('utf-16le')  # dumps() returns str
'{\x00}\x00'

json.dumps({}, encoding='utf-16le')  # dumps() returns bytes
'{\x00}\x00'

In 2.6, the first one works.  The second incorrectly returns '{}'.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090413/1eac0792/attachment.htm>

From foom at fuhm.net  Mon Apr 13 21:11:37 2009
From: foom at fuhm.net (James Y Knight)
Date: Mon, 13 Apr 2009 15:11:37 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <CBA855B3-2806-469D-A4A6-8AF279607A52@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<A286FA62-B1F0-4DB4-BC38-9D1E0F85A92A@fuhm.net>
	<CBA855B3-2806-469D-A4A6-8AF279607A52@python.org>
Message-ID: <25BB706E-C155-451B-AE18-7A8C83824FD6@fuhm.net>

On Apr 13, 2009, at 10:11 AM, Barry Warsaw wrote:
> The email package does not need a parser for every header, but it  
> should provide a framework that applications (or third party  
> libraries) can use to extend the built-in header parsers.  A bare  
> minimum for functionality requires a Content-Type parser.  I think  
> the email package should also include an address header (Originator,  
> Destination) parser, and a Message-ID header parser.  Possibly others.

Sure, that's fine...

> The default would probably be some unstructured parser for headers  
> like Subject.

But for unknown headers, it's not a useful choice to return a "str"  
object. "str" is just one possible structured data representation for  
a header: there's no correct useful decoding of all headers into str.  
Of course for the "Subject" header, str is the correct result type,  
but that's not a default, that's explicit support for "Subject". You  
can't correctly decode "To" into a str, so what makes you think you  
can decode "X-Gabazaborph" into str?

The only useful and correct representation for unknown (or  
unimplemented) headers is the raw bytes.

James

From martin at v.loewis.de  Mon Apr 13 22:02:17 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 13 Apr 2009 22:02:17 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>	
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>	
	<20090410025203.GA199@panix.com>	
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	
	<49DEBB21.70305@gmail.com>	
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>	
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>	
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>	
	<49E37416.5030802@v.loewis.de>
	<eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com>
Message-ID: <49E39A49.9070507@v.loewis.de>

> Yes, there's a TCP connection.  Sorry for not making that clear to begin
> with.
> 
>     If so, it doesn't matter what representation these implementations chose
>     to use.
> 
> 
> True, I can always convert from bytes to str or vise versa.

I think you are missing the point. It will not be necessary to convert.
You can write the JSON into the TCP connection in Python, and it will
come out just fine as strings just fine in C# and JavaScript. This
is how middleware works - it abstracts from programming languages, and
allows for different representations in different languages, in a
manner invisible to the participating processes.

> At least one of these two needs to work:
> 
> json.dumps({}).encode('utf-16le')  # dumps() returns str
> '{\x00}\x00'
> 
> json.dumps({}, encoding='utf-16le')  # dumps() returns bytes
> '{\x00}\x00'
> 
> In 2.6, the first one works.  The second incorrectly returns '{}'.

Ok, that might be a bug in the JSON implementation - but you shouldn't
be using utf-16le, anyway. Use UTF-8 always, and it will work fine.

The questions is: which of them is more appropriate, if, what you want,
is bytes. I argue that the second form is better, since it saves you
an encode invocation.

Regards,
Martin

From mrts.pydev at gmail.com  Mon Apr 13 22:14:50 2009
From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=)
Date: Mon, 13 Apr 2009 23:14:50 +0300
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>
	<49D09ECF.5090407@trueblade.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
Message-ID: <ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>

On Mon, Apr 13, 2009 at 8:23 PM, Steven Bethard
<steven.bethard at gmail.com> wrote:
>
> On Mon, Apr 13, 2009 at 2:29 AM, Mart S?mermaa <mrts.pydev at gmail.com> wrote:
> >
> >
> > On Mon, Apr 13, 2009 at 12:56 AM, Antoine Pitrou <solipsis at pitrou.net>
> > wrote:
> >>
> >> Mart S?mermaa <mrts.pydev <at> gmail.com> writes:
> >> >
> >> > Proposal: add add_query_params() for appending query parameters to an
> >> > URL to
> >> urllib.parse and urlparse.
> >>
> >> Is there anything to /remove/ a query parameter?
> >
> > I'd say this is outside the scope of add_query_params().
> >
> > As for the duplicate handling, I've implemented a threefold strategy that
> > should address all use cases raised before:
> >
> >  def add_query_params(*args, **kwargs):
> >     """
> >     add_query_parms(url, [allow_dups, [args_dict, [separator]]], **kwargs)
> >
> >     Appends query parameters to an URL and returns the result.
> >
> >     :param url: the URL to update, a string.
> >     :param allow_dups: if
> >         * True: plainly append new parameters, allowing all duplicates
> >           (default),
> >         * False: disallow duplicates in values and regroup keys so that
> >           different values for the same key are adjacent,
> >         * None: disallow duplicates in keys -- each key can have a single
> >           value and later values override the value (like dict.update()).
>
> Unnamed flag parameters are unfriendly to the reader. If I see something like:
>
>  add_query_params(url, True, dict(a=b, c=d))
>
> I can pretty much guess what the first and third arguments are, but I
> have no clue for the second. Even if I have read the documentation
> before, I may not remember whether the middle argument is "allow_dups"
> or "keep_dups".

Keyword arguments are already used for specifying the arguments to the
query, so naming can't be used. Someone may need an 'allow_dups' key
in their query and forget to pass it in params_dict.

A default behaviour should be found that works according to most
user's expectations so that they don't need to use the positional
arguments generally.

Antoine Pitrou wrote:
> You could e.g. rename the function to update_query_params() and decide that
> every parameter whose specified value is None must atcually be removed from
> the URL.

I agree that removing parameters is useful. Currently, None is used
for signifying a key with no value. Instead, booleans could be used:
if a key is True (but obviously not any other value that evaluates to
True), it is a key with no value, if False (under the same evaluation
restriction), it should be removed from the query if present. None
should not be treated specially under that scheme. As an example:

>>> update_query_params('http://example.com/?q=foo', q=False, a=True, b='c', d=None)
'http://example.com/?a&b=c&d=None'

However,
1) I'm not sure about the implications of 'foo is True', I have never
used it and PEP 8 explicitly warns against it -- does it work
consistently across different Python implementations? (Assuming on the
grounds that True should be a singleton no different from None that it
should work.)
2) the API gets overly complicated -- as per the complaint above, it's
usability-challenged already.

From bob at redivi.com  Mon Apr 13 22:28:26 2009
From: bob at redivi.com (Bob Ippolito)
Date: Mon, 13 Apr 2009 13:28:26 -0700
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49E39A49.9070507@v.loewis.de>
References: <loom.20090408T110540-221@post.gmane.org>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>
	<49E37416.5030802@v.loewis.de>
	<eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com>
	<49E39A49.9070507@v.loewis.de>
Message-ID: <6a36e7290904131328u6d4d3c20g6e12e0fd893523a2@mail.gmail.com>

On Mon, Apr 13, 2009 at 1:02 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Yes, there's a TCP connection. ?Sorry for not making that clear to begin
>> with.
>>
>> ? ? If so, it doesn't matter what representation these implementations chose
>> ? ? to use.
>>
>>
>> True, I can always convert from bytes to str or vise versa.
>
> I think you are missing the point. It will not be necessary to convert.
> You can write the JSON into the TCP connection in Python, and it will
> come out just fine as strings just fine in C# and JavaScript. This
> is how middleware works - it abstracts from programming languages, and
> allows for different representations in different languages, in a
> manner invisible to the participating processes.
>
>> At least one of these two needs to work:
>>
>> json.dumps({}).encode('utf-16le') ?# dumps() returns str
>> '{\x00}\x00'
>>
>> json.dumps({}, encoding='utf-16le') ?# dumps() returns bytes
>> '{\x00}\x00'
>>
>> In 2.6, the first one works. ?The second incorrectly returns '{}'.
>
> Ok, that might be a bug in the JSON implementation - but you shouldn't
> be using utf-16le, anyway. Use UTF-8 always, and it will work fine.
>
> The questions is: which of them is more appropriate, if, what you want,
> is bytes. I argue that the second form is better, since it saves you
> an encode invocation.

It's not a bug in dumps, it's a matter of not reading the
documentation. The encoding parameter of dumps decides how byte
strings should be interpreted, not what the output encoding is.

The output of json/simplejson dumps for Python 2.x is either an ASCII
bytestring (default) or a unicode string (when ensure_ascii=False).
This is very practical in 2.x because an ASCII bytestring can be
treated as either text or bytes in most situations, isn't going to get
mangled over any kind of encoding mismatch (as long as it's an ASCII
superset), and skips an encoding step if getting sent over the wire..

>>> simplejson.dumps(['\x00f\x00o\x00o'], encoding='utf-16be')
'["foo"]'
>>> simplejson.dumps(['\x00f\x00o\x00o'], encoding='utf-16be', ensure_ascii=False)
u'["foo"]'

-bob

From daniel at stutzbachenterprises.com  Mon Apr 13 22:32:17 2009
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Mon, 13 Apr 2009 15:32:17 -0500
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <6a36e7290904131328u6d4d3c20g6e12e0fd893523a2@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>
	<49E37416.5030802@v.loewis.de>
	<eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com>
	<49E39A49.9070507@v.loewis.de>
	<6a36e7290904131328u6d4d3c20g6e12e0fd893523a2@mail.gmail.com>
Message-ID: <eae285400904131332h432bd09fi7eadc0fb957dbede@mail.gmail.com>

On Mon, Apr 13, 2009 at 3:28 PM, Bob Ippolito <bob at redivi.com> wrote:

> It's not a bug in dumps, it's a matter of not reading the
> documentation. The encoding parameter of dumps decides how byte
> strings should be interpreted, not what the output encoding is.
>

You're right; I apologize for not reading more closely.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090413/856cbc91/attachment.htm>

From daniel at stutzbachenterprises.com  Mon Apr 13 23:25:28 2009
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Mon, 13 Apr 2009 16:25:28 -0500
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49E39A49.9070507@v.loewis.de>
References: <loom.20090408T110540-221@post.gmane.org>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>
	<49E37416.5030802@v.loewis.de>
	<eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com>
	<49E39A49.9070507@v.loewis.de>
Message-ID: <eae285400904131425t6c335753xaa85b3a5ac05c804@mail.gmail.com>

On Mon, Apr 13, 2009 at 3:02 PM, "Martin v. L?wis" <martin at v.loewis.de>wrote:

> > True, I can always convert from bytes to str or vise versa.
>
> I think you are missing the point. It will not be necessary to convert.

Sometimes I want bytes and sometimes I want str.  I am going to be
converting some of the time. ;-)

Below is a basic CGI application that assumes that json module works with
str, not bytes.  How would you write it if the json module does not support
returning a str?

print("Content-Type: application/json; charset=utf-8")
input_object = json.loads(sys.stdin.read())
output_object = do_some_work(input_object)
print(json.dumps(output_object))
print()

The questions is: which of them is more appropriate, if, what you want,
> is bytes. I argue that the second form is better, since it saves you
> an encode invocation.
>

If what you want is bytes, encoding has to happen somewhere.  If the json
module has some optimizations to do the encoding at the same time as the
serialization, great.  However, based on the original post of this thread,
it sounds like that code doesn't exist or doesn't work correctly.

What's the benefit of preventing users from getting a str out if that's what
they want?

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090413/7b9ab738/attachment.htm>

From greg.ewing at canterbury.ac.nz  Tue Apr 14 01:12:51 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Apr 2009 11:12:51 +1200
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
 3.1 (and urlparse in 2.7)
In-Reply-To: <loom.20090413T120051-834@post.gmane.org>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com>
	<ca471dc20903271926y61f16740h8c3f29e4a1e4c376@mail.gmail.com>
	<ad1f81530903300304m796e75dmc942d38c015e4fc6@mail.gmail.com>
	<49D09ECF.5090407@trueblade.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<loom.20090413T101627-359@post.gmane.org>
	<49E327A6.3000801@voidspace.org.uk>
	<loom.20090413T120051-834@post.gmane.org>
Message-ID: <49E3C6F3.9040400@canterbury.ac.nz>

Antoine Pitrou wrote:

> Say you are filtering or sorting data based on some URL parameters. If the user
> wants to remove one of those filters, you have to remove the corresponding query
> parameter.

For an application like that, I would be keeping the
parameters as a list or some other structured way and
only converting them to a URL when needed.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Tue Apr 14 01:27:42 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Apr 2009 11:27:42 +1200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <CBA855B3-2806-469D-A4A6-8AF279607A52@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<A286FA62-B1F0-4DB4-BC38-9D1E0F85A92A@fuhm.net>
	<CBA855B3-2806-469D-A4A6-8AF279607A52@python.org>
Message-ID: <49E3CA6E.1070501@canterbury.ac.nz>

Barry Warsaw wrote:
> The default 
> would probably be some unstructured parser for  headers like Subject.

Only for headers known to be unstructured, I think.
Completely unknown headers should be available only
as bytes.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Tue Apr 14 01:28:24 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Apr 2009 11:28:24 +1200
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <7DF370A6-88E4-4710-9CF8-B0B3D7249383@python.org>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
	<49DF8956.5050501@g.nevcal.com>
	<7DF370A6-88E4-4710-9CF8-B0B3D7249383@python.org>
Message-ID: <49E3CA98.3090504@canterbury.ac.nz>

Barry Warsaw wrote:
> For an  
> Originator or Destination address, what does str(header) return?

It should be an error, I think.

-- 
Greg

From alexandre at peadrop.com  Tue Apr 14 01:44:38 2009
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 13 Apr 2009 19:44:38 -0400
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <eae285400904131425t6c335753xaa85b3a5ac05c804@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<49DEBB21.70305@gmail.com> 
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de> 
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com> 
	<49E37416.5030802@v.loewis.de>
	<eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com> 
	<49E39A49.9070507@v.loewis.de>
	<eae285400904131425t6c335753xaa85b3a5ac05c804@mail.gmail.com>
Message-ID: <acd65fa20904131644x3d987278y36696371fc324c54@mail.gmail.com>

On Mon, Apr 13, 2009 at 5:25 PM, Daniel Stutzbach
<daniel at stutzbachenterprises.com> wrote:
> On Mon, Apr 13, 2009 at 3:02 PM, "Martin v. L?wis" <martin at v.loewis.de>
> wrote:
>>
>> > True, I can always convert from bytes to str or vise versa.
>>
>> I think you are missing the point. It will not be necessary to convert.
>
> Sometimes I want bytes and sometimes I want str.? I am going to be
> converting some of the time. ;-)
>
> Below is a basic CGI application that assumes that json module works with
> str, not bytes.? How would you write it if the json module does not support
> returning a str?
>
> print("Content-Type: application/json; charset=utf-8")
> input_object = json.loads(sys.stdin.read())
> output_object = do_some_work(input_object)
> print(json.dumps(output_object))
> print()
>

Like this?

print("Content-Type: application/json; charset=utf-8")
input_object = json.loads(sys.stdin.buffer.read())
output_object = do_some_work(input_object)
stdout.buffer.write(json.dumps(output_object))

-- Alexandre

From rdmurray at bitdance.com  Tue Apr 14 01:46:20 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Mon, 13 Apr 2009 19:46:20 -0400 (EDT)
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <49E3CA98.3090504@canterbury.ac.nz>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
	<49DF8956.5050501@g.nevcal.com>
	<7DF370A6-88E4-4710-9CF8-B0B3D7249383@python.org>
	<49E3CA98.3090504@canterbury.ac.nz>
Message-ID: <Pine.LNX.4.64.0904131945010.26362@kimball.webabinitio.net>

On Tue, 14 Apr 2009 at 11:28, Greg Ewing wrote:

> Barry Warsaw wrote:
>>  For an  Originator or Destination address, what does str(header) return?
>
> It should be an error, I think.

That doesn't make sense to me.  str(<arbitrary object>) should return
_something_.

--David

From greg.ewing at canterbury.ac.nz  Tue Apr 14 01:59:55 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Apr 2009 11:59:55 +1200
Subject: [Python-Dev] [Email-SIG]  Dropping bytes "support" in json
In-Reply-To: <Pine.LNX.4.64.0904131945010.26362@kimball.webabinitio.net>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com>
	<F40AE8EC-08CC-4634-AA82-264587552F47@python.org>
	<49DF8956.5050501@g.nevcal.com>
	<7DF370A6-88E4-4710-9CF8-B0B3D7249383@python.org>
	<49E3CA98.3090504@canterbury.ac.nz>
	<Pine.LNX.4.64.0904131945010.26362@kimball.webabinitio.net>
Message-ID: <49E3D1FB.9000205@canterbury.ac.nz>

R. David Murray wrote:

> That doesn't make sense to me.  str(<arbitrary object>) should return
> _something_.

Well, it might return something like "<AddressList
object at 0x123456>". But you shouldn't rely on it
to give you anything useful for an arbitrary header.

-- 
Greg

From solipsis at pitrou.net  Tue Apr 14 01:58:27 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 13 Apr 2009 23:58:27 +0000 (UTC)
Subject: [Python-Dev] Dropping bytes "support" in json
References: <loom.20090408T110540-221@post.gmane.org>
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>
	<49E37416.5030802@v.loewis.de>
	<eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com>
	<49E39A49.9070507@v.loewis.de>
	<6a36e7290904131328u6d4d3c20g6e12e0fd893523a2@mail.gmail.com>
Message-ID: <loom.20090413T235423-643@post.gmane.org>

Bob Ippolito <bob <at> redivi.com> writes:
> 
> The output of json/simplejson dumps for Python 2.x is either an ASCII
> bytestring (default) or a unicode string (when ensure_ascii=False).
> This is very practical in 2.x because an ASCII bytestring can be
> treated as either text or bytes in most situations, isn't going to get
> mangled over any kind of encoding mismatch (as long as it's an ASCII
> superset), and skips an encoding step if getting sent over the wire..

Which means that the json module already deals with text rather than bytes,
apart from the optimization that pure ASCII text is returned as 8-bit strings.

Regards

Antoine.

From greg.ewing at canterbury.ac.nz  Tue Apr 14 02:00:19 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Apr 2009 12:00:19 +1200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <acd65fa20904131644x3d987278y36696371fc324c54@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>
	<49E37416.5030802@v.loewis.de>
	<eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com>
	<49E39A49.9070507@v.loewis.de>
	<eae285400904131425t6c335753xaa85b3a5ac05c804@mail.gmail.com>
	<acd65fa20904131644x3d987278y36696371fc324c54@mail.gmail.com>
Message-ID: <49E3D213.4030801@canterbury.ac.nz>

Alexandre Vassalotti wrote:

>>print("Content-Type: application/json; charset=utf-8")
>>input_object = json.loads(sys.stdin.read())
>>output_object = do_some_work(input_object)
>>print(json.dumps(output_object))
>>print()

That assumes the encoding being used by stdout has
ascii as a subset.

-- 
Greg

From eric at trueblade.com  Tue Apr 14 02:05:34 2009
From: eric at trueblade.com (Eric Smith)
Date: Mon, 13 Apr 2009 20:05:34 -0400
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
Message-ID: <49E3D34E.8040705@trueblade.com>

Mark has uploaded our newest work to Rietveld, again at 
http://codereview.appspot.com/33084/show. Since the last version, Mark 
has added 387 support (and other fixes) and I've added localized 
formatting ('n') back in as well as ',' formatting for float and int. I 
think this addresses all open issues. If you have time, please review 
the code on Rietveld.

We believe we're ready to merge this back into the py3k branch. Pending 
any comments here or on Rietveld, we'll do the merge in the next day or so.

Before then, if anyone could build and test the py3k-short-float-repr 
branch on any of the following machines, that would be great:

Windows (preferably 64-bit)
Itanium
Old Intel/Linux (e.g., the snakebite nitrogen box)
Something bigendian, like a G4 Mac

We're pretty well tested on x86 Mac and Linux, and I've run it once on 
my Windows 32-bit machine.

I have a Snakebite account, and I'll try running on nitrogen once I 
figure out how to log in again.

I just had Itanium and PPC buildbots test our branch, and they both 
succeeded (or at least failed with errors not related to our changes).

Eric.

From benjamin at python.org  Tue Apr 14 02:54:29 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 13 Apr 2009 19:54:29 -0500
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <49E3D34E.8040705@trueblade.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
Message-ID: <1afaf6160904131754v414a855coff27921490de2a0a@mail.gmail.com>

2009/4/13 Eric Smith <eric at trueblade.com>:
> Mark has uploaded our newest work to Rietveld, again at
> http://codereview.appspot.com/33084/show. Since the last version, Mark has
> added 387 support (and other fixes) and I've added localized formatting
> ('n') back in as well as ',' formatting for float and int. I think this
> addresses all open issues. If you have time, please review the code on
> Rietveld.
>
> We believe we're ready to merge this back into the py3k branch. Pending any
> comments here or on Rietveld, we'll do the merge in the next day or so.

Cool. Will you use svnmerge.py to integrate the branch? After having
some odd behavior merging the io-c branch, suggest you just apply a
patch to the py3k branch,

-- 
Regards,
Benjamin

From eric at trueblade.com  Tue Apr 14 03:14:50 2009
From: eric at trueblade.com (Eric Smith)
Date: Mon, 13 Apr 2009 21:14:50 -0400
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <1afaf6160904131754v414a855coff27921490de2a0a@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>	
	<49E3D34E.8040705@trueblade.com>
	<1afaf6160904131754v414a855coff27921490de2a0a@mail.gmail.com>
Message-ID: <49E3E38A.6040204@trueblade.com>

Benjamin Peterson wrote:
> Cool. Will you use svnmerge.py to integrate the branch? After having
> some odd behavior merging the io-c branch, suggest you just apply a
> patch to the py3k branch,

We're just going to apply 2 patches, without using svnmerge. First we'll 
add new files and the configure changes. Once we're sure that builds 
everywhere, then the second step will actually hook in the new functions 
and will have the formatting changes.

From steven.bethard at gmail.com  Tue Apr 14 03:19:27 2009
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 13 Apr 2009 18:19:27 -0700
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<49D09ECF.5090407@trueblade.com>
	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
Message-ID: <d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>

On Mon, Apr 13, 2009 at 1:14 PM, Mart S?mermaa <mrts.pydev at gmail.com> wrote:
> On Mon, Apr 13, 2009 at 8:23 PM, Steven Bethard <steven.bethard at gmail.com> wrote:
>> On Mon, Apr 13, 2009 at 2:29 AM, Mart S?mermaa <mrts.pydev at gmail.com> wrote:
>> > As for the duplicate handling, I've implemented a threefold strategy that
>> > should address all use cases raised before:
>> >
>> > ?def add_query_params(*args, **kwargs):
>> > ? ? """
>> > ? ? add_query_parms(url, [allow_dups, [args_dict, [separator]]], **kwargs)
>> >
>> > ? ? Appends query parameters to an URL and returns the result.
>> >
>> > ? ? :param url: the URL to update, a string.
>> > ? ? :param allow_dups: if
>> > ? ? ? ? * True: plainly append new parameters, allowing all duplicates
>> > ? ? ? ? ? (default),
>> > ? ? ? ? * False: disallow duplicates in values and regroup keys so that
>> > ? ? ? ? ? different values for the same key are adjacent,
>> > ? ? ? ? * None: disallow duplicates in keys -- each key can have a single
>> > ? ? ? ? ? value and later values override the value (like dict.update()).
>>
>> Unnamed flag parameters are unfriendly to the reader. If I see something like:
>>
>> ?add_query_params(url, True, dict(a=b, c=d))
>>
>> I can pretty much guess what the first and third arguments are, but I
>> have no clue for the second. Even if I have read the documentation
>> before, I may not remember whether the middle argument is "allow_dups"
>> or "keep_dups".
>
> Keyword arguments are already used for specifying the arguments to the
> query, so naming can't be used. Someone may need an 'allow_dups' key
> in their query and forget to pass it in params_dict.
>
> A default behaviour should be found that works according to most
> user's expectations so that they don't need to use the positional
> arguments generally.

I believe the usual Python approach here is to have two variants of
the function, add_query_params and add_query_params_no_dups (or
whatever you want to name them). That way the flag parameter is
"named" right in the function name.

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From nad at acm.org  Tue Apr 14 04:07:57 2009
From: nad at acm.org (Ned Deily)
Date: Mon, 13 Apr 2009 19:07:57 -0700
Subject: [Python-Dev] Shorter float repr in Python 3.1?
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
Message-ID: <nad-D10AA9.19075613042009@news.gmane.org>

In article <49E3D34E.8040705 at trueblade.com>,
 Eric Smith <eric at trueblade.com> wrote:
> Before then, if anyone could build and test the py3k-short-float-repr 
> branch on any of the following machines, that would be great:
> 
[...]
> Something bigendian, like a G4 Mac

I'll crank up some OS X installer builds and run them on G3 and G4 Macs 
vs 32-/64- Intel.  Any tests of interest beyond the default regttest.py?

-- 
 Ned Deily,
 nad at acm.org

From martin at v.loewis.de  Tue Apr 14 04:40:10 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 14 Apr 2009 04:40:10 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <eae285400904131425t6c335753xaa85b3a5ac05c804@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>	
	<663162E3-D2EB-4417-93D0-4764BC94646C@python.org>	
	<49DEBB21.70305@gmail.com>	
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>	
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>	
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>	
	<49E37416.5030802@v.loewis.de>	
	<eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com>	
	<49E39A49.9070507@v.loewis.de>
	<eae285400904131425t6c335753xaa85b3a5ac05c804@mail.gmail.com>
Message-ID: <49E3F78A.7000307@v.loewis.de>

> Below is a basic CGI application that assumes that json module works
> with str, not bytes.  How would you write it if the json module does not
> support returning a str?

In a CGI application, you shouldn't be using sys.stdin or print().
Instead, you should be using sys.stdin.buffer (or sys.stdin.buffer.raw),
and sys.stdout.buffer.raw. A CGI script essentially does binary IO;
if you use TextIO, there likely will be bugs (e.g. if you have
attachments of type application/octet-stream).

> print("Content-Type: application/json; charset=utf-8")
> input_object = json.loads(sys.stdin.read())
> output_object = do_some_work(input_object)
> print(json.dumps(output_object))
> print()

out = sys.stdout.buffer.raw
out.write(b"Content-Type: application/json; charset=utf-8\n\n")
input_object = json.loads(sys.stdin.buffer.raw.read())
output_object = do_some_work(input_object)
out.write(json.dumps(output_object))

> What's the benefit of preventing users from getting a str out if that's
> what they want?

If they really want it, there is no benefit from preventing them.
I'm just puzzled why they want it, and what possible applications
might be where they want it. Perhaps they misunderstand something
when they think they want it.

Regards,
Martin

From stephen at xemacs.org  Tue Apr 14 09:00:59 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 14 Apr 2009 16:00:59 +0900
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <49E3CA6E.1070501@canterbury.ac.nz>
References: <loom.20090408T110540-221@post.gmane.org>
	<ca471dc20904081736j2d80d924p6b30bab66666625f@mail.gmail.com>
	<loom.20090409T043042-835@post.gmane.org>
	<86F681EB-2645-4C8C-B02F-06E9F4344139@python.org>
	<eae285400904090855n539cf97cx29dd25dbd1898470@mail.gmail.com>
	<07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org>
	<A286FA62-B1F0-4DB4-BC38-9D1E0F85A92A@fuhm.net>
	<CBA855B3-2806-469D-A4A6-8AF279607A52@python.org>
	<49E3CA6E.1070501@canterbury.ac.nz>
Message-ID: <87ocuzu3bo.fsf@xemacs.org>

Warning: Reply-To set to email-sig.

Greg Ewing writes:

 > Only for headers known to be unstructured, I think.
 > Completely unknown headers should be available only
 > as bytes.

Why do I get the feeling that you guys are feeling up an
elephant?<wink>

There are four things you might want to do with a header:

(1) Put it on the wire, which must be bytes (in fact, ASCII).
(2) Show it to a user (such as a rootin-tootin spam-fightin mail
    admin), which for consistency with well-behaved, implemented
    headers (ie, you might want to *gasp* *concatenate* your unknown
    header with a string), will sooner or later be string (ie,
    Unicode).
(3) (Try to) parse it, in which case an internal representation with
    some other structure may or may not be appropriate for storing the
    parsed data.
(4) Munge it, in which case an internal representation with some other
    structure may or may not be appropriate.

I see no particular reason for restricting these basic API classes for
any header.

From eric at trueblade.com  Tue Apr 14 10:45:28 2009
From: eric at trueblade.com (Eric Smith)
Date: Tue, 14 Apr 2009 04:45:28 -0400
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <nad-D10AA9.19075613042009@news.gmane.org>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
Message-ID: <49E44D28.3010500@trueblade.com>

Ned Deily wrote:
> In article <49E3D34E.8040705 at trueblade.com>,
>  Eric Smith <eric at trueblade.com> wrote:
>> Before then, if anyone could build and test the py3k-short-float-repr 
>> branch on any of the following machines, that would be great:
>>
> [...]
>> Something bigendian, like a G4 Mac
> 
> I'll crank up some OS X installer builds and run them on G3 and G4 Macs 
> vs 32-/64- Intel.  Any tests of interest beyond the default regttest.py?

Thanks! regrtest.py should be enough.

Eric.

From nad at acm.org  Tue Apr 14 10:45:51 2009
From: nad at acm.org (Ned Deily)
Date: Tue, 14 Apr 2009 01:45:51 -0700
Subject: [Python-Dev] Shorter float repr in Python 3.1?
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
Message-ID: <nad-DF327F.01455114042009@news.gmane.org>

In article <nad-D10AA9.19075613042009 at news.gmane.org>,
 Ned Deily <nad at acm.org> wrote:

> In article <49E3D34E.8040705 at trueblade.com>,
>  Eric Smith <eric at trueblade.com> wrote:
> > Before then, if anyone could build and test the py3k-short-float-repr 
> > branch on any of the following machines, that would be great:
> > 
> [...]
> > Something bigendian, like a G4 Mac
> 
> I'll crank up some OS X installer builds and run them on G3 and G4 Macs 
> vs 32-/64- Intel.  Any tests of interest beyond the default regttest.py?

FIrst attempt was a fat (32-bit i386 and ppc) build on 10.5 targeted for 
10.3 and above; this is the similar to recent python.org OSX installers.  
The good news: on 10.5 i386, running the default regrtest, no signficant 
differences were noted from an installer built from the current main 
py3k head.

Bad news: the same build installed on a G4 running 10.5 hung hard in 
test_pow of test_builtin; a kill was needed to terminate python.  Same 
results on a G3 running 10.4. 

nad at pbg4:/Library/Frameworks/Python.framework/Versions/3.1$ bin/python 
-S lib/python3.1/test/regrtest.py -s -v test_builtin
test_builtin
test_abs (test.test_builtin.BuiltinTest) ... ok
test_all (test.test_builtin.BuiltinTest) ... ok
test_any (test.test_builtin.BuiltinTest) ... ok
test_ascii (test.test_builtin.BuiltinTest) ... ok
test_bin (test.test_builtin.BuiltinTest) ... ok
test_callable (test.test_builtin.BuiltinTest) ... ok
test_chr (test.test_builtin.BuiltinTest) ... ok
test_cmp (test.test_builtin.BuiltinTest) ... ok
test_compile (test.test_builtin.BuiltinTest) ... ok
test_delattr (test.test_builtin.BuiltinTest) ... ok
test_dir (test.test_builtin.BuiltinTest) ... ok
test_divmod (test.test_builtin.BuiltinTest) ... ok
test_eval (test.test_builtin.BuiltinTest) ... ok
test_exec (test.test_builtin.BuiltinTest) ... ok
test_exec_redirected (test.test_builtin.BuiltinTest) ... ok
test_filter (test.test_builtin.BuiltinTest) ... ok
test_general_eval (test.test_builtin.BuiltinTest) ... ok
test_getattr (test.test_builtin.BuiltinTest) ... ok
test_hasattr (test.test_builtin.BuiltinTest) ... ok
test_hash (test.test_builtin.BuiltinTest) ... ok
test_hex (test.test_builtin.BuiltinTest) ... ok
test_id (test.test_builtin.BuiltinTest) ... ok
test_import (test.test_builtin.BuiltinTest) ... ok
test_input (test.test_builtin.BuiltinTest) ... ok
test_isinstance (test.test_builtin.BuiltinTest) ... ok
test_issubclass (test.test_builtin.BuiltinTest) ... ok
test_iter (test.test_builtin.BuiltinTest) ... ok
test_len (test.test_builtin.BuiltinTest) ... ok
test_map (test.test_builtin.BuiltinTest) ... ok
test_max (test.test_builtin.BuiltinTest) ... ok
test_min (test.test_builtin.BuiltinTest) ... ok
test_neg (test.test_builtin.BuiltinTest) ... ok
test_next (test.test_builtin.BuiltinTest) ... ok
test_oct (test.test_builtin.BuiltinTest) ... ok
test_open (test.test_builtin.BuiltinTest) ... ok
test_ord (test.test_builtin.BuiltinTest) ... ok
test_pow (test.test_builtin.BuiltinTest) ... ^CTerminated

Stepping through some of test_pow from the interactive interpreter:

Python 3.1a2+ (py3k-short-float-repr, Apr 13 2009, 20:55:35) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> pow(0,0)
1                    <-- OK
>>> pow(2,30)
1073741824  <-- OK
>>> pow(0.,0)
^C^CTerminated   <-- float argument => python hung in CPU loop, killed

Then I tried a couple of random floats:

Python 3.1a2+ (py3k-short-float-repr, Apr 13 2009, 20:55:35) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 3.1
-9.255965342383856e+61
>>> 1.
^C
Terminated  <-- kill needed

The same tests work fine on the intel Mac.

Just out of curiosity, I'll try to do the same build on the 10.4 ppc;  
there are occasionally a few differences noted in the build results.  
That won't be available until later today.

-- 
 Ned Deily,
 nad at acm.org

From l.mastrodomenico at gmail.com  Tue Apr 14 10:54:04 2009
From: l.mastrodomenico at gmail.com (Lino Mastrodomenico)
Date: Tue, 14 Apr 2009 10:54:04 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <eae285400904131425t6c335753xaa85b3a5ac05c804@mail.gmail.com>
References: <loom.20090408T110540-221@post.gmane.org>
	<49DEBB21.70305@gmail.com>
	<20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com>
	<grocad$e1c$1@ger.gmane.org> <49E00931.6050107@v.loewis.de>
	<eae285400904130911y79d4e3c0m21e69370ac1f9445@mail.gmail.com>
	<49E37416.5030802@v.loewis.de>
	<eae285400904131142u31652e19jb5b4a68c5f4241ea@mail.gmail.com>
	<49E39A49.9070507@v.loewis.de>
	<eae285400904131425t6c335753xaa85b3a5ac05c804@mail.gmail.com>
Message-ID: <cc93256f0904140154h4c16f4dem8e3a1af45054b131@mail.gmail.com>

2009/4/13 Daniel Stutzbach <daniel at stutzbachenterprises.com>:
> print("Content-Type: application/json; charset=utf-8")

Please don't do that! According to RFC 4627 the "charset" parameter is
not allowed for the application/json media type.

Just use "Content-Type: application/json", the charset is only
misleading because even if you specify, e.g., ISO-8859-1 a
standard-compliant receiver will probably still try to interpret the
content as UTF-8/16/32.

OTOH a charset can be used if you send JSON with an
application/javascript MIME type.

-- 
Lino Mastrodomenico

From dickinsm at gmail.com  Tue Apr 14 12:31:21 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 14 Apr 2009 11:31:21 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <nad-DF327F.01455114042009@news.gmane.org>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
Message-ID: <5c6f2a5d0904140331u33b29136o55ad0286a335326e@mail.gmail.com>

On Tue, Apr 14, 2009 at 9:45 AM, Ned Deily <nad at acm.org> wrote:
>  Ned Deily <nad at acm.org> wrote:
>>  Eric Smith <eric at trueblade.com> wrote:
>> > Before then, if anyone could build and test the py3k-short-float-repr
>> > branch on any of the following machines, that would be great:
>> >
>> [...]
>> > Something bigendian, like a G4 Mac
>>
>> I'll crank up some OS X installer builds and run them on G3 and G4 Macs
>> vs 32-/64- Intel.  Any tests of interest beyond the default regttest.py?

Ned, many thanks for doing this!

> Then I tried a couple of random floats:
>
> Python 3.1a2+ (py3k-short-float-repr, Apr 13 2009, 20:55:35)
> [GCC 4.0.1 (Apple Inc. build 5490)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>> 3.1
> -9.255965342383856e+61
>>>> 1.
> ^C
> Terminated  <-- kill needed

Cool!  I suspect endianness issues.  As evidence, I present:

>>> list(struct.pack('<d', 3.1))
[205, 204, 204, 204, 204, 204, 8, 64]
>>> list(struct.pack('<d', -9.255965342383856e+61))
[204, 204, 8, 64, 205, 204, 204, 204]

I'm guessing that the problem is that when you build on
Intel, the configure script detects a little-endian machine,
and Gay's code then uses the little-endian defines
throughout, both for PPC and Intel.

I don't know any sensible way to fix this.

But I'd expect that there are already similar issues
with a 'fat' build of py3k on OS X.  After all, there's
already a 'WORDS_BIGENDIAN' in pyconfig.h.in. I
don't know where this is used.

Mark

From eric at trueblade.com  Tue Apr 14 12:34:16 2009
From: eric at trueblade.com (Eric Smith)
Date: Tue, 14 Apr 2009 06:34:16 -0400
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <nad-DF327F.01455114042009@news.gmane.org>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>	<49E3D34E.8040705@trueblade.com>	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
Message-ID: <49E466A8.9050306@trueblade.com>

Ned Deily wrote:
>> I'll crank up some OS X installer builds and run them on G3 and G4 Macs 
>> vs 32-/64- Intel.  Any tests of interest beyond the default regttest.py?
> 
> FIrst attempt was a fat (32-bit i386 and ppc) build on 10.5 targeted for 
> 10.3 and above; this is the similar to recent python.org OSX installers.  
> The good news: on 10.5 i386, running the default regrtest, no signficant 
> differences were noted from an installer built from the current main 
> py3k head.

Okay, that's awesome. Thanks.

> Bad news: the same build installed on a G4 running 10.5 hung hard in 
> test_pow of test_builtin; a kill was needed to terminate python.  Same 
> results on a G3 running 10.4. 

Okay, that's less than awesome. But still a huge thanks.

> Then I tried a couple of random floats:
> 
> Python 3.1a2+ (py3k-short-float-repr, Apr 13 2009, 20:55:35) 
> [GCC 4.0.1 (Apple Inc. build 5490)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>> 3.1
> -9.255965342383856e+61
>>>> 1.
> ^C
> Terminated  <-- kill needed

I don't suppose it's possible that you could run this under gdb and get 
a stack trace when it starts looping (assuming that's what's happening)?

I think I might have a PPC Mac Mini I can get my hands on, and I'll test 
there if possible.

Eric.

From dickinsm at gmail.com  Tue Apr 14 12:37:38 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 14 Apr 2009 11:37:38 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <5c6f2a5d0904140331u33b29136o55ad0286a335326e@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140331u33b29136o55ad0286a335326e@mail.gmail.com>
Message-ID: <5c6f2a5d0904140337m394239f2w617488b18e41a198@mail.gmail.com>

By the way, a simple native build on OS X 10.4/PPC passed all tests (that
we're already failing before).

Mark

From dickinsm at gmail.com  Tue Apr 14 12:42:09 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 14 Apr 2009 11:42:09 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <5c6f2a5d0904140337m394239f2w617488b18e41a198@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140331u33b29136o55ad0286a335326e@mail.gmail.com>
	<5c6f2a5d0904140337m394239f2w617488b18e41a198@mail.gmail.com>
Message-ID: <5c6f2a5d0904140342s1567cdefyd9c1d9ddab089192@mail.gmail.com>

On Tue, Apr 14, 2009 at 11:37 AM, Mark Dickinson <dickinsm at gmail.com> wrote:
> By the way, a simple native build on OS X 10.4/PPC passed all tests (that
> we're already failing before).

s/we're/weren't

From solipsis at pitrou.net  Tue Apr 14 12:44:13 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 14 Apr 2009 10:44:13 +0000 (UTC)
Subject: [Python-Dev] Shorter float repr in Python 3.1?
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140331u33b29136o55ad0286a335326e@mail.gmail.com>
Message-ID: <loom.20090414T104218-16@post.gmane.org>

Mark Dickinson <dickinsm <at> gmail.com> writes:
> 
> But I'd expect that there are already similar issues
> with a 'fat' build of py3k on OS X.  After all, there's
> already a 'WORDS_BIGENDIAN' in pyconfig.h.in. I
> don't know where this is used.

It's used e.g. in unicode encoding/decoding, and in the IO lib.
If that constant can't take different values depending on the CPU arch, we have
a big problem.

Regards

Antoine.

From dickinsm at gmail.com  Tue Apr 14 14:40:36 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 14 Apr 2009 13:40:36 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <nad-DF327F.01455114042009@news.gmane.org>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
Message-ID: <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>

On Tue, Apr 14, 2009 at 9:45 AM, Ned Deily <nad at acm.org> wrote:
> FIrst attempt was a fat (32-bit i386 and ppc) build on 10.5 targeted for
> 10.3 and above; this is the similar to recent python.org OSX installers.

What's the proper way to create such a build?  I've been trying:

./configure --with-universal-archs=32-bit --enable-framework
--enable-universalsdk=/ MACOSX_DEPLOYMENT_TARGET=10.5

but the configure AC_C_BIGENDIAN macro doesn't seem to pick up
on the universality:  the output from ./configure contains the line:

checking whether byte ordering is bigendian... no

I was expecting a "... universal" instead of "... no".

>From reading the autoconf manual, it seems as though AC_C_BIGENDIAN
knows some magic to make things work for universal builds; it ought to be
possible to imitate that magic somehow.

Mark

From solipsis at pitrou.net  Tue Apr 14 16:42:31 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 14 Apr 2009 14:42:31 +0000 (UTC)
Subject: [Python-Dev] UTF-8 Decoder
References: <20090413080908.GM13110@nexus.in-nomine.org>
Message-ID: <loom.20090414T143924-906@post.gmane.org>

Jeroen Ruigrok van der Werven <asmodai <at> in-nomine.org> writes:
> 
> This got posted on the Unicode list, does it seem interesting for Python
> itself, the UTF-8 to UTF-16 transcoding might be?
> 
> http://bjoern.hoehrmann.de/utf-8/decoder/dfa/

If you have some time on your hands, you could try benchmarking it against
Python 3.1's (py3k) decoder. There are two cases to consider:
- mostly non-ASCII input, such as the "utf-8 demo" file mentioned in the page 
above
- mostly ASCII input, such as will happen very often (think HTML, XML, log
files, etc.)

The py3k utf-8 decoder is optimized for the latter.

Regards

Antoine.

From mal at egenix.com  Tue Apr 14 17:02:39 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 14 Apr 2009 17:02:39 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090407174355.B62983A4063@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<49D66C6E.3090602@v.loewis.de>
	<49DB475B.8060504@egenix.com>	<20090407140317.EBD383A4063@sparrow.telecommunity.com>	<49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
Message-ID: <49E4A58F.70309@egenix.com>

On 2009-04-07 19:46, P.J. Eby wrote:
> At 04:58 PM 4/7/2009 +0200, M.-A. Lemburg wrote:
>> On 2009-04-07 16:05, P.J. Eby wrote:
>> > At 02:30 PM 4/7/2009 +0200, M.-A. Lemburg wrote:
>> >> >> Wouldn't it be better to stick with a simpler approach and look for
>> >> >> "__pkg__.py" files to detect namespace packages using that O(1)
>> >> check ?
>> >> >
>> >> > Again - this wouldn't be O(1). More importantly, it breaks system
>> >> > packages, which now again have to deal with the conflicting file
>> names
>> >> > if they want to install all portions into a single location.
>> >>
>> >> True, but since that means changing the package infrastructure, I
>> think
>> >> it's fair to ask distributors who want to use that approach to also
>> take
>> >> care of looking into the __pkg__.py files and merging them if
>> >> necessary.
>> >>
>> >> Most of the time the __pkg__.py files will be empty, so that's not
>> >> really much to ask for.
>> >
>> > This means your proposal actually doesn't add any benefit over the
>> > status quo, where you can have an __init__.py that does nothing but
>> > declare the package a namespace.  We already have that now, and it
>> > doesn't need a new filename.  Why would we expect OS vendors to start
>> > supporting it, just because we name it __pkg__.py instead of
>> __init__.py?
>>
>> I lost you there.
>>
>> Since when do we support namespace packages in core Python without
>> the need to add some form of magic support code to __init__.py ?
>>
>> My suggestion basically builds on the same idea as Martin's PEP,
>> but uses a single __pkg__.py file as opposed to some non-Python
>> file yaddayadda.pkg.
> 
> Right... which completely obliterates the primary benefit of the
> original proposal compared to the status quo.  That is, that the PEP 382
> way is more compatible with system packaging tools.
> 
> Without that benefit, there's zero gain in your proposal over having
> __init__.py files just call pkgutil.extend_path() (in the stdlib since
> 2.3, btw) or pkg_resources.declare_namespace() (similar functionality,
> but with zipfile support and some other niceties).
> 
> IOW, your proposal doesn't actually improve the status quo in any way
> that I am able to determine, except that it calls for loading all the
> __pkg__.py modules, rather than just the first one.  (And the setuptools
> implementation of namespace packages actually *does* load multiple
> __init__.py's, so that's still no change over the status quo for
> setuptools-using packages.)

The purpose of the PEP is to create a standard for namespace packages.
That's orthogonal to trying to enhance or change some existing
techniques.

I don't see the emphasis in the PEP on Linux distribution support and the
remote possibility of them wanting to combine separate packages back
into one package as good argument for adding yet another separate hierarchy
of special files which Python scans during imports.

That said, note that most distributions actually take the other route:
they try to split up larger packages into smaller ones, so the argument
becomes even weaker.

It is much more important to standardize the approach than to try
to extend some existing trickery and make them even more opaque than
they already are by introducing yet another level of complexity.

My alternative approach builds on existing methods and fits nicely
with the __init__.py approach Python has already been using for more
than a decade now. It's transparent, easy to understand and provides
enough functionality to build upon - much like the original __init__.py
idea.

I've already laid out the arguments for and against it in my
previous reply, so won't repeat them here.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 14 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From mal at egenix.com  Tue Apr 14 17:17:07 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 14 Apr 2009 17:17:07 +0200
Subject: [Python-Dev] Adding new features to Python 2.x
In-Reply-To: <ca471dc20904070919m1bd08dbdj259eef076a3d7319@mail.gmail.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com>
	<49DB4624.604@egenix.com>
	<ca471dc20904070919m1bd08dbdj259eef076a3d7319@mail.gmail.com>
Message-ID: <49E4A8F3.7010202@egenix.com>

On 2009-04-07 18:19, Guido van Rossum wrote:
> On Tue, Apr 7, 2009 at 5:25 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 2009-04-06 15:21, Jesse Noller wrote:
>>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> On 2009-04-02 17:32, Martin v. L?wis wrote:
>>>>> I propose the following PEP for inclusion to Python 3.1.
>>>> Thanks for picking this up.
>>>>
>>>> I'd like to extend the proposal to Python 2.7 and later.
>>>>
>>> -1 to adding it to the 2.x series. There was much discussion around
>>> adding features to 2.x *and* 3.0, and the consensus seemed to *not*
>>> add new features to 2.x and use those new features as carrots to help
>>> lead people into 3.0.
>> I must have missed that discussion :-)
>>
>> Where's the PEP pinning this down ?
>>
>> The Python 2.x user base is huge and the number of installed
>> applications even larger.
>>
>> Cutting these users and application developers off of important new
>> features added to Python 3 is only going to work as "carrot" for
>> those developers who:
>>
>>  * have enough resources (time, money, manpower) to port their existing
>>   application to Python 3
>>
>>  * can persuade their users to switch to Python 3
>>
>>  * don't rely much on 3rd party libraries (the bread and butter
>>   of Python applications)
>>
>> Realistically, such a porting effort is not likely going to happen
>> for any decent sized application, except perhaps a few open source
>> ones.
>>
>> Such a policy would then translate to a dead end for Python 2.x
>> based applications.
> 
> Think of the advantages though! Python 2 will finally become *stable*.
> The group of users you are talking to are usually balking at the
> thought of upgrading from 2.x to 2.(x+1) just as much as they might
> balk at the thought of Py3k. We're finally giving them what they
> really want.

Python 2.x is stable - much more than 3.x is today. However, stable
does not mean zero development, which a "No new features in Python 2.x"
policy would translate to.

If there are core developers that care about 2.x, then it should be
possible for them to add the necessary patches to future 2.x releases.

> Regarding calling this a dead end, we're committed to supporting 2.x
> for at least five years. If that's not enough, well, it's open source,
> so there's no reason why some group of rogue 2.x fans can't maintain
> it indefinitely after that.

Sure, but why can't this be done within the existing Python
developer community ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 14 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From dickinsm at gmail.com  Tue Apr 14 18:09:35 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 14 Apr 2009 17:09:35 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
Message-ID: <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>

Okay, I think I might have fixed up the float endianness detection for
universal builds on OS X.  Ned, any chance you could give this
another try with an updated version of the py3k-short-float-repr branch?

One thing I don't understand:

Is it true that to produce a working universal/fat build of Python,
one has to first regenerate configure and pyconfig.h.in using autoconf
version >= 2.62?  If not, then I don't understand how the
AC_C_BIGENDIAN autoconf macro can be giving the right results.

Mark

From solipsis at pitrou.net  Tue Apr 14 18:14:32 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 14 Apr 2009 16:14:32 +0000 (UTC)
Subject: [Python-Dev] Shorter float repr in Python 3.1?
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
Message-ID: <loom.20090414T161327-708@post.gmane.org>

Mark Dickinson <dickinsm <at> gmail.com> writes:
> 
> Okay, I think I might have fixed up the float endianness detection for
> universal builds on OS X.  Ned, any chance you could give this
> another try with an updated version of the py3k-short-float-repr branch?

If this approach is sane, could it be adopted for all other instances of
endianness detection in the py3k code base?
Has anyone tested a recent py3k using universal builds? Do all tests pass?

From pje at telecommunity.com  Tue Apr 14 18:27:51 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Tue, 14 Apr 2009 12:27:51 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49E4A58F.70309@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com>
	<20090407140317.EBD383A4063@sparrow.telecommunity.com>
	<49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
Message-ID: <20090414162603.70C843A4100@sparrow.telecommunity.com>

At 05:02 PM 4/14/2009 +0200, M.-A. Lemburg wrote:
>I don't see the emphasis in the PEP on Linux distribution support and the
>remote possibility of them wanting to combine separate packages back
>into one package as good argument for adding yet another separate hierarchy
>of special files which Python scans during imports.
>
>That said, note that most distributions actually take the other route:
>they try to split up larger packages into smaller ones, so the argument
>becomes even weaker.

I think you've misunderstood something about the use case.  System 
packaging tools don't like separate packages to contain the *same 
file*.  That means that they *can't* split a larger package up with 
your proposal, because every one of those packages would have to 
contain a __pkg__.py -- and thus be in conflict with each 
other.  Either that, or they would have to make a separate system 
package containing *only* the __pkg__.py, and then make all packages 
using the namespace depend on it -- which is more work and requires 
greater co-ordination among packagers.

Allowing each system package to contain its own .pkg or .nsp or 
whatever files, on the other hand, allows each system package to be 
built independently, without conflict between contents (i.e., having 
the same file), and without requiring a special pseudo-package to 
contain the additional file.

Also, executing multiple __pkg__.py files means that when multiple 
system packages are installed to site-packages, only one of them 
could possibly be executed.  (Note that, even though the system 
packages themselves are not "combined", in practice they will all be 
installed to the same directory, i.e., site-packages or the platform 
equivalent thereof.)

From dickinsm at gmail.com  Tue Apr 14 18:30:18 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 14 Apr 2009 17:30:18 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <loom.20090414T161327-708@post.gmane.org>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
	<loom.20090414T161327-708@post.gmane.org>
Message-ID: <5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9@mail.gmail.com>

On Tue, Apr 14, 2009 at 5:14 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> If this approach is sane, could it be adopted for all other instances of
> endianness detection in the py3k code base?

I think everything else is fine:  float endianness detection (for marshal,
pickle, struct) is done at runtime. Integer endianness detection goes
via AC_C_BIGENDIAN, which understands universal builds---but only
for autoconf >= 2.62.

> Has anyone tested a recent py3k using universal builds? Do all tests pass?

Do you know the right way to create a universal build?  If so, I'm in a position
to test on 32-bit PPC, 32-bit Intel and 64-bit Intel.

Mark

From solipsis at pitrou.net  Tue Apr 14 18:49:19 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 14 Apr 2009 16:49:19 +0000 (UTC)
Subject: [Python-Dev] Shorter float repr in Python 3.1?
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
	<loom.20090414T161327-708@post.gmane.org>
	<5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9@mail.gmail.com>
Message-ID: <loom.20090414T164847-266@post.gmane.org>

Mark Dickinson <dickinsm <at> gmail.com> writes:
> 
> > Has anyone tested a recent py3k using universal builds? Do all tests pass?
> 
> Do you know the right way to create a universal build? 

Not at all, sorry.

Regards

Antoine.

From dickinsm at gmail.com  Tue Apr 14 18:52:23 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 14 Apr 2009 17:52:23 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <loom.20090414T164847-266@post.gmane.org>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
	<loom.20090414T161327-708@post.gmane.org>
	<5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9@mail.gmail.com>
	<loom.20090414T164847-266@post.gmane.org>
Message-ID: <5c6f2a5d0904140952sc82b8d8x88bc9a77d4dc340e@mail.gmail.com>

On Tue, Apr 14, 2009 at 5:49 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Mark Dickinson <dickinsm <at> gmail.com> writes:
>> Do you know the right way to create a universal build?
>
> Not at all, sorry.

No problem :). I might try asking on the pythonmac-sig list.

Mark

From nad at acm.org  Tue Apr 14 19:19:32 2009
From: nad at acm.org (Ned Deily)
Date: Tue, 14 Apr 2009 10:19:32 -0700
Subject: [Python-Dev] Shorter float repr in Python 3.1?
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
	<loom.20090414T161327-708@post.gmane.org>
Message-ID: <nad-F612ED.10193214042009@news.gmane.org>

In article <loom.20090414T161327-708 at post.gmane.org>,
 Antoine Pitrou <solipsis at pitrou.net> wrote:
> Has anyone tested a recent py3k using universal builds? Do all tests pass?

It's done all the time.  All of the current released installers (2.5, 
2.6, 3.0) are 2-way (i386, ppc) universal and we occasionally test all 
of the current lines (2.6, trunk, 3.0, 3.1) as 4-way (i386, ppc, x86_64, 
ppc64), although the ppc64 has had no testing recently.

-- 
 Ned Deily,
 nad at acm.org

From nad at acm.org  Tue Apr 14 19:22:12 2009
From: nad at acm.org (Ned Deily)
Date: Tue, 14 Apr 2009 10:22:12 -0700
Subject: [Python-Dev] Shorter float repr in Python 3.1?
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
Message-ID: <nad-76C03D.10221214042009@news.gmane.org>

In article 
<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8 at mail.gmail.com>,
 Mark Dickinson <dickinsm at gmail.com> wrote:

> Okay, I think I might have fixed up the float endianness detection for
> universal builds on OS X.  Ned, any chance you could give this
> another try with an updated version of the py3k-short-float-repr branch?

Not looking good.   Appears to be same behavior on the G4 with 10.5 
(haven't tried the G3 yet).

-- 
 Ned Deily,
 nad at acm.org

From nad at acm.org  Tue Apr 14 19:32:32 2009
From: nad at acm.org (Ned Deily)
Date: Tue, 14 Apr 2009 10:32:32 -0700
Subject: [Python-Dev] Shorter float repr in Python 3.1?
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
	<loom.20090414T161327-708@post.gmane.org>
	<5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9@mail.gmail.com>
Message-ID: <nad-745997.10323214042009@news.gmane.org>

In article 
<5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9 at mail.gmail.com>,
 Mark Dickinson <dickinsm at gmail.com> wrote:
> Do you know the right way to create a universal build?  If so, I'm in a 
> position
> to test on 32-bit PPC, 32-bit Intel and 64-bit Intel.

The OSX installer script is in Mac/BuildScript/build-installer.py.

For 2-way builds, it essentially does:

export MACOSX_DEPLOYMENT_TARGET=10.3
configure -C --enable-framework
   --enable-universalsdk=/Developer/SDKs/MacOSX10.4u.sdk
   --with-universal-archs='32-bit' --with-computed-gotos OPT='-g -O3'

and for 4-way:

export MACOSX_DEPLOYMENT_TARGET=10.5
configure -C --enable-framework
   --enable-universalsdk=/Developer/SDKs/MacOSX10.5.sdk
   --with-universal-archs='all' --with-computed-gotos OPT='-g -O3'

-- 
 Ned Deily,
 nad at acm.org

From martin at v.loewis.de  Tue Apr 14 19:55:36 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 14 Apr 2009 19:55:36 +0200
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>	<49E3D34E.8040705@trueblade.com>	<nad-D10AA9.19075613042009@news.gmane.org>	<nad-DF327F.01455114042009@news.gmane.org>	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
Message-ID: <49E4CE18.8070109@v.loewis.de>

> Is it true that to produce a working universal/fat build of Python,
> one has to first regenerate configure and pyconfig.h.in using autoconf
> version >= 2.62?  If not, then I don't understand how the
> AC_C_BIGENDIAN autoconf macro can be giving the right results.

The outcome of AC_C_BIGENDIAN isn't used on OSX. Depending on the exact
version you look at, things might work differently; in trunk,
Include/pymacconfig.h should be used, which does

#if defined(__APPLE__)
# undef WORDS_BIGENDIAN
#ifdef __BIG_ENDIAN__
#define WORDS_BIGENDIAN 1
#endif /* __BIG_ENDIAN */
#endif

Earlier versions included that ifdef block directly in pyconfig.h.in.

In case it isn't clear how this works: GCC predefines __BIG_ENDIAN__
on PPC but not on x86; for universal binaries, two (or more) separate
preprocessor (and compiler runs) are done.

HTH,
Martin

From martin at v.loewis.de  Tue Apr 14 19:56:53 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 14 Apr 2009 19:56:53 +0200
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <loom.20090414T161327-708@post.gmane.org>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>	<49E3D34E.8040705@trueblade.com>	<nad-D10AA9.19075613042009@news.gmane.org>	<nad-DF327F.01455114042009@news.gmane.org>	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
	<loom.20090414T161327-708@post.gmane.org>
Message-ID: <49E4CE65.5080503@v.loewis.de>

> If this approach is sane, could it be adopted for all other instances of
> endianness detection in the py3k code base?

Don't worry - the approach that we already take is already sane, so no
further changes are needed.

Regards,
Martin

From dickinsm at gmail.com  Tue Apr 14 20:27:29 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 14 Apr 2009 19:27:29 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <49E4CE18.8070109@v.loewis.de>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
	<49E4CE18.8070109@v.loewis.de>
Message-ID: <5c6f2a5d0904141127i2089d6b6n37dc1cadbbec23fe@mail.gmail.com>

On Tue, Apr 14, 2009 at 6:55 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> The outcome of AC_C_BIGENDIAN isn't used on OSX. Depending on the exact
> version you look at, things might work differently; in trunk,
> Include/pymacconfig.h should be used [...]

Many thanks---that was the missing piece of the puzzle.  I think I
understand how to make things work now.

Mark

From dickinsm at gmail.com  Tue Apr 14 20:30:23 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 14 Apr 2009 19:30:23 +0100
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <nad-745997.10323214042009@news.gmane.org>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
	<loom.20090414T161327-708@post.gmane.org>
	<5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9@mail.gmail.com>
	<nad-745997.10323214042009@news.gmane.org>
Message-ID: <5c6f2a5d0904141130l528bca3cr6eb2c6213d79cc9e@mail.gmail.com>

On Tue, Apr 14, 2009 at 6:32 PM, Ned Deily <nad at acm.org> wrote:
> The OSX installer script is in Mac/BuildScript/build-installer.py.
>
> For 2-way builds, it essentially does:
>
> export MACOSX_DEPLOYMENT_TARGET=10.3
> configure -C --enable-framework
> ? --enable-universalsdk=/Developer/SDKs/MacOSX10.4u.sdk
> ? --with-universal-archs='32-bit' --with-computed-gotos OPT='-g -O3'

Great---thank you!  And thank you for all the testing.

I'll try to sort all this out later this evening (GMT+1);  I think I
understand how to fix everything now.

Mark

From jbaker at zyasoft.com  Tue Apr 14 20:30:34 2009
From: jbaker at zyasoft.com (Jim Baker)
Date: Tue, 14 Apr 2009 12:30:34 -0600
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <49DB7412.9030404@voidspace.org.uk>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49DB7412.9030404@voidspace.org.uk>
Message-ID: <d03bb4010904141130xc01a247r379fe37966d2b291@mail.gmail.com>

I rather like supporting short float representation. Given that CPython is
adopting it, I'm sure Jython will adopt this approach too as part of a
future Jython 3.x release.
- Jim

On Tue, Apr 7, 2009 at 9:41 AM, Michael Foord <fuzzyman at voidspace.org.uk>wrote:

> Mark Dickinson wrote:
>
>> [snip...]
>>  Discussion points
>> =================
>>
>> (1) Any objections to including this into py3k?  If there's
>> controversy, then I guess we'll need a PEP.
>>
>>
>
> Big +1
>
>> (2) Should other Python implementations (Jython,
>> IronPython, etc.) be expected to use short float repr, or should
>> it just be considered an implementation detail of CPython?
>> I propose the latter, except that all implementations should
>> be required to satisfy eval(repr(x)) == x for finite floats x.
>>
>>
> Short float repr should be an implementation detail, so long as
> eval(repr(x)) == x still holds.
>
> Michael Foord
>
> --
> http://www.ironpythoninaction.com/
> http://www.voidspace.org.uk/blog
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/jbaker%40zyasoft.com
>

-- 
Jim Baker
jbaker at zyasoft.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090414/3928d6d4/attachment.htm>

From ronaldoussoren at mac.com  Tue Apr 14 20:30:16 2009
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Tue, 14 Apr 2009 20:30:16 +0200
Subject: [Python-Dev] Shorter float repr in Python 3.1?
In-Reply-To: <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com>
	<49E3D34E.8040705@trueblade.com>
	<nad-D10AA9.19075613042009@news.gmane.org>
	<nad-DF327F.01455114042009@news.gmane.org>
	<5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com>
	<5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com>
Message-ID: <60D4A7E5-5E09-4D53-8E7B-72E5D0321A61@mac.com>

On 14 Apr, 2009, at 18:09, Mark Dickinson wrote:

> Okay, I think I might have fixed up the float endianness detection for
> universal builds on OS X.  Ned, any chance you could give this
> another try with an updated version of the py3k-short-float-repr  
> branch?
>
> One thing I don't understand:
>
> Is it true that to produce a working universal/fat build of Python,
> one has to first regenerate configure and pyconfig.h.in using autoconf
> version >= 2.62?  If not, then I don't understand how the
> AC_C_BIGENDIAN autoconf macro can be giving the right results.

It cannot, the actual bigendian detection for universal build is done  
in pymacconfig.h.  I have given up on getting pyconfig.h right for  
universal builds, especially when dealing with 4-way universal builds.

Ronald

>
> Mark
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2224 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090414/3bf3422a/attachment.bin>

From mal at egenix.com  Tue Apr 14 22:59:39 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 14 Apr 2009 22:59:39 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090414162603.70C843A4100@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<49D66C6E.3090602@v.loewis.de>
	<49DB475B.8060504@egenix.com>	<20090407140317.EBD383A4063@sparrow.telecommunity.com>	<49DB6A1F.50801@egenix.com>	<20090407174355.B62983A4063@sparrow.telecommunity.com>	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
Message-ID: <49E4F93B.6010802@egenix.com>

On 2009-04-14 18:27, P.J. Eby wrote:
> At 05:02 PM 4/14/2009 +0200, M.-A. Lemburg wrote:
>> I don't see the emphasis in the PEP on Linux distribution support and the
>> remote possibility of them wanting to combine separate packages back
>> into one package as good argument for adding yet another separate
>> hierarchy
>> of special files which Python scans during imports.
>>
>> That said, note that most distributions actually take the other route:
>> they try to split up larger packages into smaller ones, so the argument
>> becomes even weaker.
> 
> I think you've misunderstood something about the use case.  System
> packaging tools don't like separate packages to contain the *same
> file*.  That means that they *can't* split a larger package up with your
> proposal, because every one of those packages would have to contain a
> __pkg__.py -- and thus be in conflict with each other.  Either that, or
> they would have to make a separate system package containing *only* the
> __pkg__.py, and then make all packages using the namespace depend on it
> -- which is more work and requires greater co-ordination among packagers.

You are missing the point: When breaking up a large package that lives in
site-packages into smaller distribution bundles, you don't need namespace
packages at all, so the PEP doesn't apply.

The way this works is by having a base distribution bundle that includes
the needed __init__.py file and a set of extension bundles the add
other files to the same directory (without including another copy of
__init__.py). The extension bundles include a dependency on the base
package to make sure that it always gets installed first.

Debian has been using that approach for egenix-mx-base for years. Works
great:

    http://packages.debian.org/source/lenny/egenix-mx-base

eGenix has been using that approach for mx package add-ons as well -
long before "namespace" packages where given that name :-)

Please note that the PEP is about providing ways to have package parts
live on sys.path that reintegrate themselves into a single package at
import time.

As such it's targeting Python developers that want to ship add-ons to
existing packages, not Linux distributions (they usually have their
own ideas about what goes where - something that's completely out-of-
scope for the PEP).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 14 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From pje at telecommunity.com  Wed Apr 15 02:32:34 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Tue, 14 Apr 2009 20:32:34 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49E4F93B.6010802@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com>
	<20090407140317.EBD383A4063@sparrow.telecommunity.com>
	<49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
Message-ID: <20090415003026.B0A783A4114@sparrow.telecommunity.com>

At 10:59 PM 4/14/2009 +0200, M.-A. Lemburg wrote:
>You are missing the point: When breaking up a large package that lives in
>site-packages into smaller distribution bundles, you don't need namespace
>packages at all, so the PEP doesn't apply.
>
>The way this works is by having a base distribution bundle that includes
>the needed __init__.py file and a set of extension bundles the add
>other files to the same directory (without including another copy of
>__init__.py). The extension bundles include a dependency on the base
>package to make sure that it always gets installed first.

If we're going to keep that practice, there's no point to having the 
PEP: all three methods (base+extensions, pkgutil, setuptools) all 
work just fine as they are, with no changes to importing or the stdlib.

In particular, without the feature of being able to drop that 
practice, there would be no reason for setuptools to adopt the 
PEP.  That's why I'm -1 on your proposal: it's actually inferior to 
the methods we already have today.

From dan.eloff at gmail.com  Wed Apr 15 03:01:55 2009
From: dan.eloff at gmail.com (Dan Eloff)
Date: Tue, 14 Apr 2009 20:01:55 -0500
Subject: [Python-Dev] Why does read() return bytes instead of bytearray?
Message-ID: <4817b6fc0904141801q4db6f240xe5f429763d1440d1@mail.gmail.com>

Hi,

Can someone please explain why read() should return an immutable bytes
type instead of a mutable bytearray? It's not like read() from a file
and use buffer as a key in a dict is common. Certainly read() from
file or stream, modify, write is very common. I don't understand why
the common case pays the price in performance and simplicity. It
seemed to me that the immutable bytes was described as being useful in
niche situations, but it actually seems to have been favored over
bytearray in Python 3.

Was there was a good reason for this decision? Or was this just an
artifact in the change to two bytes types?

The reason I ask is I have a server application that is mostly stream
reading/writing on the hot path and in Python 2.5 the redundant copies
add up to a significant overhead, (I estimate as much as 25% from my
measurements) I was looking at Python 3 as a way to solve that
problem, but unfortunately it doesn't look like it will help.

Thanks,
-Dan

From amauryfa at gmail.com  Wed Apr 15 03:50:06 2009
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Wed, 15 Apr 2009 03:50:06 +0200
Subject: [Python-Dev] Why does read() return bytes instead of bytearray?
In-Reply-To: <4817b6fc0904141801q4db6f240xe5f429763d1440d1@mail.gmail.com>
References: <4817b6fc0904141801q4db6f240xe5f429763d1440d1@mail.gmail.com>
Message-ID: <e27efe130904141850r1bf72237q8c1ca1130e5d361c@mail.gmail.com>

Hello,

On Wed, Apr 15, 2009 at 03:01, Dan Eloff <dan.eloff at gmail.com> wrote:
> Hi,
>
> Can someone please explain why read() should return an immutable bytes
> type instead of a mutable bytearray? It's not like read() from a file
> and use buffer as a key in a dict is common. Certainly read() from
> file or stream, modify, write is very common. I don't understand why
> the common case pays the price in performance and simplicity. It
> seemed to me that the immutable bytes was described as being useful in
> niche situations, but it actually seems to have been favored over
> bytearray in Python 3.
>
> Was there was a good reason for this decision? Or was this just an
> artifact in the change to two bytes types?

No, the read() method did not change from the 2.x series.
It returns a new object on each call.

> The reason I ask is I have a server application that is mostly stream
> reading/writing on the hot path and in Python 2.5 the redundant copies
> add up to a significant overhead, (I estimate as much as 25% from my
> measurements) I was looking at Python 3 as a way to solve that
> problem, but unfortunately it doesn't look like it will help.

Files opened in binary mode have a readinto() method, which fills the
given bytearray.
Is this what you are looking for?

-- 
Amaury Forgeot d'Arc

From dan.eloff at gmail.com  Wed Apr 15 05:05:43 2009
From: dan.eloff at gmail.com (Dan Eloff)
Date: Tue, 14 Apr 2009 22:05:43 -0500
Subject: [Python-Dev] Why does read() return bytes instead of bytearray?
Message-ID: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com>

>No, the read() method did not change from the 2.x series. It returns a new object on each call.

I think you misunderstand me, but the readinto() method looks like a
perfectly reasonable solution, I didn't realize it existed, as it's
not in the library reference on file objects. Thanks for enlightening
me, I feel a little stupid now :)

Python 3, lookout, here I come!

-Dan

From mal at egenix.com  Wed Apr 15 09:51:30 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 15 Apr 2009 09:51:30 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090415003026.B0A783A4114@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<49D66C6E.3090602@v.loewis.de>
	<49DB475B.8060504@egenix.com>	<20090407140317.EBD383A4063@sparrow.telecommunity.com>	<49DB6A1F.50801@egenix.com>	<20090407174355.B62983A4063@sparrow.telecommunity.com>	<49E4A58F.70309@egenix.com>	<20090414162603.70C843A4100@sparrow.telecommunity.com>	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
Message-ID: <49E59202.6050809@egenix.com>

On 2009-04-15 02:32, P.J. Eby wrote:
> At 10:59 PM 4/14/2009 +0200, M.-A. Lemburg wrote:
>> You are missing the point: When breaking up a large package that lives in
>> site-packages into smaller distribution bundles, you don't need namespace
>> packages at all, so the PEP doesn't apply.
>>
>> The way this works is by having a base distribution bundle that includes
>> the needed __init__.py file and a set of extension bundles the add
>> other files to the same directory (without including another copy of
>> __init__.py). The extension bundles include a dependency on the base
>> package to make sure that it always gets installed first.
> 
> If we're going to keep that practice, there's no point to having the
> PEP: all three methods (base+extensions, pkgutil, setuptools) all work
> just fine as they are, with no changes to importing or the stdlib.

Again: the PEP is about creating a standard for namespace
packages. It's not about making namespace packages easy to use for
Linux distribution maintainers. Instead, it's targeting *developers*
that want to enable shipping a single package in multiple, separate
pieces, giving the user the freedom to the select the ones she needs.

Of course, this is possible today using various other techniques. The
point is that there is no standard for namespace packages and that's
what the PEP is trying to solve.

> In particular, without the feature of being able to drop that practice,
> there would be no reason for setuptools to adopt the PEP.  That's why
> I'm -1 on your proposal: it's actually inferior to the methods we
> already have today.

It's simpler and more in line with the Python Zen, not inferior.

You are free not to support it in setuptools - the methods
implemented in setuptools will continue to work as they are,
but continue to require support code and, over time, no longer
be compatible with other tools building upon the standard
defined in the PEP.

In the end, it's the user that decides: whether to go with a
standard or not.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 15 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From alessiogiovanni.baroni at gmail.com  Wed Apr 15 10:05:13 2009
From: alessiogiovanni.baroni at gmail.com (Alessio Giovanni Baroni)
Date: Wed, 15 Apr 2009 10:05:13 +0200
Subject: [Python-Dev] IDLE timeout.
Message-ID: <c010f2650904150105j648a2d20l7182a7f29d2061f0@mail.gmail.com>

Hi to all,
I write on this list, because the error concerns the internals (I think).
The IDLE has a strange behaviour. Sometimes, randomly, the IDLE restart the
interpreter, with the follow exception on console:

----------------------------------------
Unhandled server exception!
Thread: SockThread
Client Address:  ('127.0.0.1', 8833)
Request:  <socket.socket object, fd=3, family=2, type=1, proto=0>
Traceback (most recent call last):
  File "/opt/python301/lib/python3.0/socketserver.py", line 281, in
_handle_request_noblock
    self.process_request(request, client_address)
  File "/opt/python301/lib/python3.0/socketserver.py", line 307, in
process_request
    self.finish_request(request, client_address)
  File "/opt/python301/lib/python3.0/socketserver.py", line 320, in
finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 503, in __init__
    socketserver.BaseRequestHandler.__init__(self, sock, addr, svr)
  File "/opt/python301/lib/python3.0/socketserver.py", line 614, in __init__
    self.handle()
  File "/opt/python301/lib/python3.0/idlelib/run.py", line 259, in handle
    rpc.RPCHandler.getresponse(self, myseq=None, wait=0.05)
  File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 280, in
getresponse
    response = self._getresponse(myseq, wait)
  File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 300, in
_getresponse
    response = self.pollresponse(myseq, wait)
  File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 424, in
pollresponse
    message = self.pollmessage(wait)
  File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 376, in
pollmessage
    packet = self.pollpacket(wait)
  File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 347, in
pollpacket
    r, w, x = select.select([self.sock.fileno()], [], [], wait)
select.error: (4, 'Interrupted system call')

*** Unrecoverable, server exiting!
----------------------------------------

There isn't a specific reason; IDLE restart when I write some code, or when
I insert a return, or also when I do nothing.
If it is a bug, I don't know how to compile a test case, because the error
is randomly.

Thanks to all.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090415/4b0a4cac/attachment-0001.htm>

From krstic at solarsail.hcs.harvard.edu  Wed Apr 15 16:12:31 2009
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Wed, 15 Apr 2009 16:12:31 +0200
Subject: [Python-Dev] IDLE timeout.
In-Reply-To: <c010f2650904150105j648a2d20l7182a7f29d2061f0@mail.gmail.com>
References: <c010f2650904150105j648a2d20l7182a7f29d2061f0@mail.gmail.com>
Message-ID: <8B98611B-BE8A-4D32-9B3A-296DB8BDFDC6@solarsail.hcs.harvard.edu>

On Apr 15, 2009, at 10:05 AM, Alessio Giovanni Baroni wrote:
>     r, w, x = select.select([self.sock.fileno()], [], [], wait)
> select.error: (4, 'Interrupted system call')

See here for an explanation of the same problem in another module:
<http://mail.python.org/pipermail/python-dev/2000-October/009671.html>

Sounds like you ought to file a bug against IDLE to have it grow EINTR  
handling. Cheers,

--
Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | http://radian.org

From alessiogiovanni.baroni at gmail.com  Wed Apr 15 16:33:20 2009
From: alessiogiovanni.baroni at gmail.com (Alessio Giovanni Baroni)
Date: Wed, 15 Apr 2009 16:33:20 +0200
Subject: [Python-Dev] IDLE timeout.
In-Reply-To: <8B98611B-BE8A-4D32-9B3A-296DB8BDFDC6@solarsail.hcs.harvard.edu>
References: <c010f2650904150105j648a2d20l7182a7f29d2061f0@mail.gmail.com>
	<8B98611B-BE8A-4D32-9B3A-296DB8BDFDC6@solarsail.hcs.harvard.edu>
Message-ID: <c010f2650904150733p658df281t3fad4a5309fafb30@mail.gmail.com>

Ah, sometimes, the exception raised is following (slightly different from
previous):

Exception in Tkinter callback
Traceback (most recent call last):
  File "/opt/python301/lib/python3.0/tkinter/__init__.py", line 1399, in
__call__
    return self.func(*args)
  File "/opt/python301/lib/python3.0/tkinter/__init__.py", line 487, in
callit
    func(*args)
  File "/opt/python301/lib/python3.0/idlelib/PyShell.py", line 490, in
poll_subprocess
    response = clt.pollresponse(self.active_seq, wait=0.05)
  File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 424, in
pollresponse
    message = self.pollmessage(wait)
  File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 376, in
pollmessage
    packet = self.pollpacket(wait)
  File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 347, in
pollpacket
    r, w, x = select.select([self.sock.fileno()], [], [], wait)
select.error: (4, 'Interrupted system call')

In this case the IDLE not respond, because the python interpreter is not
restarted. I must to close all :-(.
I will open a issue in the tracker relative to IDLE for now?

Regards.

2009/4/15 Ivan Krsti? <krstic at solarsail.hcs.harvard.edu>

> On Apr 15, 2009, at 10:05 AM, Alessio Giovanni Baroni wrote:
>
>>    r, w, x = select.select([self.sock.fileno()], [], [], wait)
>> select.error: (4, 'Interrupted system call')
>>
>
>
> See here for an explanation of the same problem in another module:
> <http://mail.python.org/pipermail/python-dev/2000-October/009671.html>
>
> Sounds like you ought to file a bug against IDLE to have it grow EINTR
> handling. Cheers,
>
> --
> Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | http://radian.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090415/ddbebd43/attachment.htm>

From pje at telecommunity.com  Wed Apr 15 16:44:17 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 15 Apr 2009 10:44:17 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49E59202.6050809@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com>
	<20090407140317.EBD383A4063@sparrow.telecommunity.com>
	<49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
Message-ID: <20090415144147.6845F3A4100@sparrow.telecommunity.com>

At 09:51 AM 4/15/2009 +0200, M.-A. Lemburg wrote:
>On 2009-04-15 02:32, P.J. Eby wrote:
> > At 10:59 PM 4/14/2009 +0200, M.-A. Lemburg wrote:
> >> You are missing the point: When breaking up a large package that lives in
> >> site-packages into smaller distribution bundles, you don't need namespace
> >> packages at all, so the PEP doesn't apply.
> >>
> >> The way this works is by having a base distribution bundle that includes
> >> the needed __init__.py file and a set of extension bundles the add
> >> other files to the same directory (without including another copy of
> >> __init__.py). The extension bundles include a dependency on the base
> >> package to make sure that it always gets installed first.
> >
> > If we're going to keep that practice, there's no point to having the
> > PEP: all three methods (base+extensions, pkgutil, setuptools) all work
> > just fine as they are, with no changes to importing or the stdlib.
>
>Again: the PEP is about creating a standard for namespace
>packages. It's not about making namespace packages easy to use for
>Linux distribution maintainers. Instead, it's targeting *developers*
>that want to enable shipping a single package in multiple, separate
>pieces, giving the user the freedom to the select the ones she needs.
>
>Of course, this is possible today using various other techniques. The
>point is that there is no standard for namespace packages and that's
>what the PEP is trying to solve.
>
> > In particular, without the feature of being able to drop that practice,
> > there would be no reason for setuptools to adopt the PEP.  That's why
> > I'm -1 on your proposal: it's actually inferior to the methods we
> > already have today.
>
>It's simpler and more in line with the Python Zen, not inferior.
>
>You are free not to support it in setuptools - the methods
>implemented in setuptools will continue to work as they are,
>but continue to require support code and, over time, no longer
>be compatible with other tools building upon the standard
>defined in the PEP.
>
>In the end, it's the user that decides: whether to go with a
>standard or not.

Up until this point, I've been trying to help you understand the use 
cases, but it's clear now that you already understand them, you just 
don't care.

That wouldn't be a problem if you just stayed on the sidelines, 
instead of actively working to make those use cases more difficult 
for everyone else than they already are.

Anyway, since you clearly understand precisely what you're doing, I'm 
now going to stop trying to explain things, as my responses are 
apparently just encouraging you, and possibly convincing bystanders 
that there's some genuine controversy here as well.

From rdmurray at bitdance.com  Wed Apr 15 16:42:33 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 15 Apr 2009 10:42:33 -0400 (EDT)
Subject: [Python-Dev] Why does read() return bytes instead of bytearray?
In-Reply-To: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com>
References: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0904151036440.1740@kimball.webabinitio.net>

On Tue, 14 Apr 2009 at 22:05, Dan Eloff wrote:
>> No, the read() method did not change from the 2.x series. It returns a new object on each call.
>
> I think you misunderstand me, but the readinto() method looks like a
> perfectly reasonable solution, I didn't realize it existed, as it's
> not in the library reference on file objects. Thanks for enlightening
> me, I feel a little stupid now :)

You have to follow the link from that section to the 'io' module to find
it.

The io module is about streams and is therefore in the 'generic operating
system services' section, not the 'file and directory access section',
which makes it a little harder to find when what you think you want to
know about is file access...I think this is a doc bug but I'm completely
unsure what would be a good fix.

--David

From barry at python.org  Wed Apr 15 17:45:08 2009
From: barry at python.org (Barry Warsaw)
Date: Wed, 15 Apr 2009 11:45:08 -0400
Subject: [Python-Dev] RELEASED Python 2.6.2
Message-ID: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>

On behalf of the Python community, I'm happy to announce the  
availability of Python 2.6.2.  This is the latest production-ready  
version in the Python 2.6 series.  Dozens of issues have been fixed  
since Python 2.6.1 was released back in December.  Please see the NEWS  
file for all the gory details.

     http://www.python.org/download/releases/2.6.2/NEWS.txt

For more information on Python 2.6 in general, please see

      http://docs.python.org/dev/whatsnew/2.6.html

Source tarballs, Windows installers, and (soon) Mac OS X disk images  
can be downloaded from the Python 2.6.2 page:

     http://www.python.org/download/releases/2.6.2/

Please report bugs for any Python version in the Python tracker:

     http://bugs.python.org

Enjoy,
-Barry

Barry Warsaw
barry at python.org
Python 2.6/3.0 Release Manager
(on behalf of the entire python-dev team)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090415/2dd975d6/attachment.pgp>

From aahz at pythoncraft.com  Wed Apr 15 18:10:33 2009
From: aahz at pythoncraft.com (Aahz)
Date: Wed, 15 Apr 2009 09:10:33 -0700
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090415144147.6845F3A4100@sparrow.telecommunity.com>
References: <49DB475B.8060504@egenix.com>
	<20090407140317.EBD383A4063@sparrow.telecommunity.com>
	<49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
Message-ID: <20090415161033.GA5218@panix.com>

[much quote-trimming, the following is intended to just give the gist,
but the bits quoted below are not in directe response to each other]

On Wed, Apr 15, 2009, P.J. Eby wrote:
> At 09:51 AM 4/15/2009 +0200, M.-A. Lemburg wrote:
>> 
>>  [...]
>> Again: the PEP is about creating a standard for namespace
>> packages. It's not about making namespace packages easy to use for
>> Linux distribution maintainers. Instead, it's targeting *developers*
>> that want to enable shipping a single package in multiple, separate
>> pieces, giving the user the freedom to the select the ones she needs.
>>  [...]
>
>  [...]
> Anyway, since you clearly understand precisely what you're doing, I'm  
> now going to stop trying to explain things, as my responses are  
> apparently just encouraging you, and possibly convincing bystanders that 
> there's some genuine controversy here as well.

For the benefit of us bystanders, could you summarize your vote at this
point?  Given the PEP's intended goals, if you do not oppose the PEP, are
there any changes you think should be made?
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?

From mal at egenix.com  Wed Apr 15 18:15:46 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 15 Apr 2009 18:15:46 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090415144147.6845F3A4100@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<49D66C6E.3090602@v.loewis.de>
	<49DB475B.8060504@egenix.com>	<20090407140317.EBD383A4063@sparrow.telecommunity.com>	<49DB6A1F.50801@egenix.com>	<20090407174355.B62983A4063@sparrow.telecommunity.com>	<49E4A58F.70309@egenix.com>	<20090414162603.70C843A4100@sparrow.telecommunity.com>	<49E4F93B.6010802@egenix.com>	<20090415003026.B0A783A4114@sparrow.telecommunity.com>	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
Message-ID: <49E60832.8030806@egenix.com>

On 2009-04-15 16:44, P.J. Eby wrote:
> At 09:51 AM 4/15/2009 +0200, M.-A. Lemburg wrote:
>> On 2009-04-15 02:32, P.J. Eby wrote:
>> > At 10:59 PM 4/14/2009 +0200, M.-A. Lemburg wrote:
>> >> You are missing the point: When breaking up a large package that
>> lives in
>> >> site-packages into smaller distribution bundles, you don't need
>> namespace
>> >> packages at all, so the PEP doesn't apply.
>> >>
>> >> The way this works is by having a base distribution bundle that
>> includes
>> >> the needed __init__.py file and a set of extension bundles the add
>> >> other files to the same directory (without including another copy of
>> >> __init__.py). The extension bundles include a dependency on the base
>> >> package to make sure that it always gets installed first.
>> >
>> > If we're going to keep that practice, there's no point to having the
>> > PEP: all three methods (base+extensions, pkgutil, setuptools) all work
>> > just fine as they are, with no changes to importing or the stdlib.
>>
>> Again: the PEP is about creating a standard for namespace
>> packages. It's not about making namespace packages easy to use for
>> Linux distribution maintainers. Instead, it's targeting *developers*
>> that want to enable shipping a single package in multiple, separate
>> pieces, giving the user the freedom to the select the ones she needs.
>>
>> Of course, this is possible today using various other techniques. The
>> point is that there is no standard for namespace packages and that's
>> what the PEP is trying to solve.
>>
>> > In particular, without the feature of being able to drop that practice,
>> > there would be no reason for setuptools to adopt the PEP.  That's why
>> > I'm -1 on your proposal: it's actually inferior to the methods we
>> > already have today.
>>
>> It's simpler and more in line with the Python Zen, not inferior.
>>
>> You are free not to support it in setuptools - the methods
>> implemented in setuptools will continue to work as they are,
>> but continue to require support code and, over time, no longer
>> be compatible with other tools building upon the standard
>> defined in the PEP.
>>
>> In the end, it's the user that decides: whether to go with a
>> standard or not.
> 
> Up until this point, I've been trying to help you understand the use
> cases, but it's clear now that you already understand them, you just
> don't care.
> 
> That wouldn't be a problem if you just stayed on the sidelines, instead
> of actively working to make those use cases more difficult for everyone
> else than they already are.
> 
> Anyway, since you clearly understand precisely what you're doing, I'm
> now going to stop trying to explain things, as my responses are
> apparently just encouraging you, and possibly convincing bystanders that
> there's some genuine controversy here as well.

Hopefully, bystanders will understand that the one single use case
you are always emphasizing, namely that of Linux distribution maintainers
trying to change the package installation layout, is really a rather
uncommon and rare use case.

It is true that I do understand what the namespace package idea is
all about. I've been active in Python package development since they
were first added to Python as a new built-in import feature in Python 1.5
and have been distributing packages with package add-ons for more than
a decade...

For some history, have a look at:

	http://www.python.org/doc/essays/packages.html

Also note how that essay discourages the use of .pth files:

"""
If the package really requires adding one or more directories on sys.path (e.g.
because it has not yet been structured to support dotted-name import), a "path
configuration file" named package.pth can be placed in either the site-python or
site-packages directory.
...
A typical installation should have no or very few .pth files or something is
wrong, and if you need to play with the search order, something is very wrong.
"""

Back to the PEP:

The much more common use case is that of wanting to have a base package
installation which optional add-ons that live in the same logical
package namespace.

The PEP provides a way to solve this use case by giving both developers
and users a standard at hand which they can follow without having to
rely on some non-standard helpers and across Python implementations.

My proposal tries to solve this without adding yet another .pth
file like mechanism - hopefully in the spirit of the original Python
package idea.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 15 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From georg at python.org  Wed Apr 15 17:52:57 2009
From: georg at python.org (Georg Brandl)
Date: Wed, 15 Apr 2009 17:52:57 +0200
Subject: [Python-Dev] Python Bug Day on April 23
Message-ID: <49E602D9.5070603@python.org>

Hi,

I'd like to announce that there will be a Python Bug Day on April 23.
As always, this is a perfect opportunity to get involved in Python
development, or bring your own issues to attention, discuss them and
(hopefully) resolve them together with the core developers.

We will coordinate over IRC, in #python-dev on irc.freenode.net,
and the Wiki page http://wiki.python.org/moin/PythonBugDay has all
important information and a short list of steps how to get set up.

Please spread the word!

Georg

From pje at telecommunity.com  Wed Apr 15 19:49:20 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 15 Apr 2009 13:49:20 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090415161033.GA5218@panix.com>
References: <49DB475B.8060504@egenix.com>
	<20090407140317.EBD383A4063@sparrow.telecommunity.com>
	<49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<20090415161033.GA5218@panix.com>
Message-ID: <20090415174649.43B6B3A4100@sparrow.telecommunity.com>

At 09:10 AM 4/15/2009 -0700, Aahz wrote:
>For the benefit of us bystanders, could you summarize your vote at this
>point?  Given the PEP's intended goals, if you do not oppose the PEP, are
>there any changes you think should be made?

I'm +1 on Martin's original version of the PEP, subject to the point 
brought up by someone that .pkg should be changed to a different extension.

I'm -1 on all of MAL's proposed revisions, as IMO they are a step 
backwards: they "standardize" an approach that will create problems 
that don't need to exist, and don't exist now.  Martin's proposal is 
an improvement on the status quo, Marc's proposal is a dis-improvement.

From pje at telecommunity.com  Wed Apr 15 19:59:34 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 15 Apr 2009 13:59:34 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49E60832.8030806@egenix.com>
References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com>
	<49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com>
	<20090407140317.EBD383A4063@sparrow.telecommunity.com>
	<49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
Message-ID: <20090415175704.966B13A4100@sparrow.telecommunity.com>

At 06:15 PM 4/15/2009 +0200, M.-A. Lemburg wrote:
>The much more common use case is that of wanting to have a base package
>installation which optional add-ons that live in the same logical
>package namespace.

Please see the large number of Zope and PEAK distributions on PyPI as 
minimal examples that disprove this being the common use case.  I 
expect you will find a fair number of others, as well.

In these cases, there is NO "base package"...  the entire point of 
using namespace packages for these distributions is that a "base 
package" is neither necessary nor desirable.

In other words, the "base package" scenario is the exception these 
days, not the rule.  I actually know specifically of only one other 
such package besides your mx.* case, the logilab ll.* package.

From mal at egenix.com  Wed Apr 15 20:00:42 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 15 Apr 2009 20:00:42 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <9D093FD7-080B-479E-90B4-51294EBE5186@fuhm.net>
References: <49D4DA72.60401@v.loewis.de>	<49D52115.6020001@egenix.com>	<49D66C6E.3090602@v.loewis.de>	<49DB475B.8060504@egenix.com>	<20090407140317.EBD383A4063@sparrow.telecommunity.com>	<49DB6A1F.50801@egenix.com>	<20090407174355.B62983A4063@sparrow.telecommunity.com>	<49E4A58F.70309@egenix.com>	<20090414162603.70C843A4100@sparrow.telecommunity.com>	<49E4F93B.6010802@egenix.com>	<20090415003026.B0A783A4114@sparrow.telecommunity.com>	<49E59202.6050809@egenix.com>	<20090415144147.6845F3A4100@sparrow.telecommunity.com>	<49E60832.8030806@egenix.com>
	<9D093FD7-080B-479E-90B4-51294EBE5186@fuhm.net>
Message-ID: <49E620CA.70903@egenix.com>

On 2009-04-15 19:38, James Y Knight wrote:
> 
> On Apr 15, 2009, at 12:15 PM, M.-A. Lemburg wrote:
> 
>> The much more common use case is that of wanting to have a base package
>> installation which optional add-ons that live in the same logical
>> package namespace.
>>
>> The PEP provides a way to solve this use case by giving both developers
>> and users a standard at hand which they can follow without having to
>> rely on some non-standard helpers and across Python implementations.
> 
> I'm not sure I understand what advantage your proposal gives over the
> current mechanism for doing this.
> 
> That is, add to your __init__.py file:
> 
> from pkgutil import extend_path
> __path__ = extend_path(__path__, __name__)
> 
> Can you describe the intended advantages over the status-quo a bit more
> clearly?

Simple: you don't need the above lines in your __init__.py file
anymore and can rely on a Python standard for namespace packages
instead of some helper implementation.

The fact that you have a __pkg__.py file in your package dir
will signal the namespace package character to Python's importer
and this will take care of the lookup process for you.

Namespace packages will be just as easy to write, install and
maintain as regular Python packages.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 15 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From foom at fuhm.net  Wed Apr 15 19:38:19 2009
From: foom at fuhm.net (James Y Knight)
Date: Wed, 15 Apr 2009 13:38:19 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49E60832.8030806@egenix.com>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<49D66C6E.3090602@v.loewis.de>
	<49DB475B.8060504@egenix.com>	<20090407140317.EBD383A4063@sparrow.telecommunity.com>	<49DB6A1F.50801@egenix.com>	<20090407174355.B62983A4063@sparrow.telecommunity.com>	<49E4A58F.70309@egenix.com>	<20090414162603.70C843A4100@sparrow.telecommunity.com>	<49E4F93B.6010802@egenix.com>	<20090415003026.B0A783A4114@sparrow.telecommunity.com>	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
Message-ID: <9D093FD7-080B-479E-90B4-51294EBE5186@fuhm.net>

On Apr 15, 2009, at 12:15 PM, M.-A. Lemburg wrote:

> The much more common use case is that of wanting to have a base  
> package
> installation which optional add-ons that live in the same logical
> package namespace.
>
> The PEP provides a way to solve this use case by giving both  
> developers
> and users a standard at hand which they can follow without having to
> rely on some non-standard helpers and across Python implementations.

I'm not sure I understand what advantage your proposal gives over the  
current mechanism for doing this.

That is, add to your __init__.py file:

from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)

Can you describe the intended advantages over the status-quo a bit  
more clearly?

James

From mal at egenix.com  Wed Apr 15 20:09:11 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 15 Apr 2009 20:09:11 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090415175704.966B13A4100@sparrow.telecommunity.com>
References: <49D4DA72.60401@v.loewis.de>
	<49D52115.6020001@egenix.com>	<49D66C6E.3090602@v.loewis.de>
	<49DB475B.8060504@egenix.com>	<20090407140317.EBD383A4063@sparrow.telecommunity.com>	<49DB6A1F.50801@egenix.com>	<20090407174355.B62983A4063@sparrow.telecommunity.com>	<49E4A58F.70309@egenix.com>	<20090414162603.70C843A4100@sparrow.telecommunity.com>	<49E4F93B.6010802@egenix.com>	<20090415003026.B0A783A4114@sparrow.telecommunity.com>	<49E59202.6050809@egenix.com>	<20090415144147.6845F3A4100@sparrow.telecommunity.com>	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
Message-ID: <49E622C7.4000208@egenix.com>

On 2009-04-15 19:59, P.J. Eby wrote:
> At 06:15 PM 4/15/2009 +0200, M.-A. Lemburg wrote:
>> The much more common use case is that of wanting to have a base package
>> installation which optional add-ons that live in the same logical
>> package namespace.
> 
> Please see the large number of Zope and PEAK distributions on PyPI as
> minimal examples that disprove this being the common use case.  I expect
> you will find a fair number of others, as well.
> 
> In these cases, there is NO "base package"...  the entire point of using
> namespace packages for these distributions is that a "base package" is
> neither necessary nor desirable.
> 
> In other words, the "base package" scenario is the exception these days,
> not the rule.  I actually know specifically of only one other such
> package besides your mx.* case, the logilab ll.* package.

So now you're arguing against having base packages... at least you've
dropped the strange idea of using Linux distribution maintainers
as central use case ;-)

Think of base namespace packages (the ones providing the __init__.py
file) as defining the namespace. They setup ownership and the basic
infrastructure needed by add-ons.

If you take Zope as example, the Products/ package dir is a good
example: the __init__.py file in that directory is provided by the
Zope installation (generated during Zope instance creation), so Zope
"owns" the package.

With the proposal, Zope could declare this package dir a namespace
base package by adding a __pkg__.py file to it.

Zope add-ons could then be installed somewhere else on sys.path
and include a Products/ dir as well, only this time it doesn't have
the __init__.py file, but only a __pkg__.py file.

Python would then take care of integrating the add-on Products/ dir
Python module/package contents with the base package.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 15 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From amk at amk.ca  Wed Apr 15 20:52:21 2009
From: amk at amk.ca (A.M. Kuchling)
Date: Wed, 15 Apr 2009 14:52:21 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090415175704.966B13A4100@sparrow.telecommunity.com>
References: <49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
Message-ID: <20090415185221.GB13696@amk-desktop.matrixgroup.net>

On Wed, Apr 15, 2009 at 01:59:34PM -0400, P.J. Eby wrote:
> Please see the large number of Zope and PEAK distributions on PyPI as  
> minimal examples that disprove this being the common use case.  I expect 
> you will find a fair number of others, as well.
   ...
> In other words, the "base package" scenario is the exception these days, 
> not the rule.  I actually know specifically of only one other such 
> package besides your mx.* case, the logilab ll.* package.

Isn't that pretty even, then?  zope.* and PEAK are two examples of one
approach; and mx.* and ll.* are two examples that use the base package
approach.  Neither approach seems to be the more common one, and both
are pretty rare.

--amk

From georg at python.org  Wed Apr 15 21:10:07 2009
From: georg at python.org (Georg Brandl)
Date: Wed, 15 Apr 2009 21:10:07 +0200
Subject: [Python-Dev] Correction: Python Bug Day on April 25
Message-ID: <49E6310F.2020300@python.org>

Hi,

I managed to screw up the date, so here it goes again:

I'd like to announce that there will be a Python Bug Day on April 25.
As always, this is a perfect opportunity to get involved in Python
development, or bring your own issues to attention, discuss them and
(hopefully) resolve them together with the core developers.

We will coordinate over IRC, in #python-dev on irc.freenode.net,
and the Wiki page http://wiki.python.org/moin/PythonBugDay has all
important information and a short list of steps how to get set up.

Please spread the word!

Georg

From pje at telecommunity.com  Wed Apr 15 21:22:52 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 15 Apr 2009 15:22:52 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090415185221.GB13696@amk-desktop.matrixgroup.net>
References: <49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
Message-ID: <20090415192021.558E53A4119@sparrow.telecommunity.com>

At 02:52 PM 4/15/2009 -0400, A.M. Kuchling wrote:
>On Wed, Apr 15, 2009 at 01:59:34PM -0400, P.J. Eby wrote:
> > Please see the large number of Zope and PEAK distributions on PyPI as
> > minimal examples that disprove this being the common use case.  I expect
> > you will find a fair number of others, as well.
>    ...
> > In other words, the "base package" scenario is the exception these days,
> > not the rule.  I actually know specifically of only one other such
> > package besides your mx.* case, the logilab ll.* package.
>
>Isn't that pretty even, then?  zope.* and PEAK are two examples of one
>approach; and mx.* and ll.* are two examples that use the base package
>approach.  Neither approach seems to be the more common one, and both
>are pretty rare.

If you view the package listings on PyPI, you'll see that the "pure" 
namespaces currently in use include:

alchemist.*
amplecode.*
atomisator.*
bda.*
benri.*
beyondskins.*
bliptv.*
bopen.*
borg.*
bud.*
...

This is just going down to the 'b's, looking only at packages whose 
PyPI project name reflects a nested package name, and only including 
those with entries that:

1. use setuptools,
2. declare one or more namespace packages, and
3. do not depend on some sort of "base" or "core" package.

Technically, setuptools doesn't support base packages anyway, but if 
the organization appeared to be based on a "core+plugins/addons" 
model (as opposed to "collection of packages grouped in a namespace") 
I didn't include it in the list above -- i.e., I'm bending over 
backwards to be fair in the count.

If somebody wants to do a formal count of base vs. pure, it might 
provide interesting stats.  I initially only mentioned Zope and PEAK 
because I have direct knowledge of the developers' intent regarding 
their namespace packages.

However, now that I've actually looked at a tiny sample of PyPI, it's 
clear that the actual field use of pure namespace packages has 
positively exploded since setuptools made it practical to use them.

It's unclear, however, who is using base packages besides mx.* and 
ll.*, although I'd guess from the PyPI listings that perhaps Django 
is.  (It seems that "base" packages are more likely to use a 
'base-extension' naming pattern, vs. the 'namespace.project' pattern 
used by "pure" packages.)

Of course, I am certainly not opposed to supporting base packages, 
and Martin's version of PEP 382 is a plus for setuptools because it 
would allow setuptools to better support the "base" scenario.

But pure packages are definitely not a minority; in fact, a 
superficial observation of the full PyPI list suggests that there may 
be almost as many projects using pure-namespace packages, as there 
are non-namespaced projects!

From fijall at gmail.com  Wed Apr 15 21:36:01 2009
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Wed, 15 Apr 2009 13:36:01 -0600
Subject: [Python-Dev] Correction: Python Bug Day on April 25
In-Reply-To: <49E6310F.2020300@python.org>
References: <49E6310F.2020300@python.org>
Message-ID: <693bc9ab0904151236w5b72d56bi2a5ce936201eebe6@mail.gmail.com>

On Wed, Apr 15, 2009 at 1:10 PM, Georg Brandl <georg at python.org> wrote:
> Hi,
>
> I managed to screw up the date, so here it goes again:
>
> I'd like to announce that there will be a Python Bug Day on April 25.
> As always, this is a perfect opportunity to get involved in Python
> development, or bring your own issues to attention, discuss them and
> (hopefully) resolve them together with the core developers.
>
> We will coordinate over IRC, in #python-dev on irc.freenode.net,
> and the Wiki page http://wiki.python.org/moin/PythonBugDay has all
> important information and a short list of steps how to get set up.
>
> Please spread the word!
>
> Georg
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
>

Are you aware that this directly conflicts with TurboGears world-wide sprint?

Not sure if this is relevant, just a notice.

Cheers,
fijal

From g.brandl at gmx.net  Wed Apr 15 21:55:30 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 15 Apr 2009 21:55:30 +0200
Subject: [Python-Dev] Correction: Python Bug Day on April 25
In-Reply-To: <693bc9ab0904151236w5b72d56bi2a5ce936201eebe6@mail.gmail.com>
References: <49E6310F.2020300@python.org>
	<693bc9ab0904151236w5b72d56bi2a5ce936201eebe6@mail.gmail.com>
Message-ID: <gs5dn4$s7c$1@ger.gmane.org>

Maciej Fijalkowski schrieb:
> On Wed, Apr 15, 2009 at 1:10 PM, Georg Brandl <georg at python.org> wrote:
>> Hi,
>>
>> I managed to screw up the date, so here it goes again:
>>
>> I'd like to announce that there will be a Python Bug Day on April 25.

> Are you aware that this directly conflicts with TurboGears world-wide sprint?
> 
> Not sure if this is relevant, just a notice.

I have been made aware :)

I don't think it will be much of a problem though.

Georg

From ziade.tarek at gmail.com  Wed Apr 15 22:00:01 2009
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Wed, 15 Apr 2009 22:00:01 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090415192021.558E53A4119@sparrow.telecommunity.com>
References: <49DB6A1F.50801@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
Message-ID: <94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com>

On Wed, Apr 15, 2009 at 9:22 PM, P.J. Eby <pje at telecommunity.com> wrote:
> At 02:52 PM 4/15/2009 -0400, A.M. Kuchling wrote:
>>
>> On Wed, Apr 15, 2009 at 01:59:34PM -0400, P.J. Eby wrote:
>> > Please see the large number of Zope and PEAK distributions on PyPI as
>> > minimal examples that disprove this being the common use case. ?I expect
>> > you will find a fair number of others, as well.
>> ? ...
>> > In other words, the "base package" scenario is the exception these days,
>> > not the rule. ?I actually know specifically of only one other such
>> > package besides your mx.* case, the logilab ll.* package.
>>
>> Isn't that pretty even, then? ?zope.* and PEAK are two examples of one
>> approach; and mx.* and ll.* are two examples that use the base package
>> approach. ?Neither approach seems to be the more common one, and both
>> are pretty rare.
>
> If you view the package listings on PyPI, you'll see that the "pure"
> namespaces currently in use include:
>
> alchemist.*
> amplecode.*
> atomisator.*
> bda.*
> benri.*
> beyondskins.*
> bliptv.*
> bopen.*
> borg.*
> bud.*
> ...
>
> This is just going down to the 'b's, looking only at packages whose PyPI
> project name reflects a nested package name, and only including those with
> entries that:
>
> 1. use setuptools,
> 2. declare one or more namespace packages, and
> 3. do not depend on some sort of "base" or "core" package.
>
> Technically, setuptools doesn't support base packages anyway, but if the
> organization appeared to be based on a "core+plugins/addons" model (as
> opposed to "collection of packages grouped in a namespace") I didn't include
> it in the list above -- i.e., I'm bending over backwards to be fair in the
> count.
>
> If somebody wants to do a formal count of base vs. pure, it might provide
> interesting stats. ?I initially only mentioned Zope and PEAK because I have
> direct knowledge of the developers' intent regarding their namespace
> packages.
>
> However, now that I've actually looked at a tiny sample of PyPI, it's clear
> that the actual field use of pure namespace packages has positively exploded
> since setuptools made it practical to use them.
>
> It's unclear, however, who is using base packages besides mx.* and ll.*,
> although I'd guess from the PyPI listings that perhaps Django is. ?(It seems
> that "base" packages are more likely to use a 'base-extension' naming
> pattern, vs. the 'namespace.project' pattern used by "pure" packages.)
>
> Of course, I am certainly not opposed to supporting base packages, and
> Martin's version of PEP 382 is a plus for setuptools because it would allow
> setuptools to better support the "base" scenario.
>
> But pure packages are definitely not a minority; in fact, a superficial
> observation of the full PyPI list suggests that there may be almost as many
> projects using pure-namespace packages, as there are non-namespaced
> projects!
>

In the survey I have done on packaging, 34% of the people that
answered are using setuptools namespace
feature, which currently makes it impossible to use the namespace for
the base package.

Now for the "base" or "core" package, what peoplethat uses setuptools
do most of the time:

1- they use zc.buildout so they don't need a base package : they list
in a configuration files all packages needed
   to build the application, and one of these package happen to have
the scripts to launch the application.

2 - they have a "main" package that doesn't use the same namespace,
but uses setuptools instal_requires metadata
     to include namespaced packages. It acts like zc.buildout in some ways.

For example, you mentioned atomisator.* in your example, this app has
a main package called "Atomisator" (notice the upper A)
that uses strategy #2

But frankly, the "base package" scenario is not spread these days
simply because it's not obvious to
do it without depending on a OS that has its own strategy to install packages.
For example, if you are not under debian, it's a pain to use logilab
packages because
they use this common namespace for several packages and a plain python
installation of the various packages
won't work out of the box under other systems like windows. (and for
pylint, I ended up creating my own distribution for windows...)

So :
- having namespaces natively in Python is a big win (Namespaces are
one honking great idea -- let's do more of those!)
- being able to still write some code under the primary namespace is
something I (and lots of people) wish we could do
  with setuptools, so it's a big win too.

Regards,
Tarek
-- 
Tarek Ziad? | http://ziade.org

From mal at egenix.com  Wed Apr 15 22:20:45 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 15 Apr 2009 22:20:45 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090415192021.558E53A4119@sparrow.telecommunity.com>
References: <49DB6A1F.50801@egenix.com>	<20090407174355.B62983A4063@sparrow.telecommunity.com>	<49E4A58F.70309@egenix.com>	<20090414162603.70C843A4100@sparrow.telecommunity.com>	<49E4F93B.6010802@egenix.com>	<20090415003026.B0A783A4114@sparrow.telecommunity.com>	<49E59202.6050809@egenix.com>	<20090415144147.6845F3A4100@sparrow.telecommunity.com>	<49E60832.8030806@egenix.com>	<20090415175704.966B13A4100@sparrow.telecommunity.com>	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
Message-ID: <49E6419D.5010302@egenix.com>

On 2009-04-15 21:22, P.J. Eby wrote:
> At 02:52 PM 4/15/2009 -0400, A.M. Kuchling wrote:
>> On Wed, Apr 15, 2009 at 01:59:34PM -0400, P.J. Eby wrote:
>> > Please see the large number of Zope and PEAK distributions on PyPI as
>> > minimal examples that disprove this being the common use case.  I
>> expect
>> > you will find a fair number of others, as well.
>>    ...
>> > In other words, the "base package" scenario is the exception these
>> days,
>> > not the rule.  I actually know specifically of only one other such
>> > package besides your mx.* case, the logilab ll.* package.
>>
>> Isn't that pretty even, then?  zope.* and PEAK are two examples of one
>> approach; and mx.* and ll.* are two examples that use the base package
>> approach.  Neither approach seems to be the more common one, and both
>> are pretty rare.
> 
> If you view the package listings on PyPI, you'll see that the "pure"
> namespaces currently in use include:
> 
> alchemist.*
> amplecode.*
> atomisator.*
> bda.*
> benri.*
> beyondskins.*
> bliptv.*
> bopen.*
> borg.*
> bud.*
> ...
> 
> This is just going down to the 'b's, looking only at packages whose PyPI
> project name reflects a nested package name, and only including those
> with entries that:
> 
> 1. use setuptools,
> 2. declare one or more namespace packages, and
> 3. do not depend on some sort of "base" or "core" package.
> 
> Technically, setuptools doesn't support base packages anyway, but if the
> organization appeared to be based on a "core+plugins/addons" model (as
> opposed to "collection of packages grouped in a namespace") I didn't
> include it in the list above -- i.e., I'm bending over backwards to be
> fair in the count.

Hmm, setuptools doesn't support the notion of base packages, ie.
packages that provide their own __init__.py module, so I fail
to see how your list or any other list of setuptools-depend
packages can be taken as indicator for anything related to
base packages.

Since setuptools probably introduced the idea of namespace
sharing packages to many authors in the first place, such
a list is even less appropriate to use as sample base.

That said, I don't think such statistics provide any useful
information to decide on the namespace import strategy standard
for Python which is the subject of the PEP.

They just show that one helper-based mechanism is used more than
others and that's simply a consequence of there not being a
standard built-in way of using namespace packages in Python.

Whether base packages are useful or not is really a side aspect
of the PEP and my proposal. I'm more after a method that doesn't
add more .pkg file cruft to Python's import mechanism.

Those .pth files were originally meant to help older Python "packages"
(think the early PIL or Numeric extensions) to integrate nicely into
the new scheme without having to fully support dotted package names
right from the start.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 15 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From benjamin at python.org  Wed Apr 15 22:37:47 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 15 Apr 2009 15:37:47 -0500
Subject: [Python-Dev] Why does read() return bytes instead of bytearray?
In-Reply-To: <Pine.LNX.4.64.0904151036440.1740@kimball.webabinitio.net>
References: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com>
	<Pine.LNX.4.64.0904151036440.1740@kimball.webabinitio.net>
Message-ID: <1afaf6160904151337o277c6bc3s8aaa71a756968c21@mail.gmail.com>

2009/4/15 R. David Murray <rdmurray at bitdance.com>:
> On Tue, 14 Apr 2009 at 22:05, Dan Eloff wrote:
>>>
>>> No, the read() method did not change from the 2.x series. It returns a
>>> new object on each call.
>>
>> I think you misunderstand me, but the readinto() method looks like a
>> perfectly reasonable solution, I didn't realize it existed, as it's
>> not in the library reference on file objects. Thanks for enlightening
>> me, I feel a little stupid now :)
>
> You have to follow the link from that section to the 'io' module to find
> it.
>
> The io module is about streams and is therefore in the 'generic operating
> system services' section, not the 'file and directory access section',
> which makes it a little harder to find when what you think you want to
> know about is file access...I think this is a doc bug but I'm completely
> unsure what would be a good fix.

I've added a like to the io module in the see also section of the file
and directory systems.

-- 
Regards,
Benjamin

From rowen at u.washington.edu  Wed Apr 15 22:47:08 2009
From: rowen at u.washington.edu (Russell E. Owen)
Date: Wed, 15 Apr 2009 13:47:08 -0700
Subject: [Python-Dev] RELEASED Python 2.6.2
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
Message-ID: <rowen-EDC16C.13470815042009@news.gmane.org>

Thank you for 2.6.2.

I see the Mac binary installer isn't out yet (at least it is not listed 
on the downloads page). Any chance that it will be compatible with 3rd 
party Tcl/Tk?

Most recent releases have not been; the only way I know to make a 
compatible build is to build the installer on a machine that already has 
a 3rd party Tcl/Tk installed; the resulting binary is then compatible 
with both 3rd party versions of Tcl/Tk and also with Apple's ancient 
built in version.

-- Russell

From nad at acm.org  Wed Apr 15 22:58:44 2009
From: nad at acm.org (Ned Deily)
Date: Wed, 15 Apr 2009 13:58:44 -0700
Subject: [Python-Dev] RELEASED Python 2.6.2
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
	<rowen-EDC16C.13470815042009@news.gmane.org>
Message-ID: <nad-076764.13584415042009@news.gmane.org>

In article <rowen-EDC16C.13470815042009 at news.gmane.org>,
 "Russell E. Owen" <rowen at u.washington.edu> wrote:
> I see the Mac binary installer isn't out yet (at least it is not listed 
> on the downloads page). Any chance that it will be compatible with 3rd 
> party Tcl/Tk?
> 
> Most recent releases have not been; the only way I know to make a 
> compatible build is to build the installer on a machine that already has 
> a 3rd party Tcl/Tk installed; the resulting binary is then compatible 
> with both 3rd party versions of Tcl/Tk and also with Apple's ancient 
> built in version.

Thanks for the reminder.  FWIW, that issue has recently been documented 
and there is a patch for the build script to ensure that the 3rd party 
Tcl/Tk is present during the installer build.  I don't think it made it 
into the 2.6.2 source tree, though.

<http://bugs.python.org/issue5651>

-- 
 Ned Deily,
 nad at acm.org

From pje at telecommunity.com  Wed Apr 15 23:01:49 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 15 Apr 2009 17:01:49 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49E6419D.5010302@egenix.com>
References: <49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
	<49E6419D.5010302@egenix.com>
Message-ID: <20090415205918.B5B303A4100@sparrow.telecommunity.com>

At 10:20 PM 4/15/2009 +0200, M.-A. Lemburg wrote:
>Whether base packages are useful or not is really a side aspect
>of the PEP and my proposal.

It's not whether they're useful, it's whether they're required.  Your 
proposal *requires* base packages, and for people who intend to use 
pure packages, this is NOT a feature: it's a bug.

Specifically, it introduces a large number of unnecessary, 
boilerplate dependencies to their package distribution strategy.

From barry at python.org  Wed Apr 15 23:08:38 2009
From: barry at python.org (Barry Warsaw)
Date: Wed, 15 Apr 2009 17:08:38 -0400
Subject: [Python-Dev] RELEASED Python 2.6.2
In-Reply-To: <rowen-EDC16C.13470815042009@news.gmane.org>
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
	<rowen-EDC16C.13470815042009@news.gmane.org>
Message-ID: <E2B82A71-20A7-46CC-A0C5-CA8452037279@python.org>

On Apr 15, 2009, at 4:47 PM, Russell E. Owen wrote:

> Thank you for 2.6.2.
>
> I see the Mac binary installer isn't out yet (at least it is not  
> listed
> on the downloads page). Any chance that it will be compatible with 3rd
> party Tcl/Tk?
>
> Most recent releases have not been; the only way I know to make a
> compatible build is to build the installer on a machine that already  
> has
> a 3rd party Tcl/Tk installed; the resulting binary is then compatible
> with both 3rd party versions of Tcl/Tk and also with Apple's ancient
> built in version.

I can't answer this, but Ronald is building the OS X image for 2.6.2,  
AFAIK.  I think it will be out soon, and maybe he can answer your Tcl/ 
Tk question.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090415/f00a1adf/attachment.pgp>

From pje at telecommunity.com  Wed Apr 15 23:11:32 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 15 Apr 2009 17:11:32 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com
 >
References: <49DB6A1F.50801@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
	<94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com>
Message-ID: <20090415210902.848443A4100@sparrow.telecommunity.com>

At 10:00 PM 4/15/2009 +0200, Tarek Ziad? wrote:
>Now for the "base" or "core" package, what peoplethat uses setuptools
>do most of the time:
>
>1- they use zc.buildout so they don't need a base package : they list
>in a configuration files all packages needed
>    to build the application, and one of these package happen to have
>the scripts to launch the application.
>
>2 - they have a "main" package that doesn't use the same namespace,
>but uses setuptools instal_requires metadata
>      to include namespaced packages. It acts like zc.buildout in some ways.
>
>For example, you mentioned atomisator.* in your example, this app has
>a main package called "Atomisator" (notice the upper A)
>that uses strategy #2

I think that there is some confusion here.  A "main" package or 
buildout that assembles a larger project from components is not the 
same thing as having a "base" package for a namespace package.

A base or core package is one that is depended upon by most or all of 
the related projects.  In other words, the dependencies are in the 
*opposite direction* from what you described above.  To have a base 
package in setuptools, you would move the target code from the 
namespace package __init__.py to another module or subpackage within 
your namespace, then make all your other projects depend on the 
project containing that module or subpackage.

And I explicitly excluded from my survey any packages that were 
following this strategy, on the assumption that they might consider 
switching to an __init__.py or __pkg__.py strategy if some version of 
PEP 382 were supported by setuptools, since they already have a 
"base" or "core" project -- in that case, they are only changing ONE 
of their packages' distribution metadata to adopt the new strategy, 
because the dependencies already exist.

>So :
>- having namespaces natively in Python is a big win (Namespaces are
>one honking great idea -- let's do more of those!)
>- being able to still write some code under the primary namespace is
>something I (and lots of people) wish we could do
>   with setuptools, so it's a big win too.

Yes, that's why I support Martin's proposal: it would allow 
setuptools to support this case in the future, and it would also 
allow improved startup times for installations with many 
setuptools-based namespace packages installed in flat form.  (Contra 
MAL's claims of decreased performance: adopting Martin's proposal 
allows there to be *fewer* .pth files read at startup, because only 
.pkg files for an actually-imported package need to be read.)

From dalcinl at gmail.com  Thu Apr 16 01:13:13 2009
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Wed, 15 Apr 2009 20:13:13 -0300
Subject: [Python-Dev] Why does read() return bytes instead of bytearray?
In-Reply-To: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com>
References: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com>
Message-ID: <e7ba66e40904151613i5cbb2f64p38b0fc5742f9e0ff@mail.gmail.com>

On Wed, Apr 15, 2009 at 12:05 AM, Dan Eloff <dan.eloff at gmail.com> wrote:
>>No, the read() method did not change from the 2.x series. It returns a new object on each call.
>
> I think you misunderstand me, but the readinto() method looks like a
> perfectly reasonable solution, I didn't realize it existed, as it's
> not in the library reference on file objects. Thanks for enlightening
> me, I feel a little stupid now :)
>

However, your original question is still valid ... Why a binary read()
returns an immutable type?

-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

From solipsis at pitrou.net  Thu Apr 16 01:16:29 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 15 Apr 2009 23:16:29 +0000 (UTC)
Subject: [Python-Dev] Why does read() return bytes instead of bytearray?
References: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com>
	<e7ba66e40904151613i5cbb2f64p38b0fc5742f9e0ff@mail.gmail.com>
Message-ID: <loom.20090415T231438-86@post.gmane.org>

Lisandro Dalcin <dalcinl <at> gmail.com> writes:
> 
> However, your original question is still valid ... Why a binary read()
> returns an immutable type?

Because bytes is the standard type for holding binary data. Bytearray should
only be used when there's a real, measured performance advantage doing so
(which, IMHO, is rarer than you think). An immutable type makes daily
programming much less error-prone.

Regards

Antoine.

From stephen at xemacs.org  Thu Apr 16 02:59:45 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 16 Apr 2009 09:59:45 +0900
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <49E6419D.5010302@egenix.com>
References: <49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
	<49E6419D.5010302@egenix.com>
Message-ID: <87ab6h5s72.fsf@xemacs.org>

M.-A. Lemburg writes:

 > Hmm, setuptools doesn't support the notion of base packages, ie.
 > packages that provide their own __init__.py module, so I fail
 > to see how your list or any other list of setuptools-depend
 > packages can be taken as indicator for anything related to
 > base packages.

AFAICS the only things PJE has said about base packages is that

  (a) they aren't a universal use case for namespace packages, and
  (b) he'd like to be able to support them in setuptools, but admits
      that at present they aren't.

Your arguments against the PEP supporting namespace packages as
currently supported by setuptools seem purely theoretical to me, while
he's defending an actual and common use case.  "Although practicality
beats purity."  I think that for this PEP it's more important to unify
the various use cases for namespace packages than it is to get rid of
the .pth files.

From pje at telecommunity.com  Thu Apr 16 04:45:29 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 15 Apr 2009 22:45:29 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <87ab6h5s72.fsf@xemacs.org>
References: <49DB6A1F.50801@egenix.com>
	<20090407174355.B62983A4063@sparrow.telecommunity.com>
	<49E4A58F.70309@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
	<49E6419D.5010302@egenix.com> <87ab6h5s72.fsf@xemacs.org>
Message-ID: <20090416024300.6E2843A4100@sparrow.telecommunity.com>

At 09:59 AM 4/16/2009 +0900, Stephen J. Turnbull wrote:
>I think that for this PEP it's more important to unify
>the various use cases for namespace packages than it is to get rid of
>the .pth files.

Actually, Martin's proposal *does* get rid of the .pth files in 
site-packages, and replaces them with other files inside the 
individual packages.  (Thereby speeding startup times when many 
namespace packages are present but only a few are used.)

So Martin's proposal is a win for performance and even for decreasing 
clutter.  (The same number of special files will be present, but they 
will be moved inside the namespace package directories instead of 
being in the parent directory.)

>AFAICS the only things PJE has said about base packages is that
>
>   (a) they aren't a universal use case for namespace packages, and
>   (b) he'd like to be able to support them in setuptools, but admits
>       that at present they aren't.

...and that Martin's proposal would actually permit me to do so, 
whereas MAL's proposal would not.

Replacing __init__.py with a __pkg__.py wouldn't change any of the 
tradeoffs for how setuptools handles namespace packages, except to 
add an extra variable to consider (i.e., two filenames to keep track of).

From glyph at divmod.com  Thu Apr 16 05:46:02 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 16 Apr 2009 03:46:02 -0000
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090415210902.848443A4100@sparrow.telecommunity.com>
References: <49DB6A1F.50801@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
	<94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com>
	<20090415210902.848443A4100@sparrow.telecommunity.com>
Message-ID: <20090416034602.12555.179034490.divmod.xquotient.8434@weber.divmod.com>

On 15 Apr, 09:11 pm, pje at telecommunity.com wrote:
>I think that there is some confusion here.  A "main" package or 
>buildout that assembles a larger project from components is not the 
>same thing as having a "base" package for a namespace package.

I'm certainly confused.

Twisted has its own system for "namespace" packages, and I'm not really 
sure where we fall in this discussion.  I haven't been able to follow 
the whole thread, but my original understanding was that the PEP 
supports "defining packages", which we now seem to be calling "base 
packages", just fine.  I don't understand the controversy over the 
counterproposal, since it seems roughly functionally equivalent to me.

I'd appreciate it if the PEP could also be extended cover Twisted's very 
similar mechanism for namespace packages, 
"twisted.plugin.pluginPackagePaths".  I know this is not quite as widely 
used as setuptools' namespace package support, but its existence belies 
a need for standardization.

The PEP also seems a bit vague with regard to the treatment of other 
directories containing __init__.py and *.pkg files.  The concept of a 
"defining package" seems important to avoid conflicts like this one:

    http://twistedmatrix.com/trac/ticket/2339

More specifically I don't quite understand the PEP's intentions towards 
hierarchical packages.  It says that all of sys.path will be searched, 
but what about this case?

In Twisted, the suggested idiom to structure a project which wants to 
provide Twisted plugins is to have a directory structure like this:

  MyProject/
    myproject/
      __init__.py
    twisted/
      plugins/
        myproject_plugin.py

If you then put MyProject on PYTHONPATH, MyProject/twisted/plugins will 
be picked up automatically by the plugin machinery.  However, as 
"twisted" is *not* a "namespace" package in the same way, .py files in 
MyProject/twisted/ would not be picked up - this is very much 
intentional, since the "twisted" namespace is intended to be reserved 
for packages that we actually produce.  If either MyProject/twisted or 
MyProject/twisted/plugins/ had an __init__.py, then no modules in 
MyProject/twisted/plugins/ would be picked up, because it would be 
considered a conflicting package.

This is important so that users can choose not to load the system- 
installed Twisted's plugins when they have both a system-installed 
version of Twisted and a non-installed development version of Twisted 
found first on their PYTHONPATH, and switch between them to indicate 
which version they want to be the "base" or "defining" package for the 
twisted/plugins/ namespace.

Developers might also want to have a system-installed Twisted, but a 
non-installed development version of MyProject on PYTHONPATH.

I hope this all makes sense.  As I understand it, both setuptools and 
the proposed standard would either still have the bug described by 
ticket 2339 above, or would ignore twisted/plugins/ as a namespace 
package because its parent isn't a namespace package.  I apologize for 
not testing with current setuptools before asking, but I'm not sure my 
experiments would be valid given that my environment is set up with 
assumptions from Twisted's system.

P.S.: vendor packaging systems *ARE* a major use case for just about any 
aspect of Python's package structure.  I really liked MvL's coverage of 
"vendor packages", in the PEP, since this could equally well apply to 
MSIs, python libraries distributed in bundles on OS X, debs, or RPMs. 
If this use-case were to be ignored, as one particular fellow seems to 
be advocating, then the broken packages and user confusion that has been 
going on for the last 5 years or so is just going to get worse.

From jess.austin at gmail.com  Thu Apr 16 08:18:01 2009
From: jess.austin at gmail.com (Jess Austin)
Date: Thu, 16 Apr 2009 01:18:01 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
Message-ID: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>

hi,

I'm new to python core development, and I've been advised to write to
python-dev concerning a feature/patch I've placed at
http://bugs.python.org/issue5434, with Rietveld at
http://codereview.appspot.com/25079.

This patch adds a "monthdelta" class and a "monthmod" function to the
datetime module.  The monthdelta class is much like the existing
timedelta class, except that it represents months offset from a date,
rather than an exact period offset from a date.  This allows us to
easily say, e.g. "3 months from now" without worrying about the number
of days in the intervening months.

    >>> date(2008, 1, 30) + monthdelta(1)
    datetime.date(2008, 2, 29)
    >>> date(2008, 1, 30) + monthdelta(2)
    datetime.date(2008, 3, 30)

The monthmod function, named in (imperfect) analogy to divmod, allows
us to round-trip by returning the interim between two dates
represented as a (monthdelta, timedelta) tuple:

    >>> monthmod(date(2008, 1, 14), date(2009, 4, 2))
    (datetime.monthdelta(14), datetime.timedelta(19))

Invariant: dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] == dt + td

These also work with datetimes!  There are more details in the
documentation included in the patch.  In addition to the C module
file, I've updated the datetime CAPI, the documentation, and tests.

I feel this would be a good addition to core python.  In my work, I've
often ended up writing annoying one-off "add-a-month" or similar
functions.  I think since months work differently than most other time
periods, a new object is justified rather than trying to shoe-horn
something like this into timedelta.  I also think that the round-trip
functionality provided by monthmod is important to ensure that
monthdeltas are "first-class" objects.

Please let me know what you think of the idea and/or its execution.

thanks,
Jess Austin

From tleeuwenburg at gmail.com  Thu Apr 16 10:06:41 2009
From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg)
Date: Thu, 16 Apr 2009 18:06:41 +1000
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
Message-ID: <43c8685c0904160106l28126ae1n8490712d794e9fe7@mail.gmail.com>

Hi Jess,

I'm sorry if I'm failing to understand the use of this function from not
looking closely at your code. I'm a bit dubious about the usefulness of this
(I'm not sure I understand the use cases), but I'm very open to being
convinced. Datetime semantics are very important in some areas -- I use them
a lot.

I'm not convinced the semantics of monthdelta are obvious.

A month doesn't have a consistent length -- it could be 28, 29, 30 or 31
days.

What happens when you ask for the date in "1 month's" time on the 31st Jan?
What date is a month after the 31st Jan?

Do you have a good spec (er, I mean PEP) for this describing what happens in
the edge cases and what is meant by a monthdelta? The bug notes say it
"deals sensibly" with these issues, but that's really not enough to
understand what the function is likely to do. At the very least, a few
well-chosen examples would help to illustrate the functionality much more
clearly.

Cheers,
-Tennessee

On Thu, Apr 16, 2009 at 4:18 PM, Jess Austin <jess.austin at gmail.com> wrote:

> hi,
>
> I'm new to python core development, and I've been advised to write to
> python-dev concerning a feature/patch I've placed at
> http://bugs.python.org/issue5434, with Rietveld at
> http://codereview.appspot.com/25079.
>
> This patch adds a "monthdelta" class and a "monthmod" function to the
> datetime module.  The monthdelta class is much like the existing
> timedelta class, except that it represents months offset from a date,
> rather than an exact period offset from a date.  This allows us to
> easily say, e.g. "3 months from now" without worrying about the number
> of days in the intervening months.
>
>    >>> date(2008, 1, 30) + monthdelta(1)
>    datetime.date(2008, 2, 29)
>    >>> date(2008, 1, 30) + monthdelta(2)
>    datetime.date(2008, 3, 30)
>
> The monthmod function, named in (imperfect) analogy to divmod, allows
> us to round-trip by returning the interim between two dates
> represented as a (monthdelta, timedelta) tuple:
>
>    >>> monthmod(date(2008, 1, 14), date(2009, 4, 2))
>    (datetime.monthdelta(14), datetime.timedelta(19))
>
> Invariant: dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] == dt + td
>
> These also work with datetimes!  There are more details in the
> documentation included in the patch.  In addition to the C module
> file, I've updated the datetime CAPI, the documentation, and tests.
>
> I feel this would be a good addition to core python.  In my work, I've
> often ended up writing annoying one-off "add-a-month" or similar
> functions.  I think since months work differently than most other time
> periods, a new object is justified rather than trying to shoe-horn
> something like this into timedelta.  I also think that the round-trip
> functionality provided by monthmod is important to ensure that
> monthdeltas are "first-class" objects.
>
> Please let me know what you think of the idea and/or its execution.
>
> thanks,
> Jess Austin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/tleeuwenburg%40gmail.com
>

-- 
--------------------------------------------------
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090416/89ad3dba/attachment.htm>

From phd at phd.pp.ru  Thu Apr 16 10:10:36 2009
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Thu, 16 Apr 2009 12:10:36 +0400
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
Message-ID: <20090416081036.GA25435@phd.pp.ru>

On Thu, Apr 16, 2009 at 01:18:01AM -0500, Jess Austin wrote:
> I'm new to python core development, and I've been advised to write to
> python-dev concerning a feature/patch I've placed at
> http://bugs.python.org/issue5434, with Rietveld at
> http://codereview.appspot.com/25079.

   I have read the python code and it looks good. I often have a need to do
month-based calculations.

> This patch adds a "monthdelta" class and a "monthmod" function to the
> datetime module.  The monthdelta class is much like the existing
> timedelta class, except that it represents months offset from a date,
> rather than an exact period offset from a date.

   I'd rather see the code merged with timedelta: timedelta(months=n).

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From skip at pobox.com  Thu Apr 16 10:45:24 2009
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 16 Apr 2009 03:45:24 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
Message-ID: <18918.61476.980951.991275@montanaro.dyndns.org>

    >>> date(2008, 1, 30) + monthdelta(1)
    datetime.date(2008, 2, 29)

What would this loop would print?

    for d in range(1, 32):
        print date(2008, 1, d) + monthdelta(1)

I have this funny feeling that arithmetic using monthdelta wouldn't always
be intuitive.

Skip

From lists at jwp.name  Thu Apr 16 11:44:14 2009
From: lists at jwp.name (James Pye)
Date: Thu, 16 Apr 2009 02:44:14 -0700
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <20090416081036.GA25435@phd.pp.ru>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<20090416081036.GA25435@phd.pp.ru>
Message-ID: <D0605735-EB94-4ED6-B50B-40C5141E0FA2@jwp.name>

On Apr 16, 2009, at 1:10 AM, Oleg Broytmann wrote:
>> This patch adds a "monthdelta" class and a "monthmod" function to the
>> datetime module.  The monthdelta class is much like the existing
>> timedelta class, except that it represents months offset from a date,
>> rather than an exact period offset from a date.
>
>   I'd rather see the code merged with timedelta: timedelta(months=n).

+1

From amauryfa at gmail.com  Thu Apr 16 11:54:13 2009
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Thu, 16 Apr 2009 11:54:13 +0200
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <18918.61476.980951.991275@montanaro.dyndns.org>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
Message-ID: <e27efe130904160254j3a1cc989x629bcdeb2fb259c@mail.gmail.com>

On Thu, Apr 16, 2009 at 10:45,  <skip at pobox.com> wrote:
> ? ?>>> date(2008, 1, 30) + monthdelta(1)
> ? ?datetime.date(2008, 2, 29)
>
> What would this loop would print?
>
> ? ?for d in range(1, 32):
> ? ? ? ?print date(2008, 1, d) + monthdelta(1)
>
> I have this funny feeling that arithmetic using monthdelta wouldn't always
> be intuitive.

FWIW, the Oracle database has two methods for adding months:
1- the add_months() function
    add_months(to_date('31-jan-2005'), 1)
2- the ANSI interval:
    to_date('31-jan-2005') + interval '1' month

"add_months" is calendar sensitive, "interval" is not.
"interval" raises an exception if the day is not valid for the target
month (which is the case in my example)

"add_months" is similar to the proposed monthdelta(),
except that it has a special case for the last day of the month:
"""
If date is the last day of the month or if the resulting month has
fewer days than the day
component of date, then the result is the last day of the resulting month.
Otherwise, the result has the same day component as date.
"""
indeed:
    add_months(to_date('28-feb-2005'), 1) == to_date('31-mar-2005')

In my opinion:
arithmetic with months is a mess. There is no such "month interval" or
"year interval" with a precise definition.
If we adopt some kind of month manipulation, it should be a function
or a method, like you would do for features like last_day_of_month(d),
or following_weekday(d, 'monday').

    date(2008, 1, 30).add_months(1) == date(2008, 2, 29)

-- 
Amaury Forgeot d'Arc

From dirkjan at ochtman.nl  Thu Apr 16 12:16:15 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Thu, 16 Apr 2009 12:16:15 +0200
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <e27efe130904160254j3a1cc989x629bcdeb2fb259c@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<e27efe130904160254j3a1cc989x629bcdeb2fb259c@mail.gmail.com>
Message-ID: <ea2499da0904160316x2b5206f7n7a1097797ca883ac@mail.gmail.com>

On Thu, Apr 16, 2009 at 11:54, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote:
> In my opinion:
> arithmetic with months is a mess. There is no such "month interval" or
> "year interval" with a precise definition.
> If we adopt some kind of month manipulation, it should be a function
> or a method, like you would do for features like last_day_of_month(d),
> or following_weekday(d, 'monday').
>
> ? ?date(2008, 1, 30).add_months(1) == date(2008, 2, 29)

I concur. Trying to shoehorn month arithmetic into timedelta is a
PITA, precisely because it's somewhat inexact. It's better to have
some separate behavior that has well-defined behavior in edge cases.

Cheers,

Dirkjan

From jon+python-dev at unequivocal.co.uk  Thu Apr 16 12:47:26 2009
From: jon+python-dev at unequivocal.co.uk (Jon Ribbens)
Date: Thu, 16 Apr 2009 11:47:26 +0100
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <20090416081036.GA25435@phd.pp.ru>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<20090416081036.GA25435@phd.pp.ru>
Message-ID: <20090416104726.GS24050@snowy.squish.net>

On Thu, Apr 16, 2009 at 12:10:36PM +0400, Oleg Broytmann wrote:
> > This patch adds a "monthdelta" class and a "monthmod" function to the
> > datetime module.  The monthdelta class is much like the existing
> > timedelta class, except that it represents months offset from a date,
> > rather than an exact period offset from a date.
> 
>    I'd rather see the code merged with timedelta: timedelta(months=n).

Unfortunately, that's simply impossible. A timedelta is a fixed number
of seconds, and the time between one month and the next varies.

I am very much in favour of there being the ability to add months to
dates though. Obviously there is the question of what to do when you
move forward 1 month from the 31st January; in my opinion an optional
argument to specify different behaviours would be nice.

From ronaldoussoren at mac.com  Thu Apr 16 14:35:25 2009
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Thu, 16 Apr 2009 14:35:25 +0200
Subject: [Python-Dev] RELEASED Python 2.6.2
In-Reply-To: <rowen-EDC16C.13470815042009@news.gmane.org>
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
	<rowen-EDC16C.13470815042009@news.gmane.org>
Message-ID: <FE67E6CC-039C-4524-968F-BB2D1FD42AA3@mac.com>

On 15 Apr, 2009, at 22:47, Russell E. Owen wrote:

> Thank you for 2.6.2.
>
> I see the Mac binary installer isn't out yet (at least it is not  
> listed
> on the downloads page). Any chance that it will be compatible with 3rd
> party Tcl/Tk?

The Mac installer is late because I missed the pre-announcement of the  
2.6.2 tag. I sent the installer to Barry earlier today.

The installer was build using a 3th-party installation of Tcl/Tk.

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2224 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090416/3d0fcb77/attachment.bin>

From p.f.moore at gmail.com  Thu Apr 16 16:54:08 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 16 Apr 2009 15:54:08 +0100
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
Message-ID: <79990c6b0904160754r3761518an967fc543a76767d5@mail.gmail.com>

2009/4/16 Jess Austin <jess.austin at gmail.com>:
> I'm new to python core development, and I've been advised to write to
> python-dev concerning a feature/patch I've placed at
> http://bugs.python.org/issue5434, with Rietveld at
> http://codereview.appspot.com/25079.
>
> This patch adds a "monthdelta" class and a "monthmod" function to the
> datetime module. ?The monthdelta class is much like the existing
> timedelta class, except that it represents months offset from a date,
> rather than an exact period offset from a date. ?This allows us to
> easily say, e.g. "3 months from now" without worrying about the number
> of days in the intervening months.
>
> ? ?>>> date(2008, 1, 30) + monthdelta(1)
> ? ?datetime.date(2008, 2, 29)
> ? ?>>> date(2008, 1, 30) + monthdelta(2)
> ? ?datetime.date(2008, 3, 30)
>
> The monthmod function, named in (imperfect) analogy to divmod, allows
> us to round-trip by returning the interim between two dates
> represented as a (monthdelta, timedelta) tuple:
>
> ? ?>>> monthmod(date(2008, 1, 14), date(2009, 4, 2))
> ? ?(datetime.monthdelta(14), datetime.timedelta(19))
>
> Invariant: dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] == dt + td

I like the idea in principle. In practice, of course, month
calculations are inherently ill-defined, so you need to be very
specific in documenting all of the edge cases, and you should have
strong use cases to ensure that the behaviour implemented matches user
requirements. (I haven't yet had time to read the patch - you may well
already have these points covered, certainly your comments above
indicate that you appreciate the subtleties involved).

> These also work with datetimes! ?There are more details in the
> documentation included in the patch. ?In addition to the C module
> file, I've updated the datetime CAPI, the documentation, and tests.
>
> I feel this would be a good addition to core python. ?In my work, I've
> often ended up writing annoying one-off "add-a-month" or similar
> functions. ?I think since months work differently than most other time
> periods, a new object is justified rather than trying to shoe-horn
> something like this into timedelta. ?I also think that the round-trip
> functionality provided by monthmod is important to ensure that
> monthdeltas are "first-class" objects.

I agree that ultimately it would be useful in the core. However, I'd
suggest that you release the functionality as an independent module in
the first instance, to establish it outside of the core. Once it has
matured somewhat as a 3rd party module, it would then be ready for
integration in the core. This also has the benefit that it makes the
functionality available to users of Python 2.6 (and possibly earlier)
rather than just in 2.7/3.1 onwards.

> Please let me know what you think of the idea and/or its execution.

I hope the above comments help. Ultimately, I'd like to see this added
to the core. It's tricky enough that having a "standard"
implementation is a definite benefit in itself. But equally, I'd give
it time to iron out the corner cases on a faster development cycle
than the core offers before "freezing" it as part of the stdlib.

Paul.

From p.f.moore at gmail.com  Thu Apr 16 16:56:40 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 16 Apr 2009 15:56:40 +0100
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <18918.61476.980951.991275@montanaro.dyndns.org>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
Message-ID: <79990c6b0904160756k761cdbagf085fa4966176f70@mail.gmail.com>

2009/4/16  <skip at pobox.com>:
> ? ?>>> date(2008, 1, 30) + monthdelta(1)
> ? ?datetime.date(2008, 2, 29)
>
> What would this loop would print?
>
> ? ?for d in range(1, 32):
> ? ? ? ?print date(2008, 1, d) + monthdelta(1)
>
> I have this funny feeling that arithmetic using monthdelta wouldn't always
> be intuitive.

Oh, certainly! But in the absence of "intuitive", I've found in the
past that "standardised" is often better than nothing :-) (For
example, I use Oracle's add_months function fairly often - it's not
perfect, and not always intuitive, but at least it's well-defined in
the corner cases, and fine for "normal" use).

Paul.

From solipsis at pitrou.net  Thu Apr 16 17:12:26 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 16 Apr 2009 15:12:26 +0000 (UTC)
Subject: [Python-Dev] Issue5434: datetime.monthdelta
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<79990c6b0904160756k761cdbagf085fa4966176f70@mail.gmail.com>
Message-ID: <loom.20090416T151006-190@post.gmane.org>

Paul Moore <p.f.moore <at> gmail.com> writes:
> 
> Oh, certainly! But in the absence of "intuitive", I've found in the
> past that "standardised" is often better than nothing  (For
> example, I use Oracle's add_months function fairly often - it's not
> perfect, and not always intuitive, but at least it's well-defined in
> the corner cases, and fine for "normal" use).

I think something like "date.add_months()" would be better than the proposed
monthdelta. The monthdelta proposal suggests that addition is something
well-defined and rigourous, which is not really the case here (for example, if
you add a monthdelta and then substract it again, I'm not sure you always get
back the original date).

Regards

Antoine.

From pje at telecommunity.com  Thu Apr 16 17:36:18 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Thu, 16 Apr 2009 11:36:18 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090416034602.12555.179034490.divmod.xquotient.8434@weber
	.divmod.com>
References: <49DB6A1F.50801@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
	<94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com>
	<20090415210902.848443A4100@sparrow.telecommunity.com>
	<20090416034602.12555.179034490.divmod.xquotient.8434@weber.divmod.com>
Message-ID: <20090416153350.702303A4100@sparrow.telecommunity.com>

At 03:46 AM 4/16/2009 +0000, glyph at divmod.com wrote:

>On 15 Apr, 09:11 pm, pje at telecommunity.com wrote:
>>I think that there is some confusion here.  A "main" package or 
>>buildout that assembles a larger project from components is not the 
>>same thing as having a "base" package for a namespace package.
>
>I'm certainly confused.
>
>Twisted has its own system for "namespace" packages, and I'm not 
>really sure where we fall in this discussion.  I haven't been able 
>to follow the whole thread, but my original understanding was that 
>the PEP supports "defining packages", which we now seem to be 
>calling "base packages", just fine.

Yes, it does.  The discussion since the original proposal, however, 
has been dominated by MAL's counterproposal, which *requires* a 
defining package.

There is a slight distinction between "base package" and "defining 
package", although I suppose I've been using them a bit interchangeably.

Base package describes a use case: you have a base package which is 
extended in the same namespace.  In that use case, you may want to 
place your base package in the defining package.

In contrast, setuptools does not support a defining package, so if 
you have a base package, you must place it in a submodule or 
subpackage of the namespace.

Does that all make sense now?

MAL's proposal requires a defining package, which is 
counterproductive if you have a pure package with no base, since it 
now requires you to create an additional project on PyPI just to hold 
your defining package.

>I'd appreciate it if the PEP could also be extended cover Twisted's 
>very similar mechanism for namespace packages, 
>"twisted.plugin.pluginPackagePaths".  I know this is not quite as 
>widely used as setuptools' namespace package support, but its 
>existence belies a need for standardization.
>
>The PEP also seems a bit vague with regard to the treatment of other 
>directories containing __init__.py and *.pkg files.

Do you have a clarification to suggest?  My understanding (probably a 
projection) is that to be a nested namespace package, you have to 
have a parent namespace package.

>   The concept of a "defining package" seems important to avoid 
> conflicts like this one:
>
>    http://twistedmatrix.com/trac/ticket/2339
>
>More specifically I don't quite understand the PEP's intentions 
>towards hierarchical packages.  It says that all of sys.path will be 
>searched, but what about this case?
>
>In Twisted, the suggested idiom to structure a project which wants 
>to provide Twisted plugins is to have a directory structure like this:
>
>  MyProject/
>    myproject/
>      __init__.py
>    twisted/
>      plugins/
>        myproject_plugin.py
>
>If you then put MyProject on PYTHONPATH, MyProject/twisted/plugins 
>will be picked up automatically by the plugin machinery.

Namespaces are not plugins and vice versa.  The purpose of a 
namespace package is to allow projects managed by the same entity to 
share a namespace (ala Java "package" names) and avoid naming 
conflicts with other authors.

A plugin system, by contrast, is explicitly intended for use by 
multiple authors, so the use case is rather different...  and using 
namespace packages for plugins actually *increases* the possibility 
of naming conflicts, unless you add back in another level of 
hierarchy.  (As apparently you are recommending via "myproject_plugin".)

>   However, as "twisted" is *not* a "namespace" package in the same 
> way, .py files in MyProject/twisted/ would not be picked up - this 
> is very much intentional, since the "twisted" namespace is intended 
> to be reserved for packages that we actually produce.  If either 
> MyProject/twisted or MyProject/twisted/plugins/ had an __init__.py, 
> then no modules in MyProject/twisted/plugins/ would be picked up, 
> because it would be considered a conflicting package.

Precisely.  Note, however, that neither is twisted.plugins a 
namespace package, and it should not contain any .pkg files.  I don't 
think it's reasonable to abuse PEP 382 namespace packages as a plugin 
system.  In setuptools' case, a different mechanism is provided for 
locating plugin code, and of course Twisted already has its own 
system for the same thing.  It would be nice to have a standardized 
way of locating plugins in the stdlib, but that will need to be a 
different PEP.

>I hope this all makes sense.  As I understand it, both setuptools 
>and the proposed standard would either still have the bug described 
>by ticket 2339 above, or would ignore twisted/plugins/ as a 
>namespace package because its parent isn't a namespace package.

If twisted/ lacked an __init__.py, then setuptools would ignore 
it.  Under PEP 382, the same, unless it had .pkg files.  (Again, 
setuptools explicitly does not support using namespace packages as a 
plugin mechanism.)

>P.S.: vendor packaging systems *ARE* a major use case for just about 
>any aspect of Python's package structure.  I really liked MvL's 
>coverage of "vendor packages", in the PEP, since this could equally 
>well apply to MSIs, python libraries distributed in bundles on OS X, 
>debs, or RPMs. If this use-case were to be ignored, as one 
>particular fellow seems to be advocating, then the broken packages 
>and user confusion that has been going on for the last 5 years or so 
>is just going to get worse.

Indeed. 

From nad at acm.org  Thu Apr 16 18:47:37 2009
From: nad at acm.org (Ned Deily)
Date: Thu, 16 Apr 2009 09:47:37 -0700
Subject: [Python-Dev] Issue5434: datetime.monthdelta
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
Message-ID: <nad-31D99A.09473716042009@news.gmane.org>

In article 
<b8ad139e0904152318p5473cbe5yb5f55a19894cc834 at mail.gmail.com>,
 Jess Austin <jess.austin at gmail.com> wrote:
> I'm new to python core development, and I've been advised to write to
> python-dev concerning a feature/patch I've placed at
> http://bugs.python.org/issue5434, with Rietveld at
> http://codereview.appspot.com/25079.

Without having looked at the code, I wonder whether you've looked at 
python-dateutil.   I believe its relativedelta type does what you 
propose, plus much more, and it has the advantage of being widely used 
and tested.

<http://labix.org/python-dateutil>

-- 
 Ned Deily,
 nad at acm.org

From jess.austin at gmail.com  Thu Apr 16 20:31:04 2009
From: jess.austin at gmail.com (Jess Austin)
Date: Thu, 16 Apr 2009 13:31:04 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <18918.61476.980951.991275@montanaro.dyndns.org>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
Message-ID: <b8ad139e0904161131g2f2b84fbpd67952697952afa9@mail.gmail.com>

On Thu, Apr 16, 2009 at 3:45 AM,  <skip at pobox.com> wrote:
> ? ?>>> date(2008, 1, 30) + monthdelta(1)
> ? ?datetime.date(2008, 2, 29)
>
> What would this loop would print?
>
> ? ?for d in range(1, 32):
> ? ? ? ?print date(2008, 1, d) + monthdelta(1)

>>> for d in range(1, 32):
...     print(date(2008, 1, d) + monthdelta(1))
...
2008-02-01
2008-02-02
2008-02-03
2008-02-04
2008-02-05
2008-02-06
2008-02-07
2008-02-08
2008-02-09
2008-02-10
2008-02-11
2008-02-12
2008-02-13
2008-02-14
2008-02-15
2008-02-16
2008-02-17
2008-02-18
2008-02-19
2008-02-20
2008-02-21
2008-02-22
2008-02-23
2008-02-24
2008-02-25
2008-02-26
2008-02-27
2008-02-28
2008-02-29
2008-02-29
2008-02-29

> I have this funny feeling that arithmetic using monthdelta wouldn't always
> be intuitive.

I think that's true, especially since these calculations are not
necessarily invertible:

>>> date(2008, 1, 30) + monthdelta(1)
datetime.date(2008, 2, 29)
>>> date(2008, 2, 29) - monthdelta(1)
datetime.date(2008, 1, 29)

It could be that non-intuitivity is inherent in the problem of dealing
with dates and months.  I've aimed for a good compromise between the
needs of the problem and the pythonic example of timedelta.  I would
submit that timedelta itself isn't intuitive at first blush,
especially if one was weaned on the arcana of RDBMS date functions,
but after one uses timedelta for just a bit it makes total sense.  I
hope the same may be said of monthdelta.

cheers,
Jess

From p.f.moore at gmail.com  Thu Apr 16 20:31:36 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 16 Apr 2009 19:31:36 +0100
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <loom.20090416T151006-190@post.gmane.org>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<79990c6b0904160756k761cdbagf085fa4966176f70@mail.gmail.com>
	<loom.20090416T151006-190@post.gmane.org>
Message-ID: <79990c6b0904161131o5b85555k9ba488c1de87f06b@mail.gmail.com>

2009/4/16 Antoine Pitrou <solipsis at pitrou.net>:
> Paul Moore <p.f.moore <at> gmail.com> writes:
>>
>> Oh, certainly! But in the absence of "intuitive", I've found in the
>> past that "standardised" is often better than nothing ?(For
>> example, I use Oracle's add_months function fairly often - it's not
>> perfect, and not always intuitive, but at least it's well-defined in
>> the corner cases, and fine for "normal" use).
>
> I think something like "date.add_months()" would be better than the proposed
> monthdelta. The monthdelta proposal suggests that addition is something
> well-defined and rigourous, which is not really the case here (for example, if
> you add a monthdelta and then substract it again, I'm not sure you always get
> back the original date).

I didn't particularly get that impression, but I understand what
you're saying. Personally, I don't think it matters much one way or
the other.

But as well as monthdelta, the proposal included monthmod. I'm not
entirely happy with the name, but I like the idea - and particularly
the invariant dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] ==
dt + td. For me, that makes it a lot easier to reason about month
increments.

One thing I have certainly needed in the past is a robust way of
converting a difference between two dates into "natural language" - 3
years, 2 months, 1 week and 5 days (or whatever). For that type of
application, monthmod would have been invaluable.

In my view, monthdelta seems a lot more natural alongside monthmod,
than an add_months method would. And as monthmod is a function of two
dates, it can't really be a method (OK, I know, something horrid like
date1.monthdiff(date2) is possible, but honestly, I don't see that as
reasonable).

But this type of API design discussion does emphasise why I think the
module should be a 3rd party package for a while before going into the
stdlib.

Paul.

From p.f.moore at gmail.com  Thu Apr 16 20:42:00 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 16 Apr 2009 19:42:00 +0100
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <nad-31D99A.09473716042009@news.gmane.org>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<nad-31D99A.09473716042009@news.gmane.org>
Message-ID: <79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com>

2009/4/16 Ned Deily <nad at acm.org>:
> In article
> <b8ad139e0904152318p5473cbe5yb5f55a19894cc834 at mail.gmail.com>,
> ?Jess Austin <jess.austin at gmail.com> wrote:
>> I'm new to python core development, and I've been advised to write to
>> python-dev concerning a feature/patch I've placed at
>> http://bugs.python.org/issue5434, with Rietveld at
>> http://codereview.appspot.com/25079.
>
> Without having looked at the code, I wonder whether you've looked at
> python-dateutil. ? I believe its relativedelta type does what you
> propose, plus much more, and it has the advantage of being widely used
> and tested.

The key thing missing (I believe) from dateutil is any equivalent of monthmod.

Hmm, it might be possible via relativedelta(d1,d2), but it's not clear
to me from the documentation precisely what attributes/methods of a
relativedelta object are valid for getting data *out* of it.

I do agree, though, that any proposal to extend the Python datetime
module should at least be informed by the design of the dateutil
module.

Paul.

From jess.austin at gmail.com  Thu Apr 16 20:47:42 2009
From: jess.austin at gmail.com (Jess Austin)
Date: Thu, 16 Apr 2009 13:47:42 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <e27efe130904160254j3a1cc989x629bcdeb2fb259c@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<e27efe130904160254j3a1cc989x629bcdeb2fb259c@mail.gmail.com>
Message-ID: <b8ad139e0904161147k67a687dcnfee1d8e87e80bba1@mail.gmail.com>

On Thu, Apr 16, 2009 at 4:54 AM, Amaury Forgeot d'Arc
<amauryfa at gmail.com> wrote:
> FWIW, the Oracle database has two methods for adding months:
> 1- the add_months() function
> ? ?add_months(to_date('31-jan-2005'), 1)
> 2- the ANSI interval:
> ? ?to_date('31-jan-2005') + interval '1' month
>
> "add_months" is calendar sensitive, "interval" is not.
> "interval" raises an exception if the day is not valid for the target
> month (which is the case in my example)
>
> "add_months" is similar to the proposed monthdelta(),
> except that it has a special case for the last day of the month:
> """
> If date is the last day of the month or if the resulting month has
> fewer days than the day
> component of date, then the result is the last day of the resulting month.
> Otherwise, the result has the same day component as date.
> """
> indeed:
> ? ?add_months(to_date('28-feb-2005'), 1) == to_date('31-mar-2005')

My proposal has the "calendar sensitive" semantics you describe.  It
will not raise an exception in this case.

> In my opinion:
> arithmetic with months is a mess. There is no such "month interval" or
> "year interval" with a precise definition.
> If we adopt some kind of month manipulation, it should be a function
> or a method, like you would do for features like last_day_of_month(d),
> or following_weekday(d, 'monday').
>
> ? ?date(2008, 1, 30).add_months(1) == date(2008, 2, 29)

I disagree with this point, in that I really like the pythonic date
calculations we have with timedelta.  It is easier to reason about
adding and subtracting objects than it is to reason about method
invocations.  Also, you can store a monthdelta in a variable, which is
sometimes convenient, and which is difficult to emulate with function
calls.

Except in certain particular cases, I'm not fond of last_day_of_month,
following_weekday, etc. functions.  Much in the way that timezone
considerations have been factored out of the core through the use of
tzinfo, I think these problems are more effectively addressed at the
level of detail one finds at the application level.  On the other
hand, it seems like effective month calculations could be useful in
the core.

cheers,
Jess

From jess.austin at gmail.com  Thu Apr 16 20:50:29 2009
From: jess.austin at gmail.com (Jess Austin)
Date: Thu, 16 Apr 2009 13:50:29 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <ea2499da0904160316x2b5206f7n7a1097797ca883ac@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<e27efe130904160254j3a1cc989x629bcdeb2fb259c@mail.gmail.com>
	<ea2499da0904160316x2b5206f7n7a1097797ca883ac@mail.gmail.com>
Message-ID: <b8ad139e0904161150q737a98e8ja24f3af76a425af1@mail.gmail.com>

On Thu, Apr 16, 2009 at 5:16 AM, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
> On Thu, Apr 16, 2009 at 11:54, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote:
>> In my opinion:
>> arithmetic with months is a mess. There is no such "month interval" or
>> "year interval" with a precise definition.
>> If we adopt some kind of month manipulation, it should be a function
>> or a method, like you would do for features like last_day_of_month(d),
>> or following_weekday(d, 'monday').
>>
>> ? ?date(2008, 1, 30).add_months(1) == date(2008, 2, 29)
>
> I concur. Trying to shoehorn month arithmetic into timedelta is a
> PITA, precisely because it's somewhat inexact. It's better to have
> some separate behavior that has well-defined behavior in edge cases.

This is my experience also, and including a distinct and well-defined
behavior in the core is exactly my intention with this patch.

From jess.austin at gmail.com  Thu Apr 16 21:28:07 2009
From: jess.austin at gmail.com (Jess Austin)
Date: Thu, 16 Apr 2009 14:28:07 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <79990c6b0904160754r3761518an967fc543a76767d5@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<79990c6b0904160754r3761518an967fc543a76767d5@mail.gmail.com>
Message-ID: <b8ad139e0904161228oa9a884fk52409a603e1cf380@mail.gmail.com>

Thanks for everyone's comments!

On Thu, Apr 16, 2009 at 9:54 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> I like the idea in principle. In practice, of course, month
> calculations are inherently ill-defined, so you need to be very
> specific in documenting all of the edge cases, and you should have
> strong use cases to ensure that the behaviour implemented matches user
> requirements. (I haven't yet had time to read the patch - you may well
> already have these points covered, certainly your comments above
> indicate that you appreciate the subtleties involved).
>
> I agree that ultimately it would be useful in the core. However, I'd
> suggest that you release the functionality as an independent module in
> the first instance, to establish it outside of the core. Once it has
> matured somewhat as a 3rd party module, it would then be ready for
> integration in the core. This also has the benefit that it makes the
> functionality available to users of Python 2.6 (and possibly earlier)
> rather than just in 2.7/3.1 onwards.

I have uploaded a python-coded version of this functionality to the
bug page.  I should backport it through 2.3 and post that to pypi, but
I haven't done that yet.  The current effort was focused on the C
module since that's how the rest of datetime is implemented, and also
I wanted to learn a bit about CPython internals.  To the latter point,
I would _really_ appreciate it if someone could leave a few comments
on Rietveld.

>> Please let me know what you think of the idea and/or its execution.
>
> I hope the above comments help. Ultimately, I'd like to see this added
> to the core. It's tricky enough that having a "standard"
> implementation is a definite benefit in itself. But equally, I'd give
> it time to iron out the corner cases on a faster development cycle
> than the core offers before "freezing" it as part of the stdlib.

I understand these concerns.  I think I was too brief in my initial
message.  Here are the docstrings:

>>> print(monthdelta.__doc__)
Months offset from a date or datetime.

monthdeltas allow date calculation without regard to the different lengths
of different months. A monthdelta value added to a date produces another
date that has the same day-of-the-month, regardless of the lengths of the
intervening months. If the resulting date is in too short a month, the
last day in that month will result:

date(2008,1,30) + monthdelta(1) -> date(2008,2,29)

monthdeltas may be added, subtracted, multiplied, and floor-divided
similarly to timedeltas. They may not be added to timedeltas directly, as
both classes are intended to be used directly with dates and datetimes.
Only ints may be passed to the constructor, the default argument of which
is 1 (one). monthdeltas are immutable.

NOTE: in calculations involving the 29th, 30th, and 31st days of the
month, monthdeltas are not necessarily invertible [i.e., the result above
would NOT imply that date(2008,2,29) - monthdelta(1) -> date(2008,1,30)].

>>> print(monthmod.__doc__)
monthmod(start, end) -> (monthdelta, timedelta)

Distribute the interim between start and end dates into monthdelta and
timedelta portions. If and only if start is after end, returned monthdelta
will be negative. Returned timedelta is never negative, and is always
smaller than the month in which end occurs.

Invariant: dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] = dt + td

There is better-looking documentation in html/library/datetime.html
and html/c-api/datetime.html in the patch.  By all means, if you're
curious, download the patch and try it out yourself!

cheers,
Jess

From rowen at u.washington.edu  Thu Apr 16 20:58:27 2009
From: rowen at u.washington.edu (Russell Owen)
Date: Thu, 16 Apr 2009 11:58:27 -0700
Subject: [Python-Dev] RELEASED Python 2.6.2
In-Reply-To: <FE67E6CC-039C-4524-968F-BB2D1FD42AA3@mac.com>
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
	<rowen-EDC16C.13470815042009@news.gmane.org>
	<FE67E6CC-039C-4524-968F-BB2D1FD42AA3@mac.com>
Message-ID: <DD982BD4-02AB-4395-AFEE-CD3D0EEB7926@u.washington.edu>

I installed the Mac binary on my Intel 10.5.6 system and it works,  
except it still uses Apple's system Tcl/Tk 8.4.7 instead of my  
ActiveState 8.4.19 (which is in /Library/Frameworks where one would  
expect).

I just built python from source and that version does use ActiveState  
8.4.19.

I wish I knew what's going on. Not being able to use the binary  
distros is a bit of a pain.

Just out of curiosity: which 3rd party Tcl/Tk did you have installed  
when you made the installer? Perhaps if it was 8.5 that would explain  
it. If so I may try updating my Tcl/Tk -- I've been wanting some of  
the bug fixes in 8.5 anyway.

-- Russell

On Apr 16, 2009, at 5:35 AM, Ronald Oussoren wrote:

>
> On 15 Apr, 2009, at 22:47, Russell E. Owen wrote:
>
>> Thank you for 2.6.2.
>>
>> I see the Mac binary installer isn't out yet (at least it is not  
>> listed
>> on the downloads page). Any chance that it will be compatible with  
>> 3rd
>> party Tcl/Tk?
>
> The Mac installer is late because I missed the pre-announcement of  
> the 2.6.2 tag. I sent the installer to Barry earlier today.
>
> The installer was build using a 3th-party installation of Tcl/Tk.
>
> Ronald

From jared.grubb at gmail.com  Thu Apr 16 22:08:07 2009
From: jared.grubb at gmail.com (Jared Grubb)
Date: Thu, 16 Apr 2009 13:08:07 -0700
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<nad-31D99A.09473716042009@news.gmane.org>
	<79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com>
Message-ID: <D90E8955-974F-4FB6-BDF0-88536DE780FC@gmail.com>

On 16 Apr 2009, at 11:42, Paul Moore wrote:
> The key thing missing (I believe) from dateutil is any equivalent of  
> monthmod.

I agree with that. It's well-defined and it makes a lot of sense. +1

But, I dont think monthdelta can be made to work... what should the  
following be?

print(date(2008,1,30) + monthdelta(1))
print(date(2008,1,30) + monthdelta(2))
print(date(2008,1,30) + monthdelta(1) + monthdelta(1))

Jared

From robert.kern at gmail.com  Thu Apr 16 22:10:15 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 16 Apr 2009 15:10:15 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>	<nad-31D99A.09473716042009@news.gmane.org>
	<79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com>
Message-ID: <gs83b7$9ts$1@ger.gmane.org>

On 2009-04-16 13:42, Paul Moore wrote:
> 2009/4/16 Ned Deily<nad at acm.org>:
>> In article
>> <b8ad139e0904152318p5473cbe5yb5f55a19894cc834 at mail.gmail.com>,
>>   Jess Austin<jess.austin at gmail.com>  wrote:
>>> I'm new to python core development, and I've been advised to write to
>>> python-dev concerning a feature/patch I've placed at
>>> http://bugs.python.org/issue5434, with Rietveld at
>>> http://codereview.appspot.com/25079.
>> Without having looked at the code, I wonder whether you've looked at
>> python-dateutil.   I believe its relativedelta type does what you
>> propose, plus much more, and it has the advantage of being widely used
>> and tested.
>
> The key thing missing (I believe) from dateutil is any equivalent of monthmod.
>
> Hmm, it might be possible via relativedelta(d1,d2), but it's not clear
> to me from the documentation precisely what attributes/methods of a
> relativedelta object are valid for getting data *out* of it.

I thought the examples were quite clear. relativedelta() has an alternate 
constructor precisely suited to these calculations but is general and handles 
more than just months.

 >>> from dateutil.relativedelta import *
 >>> dt = relativedelta(months=1)
 >>> dt
relativedelta(months=+1)
 >>> from datetime import datetime
 >>> datetime(2009, 1, 15) + dt
datetime.datetime(2009, 2, 15, 0, 0)
 >>> datetime(2009, 1, 31) + dt
datetime.datetime(2009, 2, 28, 0, 0)
 >>> dt.months
1
 >>> datetime(2009, 1, 31) + relativedelta(years=-1)
datetime.datetime(2008, 1, 31, 0, 0)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

From jess.austin at gmail.com  Thu Apr 16 23:30:13 2009
From: jess.austin at gmail.com (Jess Austin)
Date: Thu, 16 Apr 2009 16:30:13 -0500
Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143
In-Reply-To: <mailman.14603.1239912495.11745.python-dev@python.org>
References: <mailman.14603.1239912495.11745.python-dev@python.org>
Message-ID: <b8ad139e0904161430s1310bba5nba7a8c09e68916b8@mail.gmail.com>

Jared Grubb <jared.grubb at gmail.com> wrote:
> On 16 Apr 2009, at 11:42, Paul Moore wrote:
>> The key thing missing (I believe) from dateutil is any equivalent of
>> monthmod.
>
>
> I agree with that. It's well-defined and it makes a lot of sense. +1
>
> But, I dont think monthdelta can be made to work... what should the
> following be?

>>> print(date(2008,1,30) + monthdelta(1))
2008-02-29
>>> print(date(2008,1,30) + monthdelta(2))
2008-03-30
>>> print(date(2008,1,30) + monthdelta(1) + monthdelta(1))
2008-03-29

This is a perceptive observation: in the absence of parentheses to
dictate a different order of operations, the third quantity will
differ from the second.  Furthermore, this won't _always_ be true,
just for dates near the end of the month, which is nonintuitive.
(Incidentally, this is another reason why this functionality should
not just be lumped into timedelta; guarantees that have long existed
for operations with timedelta would no longer hold if it tried to deal
with months.)

I find that date calculations involving months involve a certain
amount of inherent confusion.  I've tried to reduce this by
introducing well-specified functionality that will allow accurate
reasoning, as part of the core's included batteries.  I think that one
who uses these objects will develop an intuition and write accurate
code quickly.  It is nonintuitive that order of operation matters for
addition of months, just as it matters for subtraction and division of
all objects, but with the right tools we can deal with this.  An
interesting consequence is that if I want to determine if date b is
more than a month after date a, sometimes I should use:

    b - monthdelta(1) > a

rather than

    a + monthdelta(1) < b

[Consider a list of run dates for a process that should run the last
day of every month: "a" might be date(2008, 2, 29) while "b" is
date(2008, 3, 31). In this case the two expressions would have
different values.]

cheers,
Jess

From jess.austin at gmail.com  Thu Apr 16 23:41:19 2009
From: jess.austin at gmail.com (Jess Austin)
Date: Thu, 16 Apr 2009 16:41:19 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
Message-ID: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>

Jon Ribbens <jon+python-dev at unequivocal.co.uk> wrote:
> On Thu, Apr 16, 2009 at 12:10:36PM +0400, Oleg Broytmann wrote:
>> > This patch adds a "monthdelta" class and a "monthmod" function to the
>> > datetime module. ?The monthdelta class is much like the existing
>> > timedelta class, except that it represents months offset from a date,
>> > rather than an exact period offset from a date.
>>
>> ? ?I'd rather see the code merged with timedelta: timedelta(months=n).
>
> Unfortunately, that's simply impossible. A timedelta is a fixed number
> of seconds, and the time between one month and the next varies.

I agree.

> I am very much in favour of there being the ability to add months to
> dates though. Obviously there is the question of what to do when you
> move forward 1 month from the 31st January; in my opinion an optional
> argument to specify different behaviours would be nice.

Others have suggested raising an exception when a month calculation
lands on an invalid date.  Python already has that; it's spelled like
this:

>>> dt = date(2008, 1, 31)
>>> dt.replace(month=dt.month + 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: day is out of range for month

What other behavior options besides "last-valid-day-of-the-month"
would you like to see?

cheers,
Jess

From solipsis at pitrou.net  Thu Apr 16 23:47:14 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 16 Apr 2009 21:47:14 +0000 (UTC)
Subject: [Python-Dev] Issue5434: datetime.monthdelta
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
Message-ID: <loom.20090416T214556-65@post.gmane.org>

Jess Austin <jess.austin <at> gmail.com> writes:
> 
> What other behavior options besides "last-valid-day-of-the-month"
> would you like to see?

IMHO, the question is rather what the use case is for the behaviour you are
proposing. In which kind of situation is it acceptable to turn 31/2 silently
into 29/2?

From eric at trueblade.com  Thu Apr 16 23:50:27 2009
From: eric at trueblade.com (Eric Smith)
Date: Thu, 16 Apr 2009 17:50:27 -0400
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
Message-ID: <49E7A823.3060702@trueblade.com>

Jess Austin wrote:
> What other behavior options besides "last-valid-day-of-the-month"
> would you like to see?

- Add 30 days to the source date.
I'm sure there are others.

Followups to python-ideas.

From p.f.moore at gmail.com  Fri Apr 17 00:17:07 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 16 Apr 2009 23:17:07 +0100
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <gs83b7$9ts$1@ger.gmane.org>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<nad-31D99A.09473716042009@news.gmane.org>
	<79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com>
	<gs83b7$9ts$1@ger.gmane.org>
Message-ID: <79990c6b0904161517g5c83673g3e0673072ff8abee@mail.gmail.com>

2009/4/16 Robert Kern <robert.kern at gmail.com>:
> On 2009-04-16 13:42, Paul Moore wrote:
>>
>> 2009/4/16 Ned Deily<nad at acm.org>:
>>>
>>> In article
>>> <b8ad139e0904152318p5473cbe5yb5f55a19894cc834 at mail.gmail.com>,
>>> ?Jess Austin<jess.austin at gmail.com> ?wrote:
>>>>
>>>> I'm new to python core development, and I've been advised to write to
>>>> python-dev concerning a feature/patch I've placed at
>>>> http://bugs.python.org/issue5434, with Rietveld at
>>>> http://codereview.appspot.com/25079.
>>>
>>> Without having looked at the code, I wonder whether you've looked at
>>> python-dateutil. ? I believe its relativedelta type does what you
>>> propose, plus much more, and it has the advantage of being widely used
>>> and tested.
>>
>> The key thing missing (I believe) from dateutil is any equivalent of
>> monthmod.
>>
>> Hmm, it might be possible via relativedelta(d1,d2), but it's not clear
>> to me from the documentation precisely what attributes/methods of a
>> relativedelta object are valid for getting data *out* of it.
>
> I thought the examples were quite clear. relativedelta() has an alternate
> constructor precisely suited to these calculations but is general and
> handles more than just months.
>
>>>> from dateutil.relativedelta import *
>>>> dt = relativedelta(months=1)
>>>> dt
> relativedelta(months=+1)
>>>> from datetime import datetime
>>>> datetime(2009, 1, 15) + dt
> datetime.datetime(2009, 2, 15, 0, 0)
>>>> datetime(2009, 1, 31) + dt
> datetime.datetime(2009, 2, 28, 0, 0)
>>>> dt.months
> 1
>>>> datetime(2009, 1, 31) + relativedelta(years=-1)
> datetime.datetime(2008, 1, 31, 0, 0)

Yes, but given

r = relativedelta(d1, d2)

how do I determine the number of months between d1 and d2, and the
"remainder" - what monthmod gives me. From the code, r.months looks
like it works, but it's not documented, and I'm not 100% sure if it's
always computed.

The use case I'm thinking of is converting the difference between 2
dates into "3 years, 2 months, 5 days" or whatever. I've got an
application which needs to get this right for one of the dates being
29th Feb, so I *really* get to exercise the corner cases :-)

Paul

From robert.kern at gmail.com  Fri Apr 17 00:29:24 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 16 Apr 2009 17:29:24 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <79990c6b0904161517g5c83673g3e0673072ff8abee@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>	<nad-31D99A.09473716042009@news.gmane.org>	<79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com>	<gs83b7$9ts$1@ger.gmane.org>
	<79990c6b0904161517g5c83673g3e0673072ff8abee@mail.gmail.com>
Message-ID: <gs8bg4$40v$1@ger.gmane.org>

On 2009-04-16 17:17, Paul Moore wrote:
> 2009/4/16 Robert Kern<robert.kern at gmail.com>:

>>>>> from dateutil.relativedelta import *
>>>>> dt = relativedelta(months=1)
>>>>> dt
>> relativedelta(months=+1)
>>>>> from datetime import datetime
>>>>> datetime(2009, 1, 15) + dt
>> datetime.datetime(2009, 2, 15, 0, 0)
>>>>> datetime(2009, 1, 31) + dt
>> datetime.datetime(2009, 2, 28, 0, 0)
>>>>> dt.months
>> 1
>>>>> datetime(2009, 1, 31) + relativedelta(years=-1)
>> datetime.datetime(2008, 1, 31, 0, 0)
>
> Yes, but given
>
> r = relativedelta(d1, d2)
>
> how do I determine the number of months between d1 and d2, and the
> "remainder" - what monthmod gives me.

Oops! Sorry, I read too quickly and misread "monthmod" as "monthdelta".

> From the code, r.months looks
> like it works, but it's not documented, and I'm not 100% sure if it's
> always computed.

The result of relativedelta(d1, d2) is the same thing as if it were explicitly 
constructed from the years=, months=, etc. keyword arguments. From this example, 
I think this is something that can be relied upon:

"""
It works with dates too.

 >>> relativedelta(TODAY, johnbirthday)
relativedelta(years=+25, months=+5, days=+11, hours=+12)
"""

> The use case I'm thinking of is converting the difference between 2
> dates into "3 years, 2 months, 5 days" or whatever. I've got an
> application which needs to get this right for one of the dates being
> 29th Feb, so I *really* get to exercise the corner cases :-)

I believe relativedelta() is intended for this use case although it may resolve 
ambiguities in a different way than you were hoping.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

From jess.austin at gmail.com  Fri Apr 17 01:02:11 2009
From: jess.austin at gmail.com (Jess Austin)
Date: Thu, 16 Apr 2009 18:02:11 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
Message-ID: <b8ad139e0904161602x78d93ea7y16d41081b6dfd347@mail.gmail.com>

Antoine Pitrou <solipsis at pitrou.net> wrote:
> Jess Austin <jess.austin <at> gmail.com> writes:
>>
>> What other behavior options besides "last-valid-day-of-the-month"
>> would you like to see?
>
> IMHO, the question is rather what the use case is for the behaviour you are
> proposing. In which kind of situation is it acceptable to turn 31/2 silently
> into 29/2?

I have worked in utility/telecom billing, and needed to examine large
numbers of invoice dates, fulfillment dates, disconnection dates,
payment dates, collection event dates, etc.  There would often be
particular rules for the relationships among these dates, and since
many companies generate invoices every day of the month, you couldn't
rely on rules like "this always happens on the 5th".  Here is an
example (modified) from the doc page.  We want to find missing
invoices:

>>> invoices = {123: [date(2008, 1, 31),
...                   date(2008, 2, 29),
...                   date(2008, 3, 31),
...                   date(2008, 4, 30),
...                   date(2008, 5, 31),
...                   date(2008, 6, 30),
...                   date(2008, 7, 31),
...                   date(2008, 12, 31)],
...             456: [date(2008, 1, 1),
...                   date(2008, 5, 1),
...                   date(2008, 6, 1),
...                   date(2008, 7, 1),
...                   date(2008, 8, 1),
...                   date(2008, 11, 1),
...                   date(2008, 12, 1)]}
>>> for account, dates in invoices.items():
...     a = dates[0]
...     for b in dates[1:]:
...         if b - monthdelta(1) > a:
...             print('account', account, 'missing between', a, 'and', b)
...         a = b
...
account 456 missing between 2008-01-01 and 2008-05-01
account 456 missing between 2008-08-01 and 2008-11-01
account 123 missing between 2008-07-31 and 2008-12-31

In general, sometimes we care more about the number of months that
separate dates than we do about the exact dates themselves.  This is
perhaps not the most common situation for date calculations, but it
does come up for some of us.  I tired of writing one-off solutions
that would fail in unexpected corner cases, so I created this patch.
Paul Moore has also described his favorite use-case for this
functionality.

cheers,
Jess

From foom at fuhm.net  Fri Apr 17 01:11:51 2009
From: foom at fuhm.net (James Y Knight)
Date: Thu, 16 Apr 2009 19:11:51 -0400
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <loom.20090416T214556-65@post.gmane.org>
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
	<loom.20090416T214556-65@post.gmane.org>
Message-ID: <52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net>

On Apr 16, 2009, at 5:47 PM, Antoine Pitrou wrote:
> IMHO, the question is rather what the use case is for the behaviour  
> you are
> proposing. In which kind of situation is it acceptable to turn 31/2  
> silently
> into 29/2?

Essentially any situation in which you'd actually want a "next month"  
operation it's acceptable to do that.

It's a human-interface operation, and as such, everyone (ahem) "knows  
what it means" to say "2 months from now", but the details don't  
usually have to be thought about too much. Of course when you have a  
computer program, you actually need to tell it what you really mean.

I do a fair amount of date calculating, and use two different kinds of  
"add-month":

Option 1)
Add n to the month number, truncate day number to fit the month you  
end up in.

Option 2)
As above, but with the additional caveat that if the original date is  
the last day of its month, the new day should also be the last day of  
the new month. That is:
April 30th + 1 month = May 31st, instead of May 30th.

They're both useful behaviors, in different circumstances.

James

From skip at pobox.com  Fri Apr 17 02:18:35 2009
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 16 Apr 2009 19:18:35 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <b8ad139e0904161131g2f2b84fbpd67952697952afa9@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<b8ad139e0904161131g2f2b84fbpd67952697952afa9@mail.gmail.com>
Message-ID: <18919.51931.874515.848841@montanaro.dyndns.org>

    >> I have this funny feeling that arithmetic using monthdelta wouldn't
    >> always be intuitive.

    Jess> I think that's true, especially since these calculations are not
    Jess> necessarily invertible:

    >>> date(2008, 1, 30) + monthdelta(1)
    datetime.date(2008, 2, 29)
    >>> date(2008, 2, 29) - monthdelta(1)
    datetime.date(2008, 1, 29)

    Jess> It could be that non-intuitivity is inherent in the problem of
    Jess> dealing with dates and months.

To which I would respond:

    >>> import this
    The Zen of Python, by Tim Peters

    ...
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    ...

>From the discussion I've seen so far, it's not clear that there is one
obvious way to do it, and the ambiguity of the problem forces people to
guess.  

My recommendations after letting it roll around in the back of my brain for
the day:

    * I think it would be best to leave the definition of monthdelta up to
      individual users.  That is, add nothing to the datetime module and let
      them write a function which does what they want it to do.

    * The idea/implementation probably needs to bake on the python-ideas
      list and perhaps comp.lang.python for a bit to see if some concensus
      can be reached on reasonable functionality.

(I'm a bit behind on this thread.  Hopefully someone else has already
suggested these two things.)

Skip

From tleeuwenburg at gmail.com  Fri Apr 17 02:52:55 2009
From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg)
Date: Fri, 17 Apr 2009 10:52:55 +1000
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <18919.51931.874515.848841@montanaro.dyndns.org>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<b8ad139e0904161131g2f2b84fbpd67952697952afa9@mail.gmail.com>
	<18919.51931.874515.848841@montanaro.dyndns.org>
Message-ID: <43c8685c0904161752i6a7f4a23o3ece8f5b71ec6dd8@mail.gmail.com>

My thoughts on balance regarding monthdeltas:
  -- Month operations are useful, people will want to do them
  -- I think having a monthdelta object rather than a method makes sense to
me
  -- I think the documentation is severely underdone. The functionality is
not intuitive
     and therefore the docs need a lot more detail than usual
  -- Can you specify "1 month plus 10 days"?, i.e. add a monthdelta to a
timedelta?
  -- What about other cyclical periods (fortnights, 28 days, lunar cycles,
high tides)?

Cheers,
-T
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090417/9a0a5f57/attachment.htm>

From jess.austin at gmail.com  Fri Apr 17 03:01:22 2009
From: jess.austin at gmail.com (Jess Austin)
Date: Thu, 16 Apr 2009 20:01:22 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <18919.51931.874515.848841@montanaro.dyndns.org>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<b8ad139e0904161131g2f2b84fbpd67952697952afa9@mail.gmail.com>
	<18919.51931.874515.848841@montanaro.dyndns.org>
Message-ID: <b8ad139e0904161801t56a9be3o999d7f4b96a0cb5f@mail.gmail.com>

On Thu, Apr 16, 2009 at 7:18 PM,  <skip at pobox.com> wrote:
>
> ? ?>> I have this funny feeling that arithmetic using monthdelta wouldn't
> ? ?>> always be intuitive.
>
> ? ?Jess> I think that's true, especially since these calculations are not
> ? ?Jess> necessarily invertible:
>
> ? ?>>> date(2008, 1, 30) + monthdelta(1)
> ? ?datetime.date(2008, 2, 29)
> ? ?>>> date(2008, 2, 29) - monthdelta(1)
> ? ?datetime.date(2008, 1, 29)
>
> ? ?Jess> It could be that non-intuitivity is inherent in the problem of
> ? ?Jess> dealing with dates and months.
>
> To which I would respond:
>
> ? ?>>> import this
> ? ?The Zen of Python, by Tim Peters
>
> ? ?...
> ? ?In the face of ambiguity, refuse the temptation to guess.
> ? ?There should be one-- and preferably only one --obvious way to do it.
> ? ?Although that way may not be obvious at first unless you're Dutch.
> ? ?...
>
> From the discussion I've seen so far, it's not clear that there is one
> obvious way to do it, and the ambiguity of the problem forces people to
> guess.
>
> My recommendations after letting it roll around in the back of my brain for
> the day:
>
> ? ?* I think it would be best to leave the definition of monthdelta up to
> ? ? ?individual users. ?That is, add nothing to the datetime module and let
> ? ? ?them write a function which does what they want it to do.
>
> ? ?* The idea/implementation probably needs to bake on the python-ideas
> ? ? ?list and perhaps comp.lang.python for a bit to see if some concensus
> ? ? ?can be reached on reasonable functionality.

So far, all the other solutions to the problem that have been
mentioned are easily supported in current python.

Raise an exception when a calculation results in an invalid date:

>>> dt = date(2008, 1, 31)
>>> dt.replace(month=dt.month + 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: day is out of range for month

Add exactly 30 days to a date:

>>> dt + timedelta(30)
datetime.date(2008, 3, 1)

These operations are useful in particular contexts.  What I've
submitted is also useful, and currently isn't easy in core,
batteries-included python.  While I would consider the foregoing
interpretation of the Zen to be backwards (this doesn't add another
way to do something that's already possible, it makes possible
something that currently encourages one to pull her hair out), I
suppose it doesn't matter.  If adding a class and a function to a
module will require extended advocacy on -ideas and c.l.p, I'm
probably not the person for the job.

If, on the other hand, one of the committers wants to toss this in at
some point, whether now or 3 versions down the road, the patch is up
at bugs.python.org (and I'm happy to make any suggested
modifications).  I'm glad to have written this; I learned a bit about
CPython internals and scraped a layer of rust off my C skills.  I will
go ahead and backport the python-coded version to 2.3.  I'll continue
this conversation with whomever for however long, but I suspect this
topic will soon have worn out its welcome on python-dev.

cheers,
Jess

From greg.ewing at canterbury.ac.nz  Fri Apr 17 03:55:58 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 17 Apr 2009 13:55:58 +1200
Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143
In-Reply-To: <b8ad139e0904161430s1310bba5nba7a8c09e68916b8@mail.gmail.com>
References: <mailman.14603.1239912495.11745.python-dev@python.org>
	<b8ad139e0904161430s1310bba5nba7a8c09e68916b8@mail.gmail.com>
Message-ID: <49E7E1AE.5030809@canterbury.ac.nz>

Jess Austin wrote:

> This is a perceptive observation: in the absence of parentheses to
> dictate a different order of operations, the third quantity will
> differ from the second.

Another aspect of this is the use case mentioned right
at the beginning of this discussion concerning a recurring
event on a particular day of the month.

If you do this the naive way by just repeatedly adding one
of these monthdeltas to the previous date, and the date is
near the end of the month, it will eventually end up
drifting to the 28th of every month.

-- 
Greg

From tleeuwenburg at gmail.com  Fri Apr 17 04:10:59 2009
From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg)
Date: Fri, 17 Apr 2009 12:10:59 +1000
Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143
In-Reply-To: <49E7E1AE.5030809@canterbury.ac.nz>
References: <mailman.14603.1239912495.11745.python-dev@python.org>
	<b8ad139e0904161430s1310bba5nba7a8c09e68916b8@mail.gmail.com>
	<49E7E1AE.5030809@canterbury.ac.nz>
Message-ID: <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com>

Actually, that's a point.

If its' the 31st of Jan, then +1 monthdelta will be 28 Feb and another +1
will be 28 March whereas 31st Jan +2 monthdeltas will be 31 March.

That's the kind of thing which really needs to be documented, or I think
people really will make mistakes.

For example, should a monthdelta include a goal-day so that the example
above would go 31 Jan / 28 Feb / 31 March?

-T

On Fri, Apr 17, 2009 at 11:55 AM, Greg Ewing <greg.ewing at canterbury.ac.nz>wrote:

> Jess Austin wrote:
>
>  This is a perceptive observation: in the absence of parentheses to
>> dictate a different order of operations, the third quantity will
>> differ from the second.
>>
>
> Another aspect of this is the use case mentioned right
> at the beginning of this discussion concerning a recurring
> event on a particular day of the month.
>
> If you do this the naive way by just repeatedly adding one
> of these monthdeltas to the previous date, and the date is
> near the end of the month, it will eventually end up
> drifting to the 28th of every month.
>
> --
> Greg
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/tleeuwenburg%40gmail.com
>

-- 
--------------------------------------------------
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090417/bffc5f2c/attachment.htm>

From steve at pearwood.info  Fri Apr 17 04:29:11 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 17 Apr 2009 12:29:11 +1000
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
Message-ID: <200904171229.11531.steve@pearwood.info>

On Fri, 17 Apr 2009 07:41:19 am Jess Austin wrote:

> Others have suggested raising an exception when a month calculation
> lands on an invalid date.  Python already has that; it's spelled like
>
> this:
> >>> dt = date(2008, 1, 31)
> >>> dt.replace(month=dt.month + 1)
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: day is out of range for month
>
> What other behavior options besides "last-valid-day-of-the-month"
> would you like to see?

Adding one month to 31st January could mean:

1: raise an exception
2: return 28th February (last day of February)
3: return 3rd April (1 month = 31 days)
4: return 2nd April (1 month = 30 days)
5: return 28th February (1 month = 4 weeks = 28 days)
6: next business day after any of the above dates

I don't really expect Python to support scenario 6, as that would 
require knowledge of local public holidays and conventions for week 
ends and working days.

Open Office spreadsheet includes the following relevant functions:

EDATE(start date; months)
returns the serial number of the date that is a specified number of 
months before or after the start date.

EOMONTH(start date; months)
returns the serial number of the last day of the month that comes a 
certain number of months before or after the start date.

MONTHS(start date; end date; type)
calculate the difference in months between start and end date, possible 
values for type include 0 (interval) and 1 (in calendar months).

Rather than a series of almost-identical functions catering for people 
who want 28 day months and 31 day months, I propose a keyword argument 
days_in_month which specifies the number of days in a month. Any 
positive integer should be accepted, but of course only 28, 30 and 31 
will be meaningful for the common English meaning of the word "month". 
0 or None (the default) should trigger "last day of the month" 
behaviour (scenario 2 above).

That will (I think) simplify both documentation and implementation. 
Adding 1 month to a day will be defined as adding days_in_month days 
(if given), and if not given, adding 31 days but truncating the result 
to the last day of the next month.

Thoughts?

-- 
Steven D'Aprano

From steve at pearwood.info  Fri Apr 17 04:34:20 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 17 Apr 2009 12:34:20 +1000
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <200904171229.11531.steve@pearwood.info>
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
	<200904171229.11531.steve@pearwood.info>
Message-ID: <200904171234.21121.steve@pearwood.info>

On Fri, 17 Apr 2009 12:29:11 pm Steven D'Aprano wrote:

> Adding one month to 31st January could mean:
>
> 1: raise an exception
> 2: return 28th February (last day of February)
> 3: return 3rd April (1 month = 31 days)
> 4: return 2nd April (1 month = 30 days)
> 5: return 28th February (1 month = 4 weeks = 28 days)

Obviously I meant March, not April. Oops.

-- 
Steven D'Aprano

From glyph at divmod.com  Fri Apr 17 04:53:32 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 17 Apr 2009 02:53:32 -0000
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net>
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
	<loom.20090416T214556-65@post.gmane.org>
	<52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net>
Message-ID: <20090417025332.12555.1359868255.divmod.xquotient.8476@weber.divmod.com>

On 16 Apr, 11:11 pm, foom at fuhm.net wrote:
>On Apr 16, 2009, at 5:47 PM, Antoine Pitrou wrote:

>It's a human-interface operation, and as such, everyone (ahem) "knows 
>what it means" to say "2 months from now", but the details don't 
>usually have to be thought about too much. Of course when you have a 
>computer program, you actually need to tell it what you really mean.
>
>I do a fair amount of date calculating, and use two different kinds of 
>"add-month":
>
>Option 1)
>Add n to the month number, truncate day number to fit the month you 
>end up in.
>
>Option 2)
>As above, but with the additional caveat that if the original date is 
>the last day of its month, the new day should also be the last day of 
>the new month. That is:
>April 30th + 1 month = May 31st, instead of May 30th.
>
>They're both useful behaviors, in different circumstances.

I don't have a third option, but something that would be useful to 
mention in the documentation for "monthdelta": frequently users will 
want a recurring "monthly" event.  It's important to note that you need 
to keep your original date around if you want these rules to be 
consistently applied.  For example, if you have a monthly billing cycle 
that starts on May 31, you need to keep the original May 31 around to 
add monthdelta(X) if you want it to be May 31 when it rolls around next 
year; otherwise every time February rolls around all of your end-of- 
month dates get clamped to the 28th of every month.  (Unless you're 
following James's option 2, of course, in which case things which are 
normally on the 28th will get clamped to the 31st of following months.)

My experience with month-calculating software suggests that this is 
something very easy to screw up.

From steve at pearwood.info  Fri Apr 17 04:42:45 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 17 Apr 2009 12:42:45 +1000
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <loom.20090416T214556-65@post.gmane.org>
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
	<loom.20090416T214556-65@post.gmane.org>
Message-ID: <200904171242.46009.steve@pearwood.info>

On Fri, 17 Apr 2009 07:47:14 am Antoine Pitrou wrote:
> Jess Austin <jess.austin <at> gmail.com> writes:
> > What other behavior options besides "last-valid-day-of-the-month"
> > would you like to see?
>
> IMHO, the question is rather what the use case is for the behaviour
> you are proposing. In which kind of situation is it acceptable to
> turn 31/2 silently into 29/2?

Any time the user expects "one month from the last day of January" to 
mean "the last day of February". I dare say that if you did a poll of 
non-programmers, that would be a very common expectation, possibly the 
most common.

I just asked the missus, who is a non-programmer, what date is one month 
after 31st January and her answer was "2rd of March on leap years, 
otherwise 3rd of March".

-- 
Steven D'Aprano

From skip at pobox.com  Fri Apr 17 04:55:02 2009
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 16 Apr 2009 21:55:02 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <b8ad139e0904161801t56a9be3o999d7f4b96a0cb5f@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<b8ad139e0904161131g2f2b84fbpd67952697952afa9@mail.gmail.com>
	<18919.51931.874515.848841@montanaro.dyndns.org>
	<b8ad139e0904161801t56a9be3o999d7f4b96a0cb5f@mail.gmail.com>
Message-ID: <18919.61318.106749.848833@montanaro.dyndns.org>

    Jess> If, on the other hand, one of the committers wants to toss this in
    Jess> at some point, whether now or 3 versions down the road, the patch
    Jess> is up at bugs.python.org (and I'm happy to make any suggested
    Jess> modifications).

Again, I think it needs to bake a bit.  I understand the desire and need for
doing date arithmetic with months.  Python is mature enough though that I
don't think you can just "toss this in".  It should be available as a module
outside of Python so people can beat on it, flush out bugs, make suggestions
for enhancements, whatever.  I believe you mentioned putting it up on PyPI.
I think that's an excellent idea.

I've used parts of Gustavo Niemeyer's dateutil package for a couple years
and love it.  It's widely used.  Adding it to dateutil seems like another
possibility.  That would guarantee an instant user base.  From there, if it
is found to be useful it could make the leap to be part of the datetime
module.

Skip

From skip at pobox.com  Fri Apr 17 05:14:33 2009
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 16 Apr 2009 22:14:33 -0500
Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143
In-Reply-To: <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com>
References: <mailman.14603.1239912495.11745.python-dev@python.org>
	<b8ad139e0904161430s1310bba5nba7a8c09e68916b8@mail.gmail.com>
	<49E7E1AE.5030809@canterbury.ac.nz>
	<43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com>
Message-ID: <18919.62489.500216.889134@montanaro.dyndns.org>

    Tennessee> If its' the 31st of Jan, then +1 monthdelta will be 28 Feb
    Tennessee> and another +1 will be 28 March whereas 31st Jan +2
    Tennessee> monthdeltas will be 31 March.

Other possible arithmetics:

    * 31 Jan 2008 + monthdelta(2) might be
        31 Jan 2008 + 31 days (# days in Jan) + 29 days (# days in Feb)

    * 31 Jan 2008 + monthdelta(2) might be
        31 Jan 2008 + 29 days (# days in Feb) + 31 days (# days in Mar)

    * Treat the day of the month of the base datetime as an offset from the
      end of the month.  29 Jan 2007 would thus have an EOM offset of -2.
      Adding monthdelta(2) would advance you into March with the resulting
      day being two from the end of the month, or 29 Mar 2007.  OTOH, adding
      monthdelta(1) you'd wind up on 26 Feb 2007.

    * Consider the day of the month in the base datetime as an offset from
      the start of the month if it is closer to the start or as an offset
      from the end of the month if it is closer to the end.  12 Mar 2009 -
      monthdelta(2) would land you at 12 Jan 2009 whereas 17 Mar 2009 -
      monthdelta(1) would land you at 12 Feb 2009.

My mind spins at all the possibilities.  I suspect we've seen at least ten
different monthdelta rules just in this thread.  I don't know how much sense
they all make, but we can probably keep dreaming up new ones until the cows
come home.  Except for completely wacko sets of rules you can probably find
uses for most of them.  Bake, baby, bake.

pillsbury-doughboy-ly, y'rs,

Skip

From steve at pearwood.info  Fri Apr 17 05:20:12 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 17 Apr 2009 13:20:12 +1000
Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143
In-Reply-To: <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com>
References: <mailman.14603.1239912495.11745.python-dev@python.org>
	<49E7E1AE.5030809@canterbury.ac.nz>
	<43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com>
Message-ID: <200904171320.13025.steve@pearwood.info>

On Fri, 17 Apr 2009 12:10:59 pm Tennessee Leeuwenburg wrote:
> Actually, that's a point.
>
> If its' the 31st of Jan, then +1 monthdelta will be 28 Feb and
> another +1 will be 28 March whereas 31st Jan +2 monthdeltas will be
> 31 March.
>
> That's the kind of thing which really needs to be documented, or I
> think people really will make mistakes.

It might be worth noting as an aside, but it should be obvious in the 
same way that string concatenation is different from numerical 
addition:

1 + 2 = 2 + 1
'1' + '2' != '2' + '1'

> For example, should a monthdelta include a goal-day so that the
> example above would go 31 Jan / 28 Feb / 31 March?

No, that just adds complexity.

Consider floating point addition. If you want to step through a loop 
while doing addition, you need to be aware of round-off error:

>>> x = 0.0
>>> step = 0.001
>>> for i in xrange(1000):
...     x += step
...
>>> x
1.0000000000000007

The solution isn't to add a "goal" to the plus operator, but to fix your 
code to use a better algorithm:

>>> y = 0.0
>>> for i in xrange(1, 1001):
...     y = i*step
...
>>> y
1.0

Same with monthdelta.

-- 
Steven D'Aprano

From nad at acm.org  Fri Apr 17 05:28:45 2009
From: nad at acm.org (Ned Deily)
Date: Thu, 16 Apr 2009 20:28:45 -0700
Subject: [Python-Dev] RELEASED Python 2.6.2
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
	<rowen-EDC16C.13470815042009@news.gmane.org>
	<FE67E6CC-039C-4524-968F-BB2D1FD42AA3@mac.com>
	<DD982BD4-02AB-4395-AFEE-CD3D0EEB7926@u.washington.edu>
Message-ID: <nad-304E10.20284516042009@news.gmane.org>

In article <DD982BD4-02AB-4395-AFEE-CD3D0EEB7926 at u.washington.edu>,
 Russell Owen <rowen at u.washington.edu> wrote:
> I installed the Mac binary on my Intel 10.5.6 system and it works,  
> except it still uses Apple's system Tcl/Tk 8.4.7 instead of my  
> ActiveState 8.4.19 (which is in /Library/Frameworks where one would  
> expect).
> 
> I just built python from source and that version does use ActiveState  
> 8.4.19.
> 
> I wish I knew what's going on. Not being able to use the binary  
> distros is a bit of a pain.

You're right, the tkinter included with the 2.6.2 installer is not 
linked properly:

Is:
$ cd /Library/Frameworks/Python.framework/Versions/2.6
$ cd lib/python2.6/lib-dynload
$ otool -L _tkinter.so 
_tkinter.so:
   /System/Library/Frameworks/Tcl.framework/Versions/8.4/Tcl 
(compatibility version 8.4.0, current version 8.4.0)
   /System/Library/Frameworks/Tk.framework/Versions/8.4/Tk 
(compatibility version 8.4.0, current version 8.4.0)
   /usr/lib/libSystem.B.dylib [...]

should be:
_tkinter.so:
   /Library/Frameworks/Tcl.framework/Versions/8.4/Tcl (compatibility 
version 8.4.0, current version 8.4.19)
   /Library/Frameworks/Tk.framework/Versions/8.4/Tk (compatibility 
version 8.4.0, current version 8.4.19)
   /usr/lib/libSystem.B.dylib [...]

-- 
 Ned Deily,
 nad at acm.org

From glyph at divmod.com  Fri Apr 17 05:58:22 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 17 Apr 2009 03:58:22 -0000
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090416153350.702303A4100@sparrow.telecommunity.com>
References: <49DB6A1F.50801@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
	<94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com>
	<20090415210902.848443A4100@sparrow.telecommunity.com>
	<20090416034602.12555.179034490.divmod.xquotient.8434@weber.divmod.com>
	<20090416153350.702303A4100@sparrow.telecommunity.com>
Message-ID: <20090417035822.12555.1669891463.divmod.xquotient.8566@weber.divmod.com>

On 16 Apr, 03:36 pm, pje at telecommunity.com wrote:
>At 03:46 AM 4/16/2009 +0000, glyph at divmod.com wrote:
>>On 15 Apr, 09:11 pm, pje at telecommunity.com wrote:

>>Twisted has its own system for "namespace" packages, and I'm not 
>>really sure where we fall in this discussion.  I haven't been able to 
>>follow the whole thread, but my original understanding was that the 
>>PEP supports "defining packages", which we now seem to be calling 
>>"base packages", just fine.
>
>Yes, it does.  The discussion since the original proposal, however, has 
>been dominated by MAL's counterproposal, which *requires* a defining 
>package.

[snip clarifications]
>Does that all make sense now?

Yes.  Thank you very much for the detailed explanation.  It was more 
than I was due :-).
>MAL's proposal requires a defining package, which is counterproductive 
>if you have a pure package with no base, since it now requires you to 
>create an additional project on PyPI just to hold your defining 
>package.

Just as a use-case: would the Java "com.*" namespace be an example of a 
"pure package with no base"?  i.e. lots of projects are in it, but no 
project owns it?
>>I'd appreciate it if the PEP could also be extended cover Twisted's 
>>very similar mechanism for namespace packages,

>>"twisted.plugin.pluginPackagePaths".  I know this is not quite as 
>>widely used as setuptools' namespace package support, but its 
>>existence belies a need for standardization.
>>
>>The PEP also seems a bit vague with regard to the treatment of other 
>>directories containing __init__.py and *.pkg files.
>
>Do you have a clarification to suggest?  My understanding (probably a 
>projection) is that to be a nested namespace package, you have to have 
>a parent namespace package.

Just to clarify things on my end: "namespace package" to *me* means 
"package with modules provided from multiple distributions (the 
distutils term)".  The definition provided by the PEP, that a package is 
spread over multiple directories on disk, seems like an implementation 
detail.

Entries on __path__ slow down import, so my understanding of the 
platonic ideal of a system python installation is one which has a single 
directory where all packages reside, and a set of metadata off to the 
side explaining which files belong to which distributions so they can be 
uninstalled by a package manager.

Of course, for a development installation, easy uninstallation and quick 
swapping between different versions of relevant dependencies is more 
important than good import performance.  So in that case, you would want 
to optimize differently by having all of your distributions installed 
into separate directories, with a long PYTHONPATH or lots of .pth files 
to point at them.

And of course you may want a hybrid of the two.

So another clarification I'd like in the PEP is an explanation of 
motivation.  For example, it comes as a complete surprise to me that the 
expectation of namespace packages was to provide only single-source 
namespaces like zope.*, peak.*, twisted.*.  As I mentioned above, I 
implicitly thought this was more for com.*, twisted.plugins.*.

Right now it just says that it's a package which resides in multiple 
directories, and it's not made clear why that's a desirable feature.
>>   The concept of a "defining package" seems important to avoid 
>>conflicts like this one:
>>
>>    http://twistedmatrix.com/trac/ticket/2339

[snip some stuff about plugins and package layout]
>Namespaces are not plugins and vice versa.  The purpose of a namespace 
>package is to allow projects managed by the same entity to share a 
>namespace (ala Java "package" names) and avoid naming conflicts with 
>other authors.

I think this is missing a key word: *separate* projects managed by the 
same entity.

Hmm.  I thought I could illustrate that the same problem actually occurs 
without using a plugin system, but I can actually only come up with an 
example if an application implements multi-library-version compatibility 
by doing

    try:
        from bad_old_name import bad_old_feature as feature
    except ImportError:
        from good_new_name import good_new_feature as feature

rather than the other way around; and that's a terrible idea for other 
reasons.  Other than that, you'd have to use 
pkg_resources.resource_listdir or somesuch, at which point you pretty 
much are implementing a plugin system.

So I started this reply disagreeing but I think I've convinced myself 
that you're right.
>Precisely.  Note, however, that neither is twisted.plugins a namespace 
>package, and it should not contain any .pkg files.  I don't think it's 
>reasonable to abuse PEP 382 namespace packages as a plugin system.  In 
>setuptools' case, a different mechanism is provided for locating plugin 
>code, and of course Twisted already has its own system for the same 
>thing. It would be nice to have a standardized way of locating plugins 
>in the stdlib, but that will need to be a different PEP.

Okay.  So what I'm hearing is that Twisted should happily continue using 
our own wacky __path__-calculation logic for twisted.plugins, but that 
*twisted* should be a namespace package so that our separate 
distributions (TwistedCore, TwistedWeb, TwistedConch, et. al.) can be 
installed into separate directories.

From greg.ewing at canterbury.ac.nz  Fri Apr 17 06:08:46 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 17 Apr 2009 16:08:46 +1200
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <200904171242.46009.steve@pearwood.info>
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
	<loom.20090416T214556-65@post.gmane.org>
	<200904171242.46009.steve@pearwood.info>
Message-ID: <49E800CE.6000707@canterbury.ac.nz>

Steven D'Aprano wrote:

> "2rd of March on leap years,
    ^^^

The turd of March?

-- 
Greg

From greg.ewing at canterbury.ac.nz  Fri Apr 17 06:14:06 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 17 Apr 2009 16:14:06 +1200
Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143
In-Reply-To: <200904171320.13025.steve@pearwood.info>
References: <mailman.14603.1239912495.11745.python-dev@python.org>
	<49E7E1AE.5030809@canterbury.ac.nz>
	<43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com>
	<200904171320.13025.steve@pearwood.info>
Message-ID: <49E8020E.3010109@canterbury.ac.nz>

Steven D'Aprano wrote:
> it should be obvious in the 
> same way that string concatenation is different from numerical 
> addition:
> 
> 1 + 2 = 2 + 1
> '1' + '2' != '2' + '1'

However, the proposed arithmetic isn't just non-
commutative, it's non-associative, which is a
much rarer and more surprising thing. We do
at least have

   ('1' + '2') + '3' == '1' + ('2' + '3')

-- 
Greg

From pje at telecommunity.com  Fri Apr 17 06:56:39 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 17 Apr 2009 00:56:39 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090417035822.12555.1669891463.divmod.xquotient.8566@webe
	r.divmod.com>
References: <49DB6A1F.50801@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
	<94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com>
	<20090415210902.848443A4100@sparrow.telecommunity.com>
	<20090416034602.12555.179034490.divmod.xquotient.8434@weber.divmod.com>
	<20090416153350.702303A4100@sparrow.telecommunity.com>
	<20090417035822.12555.1669891463.divmod.xquotient.8566@weber.divmod.com>
Message-ID: <20090417045411.67E4B3A4100@sparrow.telecommunity.com>

At 03:58 AM 4/17/2009 +0000, glyph at divmod.com wrote:
>Just as a use-case: would the Java "com.*" namespace be an example 
>of a "pure package with no base"?  i.e. lots of projects are in it, 
>but no project owns it?

Er, I suppose.  I was thinking more of the various 'com.foo' and 
'org.bar' packages as being the pure namespaces in question.  For 
Python, a "flat is better than nested" approach seems fine at the moment.

>Just to clarify things on my end: "namespace package" to *me* means 
>"package with modules provided from multiple distributions (the 
>distutils term)".  The definition provided by the PEP, that a 
>package is spread over multiple directories on disk, seems like an 
>implementation detail.

Agreed.

>Entries on __path__ slow down import, so my understanding of the 
>platonic ideal of a system python installation is one which has a 
>single directory where all packages reside, and a set of metadata 
>off to the side explaining which files belong to which distributions 
>so they can be uninstalled by a package manager.

True... except that part of the function of the PEP is to ensure that 
if you install those separately-distributed modules to the same 
directory, it still needs to work as a package and not have any 
inter-package file conflicts.

>Of course, for a development installation, easy uninstallation and 
>quick swapping between different versions of relevant dependencies 
>is more important than good import performance.  So in that case, 
>you would want to optimize differently by having all of your 
>distributions installed into separate directories, with a long 
>PYTHONPATH or lots of .pth files to point at them.
>
>And of course you may want a hybrid of the two.

Yep.

>So another clarification I'd like in the PEP is an explanation of 
>motivation.  For example, it comes as a complete surprise to me that 
>the expectation of namespace packages was to provide only 
>single-source namespaces like zope.*, peak.*, twisted.*.  As I 
>mentioned above, I implicitly thought this was more for com.*, 
>twisted.plugins.*.

Well, aside from twisted.plugins, I wasn't aware of anybody in Python 
doing that...  and as I described, I never really interpreted that 
through the lens of "namespace package" vs. "plugin finding".

>Right now it just says that it's a package which resides in multiple 
>directories, and it's not made clear why that's a desirable feature.

Good point; perhaps you can suggest some wording on these matters to Martin?

>Okay.  So what I'm hearing is that Twisted should happily continue 
>using our own wacky __path__-calculation logic for twisted.plugins, 
>but that *twisted* should be a namespace package so that our 
>separate distributions (TwistedCore, TwistedWeb, TwistedConch, et. 
>al.) can be installed into separate directories.

Yes.

Thanks for taking the time to participate in this and add another 
viewpoint to the mix, not to mention clarifying some areas where the 
PEP could be clearer.

From ronaldoussoren at mac.com  Fri Apr 17 08:17:50 2009
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Fri, 17 Apr 2009 08:17:50 +0200
Subject: [Python-Dev] RELEASED Python 2.6.2
In-Reply-To: <DD982BD4-02AB-4395-AFEE-CD3D0EEB7926@u.washington.edu>
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
	<rowen-EDC16C.13470815042009@news.gmane.org>
	<FE67E6CC-039C-4524-968F-BB2D1FD42AA3@mac.com>
	<DD982BD4-02AB-4395-AFEE-CD3D0EEB7926@u.washington.edu>
Message-ID: <92CB905D-99F8-4727-A1AE-1772EA3ED79C@mac.com>

On 16 Apr, 2009, at 20:58, Russell Owen wrote:

> I installed the Mac binary on my Intel 10.5.6 system and it works,  
> except it still uses Apple's system Tcl/Tk 8.4.7 instead of my  
> ActiveState 8.4.19 (which is in /Library/Frameworks where one would  
> expect).

That's very string. I had ActiveState 8.4 installed (whatever was  
current about a month ago).

>
> Just out of curiosity: which 3rd party Tcl/Tk did you have installed  
> when you made the installer? Perhaps if it was 8.5 that would  
> explain it. If so I may try updating my Tcl/Tk -- I've been wanting  
> some of the bug fixes in 8.5 anyway.

Tcl 8.5 won't happen in 2.6, and might not happen in 2.7 either.   
Tkinter needs to work with the system version of Tcl, which is some  
version of 8.4,  Tkinter will not work when the major release of Tcl  
is different than during the compile. That makes it rather hard to  
support both 8.4 and 8.5 in the same installer.

Ronald

>
> -- Russell
>
> On Apr 16, 2009, at 5:35 AM, Ronald Oussoren wrote:
>
>>
>> On 15 Apr, 2009, at 22:47, Russell E. Owen wrote:
>>
>>> Thank you for 2.6.2.
>>>
>>> I see the Mac binary installer isn't out yet (at least it is not  
>>> listed
>>> on the downloads page). Any chance that it will be compatible with  
>>> 3rd
>>> party Tcl/Tk?
>>
>> The Mac installer is late because I missed the pre-announcement of  
>> the 2.6.2 tag. I sent the installer to Barry earlier today.
>>
>> The installer was build using a 3th-party installation of Tcl/Tk.
>>
>> Ronald
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2224 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090417/7e8e6b6e/attachment.bin>

From glyph at divmod.com  Fri Apr 17 09:02:31 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 17 Apr 2009 07:02:31 -0000
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <20090417045411.67E4B3A4100@sparrow.telecommunity.com>
References: <49DB6A1F.50801@egenix.com>
	<20090414162603.70C843A4100@sparrow.telecommunity.com>
	<49E4F93B.6010802@egenix.com>
	<20090415003026.B0A783A4114@sparrow.telecommunity.com>
	<49E59202.6050809@egenix.com>
	<20090415144147.6845F3A4100@sparrow.telecommunity.com>
	<49E60832.8030806@egenix.com>
	<20090415175704.966B13A4100@sparrow.telecommunity.com>
	<20090415185221.GB13696@amk-desktop.matrixgroup.net>
	<20090415192021.558E53A4119@sparrow.telecommunity.com>
	<94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com>
	<20090415210902.848443A4100@sparrow.telecommunity.com>
	<20090416034602.12555.179034490.divmod.xquotient.8434@weber.divmod.com>
	<20090416153350.702303A4100@sparrow.telecommunity.com>
	<20090417035822.12555.1669891463.divmod.xquotient.8566@weber.divmod.com>
	<20090417045411.67E4B3A4100@sparrow.telecommunity.com>
Message-ID: <20090417070231.12555.1552701942.divmod.xquotient.8602@weber.divmod.com>

On 04:56 am, pje at telecommunity.com wrote:
>At 03:58 AM 4/17/2009 +0000, glyph at divmod.com wrote:
>>Just as a use-case: would the Java "com.*" namespace be an example of 
>>a "pure package with no base"?  i.e. lots of projects are in it, but 
>>no project owns it?
>
>Er, I suppose.  I was thinking more of the various 'com.foo' and 
>'org.bar' packages as being the pure namespaces in question.  For 
>Python, a "flat is better than nested" approach seems fine at the 
>moment.

Sure.  I wasn't saying we should go down the domain-names-are-package- 
names road for Python itself, just that "com.*" is a very broad example 
of a multi-"vendor" namespace :).
>>Entries on __path__ slow down import, so my understanding of the 
>>platonic ideal of a system python installation is one which has a 
>>single directory where all packages reside, and a set of metadata off 
>>to the side explaining which files belong to which distributions so 
>>they can be uninstalled by a package manager.

>True... except that part of the function of the PEP is to ensure that 
>if you install those separately-distributed modules to the same 
>directory, it still needs to work as a package and not have any inter- 
>package file conflicts.

Are you just referring to anything other than the problem of multiple 
packages overwriting __init__.py here?  It's phrased in a very general 
way that makes me think maybe there's something else going on.
>>So another clarification I'd like in the PEP is an explanation of 
>>motivation.  For example, it comes as a complete surprise to me that 
>>the expectation of namespace packages was to provide only single- 
>>source namespaces like zope.*, peak.*, twisted.*.  As I mentioned 
>>above, I implicitly thought this was more for com.*, 
>>twisted.plugins.*.
>
>Well, aside from twisted.plugins, I wasn't aware of anybody in Python 
>doing that...  and as I described, I never really interpreted that 
>through the lens of "namespace package" vs. "plugin finding".

There is some overlap.  In particular, in the "vendor distribution" 
case, I would like there to be one nice, declarative Python way to say 
"please put these modules into the same package".  In the past, Debian 
in particular has produced some badly broken Twisted packages in the 
past because there was no standard Python way to say "I have some 
modules here that go into an existing package".  Since every 
distribution has its own funny ideas about what the filesystem should 
look like, this has come up for us in a variety of ways.

I'd like it if we could use the "official" way of declaring a namespace 
package for that.
>>Right now it just says that it's a package which resides in multiple 
>>directories, and it's not made clear why that's a desirable feature.
>
>Good point; perhaps you can suggest some wording on these matters to 
>Martin?

I think the thing I said in my previous message about "multiple 
distributions" is a good start.  That might not be everything, but I 
think it's clearly the biggest motivation.
>>Okay.  So what I'm hearing is that Twisted should happily continue 
>>using our own wacky __path__-calculation logic for twisted.plugins, 
>>but that *twisted* should be a namespace package so that our separate 
>>distributions (TwistedCore, TwistedWeb, TwistedConch, et. al.) can be 
>>installed into separate directories.
>
>Yes.

I'm fairly happy with that, except the aforementioned communication- 
channel-with-packagers feature of namespace packages; they unambiguously 
say "multiple OS packages may contribute modules to this Python 
package".
>Thanks for taking the time to participate in this and add another 
>viewpoint to the mix, not to mention clarifying some areas where the 
>PEP could be clearer.

My pleasure.

From robert.kern at gmail.com  Fri Apr 17 09:46:57 2009
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 17 Apr 2009 02:46:57 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <18919.61318.106749.848833@montanaro.dyndns.org>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>	<18918.61476.980951.991275@montanaro.dyndns.org>	<b8ad139e0904161131g2f2b84fbpd67952697952afa9@mail.gmail.com>	<18919.51931.874515.848841@montanaro.dyndns.org>	<b8ad139e0904161801t56a9be3o999d7f4b96a0cb5f@mail.gmail.com>
	<18919.61318.106749.848833@montanaro.dyndns.org>
Message-ID: <gs9c5h$74r$1@ger.gmane.org>

On 2009-04-16 21:55, skip at pobox.com wrote:
>      Jess>  If, on the other hand, one of the committers wants to toss this in
>      Jess>  at some point, whether now or 3 versions down the road, the patch
>      Jess>  is up at bugs.python.org (and I'm happy to make any suggested
>      Jess>  modifications).
>
> Again, I think it needs to bake a bit.  I understand the desire and need for
> doing date arithmetic with months.  Python is mature enough though that I
> don't think you can just "toss this in".  It should be available as a module
> outside of Python so people can beat on it, flush out bugs, make suggestions
> for enhancements, whatever.  I believe you mentioned putting it up on PyPI.
> I think that's an excellent idea.
>
> I've used parts of Gustavo Niemeyer's dateutil package for a couple years
> and love it.  It's widely used.  Adding it to dateutil seems like another
> possibility.  That would guarantee an instant user base.  From there, if it
> is found to be useful it could make the leap to be part of the datetime
> module.

dateutil.relativedelta appears to do everything monthdelta does and more in a 
general way. Adding monthdelta to dateutil doesn't seem to make much sense.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

From asmodai at in-nomine.org  Fri Apr 17 10:50:16 2009
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Fri, 17 Apr 2009 10:50:16 +0200
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <18919.61318.106749.848833@montanaro.dyndns.org>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<b8ad139e0904161131g2f2b84fbpd67952697952afa9@mail.gmail.com>
	<18919.51931.874515.848841@montanaro.dyndns.org>
	<b8ad139e0904161801t56a9be3o999d7f4b96a0cb5f@mail.gmail.com>
	<18919.61318.106749.848833@montanaro.dyndns.org>
Message-ID: <20090417085016.GD24948@nexus.in-nomine.org>

-On [20090417 04:55], skip at pobox.com (skip at pobox.com) wrote:
>Again, I think it needs to bake a bit.  I understand the desire and need for
>doing date arithmetic with months.  Python is mature enough though that I
>don't think you can just "toss this in".  It should be available as a module
>outside of Python so people can beat on it, flush out bugs, make suggestions
>for enhancements, whatever. 

I think people should look at mx.DateTime a bit, including its
documentation.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
To do injustice is more disgraceful than to suffer it...

From solipsis at pitrou.net  Fri Apr 17 11:34:21 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 17 Apr 2009 09:34:21 +0000 (UTC)
Subject: [Python-Dev] Issue5434: datetime.monthdelta
References: <b8ad139e0904161602x78d93ea7y16d41081b6dfd347@mail.gmail.com>
Message-ID: <loom.20090417T093310-707@post.gmane.org>

Jess Austin <jess.austin <at> gmail.com> writes:
> 
> I have worked in utility/telecom billing, and needed to examine large
> numbers of invoice dates, fulfillment dates, disconnection dates,
> payment dates, collection event dates, etc.  There would often be
> particular rules for the relationships among these dates, and since
> many companies generate invoices every day of the month, you couldn't
> rely on rules like "this always happens on the 5th".

But, as you say, these are /particular rules/. Why do you think they would be
the same in another industry, or even another telecom company? Why should they
be integrated in Python's standard distribution?

From solipsis at pitrou.net  Fri Apr 17 11:37:13 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 17 Apr 2009 09:37:13 +0000 (UTC)
Subject: [Python-Dev] Issue5434: datetime.monthdelta
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
	<loom.20090416T214556-65@post.gmane.org>
	<52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net>
Message-ID: <loom.20090417T093442-350@post.gmane.org>

James Y Knight <foom <at> fuhm.net> writes:
> 
> It's a human-interface operation, and as such, everyone (ahem) "knows  
> what it means" to say "2 months from now", but the details don't  
> usually have to be thought about too much.

I don't think it's true. When you say "2 months from now", some people will
think "9 weeks from now" (or "10 weeks from now"), others "60 days from now",
and yet other will think of the meaning this proposal gives it.

That's why, when scheduling a meeting, you don't say "2 months from now". You
give a precise date instead, because you know otherwise people wouldn't show up
on the same day.

Regards

Antoine.

From piet at cs.uu.nl  Fri Apr 17 11:42:59 2009
From: piet at cs.uu.nl (Piet van Oostrum)
Date: Fri, 17 Apr 2009 11:42:59 +0200
Subject: [Python-Dev] RELEASED Python 2.6.2
In-Reply-To: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> (Barry Warsaw's
	message of "Wed\, 15 Apr 2009 11\:45\:08 -0400")
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
Message-ID: <m2ab6fvcnw.fsf@cs.uu.nl>

>>>>> Barry Warsaw <barry at python.org> (BW) wrote:

>BW> On behalf of the Python community, I'm happy to announce the  availability
>BW> of Python 2.6.2.  This is the latest production-ready  version in the
>BW> Python 2.6 series.  Dozens of issues have been fixed  since Python 2.6.1
>BW> was released back in December.  Please see the NEWS  file for all the gory
>BW> details.

>BW>     http://www.python.org/download/releases/2.6.2/NEWS.txt

>BW> For more information on Python 2.6 in general, please see

>BW>      http://docs.python.org/dev/whatsnew/2.6.html

>BW> Source tarballs, Windows installers, and (soon) Mac OS X disk images  can
>BW> be downloaded from the Python 2.6.2 page:

>BW>     http://www.python.org/download/releases/2.6.2/

Maybe a link to the MacOSX image can also be added to
http://www.python.org/download 
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org

From barry at python.org  Fri Apr 17 14:57:49 2009
From: barry at python.org (Barry Warsaw)
Date: Fri, 17 Apr 2009 08:57:49 -0400
Subject: [Python-Dev] RELEASED Python 2.6.2
In-Reply-To: <m2ab6fvcnw.fsf@cs.uu.nl>
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
	<m2ab6fvcnw.fsf@cs.uu.nl>
Message-ID: <7E3DD051-76DC-44A4-AB9F-DF462D764BF4@python.org>

On Apr 17, 2009, at 5:42 AM, Piet van Oostrum wrote:

> Maybe a link to the MacOSX image can also be added to
> http://www.python.org/download

Done.
-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090417/284aa08a/attachment.pgp>

From skip at pobox.com  Fri Apr 17 15:45:05 2009
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 17 Apr 2009 08:45:05 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <49E800CE.6000707@canterbury.ac.nz>
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
	<loom.20090416T214556-65@post.gmane.org>
	<200904171242.46009.steve@pearwood.info>
	<49E800CE.6000707@canterbury.ac.nz>
Message-ID: <18920.34785.182858.323654@montanaro.dyndns.org>

    >> "2rd of March on leap years,
    >   ^^^

    > The turd of March?

Yeah, it's from a little known Shakespearean play about a benevolent
dictator, Guidius van Rossumus.  The name of the play escapes me at the
moment, but there's this critical scene where the BDFL is in mortal danger
because of ongoing schemes by the members of the PSU.  His one true friend
and eventual replacement, Barius Warsawvius, known as the FLUFL, tries to
warn him surreptitiously about the dangers lurking all about.  Barius utters
this immortal quote, "Beware the Turd of March."  Unfortunately, the drama
of that scene tends to be lost on modern audiences.  Upon hearing that
famous utterance they tend to break out in laughter, especially if the
audience is made up mostly of boys under the age of twelve.

-- 
Skip Montanaro - skip at pobox.com - http://www.smontanaro.net/
        "XML sucks, dictionaries rock" - Dave Beazley

From Scott.Daniels at Acm.Org  Fri Apr 17 16:58:55 2009
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Fri, 17 Apr 2009 07:58:55 -0700
Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143
In-Reply-To: <49E8020E.3010109@canterbury.ac.nz>
References: <mailman.14603.1239912495.11745.python-dev@python.org>	<49E7E1AE.5030809@canterbury.ac.nz>	<43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com>	<200904171320.13025.steve@pearwood.info>
	<49E8020E.3010109@canterbury.ac.nz>
Message-ID: <gsa57a$upr$1@ger.gmane.org>

Greg Ewing wrote:
> Steven D'Aprano wrote:
>> it should be obvious in the same way that string concatenation is 
>> different from numerical addition:
>>
>> 1 + 2 = 2 + 1
>> '1' + '2' != '2' + '1'
> 
> However, the proposed arithmetic isn't just non-
> commutative, it's non-associative, which is a
> much rarer and more surprising thing. We do
> at least have
> 
>   ('1' + '2') + '3' == '1' + ('2' + '3')
> 
But we don't have:
     (1e40 + -1e40) + 1 == 1e40 + (-1e40 + 1)

Non-associativity is what makes for floating point headaches.
To my knowledge, floating point is at least commutative.

--Scott David Daniels
Scott.Daniels at Acm.Org

From dickinsm at gmail.com  Fri Apr 17 17:42:10 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Fri, 17 Apr 2009 16:42:10 +0100
Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143
In-Reply-To: <gsa57a$upr$1@ger.gmane.org>
References: <mailman.14603.1239912495.11745.python-dev@python.org>
	<49E7E1AE.5030809@canterbury.ac.nz>
	<43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com>
	<200904171320.13025.steve@pearwood.info>
	<49E8020E.3010109@canterbury.ac.nz> <gsa57a$upr$1@ger.gmane.org>
Message-ID: <5c6f2a5d0904170842h5667188dud795e6855245f52a@mail.gmail.com>

On Fri, Apr 17, 2009 at 3:58 PM, Scott David Daniels
<Scott.Daniels at acm.org> wrote:
> Non-associativity is what makes for floating point headaches.
> To my knowledge, floating point is at least commutative.

Well, mostly. :-)

>>> from decimal import Decimal
>>> x, y = Decimal('NaN123'), Decimal('-NaN456')
>>> x + y
Decimal('NaN123')
>>> y + x
Decimal('-NaN456')

Similar effects can happen with regular IEEE 754 binary doubles,
but Python doesn't expose NaN payloads or signs, so we don't
see those effects witihin Python.

Mark

From status at bugs.python.org  Fri Apr 17 18:06:56 2009
From: status at bugs.python.org (Python tracker)
Date: Fri, 17 Apr 2009 18:06:56 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20090417160656.09609781DE@psf.upfronthosting.co.za>

ACTIVITY SUMMARY (04/10/09 - 04/17/09)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.

 2222 open (+37) / 15383 closed (+12) / 17605 total (+49)

Open issues with patches:   852

Average duration of open issues: 642 days.
Median duration of open issues: 393 days.

Open Issues Breakdown
   open  2168 (+37)
pending    54 ( +0)

Issues Created Or Reopened (50)
_______________________________

ignore py3_test_grammar.py syntax error                          04/11/09
CLOSED http://bugs.python.org/issue5733    reopened benjamin.peterson             

BufferedRWPair broken                                            04/11/09
       http://bugs.python.org/issue5734    created  bquinlan                      
       patch                                                                   

Segfault when loading not recompiled module                      04/11/09
       http://bugs.python.org/issue5735    created  chin                          
       patch, needs review                                                     

Add the iterator protocol to dbm modules                         04/11/09
       http://bugs.python.org/issue5736    created  akitada                       
       patch                                                                   

add Solaris errnos                                               04/11/09
       http://bugs.python.org/issue5737    created  mahrens                       
       easy                                                                    

multiprocessing example wrong                                    04/11/09
       http://bugs.python.org/issue5738    created  yaneurabeya                   

Language reference is ambiguous regarding next() method lookup   04/12/09
       http://bugs.python.org/issue5739    created  ncoghlan                      

multiprocessing.connection.Client API documentation incorrect    04/12/09
       http://bugs.python.org/issue5740    created  yaneurabeya                   

SafeConfigParser incorrectly detects lone percent signs          04/12/09
CLOSED http://bugs.python.org/issue5741    created  marcio                        

inspect.findsource() should look only for sources                04/12/09
       http://bugs.python.org/issue5742    created  hdima                         
       patch                                                                   

multiprocessing.managers not accessible even though docs say so  04/12/09
CLOSED http://bugs.python.org/issue5743    created  yaneurabeya                   

multiprocessing.managers.BaseManager.connect example typos       04/12/09
CLOSED http://bugs.python.org/issue5744    reopened quiver                        

email document update (more links)                               04/13/09
CLOSED http://bugs.python.org/issue5745    created  ocean-city                    
       patch                                                                   

socketserver problem upon disconnection (undefined member)       04/13/09
CLOSED http://bugs.python.org/issue5746    created  eblond                        

knowing the parent command                                       04/13/09
       http://bugs.python.org/issue5747    created  tarek                         

Objects/bytesobject.c should include stringdefs.h, instead of de 04/13/09
       http://bugs.python.org/issue5748    created  eric.smith                    
       easy                                                                    

Allow bin() to have an optional "Total Bits" argument.           04/14/09
CLOSED http://bugs.python.org/issue5749    created  MechPaul                      

weird seg fault                                                  04/14/09
CLOSED http://bugs.python.org/issue5750    created  utilitarian                   

Typo in documentation of print function parameters               04/14/09
       http://bugs.python.org/issue5751    created  nicolasg                      

xml.dom.minidom does not handle newline characters in attribute  04/14/09
       http://bugs.python.org/issue5752    created  Tomalak                       

CVE-2008-5983 python: untrusted python modules search path       04/14/09
       http://bugs.python.org/issue5753    created  iankko                        
       patch                                                                   

Shelve module writeback parameter does not act as advertised     04/14/09
       http://bugs.python.org/issue5754    created  jherskovic                    

"-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++"   04/14/09
       http://bugs.python.org/issue5755    created  zooko                         

idle pydoc et al removed from 3.1 without versioned replacements 04/14/09
       http://bugs.python.org/issue5756    created  nad                           

Documentation error for Condition.notify()                       04/14/09
       http://bugs.python.org/issue5757    created  pietvo                        

fileinput.hook_compressed returning bytes from gz file           04/14/09
       http://bugs.python.org/issue5758    created  mnewman                       

__float__ not called by 'float' on classes derived from str      04/15/09
CLOSED http://bugs.python.org/issue5759    created  shura_zam                     
       patch                                                                   

__getitem__ error message hard to understand                     04/15/09
       http://bugs.python.org/issue5760    created  cvrebert                      

add file name to py3k IO objects repr()                          04/15/09
       http://bugs.python.org/issue5761    created  pitrou                        

AttributeError: 'NoneType' object has no attribute 'replace'     04/15/09
       http://bugs.python.org/issue5762    created  hda                           

scope resolving error                                            04/15/09
CLOSED http://bugs.python.org/issue5763    created  vpodpecan                     

2.6.2 Python Manuals CHM file seems broken                       04/15/09
       http://bugs.python.org/issue5764    created  dx617                         

stack overflow evaluating eval("()" * 30000)                     04/15/09
       http://bugs.python.org/issue5765    created  gagenellina                   

Mac/scripts/BuildApplet.py reset of sys.executable during instal 04/16/09
       http://bugs.python.org/issue5766    created  blb                           

xmlrpclib loads invalid documents                                04/16/09
       http://bugs.python.org/issue5767    created  exarkun                       

logging don't encode Unicode message correctly.                  04/16/09
CLOSED http://bugs.python.org/issue5768    created  naoki                         
       patch                                                                   

OS X Installer: new make of documentation installs at wrong loca 04/16/09
       http://bugs.python.org/issue5769    created  nad                           

SA bugs with unittest.py                                         04/16/09
CLOSED http://bugs.python.org/issue5770    created  yaneurabeya                   
       patch                                                                   

SA bugs with unittest.py at r71263                                  04/16/09
       http://bugs.python.org/issue5771    created  yaneurabeya                   
       patch                                                                   

For float.__format__, don't add a trailing ".0" if we're using n 04/16/09
       http://bugs.python.org/issue5772    created  eric.smith                    
       easy                                                                    

Crash on shutdown after os.fdopen(2) in debug builds             04/16/09
       http://bugs.python.org/issue5773    created  amaury.forgeotdarc            

_winreg.OpenKey() is documented with keyword arguments, but does 04/16/09
       http://bugs.python.org/issue5774    created  stutzbach                     

marshal.c needs to be checked for out of memory errors           04/16/09
       http://bugs.python.org/issue5775    created  eric.smith                    

RPM build error with python-2.6.spec                             04/17/09
       http://bugs.python.org/issue5776    created  yasusii                       
       patch                                                                   

unable to search in python V3 documentation                      04/17/09
       http://bugs.python.org/issue5777    created  aotto1968                     

sys.version format differs between MSC and GCC                   04/17/09
       http://bugs.python.org/issue5778    created  t-kamiya                      

_elementtree import can fail silently                            04/17/09
CLOSED http://bugs.python.org/issue5779    created  naufraghi                     

test_float fails for 'legacy' float repr style                   04/17/09
       http://bugs.python.org/issue5780    created  marketdickinson               
       patch                                                                   

Legacy float repr is used unnecessarily on some platforms        04/17/09
       http://bugs.python.org/issue5781    created  marketdickinson               
       easy                                                                    

',' formatting with empty format type '' (PEP 378)               04/17/09
       http://bugs.python.org/issue5782    created  eric.smith                    
       easy                                                                    

Issues Now Closed (37)
______________________

Use shorter float repr when possible                              493 days
       http://bugs.python.org/issue1580    marketdickinson               
       patch                                                                   

"make altinstall" installs pydoc, idle, smtpd.py                  490 days
       http://bugs.python.org/issue1590    nad                           
       patch                                                                   

PyString_FromStringAndSize() to be considered unsafe              369 days
       http://bugs.python.org/issue2587    psss                          

PyOS_vsnprintf() underflow leads to memory corruption             371 days
       http://bugs.python.org/issue2588    psss                          
       patch                                                                   

Handle ASDLSyntaxErrors gracefully                                347 days
       http://bugs.python.org/issue2725    georg.brandl                  
       patch                                                                   

Starting any program as a subprocess fails when subprocess.Popen  265 days
       http://bugs.python.org/issue3440    r.david.murray                
       patch                                                                   

Allow Division of datetime.timedelta Objects                      156 days
       http://bugs.python.org/issue4291    Jeremy Banks                  

asyncore's urgent data management and connection closed events    131 days
       http://bugs.python.org/issue4501    r.david.murray                
       patch                                                                   

handling inf/nan in '%f'                                          101 days
       http://bugs.python.org/issue4799    eric.smith                    
       patch                                                                   

native build of python win32 using msys under wine.                90 days
       http://bugs.python.org/issue4954    lritter                       

test_maxint64 fails on 32-bit systems due to assumption that 64-   81 days
       http://bugs.python.org/issue4977    pitrou                        

io-c: TextIOWrapper is faster than BufferedReader but not protec   25 days
       http://bugs.python.org/issue5502    pitrou                        
       patch                                                                   

Lib/distutils/test/test_util: test_get_platform bogus for OSX      14 days
       http://bugs.python.org/issue5607    tarek                         

Add more pickling tests                                            14 days
       http://bugs.python.org/issue5665    collinwinter                  
       patch, needs review                                                     

string module requires bytes type for maketrans, but calling met   10 days
       http://bugs.python.org/issue5675    georg.brandl                  

pydoc -w doesn't produce proper HTML                                6 days
       http://bugs.python.org/issue5698    georg.brandl                  
       patch, patch, needs review                                              

inside *currentmodule* some links is disabled                       7 days
       http://bugs.python.org/issue5703    georg.brandl                  

Command line option '-3' should imply '-t'                          7 days
       http://bugs.python.org/issue5704    georg.brandl                  

setuptools doesn't honor standard compiler variables                7 days
       http://bugs.python.org/issue5706    tarek                         

Tiny code polishing to unicode_repeat                               6 days
       http://bugs.python.org/issue5708    georg.brandl                  
       patch                                                                   

optparse: please provide a usage example in the module docstring    5 days
       http://bugs.python.org/issue5719    georg.brandl                  

ignore py3_test_grammar.py syntax error                             0 days
       http://bugs.python.org/issue5733    benjamin.peterson             

SafeConfigParser incorrectly detects lone percent signs             1 days
       http://bugs.python.org/issue5741    georg.brandl                  

multiprocessing.managers not accessible even though docs say so     0 days
       http://bugs.python.org/issue5743    yaneurabeya                   

multiprocessing.managers.BaseManager.connect example typos          0 days
       http://bugs.python.org/issue5744    yaneurabeya                   

email document update (more links)                                  0 days
       http://bugs.python.org/issue5745    georg.brandl                  
       patch                                                                   

socketserver problem upon disconnection (undefined member)          1 days
       http://bugs.python.org/issue5746    benjamin.peterson             

Allow bin() to have an optional "Total Bits" argument.              0 days
       http://bugs.python.org/issue5749    rhettinger                    

weird seg fault                                                     0 days
       http://bugs.python.org/issue5750    amaury.forgeotdarc            

__float__ not called by 'float' on classes derived from str         1 days
       http://bugs.python.org/issue5759    benjamin.peterson             
       patch                                                                   

scope resolving error                                               0 days
       http://bugs.python.org/issue5763    marketdickinson               

logging don't encode Unicode message correctly.                     1 days
       http://bugs.python.org/issue5768    vsajip                        
       patch                                                                   

SA bugs with unittest.py                                            0 days
       http://bugs.python.org/issue5770    benjamin.peterson             
       patch                                                                   

_elementtree import can fail silently                               0 days
       http://bugs.python.org/issue5779    naufraghi                     

DistributionMetaData error ?                                     2217 days
       http://bugs.python.org/issue708320  varash                        

file.seek() influences write() when opened with a+ mode          1008 days
       http://bugs.python.org/issue1521491 amaury.forgeotdarc            

Popen pipe file descriptor leak on OSError in init                641 days
       http://bugs.python.org/issue1751245 benjamin.peterson             

Top Issues Most Discussed (10)
______________________________

 10 io.FileIO calls flush() after file closed                         12 days
open    http://bugs.python.org/issue5700   

  9 Add the iterator protocol to dbm modules                           6 days
open    http://bugs.python.org/issue5736   

  7 2.6.2c1 fails to pass test_cmath on Solaris10                      9 days
open    http://bugs.python.org/issue5724   

  6 test_float fails for 'legacy' float repr style                     0 days
open    http://bugs.python.org/issue5780   

  6 Add test.support.import_python_only                               53 days
open    http://bugs.python.org/issue5354   

  6 10e667.__format__('+') should return 'inf'                       137 days
open    http://bugs.python.org/issue4482   

  5 ignore py3_test_grammar.py syntax error                            0 days
closed  http://bugs.python.org/issue5733   

  5 setdefault speedup                                                 8 days
open    http://bugs.python.org/issue5730   

  5 Support telling TestResult objects a test run has finished         8 days
open    http://bugs.python.org/issue5728   

  5 IDLE shell gives different len() of unicode strings compared to  973 days
open    http://bugs.python.org/issue1542677

From rowen at u.washington.edu  Fri Apr 17 18:23:46 2009
From: rowen at u.washington.edu (Russell Owen)
Date: Fri, 17 Apr 2009 09:23:46 -0700
Subject: [Python-Dev] RELEASED Python 2.6.2
In-Reply-To: <92CB905D-99F8-4727-A1AE-1772EA3ED79C@mac.com>
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
	<rowen-EDC16C.13470815042009@news.gmane.org>
	<FE67E6CC-039C-4524-968F-BB2D1FD42AA3@mac.com>
	<DD982BD4-02AB-4395-AFEE-CD3D0EEB7926@u.washington.edu>
	<92CB905D-99F8-4727-A1AE-1772EA3ED79C@mac.com>
Message-ID: <583B711E-5FD3-4C5A-9DF2-B1725E0F53A7@u.washington.edu>

On Apr 16, 2009, at 11:17 PM, Ronald Oussoren wrote:

> On 16 Apr, 2009, at 20:58, Russell Owen wrote:
>
>> I installed the Mac binary on my Intel 10.5.6 system and it works,  
>> except it still uses Apple's system Tcl/Tk 8.4.7 instead of my  
>> ActiveState 8.4.19 (which is in /Library/Frameworks where one would  
>> expect).
>
> That's very string. I had ActiveState 8.4 installed (whatever was  
> current about a month ago).

I agree. (For what it's worth, you probably have Tcl/Tk 8.4.19 -- a  
version I've found to be very robust. 8.4.19 was released awhile ago  
and is probably the last version of 8.4 we will see, since all  
development is happening on 8.5 now).

Could you try a simple experiment (assuming you still have ActiveState  
Tcl/Tk installed): run python from the command line and enter these  
commands:
import Tkinter
root = Tkinter.Tk()

Then go to the application that comes up and select About Tcl/Tk...  
(in the Python menu) and see what version it reports. When I run with  
the Mac binary of 2.6.2 it reports 8.4.7 (Apple's built-in python).  
When I build python 2.6.2 from source it reports 8.4.19 (my  
ActiveState Tclc/Tk).

>> Just out of curiosity: which 3rd party Tcl/Tk did you have  
>> installed when you made the installer? Perhaps if it was 8.5 that  
>> would explain it. If so I may try updating my Tcl/Tk -- I've been  
>> wanting some of the bug fixes in 8.5 anyway.
>
> Tcl 8.5 won't happen in 2.6, and might not happen in 2.7 either.   
> Tkinter needs to work with the system version of Tcl, which is some  
> version of 8.4,  Tkinter will not work when the major release of Tcl  
> is different than during the compile. That makes it rather hard to  
> support both 8.4 and 8.5 in the same installer.

Perfect. I agree.

-- Russell

From ajaksu at gmail.com  Fri Apr 17 20:04:45 2009
From: ajaksu at gmail.com (Daniel (ajax) Diniz)
Date: Fri, 17 Apr 2009 15:04:45 -0300
Subject: [Python-Dev] Experimental and Test Tracker instances live
Message-ID: <2d75d7660904171104k3d13427au8e91489132cd2e23@mail.gmail.com>

Hi,
As discussed before, I have put two mock Python Tracker instances online.

The Test[1] instance follows bugs.python.org code, so we can test
bugfixes and procedures without breaking the real tracker. The
Experimental[2] one, aka the cool instance, is where new features are
showcased.

Currently no emails are being sent and the dbs can be reset at any
time. If you'd like to play as a registered user, please email me and
I'll create a user (or activate the one you've started to register).

So far, the new features[3] include:
   * Issue tags [4],[5]
   * Quiet properties [6]
   * Restore removed messages and files [7]
   * Claim ('assign to self') and add/remove self as nosy buttons [8]
   * Don't close issues with open dependencies [9]
   * Auto-add nosy users based on Components [10]
   * "Email me" buttons for messages and issues, "Reply by email" [11]
   * RSS feeds (per issue and global) [12]
   * Display selected issues in the index view [13]

You can subscribe to a RSS feed[14] about the new features.

Thanks to everyone who filled RFEs, there's still time to submit yours :)

Regards,
Daniel

[1] http://bot.bio.br/python-dev/
[2] http://bot.bio.br/python-dev-exp/
[3] http://bot.bio.br/python-dev-exp/issue5
[4] http://mail.python.org/pipermail/tracker-discuss/2009-April/002099.html
[5] http://codereview.appspot.com/40100/show
[6] http://psf.upfronthosting.co.za/roundup/meta/issue249
[7] http://psf.upfronthosting.co.za/roundup/meta/issue267
[8] http://psf.upfronthosting.co.za/roundup/meta/issue258
[9] http://psf.upfronthosting.co.za/roundup/meta/issue266
[10] http://psf.upfronthosting.co.za/roundup/meta/issue258
[11] http://psf.upfronthosting.co.za/roundup/meta/issue245
[12] http://psf.upfronthosting.co.za/roundup/meta/issue155
[13] http://psf.upfronthosting.co.za/roundup/meta/issue246
[14] http://bot.bio.br/python-dev-exp/issue5?@template=feed

From bjourne at gmail.com  Fri Apr 17 22:41:49 2009
From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=)
Date: Fri, 17 Apr 2009 22:41:49 +0200
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <loom.20090417T093442-350@post.gmane.org>
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
	<loom.20090416T214556-65@post.gmane.org>
	<52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net>
	<loom.20090417T093442-350@post.gmane.org>
Message-ID: <740c3aec0904171341q78c88633ie4179791a123f251@mail.gmail.com>

It's not only about what people find intuitive. Why care about them?
Most persons aren't programmers. It is about what application
developers find useful too. I have often needed to calculate month
deltas according to the proposal. I suspect many other programmers
have too. Writing a month add function isn't entirely trivial and
would be a good candidate for stdlib imho.

2009/4/17, Antoine Pitrou <solipsis at pitrou.net>:
> James Y Knight <foom <at> fuhm.net> writes:
>>
>> It's a human-interface operation, and as such, everyone (ahem) "knows
>> what it means" to say "2 months from now", but the details don't
>> usually have to be thought about too much.
>
> I don't think it's true. When you say "2 months from now", some people will
> think "9 weeks from now" (or "10 weeks from now"), others "60 days from
> now",
> and yet other will think of the meaning this proposal gives it.
>
> That's why, when scheduling a meeting, you don't say "2 months from now".
> You
> give a precise date instead, because you know otherwise people wouldn't show
> up
> on the same day.
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/bjourne%40gmail.com
>

-- 
mvh Bj?rn

From aahz at pythoncraft.com  Fri Apr 17 22:49:28 2009
From: aahz at pythoncraft.com (Aahz)
Date: Fri, 17 Apr 2009 13:49:28 -0700
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <740c3aec0904171341q78c88633ie4179791a123f251@mail.gmail.com>
References: <b8ad139e0904161441g9782b75r1bd5811157eca079@mail.gmail.com>
	<loom.20090416T214556-65@post.gmane.org>
	<52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net>
	<loom.20090417T093442-350@post.gmane.org>
	<740c3aec0904171341q78c88633ie4179791a123f251@mail.gmail.com>
Message-ID: <20090417204928.GA15121@panix.com>

On Fri, Apr 17, 2009, BJ?rn Lindqvist wrote:
>
> It's not only about what people find intuitive. Why care about them?
> Most persons aren't programmers. It is about what application
> developers find useful too. I have often needed to calculate month
> deltas according to the proposal. I suspect many other programmers
> have too. Writing a month add function isn't entirely trivial and
> would be a good candidate for stdlib imho.

At this point, further discussion really needs to move to python-ideas;
for acceptance in stdlib, there needs to be either well-accepted code out
in the community or a PEP for Guido to pronounce on (or probably both, in
the end).

I've set followups to python-ideas for convenience.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From rowen at u.washington.edu  Sat Apr 18 00:47:11 2009
From: rowen at u.washington.edu (Russell E. Owen)
Date: Fri, 17 Apr 2009 15:47:11 -0700
Subject: [Python-Dev] RELEASED Python 2.6.2
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
	<rowen-EDC16C.13470815042009@news.gmane.org>
	<FE67E6CC-039C-4524-968F-BB2D1FD42AA3@mac.com>
	<DD982BD4-02AB-4395-AFEE-CD3D0EEB7926@u.washington.edu>
	<nad-304E10.20284516042009@news.gmane.org>
Message-ID: <rowen-E2F734.15471117042009@news.gmane.org>

In article <nad-304E10.20284516042009 at news.gmane.org>,
 Ned Deily <nad at acm.org> wrote:

> In article <DD982BD4-02AB-4395-AFEE-CD3D0EEB7926 at u.washington.edu>,
>  Russell Owen <rowen at u.washington.edu> wrote:
> > I installed the Mac binary on my Intel 10.5.6 system and it works,  
> > except it still uses Apple's system Tcl/Tk 8.4.7 instead of my  
> > ActiveState 8.4.19 (which is in /Library/Frameworks where one would  
> > expect).
> > 
> > I just built python from source and that version does use ActiveState  
> > 8.4.19.
> > 
> > I wish I knew what's going on. Not being able to use the binary  
> > distros is a bit of a pain.
> 
> You're right, the tkinter included with the 2.6.2 installer is not 
> linked properly:
> 
> Is:
> $ cd /Library/Frameworks/Python.framework/Versions/2.6
> $ cd lib/python2.6/lib-dynload
> $ otool -L _tkinter.so 
> _tkinter.so:
>    /System/Library/Frameworks/Tcl.framework/Versions/8.4/Tcl 
> (compatibility version 8.4.0, current version 8.4.0)
>    /System/Library/Frameworks/Tk.framework/Versions/8.4/Tk 
> (compatibility version 8.4.0, current version 8.4.0)
>    /usr/lib/libSystem.B.dylib [...]
> 
> should be:
> _tkinter.so:
>    /Library/Frameworks/Tcl.framework/Versions/8.4/Tcl (compatibility 
> version 8.4.0, current version 8.4.19)
>    /Library/Frameworks/Tk.framework/Versions/8.4/Tk (compatibility 
> version 8.4.0, current version 8.4.19)
>    /usr/lib/libSystem.B.dylib [...]

Just for the record, when I built Python 2.6 from source I got the 
latter output (the desired result).

If someone can point me to instructions I'm willing to try to make a 
binary installer and make it available (though I'd much prefer to debug 
the standard installer).

-- Russell

From nad at acm.org  Sat Apr 18 01:13:34 2009
From: nad at acm.org (Ned Deily)
Date: Fri, 17 Apr 2009 16:13:34 -0700
Subject: [Python-Dev] RELEASED Python 2.6.2
References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org>
	<rowen-EDC16C.13470815042009@news.gmane.org>
	<FE67E6CC-039C-4524-968F-BB2D1FD42AA3@mac.com>
	<DD982BD4-02AB-4395-AFEE-CD3D0EEB7926@u.washington.edu>
	<nad-304E10.20284516042009@news.gmane.org>
	<rowen-E2F734.15471117042009@news.gmane.org>
Message-ID: <nad-121542.16133417042009@news.gmane.org>

In article <rowen-E2F734.15471117042009 at news.gmane.org>,
 "Russell E. Owen" <rowen at u.washington.edu> wrote:
> In article <nad-304E10.20284516042009 at news.gmane.org>,
>  Ned Deily <nad at acm.org> wrote:
> > In article <DD982BD4-02AB-4395-AFEE-CD3D0EEB7926 at u.washington.edu>,
> >  Russell Owen <rowen at u.washington.edu> wrote:
> > > I installed the Mac binary on my Intel 10.5.6 system and it works,  
> > > except it still uses Apple's system Tcl/Tk 8.4.7 instead of my  
> > > ActiveState 8.4.19 (which is in /Library/Frameworks where one would  
> > > expect).
> > > 
> > > I just built python from source and that version does use ActiveState  
> > > 8.4.19.
> > > 
> > > I wish I knew what's going on. Not being able to use the binary  
> > > distros is a bit of a pain.
> > 
> > You're right, the tkinter included with the 2.6.2 installer is not 
> > linked properly:
> > 
> > Is:
> > $ cd /Library/Frameworks/Python.framework/Versions/2.6
> > $ cd lib/python2.6/lib-dynload
> > $ otool -L _tkinter.so 
> > _tkinter.so:
> >    /System/Library/Frameworks/Tcl.framework/Versions/8.4/Tcl 
> > (compatibility version 8.4.0, current version 8.4.0)
> >    /System/Library/Frameworks/Tk.framework/Versions/8.4/Tk 
> > (compatibility version 8.4.0, current version 8.4.0)
> >    /usr/lib/libSystem.B.dylib [...]
> > 
> > should be:
> > _tkinter.so:
> >    /Library/Frameworks/Tcl.framework/Versions/8.4/Tcl (compatibility 
> > version 8.4.0, current version 8.4.19)
> >    /Library/Frameworks/Tk.framework/Versions/8.4/Tk (compatibility 
> > version 8.4.0, current version 8.4.19)
> >    /usr/lib/libSystem.B.dylib [...]
> 
> Just for the record, when I built Python 2.6 from source I got the 
> latter output (the desired result).
> 
> If someone can point me to instructions I'm willing to try to make a 
> binary installer and make it available (though I'd much prefer to debug 
> the standard installer).

I suspect Ronald will be fixing this in the standard installer soon.

-- 
 Ned Deily,
 nad at acm.org

From ncoghlan at gmail.com  Sat Apr 18 14:41:14 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 18 Apr 2009 22:41:14 +1000
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse
 in	3.1 (and urlparse in 2.7)
In-Reply-To: <d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>	<49D09ECF.5090407@trueblade.com>	<ad1f81530903300355g2e112cadwcf5250761d4e1f87@mail.gmail.com>	<49D0ACD5.5090209@gmail.com>	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>	<loom.20090412T215625-611@post.gmane.org>	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
Message-ID: <49E9CA6A.6060004@gmail.com>

Steven Bethard wrote:
> On Mon, Apr 13, 2009 at 1:14 PM, Mart S?mermaa <mrts.pydev at gmail.com> wrote:
>> A default behaviour should be found that works according to most
>> user's expectations so that they don't need to use the positional
>> arguments generally.
> 
> I believe the usual Python approach here is to have two variants of
> the function, add_query_params and add_query_params_no_dups (or
> whatever you want to name them). That way the flag parameter is
> "named" right in the function name.

Yep - Guido has pointed out in a few different API design discussions
that a boolean flag that is almost always set to a literal True or False
is a good sign that there are two functions involved rather than just
one. There are exceptions to that guideline (e.g. the reverse argument
for sorted and list.sort), but they aren't common, and even when they do
crop up, making them keyword-only arguments is strongly recommended.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From regebro at gmail.com  Sat Apr 18 18:03:52 2009
From: regebro at gmail.com (Lennart Regebro)
Date: Sat, 18 Apr 2009 18:03:52 +0200
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
Message-ID: <319e029f0904180903t39941b55o34478b868f09a876@mail.gmail.com>

On Thu, Apr 16, 2009 at 08:18, Jess Austin <jess.austin at gmail.com> wrote:
> hi,
>
> I'm new to python core development, and I've been advised to write to
> python-dev concerning a feature/patch I've placed at
> http://bugs.python.org/issue5434, with Rietveld at
> http://codereview.appspot.com/25079.
>
> This patch adds a "monthdelta" class and a "monthmod" function to the
> datetime module. ?The monthdelta class is much like the existing
> timedelta class, except that it represents months offset from a date,
> rather than an exact period offset from a date. ?This allows us to
> easily say, e.g. "3 months from now" without worrying about the number
> of days in the intervening months.
>
> ? ?>>> date(2008, 1, 30) + monthdelta(1)
> ? ?datetime.date(2008, 2, 29)
> ? ?>>> date(2008, 1, 30) + monthdelta(2)
> ? ?datetime.date(2008, 3, 30)
>
> The monthmod function, named in (imperfect) analogy to divmod, allows
> us to round-trip by returning the interim between two dates
> represented as a (monthdelta, timedelta) tuple:
>
> ? ?>>> monthmod(date(2008, 1, 14), date(2009, 4, 2))
> ? ?(datetime.monthdelta(14), datetime.timedelta(19))
>
> Invariant: dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] == dt + td
>
> These also work with datetimes! ?There are more details in the
> documentation included in the patch. ?In addition to the C module
> file, I've updated the datetime CAPI, the documentation, and tests.
>
> I feel this would be a good addition to core python. ?In my work, I've
> often ended up writing annoying one-off "add-a-month" or similar
> functions. ?I think since months work differently than most other time
> periods, a new object is justified rather than trying to shoe-horn
> something like this into timedelta. ?I also think that the round-trip
> functionality provided by monthmod is important to ensure that
> monthdeltas are "first-class" objects.
>
> Please let me know what you think of the idea and/or its execution.

There are so many meanings of "one month from now" so I'd rather see a
bunch of methods for monthly manipulations than a monthdelta class.

Obvious:
Tuesday February 3rd 2009 + 1 month = Tuesday March 3rd 2009

Not obvious:
Tuesday March 3rd 2009 + 1 month = Tuesday April 7th 2009 (5 weeks)
Tuesday April 7th 2009 + 1 months = Tuesday May 5th 2009 (4 weeks)

Problematic:
Tuesday March 31st 2009 + 1 month = what? Thursday April 30th 2009? Error?

Just supporting the obvious case is just not enough to be worth the work. Doing
  month = month + 1
  if month > 12:
     month = 1
     year = year +1
  lastday = calendar.monthrange(year, month)[1]
  if day > lastday:
    day = lastday

Isn't really enough work to warrant it's own class IMO, even though
it's a method I also end up doing all the time in every bloody
calendar implementation I've done. :)

And then comes the same question when talking about years.
One year after the 20th of March 2011 may be the 20th of March 2012.
But it could also be 19th of March, as 2012 is a leap year. And a year
later still would then be the 20th of Match 2013 again... Code that
doesn't support ALL the weird-ass variants is really not worth putting
into the standard library, IMO.

I'd recommend you to look at the dateutil.rrule code, maybe there is
something you can use there. Perhaps there is something there that can
be used straight off. Or at least maybe it can be extracted to it's
own extended timedelta library that supports more advanced timedeltas,
including "second to last wednesday" and "first sunday after easter".

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64

From MLMLists at Comcast.net  Sat Apr 18 22:08:55 2009
From: MLMLists at Comcast.net (Mitchell L Model)
Date: Sat, 18 Apr 2009 16:08:55 -0400
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
Message-ID: <p06240800c60fe2a6a405@[10.0.1.221]>

Some library files, such as pdb.py, begin with
	#!/usr/bin/env python
In various discussions regarding some issues I submitted I was told 
that the decision had been made to call Python 3.x release 
executables python3. (One of the conflicts I ran into when I made 
'python' a link to python3.1 was that some tools used in making the 
HTML documentation haven't been upgraded to run with 3.)

Shouldn't all library files that begin with the above line be changed 
so that they read 'python3' instead of python? Perhaps I should have 
just filed this as an issue, but I'm not confident of the state of 
the plan to move to python3 as the official executable name.

From mrts.pydev at gmail.com  Sat Apr 18 22:42:40 2009
From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=)
Date: Sat, 18 Apr 2009 23:42:40 +0300
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <49E9CA6A.6060004@gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<49D0ACD5.5090209@gmail.com>
	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
Message-ID: <ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>

On Sat, Apr 18, 2009 at 3:41 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Yep - Guido has pointed out in a few different API design discussions
> that a boolean flag that is almost always set to a literal True or False
> is a good sign that there are two functions involved rather than just
> one. There are exceptions to that guideline (e.g. the reverse argument
> for sorted and list.sort), but they aren't common, and even when they do
> crop up, making them keyword-only arguments is strongly recommended.

As you yourself previously noted -- "it is often
better to use *args for the two positional arguments - it avoids
accidental name conflicts between the positional arguments and arbitrary
keyword arguments" -- kwargs may cause name conflicts.

But I also agree, that the current proliferation of positional args is ugly.

add_query_params_no_dups() would be suboptimal though, as there are
currently three different ways to handle the duplicates:
* allow duplicates everywhere (True),
* remove duplicate *values* for the same key (False),
* behave like dict.update -- remove duplicate *keys*, unless
explicitly passed a list (None).

(See the documentation at
http://github.com/mrts/qparams/blob/bf1b29ad46f9d848d5609de6de0bfac1200da310/qparams.py
).

Additionally, as proposed by Antoine Pitrou, removing keys could be implemented.

It feels awkward to start a PEP for such a marginal feature, but
clearly a couple of enlightened design decisions are required.

From benjamin at python.org  Sat Apr 18 22:48:08 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sat, 18 Apr 2009 15:48:08 -0500
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <p06240800c60fe2a6a405@10.0.1.221>
References: <p06240800c60fe2a6a405@10.0.1.221>
Message-ID: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>

2009/4/18 Mitchell L Model <MLMLists at comcast.net>:
> Some library files, such as pdb.py, begin with
> ? ? ? ?#!/usr/bin/env python
> In various discussions regarding some issues I submitted I was told that the
> decision had been made to call Python 3.x release executables python3. (One
> of the conflicts I ran into when I made 'python' a link to python3.1 was
> that some tools used in making the HTML documentation haven't been upgraded
> to run with 3.)
>
> Shouldn't all library files that begin with the above line be changed so
> that they read 'python3' instead of python? Perhaps I should have just filed
> this as an issue, but I'm not confident of the state of the plan to move to
> python3 as the official executable name.

That sounds correct. Please file a bug report.

-- 
Regards,
Benjamin

From kevin at bud.ca  Sun Apr 19 01:01:02 2009
From: kevin at bud.ca (Kevin Teague)
Date: Sat, 18 Apr 2009 16:01:02 -0700
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <p06240800c60fe2a6a405@[10.0.1.221]>
References: <p06240800c60fe2a6a405@[10.0.1.221]>
Message-ID: <4838697B-885E-433E-A50E-1EDC5C96A5EC@bud.ca>

On Apr 18, 2009, at 1:08 PM, Mitchell L Model wrote:

> Some library files, such as pdb.py, begin with
> 	#!/usr/bin/env python
> In various discussions regarding some issues I submitted I was told  
> that the decision had been made to call Python 3.x release  
> executables python3. (One of the conflicts I ran into when I made  
> 'python' a link to python3.1 was that some tools used in making the  
> HTML documentation haven't been upgraded to run with 3.)
>
> Shouldn't all library files that begin with the above line be  
> changed so that they read 'python3' instead of python? Perhaps I  
> should have just filed this as an issue, but I'm not confident of  
> the state of the plan to move to python3 as the official executable  
> name.

Hrmm ...

On installing from source, one either gets:

./bin/python3.0

Or is using 'make fullinstall':

./bin/python

So the default and the tutorial (http://docs.python.org/3.0/tutorial/interpreter.html 
) refer to 'python3.0'. But I've done all my Python installs with  
'make fullinstall' and then just manage my environment such that  
'python' points to a 2.x or 3.x release depending upon what the source  
code I'm working on requires. If using something such as the Mac OS X  
Installer you'll get both a 'python' and 'python3.0'.

Are there some Python installers that provide './bin/python3'?

But if there sometimes just 'python', 'python3.0' or 'python3' then  
it's not possible for the shebang to work with both all known install  
methods ...

One could argue that executable files part of the python standard  
library should have their interpreter hard-coded to the python  
interpreter to which they are installed with, e.g.:

#!/Users/kteague/shared/python-3.0.1/bin/python

Of course, this would remove the ability for a Python installation to  
be re-located ... if you wanted to move the install, you'd need to re- 
install it in order to maintain the proper shebangs. But it would mean  
that these scripts would also use the correct interpreter regardless  
of a user's current environemnt.

Or, if the standard library was packaged such that all of it's scripts  
were advertised as console_scripts in the entry_points, it'd be easier  
for different install approaches to decide how to write out the  
shebang or to instead provide wrapper scripts for accessing those  
entry points (since it might be nice to have a ./bin/pdb). But that's  
a bit pie-in-the-sky since entry_points isn't even yet a part of the  
Distutils Metadata.

From ncoghlan at gmail.com  Sun Apr 19 01:06:41 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 19 Apr 2009 09:06:41 +1000
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse
 in	3.1 (and urlparse in 2.7)
In-Reply-To: <ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>	<49D0ACD5.5090209@gmail.com>	<ad1f81530903300522m51fd1099s90c05983ae748fa3@mail.gmail.com>	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>	<loom.20090412T215625-611@post.gmane.org>	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
Message-ID: <49EA5D01.6040208@gmail.com>

Mart S?mermaa wrote:
> On Sat, Apr 18, 2009 at 3:41 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Yep - Guido has pointed out in a few different API design discussions
>> that a boolean flag that is almost always set to a literal True or False
>> is a good sign that there are two functions involved rather than just
>> one. There are exceptions to that guideline (e.g. the reverse argument
>> for sorted and list.sort), but they aren't common, and even when they do
>> crop up, making them keyword-only arguments is strongly recommended.
> 
> As you yourself previously noted -- "it is often
> better to use *args for the two positional arguments - it avoids
> accidental name conflicts between the positional arguments and arbitrary
> keyword arguments" -- kwargs may cause name conflicts.

Despite what I said earlier, it is probably OK to use named parameters
on the function in this case, especially since you have 3 optional
arguments that someone may want to specify independently of each other.
If someone really wants to add a query parameter to their URL that
conflicts with one of the function parameter names then they can pass
them in the same way they would pass in parameters that don't meet the
rules for a Python identifier (i.e. using the explicit params dictionary).

Something that can be done to even further reduce the chance of
conflicts is to prefix the function parameter names with underscores:

  def add_query_params(_url, _dups, _params, _sep, **kwargs)

That said, I'm starting to wonder if an even better option may be to
just drop the kwargs support from the function and require people to
always supply a parameters dictionary. That would simplify the signature
to the quite straightforward:

  def add_query_params(url, params, allow_dups=True, sep='&')

The "keyword arguments as query parameters" style would still be
supported via dict's own constructor:

>>> add_query_params('foo', dict(bar='baz'))
'foo?bar=baz'

>>> add_query_params('http://example.com/a/b/c?a=b', dict(b='d'))
'http://example.com/a/b/c?a=b&b=d'

>>> add_query_params('http://example.com/a/b/c?a=b&c=q',
... dict(a='b', b='d', c='q'))
'http://example.com/a/b/c?a=b&c=q&a=b&c=q&b=d'

>>> add_query_params('http://example.com/a/b/c?a=b', dict(a='c', b='d'))
'http://example.com/a/b/c?a=b&a=c&b=d'

This also makes the transition to a different container type (such as
OrderedDict) cleaner, since you will already be constructing a separate
object to hold the new parameters.

> But I also agree, that the current proliferation of positional args is ugly.
> 
> add_query_params_no_dups() would be suboptimal though, as there are
> currently three different ways to handle the duplicates:
> * allow duplicates everywhere (True),
> * remove duplicate *values* for the same key (False),
> * behave like dict.update -- remove duplicate *keys*, unless
> explicitly passed a list (None).

So if we went the multiple functions route, we would have at least:

add_query_params_allow_duplicates()
add_query_params_ignore_duplicate_items()
add_query_params_ignore_duplicate_keys()

I agree that isn't a good option, but mapping True/False/None to those
specific behaviours also seems rather arbitrary (specifically, it is
difficult to remember which of "allow_dups=False" and "allow_dups=None"
means to ignore any duplicate keys and which means to ignore only
duplicate items). It also doesn't provide a clear mechanism for
extension (e.g. what if someone wanted duplicate params to trigger an
exception?)

Perhaps the extra argument should just be a key/value pair filtering
function, and we provide functions for the three existing behaviours
(i.e. allow_duplicates(), ignore_duplicate_keys(),
ignore_duplicate_items()) in the urllib.parse module.

> (See the documentation at
> http://github.com/mrts/qparams/blob/bf1b29ad46f9d848d5609de6de0bfac1200da310/qparams.py
> ).

Note that your implementation and docstring currently conflict with each
other - the docstring says "pass them via a dictionary in second
argument:" but the dictionary is currently the third argument (the
docstring also later refers to passing OrderedDictionary as the second
argument).

Phrases like "second optional argument" and "fourth optional argument"
are also ambiguous - do they refer to "the second argument, which
happens to be optional" or to "the second of the optional arguments".
The fact that changing the function signature to disallow keyword
argument would make the optional parameters easier to refer to is a big
win in my book.

> Additionally, as proposed by Antoine Pitrou, removing keys could be implemented.
> 
> It feels awkward to start a PEP for such a marginal feature, but
> clearly a couple of enlightened design decisions are required.

Probably not a PEP - just a couple of documented design decisions on a
tracker item pointing to discussion on this list for the rationale.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Sun Apr 19 01:19:00 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 19 Apr 2009 09:19:00 +1000
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
Message-ID: <49EA5FE4.9040102@gmail.com>

Benjamin Peterson wrote:
> 2009/4/18 Mitchell L Model <MLMLists at comcast.net>:
>> Some library files, such as pdb.py, begin with
>>        #!/usr/bin/env python
>> In various discussions regarding some issues I submitted I was told that the
>> decision had been made to call Python 3.x release executables python3. (One
>> of the conflicts I ran into when I made 'python' a link to python3.1 was
>> that some tools used in making the HTML documentation haven't been upgraded
>> to run with 3.)
>>
>> Shouldn't all library files that begin with the above line be changed so
>> that they read 'python3' instead of python? Perhaps I should have just filed
>> this as an issue, but I'm not confident of the state of the plan to move to
>> python3 as the official executable name.
> 
> That sounds correct. Please file a bug report.

As Kevin pointed out, while this is a problem, changing the affected
scripts to say "python3" instead isn't the right answer.

All that happened with the Python 3 installers is that they do
'altinstall' rather than 'fullinstall' by default, thus leaving the
'python' alias alone. There is no "python3" alias unless a user creates
it for themselves (or a distro packager does it for them).

I see a few options:
1. Abandon the "python" name for the 3.x series and commit to calling it
"python3" now and forever (i.e. actually make the decision that Mitchell
refers to).
2. Remove the offending shebang lines from the affected files and tell
people to use "python -m <module>" instead.
3. Change the shebang lines in Python standard library scripts to be
version specific and update release.py to fix them all when bumping the
version number in the source tree.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From benjamin at python.org  Sun Apr 19 05:14:17 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Sat, 18 Apr 2009 22:14:17 -0500
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <49EA5FE4.9040102@gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
Message-ID: <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>

2009/4/18 Nick Coghlan <ncoghlan at gmail.com>:
> Benjamin Peterson wrote:
>> 2009/4/18 Mitchell L Model <MLMLists at comcast.net>:
>>> Some library files, such as pdb.py, begin with
>>> ? ? ? ?#!/usr/bin/env python
>>> In various discussions regarding some issues I submitted I was told that the
>>> decision had been made to call Python 3.x release executables python3. (One
>>> of the conflicts I ran into when I made 'python' a link to python3.1 was
>>> that some tools used in making the HTML documentation haven't been upgraded
>>> to run with 3.)
>>>
>>> Shouldn't all library files that begin with the above line be changed so
>>> that they read 'python3' instead of python? Perhaps I should have just filed
>>> this as an issue, but I'm not confident of the state of the plan to move to
>>> python3 as the official executable name.
>>
>> That sounds correct. Please file a bug report.
>
> As Kevin pointed out, while this is a problem, changing the affected
> scripts to say "python3" instead isn't the right answer.
>
> All that happened with the Python 3 installers is that they do
> 'altinstall' rather than 'fullinstall' by default, thus leaving the
> 'python' alias alone. There is no "python3" alias unless a user creates
> it for themselves (or a distro packager does it for them).

I've actually implemented a python3 alias for 3.1.

>
> I see a few options:
> 1. Abandon the "python" name for the 3.x series and commit to calling it
> "python3" now and forever (i.e. actually make the decision that Mitchell
> refers to).

I believe this was decided on sometime (the sprints?).

-- 
Regards,
Benjamin

From ncoghlan at gmail.com  Sun Apr 19 05:22:55 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 19 Apr 2009 13:22:55 +1000
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>	
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>	
	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
Message-ID: <49EA990F.6060301@gmail.com>

Benjamin Peterson wrote:
> 2009/4/18 Nick Coghlan <ncoghlan at gmail.com>:
>> I see a few options:
>> 1. Abandon the "python" name for the 3.x series and commit to calling it
>> "python3" now and forever (i.e. actually make the decision that Mitchell
>> refers to).
> 
> I believe this was decided on sometime (the sprints?).

If that decision has already been made, then sure, changing the shebang
lines to use the new name is the right thing to do.

It certainly wouldn't be the first time something was discussed at Pycon
or the sprints and those involved forgot to mention the outcome on the
list :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From steven.bethard at gmail.com  Sun Apr 19 05:51:46 2009
From: steven.bethard at gmail.com (Steven Bethard)
Date: Sat, 18 Apr 2009 20:51:46 -0700
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
Message-ID: <d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>

On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2009/4/18 Nick Coghlan <ncoghlan at gmail.com>:
>> I see a few options:
>> 1. Abandon the "python" name for the 3.x series and commit to calling it
>> "python3" now and forever (i.e. actually make the decision that Mitchell
>> refers to).
>
> I believe this was decided on sometime (the sprints?).

That's an unfortunate decision. When the 2.X line stops being
maintained (after 2.7 maybe?) we're going to be stuck with the "3"
suffix forever for the "real" Python.

Why doesn't it make more sense to just use "python3" only for
"altinstall" and "python" for "fullinstall"?

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From tonynelson at georgeanelson.com  Sun Apr 19 06:29:08 2009
From: tonynelson at georgeanelson.com (Tony Nelson)
Date: Sun, 19 Apr 2009 00:29:08 -0400
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
Message-ID: <p04330100c610573fa601@[192.168.123.162]>

At 20:51 -0700 04/18/2009, Steven Bethard wrote:
>On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson <benjamin at python.org>
>wrote:
>> 2009/4/18 Nick Coghlan <ncoghlan at gmail.com>:
>>> I see a few options:
>>> 1. Abandon the "python" name for the 3.x series and commit to calling it
>>> "python3" now and forever (i.e. actually make the decision that Mitchell
>>> refers to).
>>
>> I believe this was decided on sometime (the sprints?).
>
>That's an unfortunate decision. When the 2.X line stops being
>maintained (after 2.7 maybe?) we're going to be stuck with the "3"
>suffix forever for the "real" Python.
>
>Why doesn't it make more sense to just use "python3" only for
>"altinstall" and "python" for "fullinstall"?

Just use python3 in the shebang lines all the time (where applicable ;), as
it is made by both altinstall and fullinstall.  fullinstall also make plain
"python", but that is not important.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>

From ncoghlan at gmail.com  Sun Apr 19 06:37:54 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 19 Apr 2009 14:37:54 +1000
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>	
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>	
	<49EA5FE4.9040102@gmail.com>	
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
Message-ID: <49EAAAA2.4040800@gmail.com>

Steven Bethard wrote:
> On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson <benjamin at python.org> wrote:
>> 2009/4/18 Nick Coghlan <ncoghlan at gmail.com>:
>>> I see a few options:
>>> 1. Abandon the "python" name for the 3.x series and commit to calling it
>>> "python3" now and forever (i.e. actually make the decision that Mitchell
>>> refers to).
>> I believe this was decided on sometime (the sprints?).
> 
> That's an unfortunate decision. When the 2.X line stops being
> maintained (after 2.7 maybe?) we're going to be stuck with the "3"
> suffix forever for the "real" Python.
> 
> Why doesn't it make more sense to just use "python3" only for
> "altinstall" and "python" for "fullinstall"?

Note that such an approach would then require an altaltinstall command
in order to be able to install a specific version of python 3.x without
changing the python3 alias (e.g. installing 3.2 without overriding 3.1).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From steven.bethard at gmail.com  Sun Apr 19 06:45:14 2009
From: steven.bethard at gmail.com (Steven Bethard)
Date: Sat, 18 Apr 2009 21:45:14 -0700
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <49EAAAA2.4040800@gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
	<49EAAAA2.4040800@gmail.com>
Message-ID: <d11dcfba0904182145y38d823e4yfe3b02718130678a@mail.gmail.com>

On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Steven Bethard wrote:
>> On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson <benjamin at python.org> wrote:
>>> 2009/4/18 Nick Coghlan <ncoghlan at gmail.com>:
>>>> I see a few options:
>>>> 1. Abandon the "python" name for the 3.x series and commit to calling it
>>>> "python3" now and forever (i.e. actually make the decision that Mitchell
>>>> refers to).
>>> I believe this was decided on sometime (the sprints?).
>>
>> That's an unfortunate decision. When the 2.X line stops being
>> maintained (after 2.7 maybe?) we're going to be stuck with the "3"
>> suffix forever for the "real" Python.
>>
>> Why doesn't it make more sense to just use "python3" only for
>> "altinstall" and "python" for "fullinstall"?
>
> Note that such an approach would then require an altaltinstall command
> in order to be able to install a specific version of python 3.x without
> changing the python3 alias (e.g. installing 3.2 without overriding 3.1).

I wasn't suggesting that there shouldn't be a "python3.1",
"python3.2", etc. I'm more concerned about "fullinstall" creating
"python3" instead of regular "python".

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From ncoghlan at gmail.com  Sun Apr 19 07:04:02 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 19 Apr 2009 15:04:02 +1000
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <d11dcfba0904182145y38d823e4yfe3b02718130678a@mail.gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>	
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>	
	<49EA5FE4.9040102@gmail.com>	
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>	
	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>	
	<49EAAAA2.4040800@gmail.com>
	<d11dcfba0904182145y38d823e4yfe3b02718130678a@mail.gmail.com>
Message-ID: <49EAB0C2.8040506@gmail.com>

Steven Bethard wrote:
> On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Note that such an approach would then require an altaltinstall command
>> in order to be able to install a specific version of python 3.x without
>> changing the python3 alias (e.g. installing 3.2 without overriding 3.1).
> 
> I wasn't suggesting that there shouldn't be a "python3.1",
> "python3.2", etc. I'm more concerned about "fullinstall" creating
> "python3" instead of regular "python".

If I understand Tony's summary correctly, the situation after Benjamin's
latest checkin is as follows:

2.x altinstall:
  - installs python2.x executable

2.x fullinstall (default for "make install"):
  - installs python2.x executable
  - adjusts (or creates) python symlink to new executable

3.x altinstall (default for "make install"):
  - installs python3.x executable
  - adjusts (or creates) python3 symlink to new executable

3.x fullinstall:
  - installs python3.x executable
  - adjusts (or creates) python3 symlink to new executable
  - adjusts (or creates) python symlink to new executable

With that setup, I'm sure we're going to get people complaining that
'altinstall' of 3.2 broke their python3 symlink from 3.1. If there are
going to be 3 levels of executable naming (python3.x, python3, python),
there needs to be 3 levels of installation rather than the traditional 2.

For example, add a new target "py3install" and make that the default for
3.1:

3.x altinstall:
  - installs python3.x executable

3.x py3install (default for "make install"):
  - installs python3.x executable
  - adjusts (or creates) python3 symlink to new executable

3.x fullinstall:
  - installs python3.x executable
  - adjusts (or creates) python3 symlink to new executable
  - adjusts (or creates) python symlink to new executable

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From steven.bethard at gmail.com  Sun Apr 19 07:14:32 2009
From: steven.bethard at gmail.com (Steven Bethard)
Date: Sat, 18 Apr 2009 22:14:32 -0700
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <49EAB0C2.8040506@gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
	<49EAAAA2.4040800@gmail.com>
	<d11dcfba0904182145y38d823e4yfe3b02718130678a@mail.gmail.com>
	<49EAB0C2.8040506@gmail.com>
Message-ID: <d11dcfba0904182214n46e29ad8sdd9994be8841f9da@mail.gmail.com>

On Sat, Apr 18, 2009 at 10:04 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Steven Bethard wrote:
>> On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> Note that such an approach would then require an altaltinstall command
>>> in order to be able to install a specific version of python 3.x without
>>> changing the python3 alias (e.g. installing 3.2 without overriding 3.1).
>>
>> I wasn't suggesting that there shouldn't be a "python3.1",
>> "python3.2", etc. I'm more concerned about "fullinstall" creating
>> "python3" instead of regular "python".
>
> If I understand Tony's summary correctly, the situation after Benjamin's
> latest checkin is as follows:
>
> 2.x altinstall:
> ?- installs python2.x executable
>
> 2.x fullinstall (default for "make install"):
> ?- installs python2.x executable
> ?- adjusts (or creates) python symlink to new executable
>
> 3.x altinstall (default for "make install"):
> ?- installs python3.x executable
> ?- adjusts (or creates) python3 symlink to new executable
>
> 3.x fullinstall:
> ?- installs python3.x executable
> ?- adjusts (or creates) python3 symlink to new executable
> ?- adjusts (or creates) python symlink to new executable

Thanks for the clear explanation. The fact that "python" still appears
with "fullinstall" covers my concern.

> With that setup, I'm sure we're going to get people complaining that
> 'altinstall' of 3.2 broke their python3 symlink from 3.1. If there are
> going to be 3 levels of executable naming (python3.x, python3, python),
> there needs to be 3 levels of installation rather than the traditional 2.
>
> For example, add a new target "py3install" and make that the default for
> 3.1:
>
> 3.x altinstall:
> ?- installs python3.x executable
>
> 3.x py3install (default for "make install"):
> ?- installs python3.x executable
> ?- adjusts (or creates) python3 symlink to new executable
>
> 3.x fullinstall:
> ?- installs python3.x executable
> ?- adjusts (or creates) python3 symlink to new executable
> ?- adjusts (or creates) python symlink to new executable

Yep, I agree this is what needs done to sensibly support a "python3".

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From allan at archlinux.org  Sun Apr 19 07:23:13 2009
From: allan at archlinux.org (Allan McRae)
Date: Sun, 19 Apr 2009 15:23:13 +1000
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <49EAB0C2.8040506@gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>		<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>		<49EA5FE4.9040102@gmail.com>		<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>		<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>		<49EAAAA2.4040800@gmail.com>	<d11dcfba0904182145y38d823e4yfe3b02718130678a@mail.gmail.com>
	<49EAB0C2.8040506@gmail.com>
Message-ID: <BAY102-DAV1315D64859AC41DFD16F2B86790@phx.gbl>

Nick Coghlan wrote:
> Steven Bethard wrote:
>   
>> On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>     
>>> Note that such an approach would then require an altaltinstall command
>>> in order to be able to install a specific version of python 3.x without
>>> changing the python3 alias (e.g. installing 3.2 without overriding 3.1).
>>>       
>> I wasn't suggesting that there shouldn't be a "python3.1",
>> "python3.2", etc. I'm more concerned about "fullinstall" creating
>> "python3" instead of regular "python".
>>     
>
> If I understand Tony's summary correctly, the situation after Benjamin's
> latest checkin is as follows:
>
> 2.x altinstall:
>   - installs python2.x executable
>
> 2.x fullinstall (default for "make install"):
>   - installs python2.x executable
>   - adjusts (or creates) python symlink to new executable
>
> 3.x altinstall (default for "make install"):
>   - installs python3.x executable
>   - adjusts (or creates) python3 symlink to new executable
>
> 3.x fullinstall:
>   - installs python3.x executable
>   - adjusts (or creates) python3 symlink to new executable
>   - adjusts (or creates) python symlink to new executable
>
> With that setup, I'm sure we're going to get people complaining that
> 'altinstall' of 3.2 broke their python3 symlink from 3.1. If there are
> going to be 3 levels of executable naming (python3.x, python3, python),
> there needs to be 3 levels of installation rather than the traditional 2.
>
> For example, add a new target "py3install" and make that the default for
> 3.1:
>
> 3.x altinstall:
>   - installs python3.x executable
>
> 3.x py3install (default for "make install"):
>   - installs python3.x executable
>   - adjusts (or creates) python3 symlink to new executable
>
> 3.x fullinstall:
>   - installs python3.x executable
>   - adjusts (or creates) python3 symlink to new executable
>   - adjusts (or creates) python symlink to new executable
>   

Adjusting the python2 installs to do something similar with symlinks to 
python2 would also be useful when python3 becomes the standard python 
and python2 is used for legacy.

I.e.

2.x altinstall:
 - installs python2.x executable

2.x py2install (default for "make install"):
 - installs python2.x executable
 - adjusts (or creates) python2 symlink to new executable

2.x fullinstall (default for "make install"):
 - installs python2.x executable
 - adjusts (or creates) python2 symlink to new executable
 - adjusts (or creates) python symlink to new executable

Allan

From allan at archlinux.org  Sun Apr 19 07:31:27 2009
From: allan at archlinux.org (Allan McRae)
Date: Sun, 19 Apr 2009 15:31:27 +1000
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <BAY102-DAV1315D64859AC41DFD16F2B86790@phx.gbl>
References: <p06240800c60fe2a6a405@10.0.1.221>		<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>		<49EA5FE4.9040102@gmail.com>		<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>		<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>		<49EAAAA2.4040800@gmail.com>	<d11dcfba0904182145y38d823e4yfe3b02718130678a@mail.gmail.com>	<49EAB0C2.8040506@gmail.com>
	<BAY102-DAV1315D64859AC41DFD16F2B86790@phx.gbl>
Message-ID: <BAY102-DAV430DC0BEBA4BC204E3E2986790@phx.gbl>

Allan McRae wrote:
> Nick Coghlan wrote:
>> Steven Bethard wrote:
>>  
>>> On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan <ncoghlan at gmail.com> 
>>> wrote:
>>>    
>>>> Note that such an approach would then require an altaltinstall command
>>>> in order to be able to install a specific version of python 3.x 
>>>> without
>>>> changing the python3 alias (e.g. installing 3.2 without overriding 
>>>> 3.1).
>>>>       
>>> I wasn't suggesting that there shouldn't be a "python3.1",
>>> "python3.2", etc. I'm more concerned about "fullinstall" creating
>>> "python3" instead of regular "python".
>>>     
>>
>> If I understand Tony's summary correctly, the situation after Benjamin's
>> latest checkin is as follows:
>>
>> 2.x altinstall:
>>   - installs python2.x executable
>>
>> 2.x fullinstall (default for "make install"):
>>   - installs python2.x executable
>>   - adjusts (or creates) python symlink to new executable
>>
>> 3.x altinstall (default for "make install"):
>>   - installs python3.x executable
>>   - adjusts (or creates) python3 symlink to new executable
>>
>> 3.x fullinstall:
>>   - installs python3.x executable
>>   - adjusts (or creates) python3 symlink to new executable
>>   - adjusts (or creates) python symlink to new executable
>>
>> With that setup, I'm sure we're going to get people complaining that
>> 'altinstall' of 3.2 broke their python3 symlink from 3.1. If there are
>> going to be 3 levels of executable naming (python3.x, python3, python),
>> there needs to be 3 levels of installation rather than the 
>> traditional 2.
>>
>> For example, add a new target "py3install" and make that the default for
>> 3.1:
>>
>> 3.x altinstall:
>>   - installs python3.x executable
>>
>> 3.x py3install (default for "make install"):
>>   - installs python3.x executable
>>   - adjusts (or creates) python3 symlink to new executable
>>
>> 3.x fullinstall:
>>   - installs python3.x executable
>>   - adjusts (or creates) python3 symlink to new executable
>>   - adjusts (or creates) python symlink to new executable
>>   
>
>
> Adjusting the python2 installs to do something similar with symlinks 
> to python2 would also be useful when python3 becomes the standard 
> python and python2 is used for legacy.
>
> I.e.
>
> 2.x altinstall:
> - installs python2.x executable
>
> 2.x py2install (default for "make install"):
And of course that was supposed to say "future default"...
> - installs python2.x executable
> - adjusts (or creates) python2 symlink to new executable
>
>
> 2.x fullinstall (default for "make install"):
> - installs python2.x executable
> - adjusts (or creates) python2 symlink to new executable
> - adjusts (or creates) python symlink to new executable

From regebro at gmail.com  Sun Apr 19 08:16:57 2009
From: regebro at gmail.com (Lennart Regebro)
Date: Sun, 19 Apr 2009 08:16:57 +0200
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
Message-ID: <319e029f0904182316j11d9a198u205d4fe31b8fff1c@mail.gmail.com>

On Sun, Apr 19, 2009 at 05:51, Steven Bethard <steven.bethard at gmail.com> wrote:
> That's an unfortunate decision. When the 2.X line stops being
> maintained (after 2.7 maybe?) we're going to be stuck with the "3"
> suffix forever for the "real" Python.

Yes, but that's the only decision that really works.

> Why doesn't it make more sense to just use "python3" only for
> "altinstall" and "python" for "fullinstall"?

Because you will then get Python 3 trying to run all shebangs that
should be run with python 2. Making Python 3 default doesn't make it
compatible. ;-) And yes, that means we are stuck with it forever, and
I don't like that either, but nobody could come up with an
alternative.

The recommendation to use python3 could change back to use python once
2.7 falls out of support, which is gonna be many years still. And
until then we kinda need different shebang lines. Not much you can do
to get around that.

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64

From greg.ewing at canterbury.ac.nz  Sun Apr 19 08:52:28 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 19 Apr 2009 18:52:28 +1200
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
Message-ID: <49EACA2C.6060606@canterbury.ac.nz>

Steven Bethard wrote:

> That's an unfortunate decision. When the 2.X line stops being
> maintained (after 2.7 maybe?) we're going to be stuck with the "3"
> suffix forever for the "real" Python.

I don't see why we have to be stuck with it forever.
When 2.x has faded into the sunset, we can start
aliasing 'python' to 'python3' if we want, can't we?

-- 
Greg

From greg.ewing at canterbury.ac.nz  Sun Apr 19 08:54:37 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 19 Apr 2009 18:54:37 +1200
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <49EAAAA2.4040800@gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
	<49EAAAA2.4040800@gmail.com>
Message-ID: <49EACAAD.4030401@canterbury.ac.nz>

Nick Coghlan wrote:

> Note that such an approach would then require an altaltinstall command
> in order to be able to install a specific version of python 3.x without
> changing the python3 alias (e.g. installing 3.2 without overriding 3.1).

Seems like what we need is something in between altinstall
and fullinstall that aliases 'python3' but not 'python',
and make that the default. Maybe call it 'install3'.

-- 
Greg

From nad at acm.org  Sun Apr 19 09:26:13 2009
From: nad at acm.org (Ned Deily)
Date: Sun, 19 Apr 2009 00:26:13 -0700
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
	<49EAAAA2.4040800@gmail.com>
	<d11dcfba0904182145y38d823e4yfe3b02718130678a@mail.gmail.com>
	<49EAB0C2.8040506@gmail.com>
Message-ID: <nad-7E8D6C.00261219042009@ger.gmane.org>

In article <49EAB0C2.8040506 at gmail.com>,
 Nick Coghlan <ncoghlan at gmail.com> wrote:
> Steven Bethard wrote:
> > On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> >> Note that such an approach would then require an altaltinstall command
> >> in order to be able to install a specific version of python 3.x without
> >> changing the python3 alias (e.g. installing 3.2 without overriding 3.1).
> > 
> > I wasn't suggesting that there shouldn't be a "python3.1",
> > "python3.2", etc. I'm more concerned about "fullinstall" creating
> > "python3" instead of regular "python".
> 
> If I understand Tony's summary correctly, the situation after Benjamin's
> latest checkin is as follows:
> 
> 2.x altinstall:
>   - installs python2.x executable
> 
> 2.x fullinstall (default for "make install"):
>   - installs python2.x executable
>   - adjusts (or creates) python symlink to new executable
> 
> 3.x altinstall (default for "make install"):
>   - installs python3.x executable
>   - adjusts (or creates) python3 symlink to new executable
> 
> 3.x fullinstall:
>   - installs python3.x executable
>   - adjusts (or creates) python3 symlink to new executable
>   - adjusts (or creates) python symlink to new executable

Note that versioning is also an unresolved issue for the scripts 
installed by setup.py; pydoc, idle, 2to3, and smtpd.py.   See:

http://bugs.python.org/issue5756

Whatever is implemented for python itself should likely apply to them as 
well.

-- 
 Ned Deily,
 nad at acm.org

From mrts.pydev at gmail.com  Sun Apr 19 10:38:05 2009
From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=)
Date: Sun, 19 Apr 2009 11:38:05 +0300
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <49EA5D01.6040208@gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
Message-ID: <ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>

On Sun, Apr 19, 2009 at 2:06 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> That said, I'm starting to wonder if an even better option may be to
> just drop the kwargs support from the function and require people to
> always supply a parameters dictionary. That would simplify the signature
> to the quite straightforward:
>
> ?def add_query_params(url, params, allow_dups=True, sep='&')

That's the most straightforward and I like this more than the one below.

> I agree that isn't a good option, but mapping True/False/None to those
> specific behaviours also seems rather arbitrary (specifically, it is
> difficult to remember which of "allow_dups=False" and "allow_dups=None"
> means to ignore any duplicate keys and which means to ignore only
> duplicate items).

I'd say it's less of a problem when using named arguments, i.e. you read it as:

allow_dups=True : yes
allow_dups=False : effeminately no :),
allow_dups=None : strictly no

which more or less corresponds to the behaviour.

> It also doesn't provide a clear mechanism for
> extension (e.g. what if someone wanted duplicate params to trigger an
> exception?)
>
> Perhaps the extra argument should just be a key/value pair filtering
> function, and we provide functions for the three existing behaviours
> (i.e. allow_duplicates(), ignore_duplicate_keys(),
> ignore_duplicate_items()) in the urllib.parse module.

This would be the most flexible and conceptually right (ye olde
strategy pattern), but would clutter the API.

> Note that your implementation and docstring currently conflict with each
> other - the docstring says "pass them via a dictionary in second
> argument:" but the dictionary is currently the third argument (the
> docstring also later refers to passing OrderedDictionary as the second
> argument).

It's a mistake that exemplifies once again that positional args are awkward :).

---

So, gentlemen, either

def add_query_params(url, params, allow_dups=True, sep='&')

or

def allow_duplicates(...)

def remove_duplicate_values(...)

...

def add_query_params(url, params, strategy=allow_duplicates, sep='&')

From stephen at xemacs.org  Sun Apr 19 11:17:20 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 19 Apr 2009 18:17:20 +0900
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <49EA5FE4.9040102@gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
Message-ID: <87ocut2ean.fsf@xemacs.org>

Nick Coghlan writes:

 > 3. Change the shebang lines in Python standard library scripts to be
 > version specific and update release.py to fix them all when bumping the
 > version number in the source tree.

+1

I think that it's probably best to leave "python", "python2", and
"python3" for the use of downstream distributors.  ISTR that was what
Guido concluded, in the discuss that led to Python 3 defaulting to
altinstall---it wasn't just convenient because Python 3 is a major
change, but that experience has shown that deciding which Python is
going to be "The python" on somebody's system just isn't a decision
that Python should make.

Sure, the difference between Python 2 and Python 3 is big enough to be
a hairy nuisance 95% of the time, while the difference between Python
2.5 and Python 2.6 is so only 5% of the time.  But the fact is that
incompatibilities arise with a minor version bump, too, and all the
major distros that I know about have some way to select the default
Python version that will be "python".  That's not because they want to
distinguish between Python 2 and Python 3, nor between Python 2 and
Python 1.

From martin at v.loewis.de  Sun Apr 19 12:18:13 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Apr 2009 12:18:13 +0200
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <87ocut2ean.fsf@xemacs.org>
References: <p06240800c60fe2a6a405@10.0.1.221>	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>	<49EA5FE4.9040102@gmail.com>
	<87ocut2ean.fsf@xemacs.org>
Message-ID: <49EAFA65.2090009@v.loewis.de>

> I think that it's probably best to leave "python", "python2", and
> "python3" for the use of downstream distributors.  ISTR that was what
> Guido concluded, in the discuss that led to Python 3 defaulting to
> altinstall---it wasn't just convenient because Python 3 is a major
> change, but that experience has shown that deciding which Python is
> going to be "The python" on somebody's system just isn't a decision
> that Python should make.

Yes. However, at the language summit in Chicago, it was agreed that
the installation should also provide a python3 symlink.

I don't recall the agreement wrt. to the names of executables on
Windows.

Regards,
Martin

From p.f.moore at gmail.com  Sun Apr 19 15:04:27 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 19 Apr 2009 14:04:27 +0100
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
Message-ID: <79990c6b0904190604s7ee2b6e1j7af35010b28ebb67@mail.gmail.com>

2009/4/19 Steven Bethard <steven.bethard at gmail.com>:
> On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson <benjamin at python.org> wrote:
>> 2009/4/18 Nick Coghlan <ncoghlan at gmail.com>:
>>> I see a few options:
>>> 1. Abandon the "python" name for the 3.x series and commit to calling it
>>> "python3" now and forever (i.e. actually make the decision that Mitchell
>>> refers to).
>>
>> I believe this was decided on sometime (the sprints?).
>
> That's an unfortunate decision. When the 2.X line stops being
> maintained (after 2.7 maybe?) we're going to be stuck with the "3"
> suffix forever for the "real" Python.
>
> Why doesn't it make more sense to just use "python3" only for
> "altinstall" and "python" for "fullinstall"?

Agreed. Personally, I'm -0 on this decision. I'd be -1 if I was a
Linux user, or if I thought that it would be applied to Windows as
well. As it is, my -0 is based on "it doesn't affect me, but it seems
wrong to have the official name be different things depending on
platform".

Paul.

From ncoghlan at gmail.com  Sun Apr 19 15:52:39 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 19 Apr 2009 23:52:39 +1000
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <49EAFA65.2090009@v.loewis.de>
References: <p06240800c60fe2a6a405@10.0.1.221>	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>	<49EA5FE4.9040102@gmail.com>
	<87ocut2ean.fsf@xemacs.org> <49EAFA65.2090009@v.loewis.de>
Message-ID: <49EB2CA7.7000803@gmail.com>

Martin v. L?wis wrote:
>> I think that it's probably best to leave "python", "python2", and
>> "python3" for the use of downstream distributors.  ISTR that was what
>> Guido concluded, in the discuss that led to Python 3 defaulting to
>> altinstall---it wasn't just convenient because Python 3 is a major
>> change, but that experience has shown that deciding which Python is
>> going to be "The python" on somebody's system just isn't a decision
>> that Python should make.
> 
> Yes. However, at the language summit in Chicago, it was agreed that
> the installation should also provide a python3 symlink.
> 
> I don't recall the agreement wrt. to the names of executables on
> Windows.

The installer still leaves PATH alone by default, doesn't it? That means
the Windows version selection is done by naming the directory.

Although I guess choosing a file association for .py files becomes
rather more interesting...

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From tseaver at palladion.com  Sun Apr 19 16:41:59 2009
From: tseaver at palladion.com (Tres Seaver)
Date: Sun, 19 Apr 2009 10:41:59 -0400
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>	<49EA5FE4.9040102@gmail.com>
	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>
Message-ID: <gsfd8j$2kq$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Benjamin Peterson wrote:
> 2009/4/18 Nick Coghlan <ncoghlan at gmail.com>:
>> Benjamin Peterson wrote:
>>> 2009/4/18 Mitchell L Model <MLMLists at comcast.net>:
>>>> Some library files, such as pdb.py, begin with
>>>>        #!/usr/bin/env python
>>>> In various discussions regarding some issues I submitted I was told that the
>>>> decision had been made to call Python 3.x release executables python3. (One
>>>> of the conflicts I ran into when I made 'python' a link to python3.1 was
>>>> that some tools used in making the HTML documentation haven't been upgraded
>>>> to run with 3.)
>>>>
>>>> Shouldn't all library files that begin with the above line be changed so
>>>> that they read 'python3' instead of python? Perhaps I should have just filed
>>>> this as an issue, but I'm not confident of the state of the plan to move to
>>>> python3 as the official executable name.
>>> That sounds correct. Please file a bug report.
>> As Kevin pointed out, while this is a problem, changing the affected
>> scripts to say "python3" instead isn't the right answer.
>>
>> All that happened with the Python 3 installers is that they do
>> 'altinstall' rather than 'fullinstall' by default, thus leaving the
>> 'python' alias alone. There is no "python3" alias unless a user creates
>> it for themselves (or a distro packager does it for them).
> 
> I've actually implemented a python3 alias for 3.1.
> 
>> I see a few options:
>> 1. Abandon the "python" name for the 3.x series and commit to calling it
>> "python3" now and forever (i.e. actually make the decision that Mitchell
>> refers to).
> 
> I believe this was decided on sometime (the sprints?).

It was at the Language Summit.

Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ6zg3+gerLs4ltQ4RAt2ZAKDRGXMXBRs5FiHLnC0MQt56janafwCdGytm
/nrHCiifI/KibI+ljppr3aA=
=uYha
-----END PGP SIGNATURE-----

From steven.bethard at gmail.com  Sun Apr 19 17:54:12 2009
From: steven.bethard at gmail.com (Steven Bethard)
Date: Sun, 19 Apr 2009 08:54:12 -0700
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
Message-ID: <d11dcfba0904190854l11c4c167g60f5ef2a70af27b@mail.gmail.com>

On Sun, Apr 19, 2009 at 1:38 AM, Mart S?mermaa <mrts.pydev at gmail.com> wrote:
> On Sun, Apr 19, 2009 at 2:06 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> That said, I'm starting to wonder if an even better option may be to
>> just drop the kwargs support from the function and require people to
>> always supply a parameters dictionary. That would simplify the signature
>> to the quite straightforward:
>>
>> ?def add_query_params(url, params, allow_dups=True, sep='&')
>
> That's the most straightforward and I like this more than the one below.
>
>> I agree that isn't a good option, but mapping True/False/None to those
>> specific behaviours also seems rather arbitrary (specifically, it is
>> difficult to remember which of "allow_dups=False" and "allow_dups=None"
>> means to ignore any duplicate keys and which means to ignore only
>> duplicate items).
>
> I'd say it's less of a problem when using named arguments, i.e. you read it as:
>
> allow_dups=True : yes
> allow_dups=False : effeminately no :),
> allow_dups=None : strictly no
>
> which more or less corresponds to the behaviour.
>
>> It also doesn't provide a clear mechanism for
>> extension (e.g. what if someone wanted duplicate params to trigger an
>> exception?)
>>
>> Perhaps the extra argument should just be a key/value pair filtering
>> function, and we provide functions for the three existing behaviours
>> (i.e. allow_duplicates(), ignore_duplicate_keys(),
>> ignore_duplicate_items()) in the urllib.parse module.
>
> This would be the most flexible and conceptually right (ye olde
> strategy pattern), but would clutter the API.
>
>> Note that your implementation and docstring currently conflict with each
>> other - the docstring says "pass them via a dictionary in second
>> argument:" but the dictionary is currently the third argument (the
>> docstring also later refers to passing OrderedDictionary as the second
>> argument).
>
> It's a mistake that exemplifies once again that positional args are awkward :).
>
> ---
>
> So, gentlemen, either
>
> def add_query_params(url, params, allow_dups=True, sep='&')
>
> or
>
> def allow_duplicates(...)
>
> def remove_duplicate_values(...)
>
> ...
>
> def add_query_params(url, params, strategy=allow_duplicates, sep='&')

+1 for the strategy approach.

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From martin at v.loewis.de  Sun Apr 19 20:51:47 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Apr 2009 20:51:47 +0200
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <49EB2CA7.7000803@gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>	<49EA5FE4.9040102@gmail.com>
	<87ocut2ean.fsf@xemacs.org> <49EAFA65.2090009@v.loewis.de>
	<49EB2CA7.7000803@gmail.com>
Message-ID: <49EB72C3.5080307@v.loewis.de>

> The installer still leaves PATH alone by default, doesn't it? 

Correct. However, people frequently set the path "by hand", so
they would probably appreciate a python3 binary (and pythonw3?
python3w?). Of course, those people could also manually
copy/rename the executable.

> Although I guess choosing a file association for .py files becomes
> rather more interesting...

Indeed. We could register a py3 extension (and py3w? pyw3?),
but then .py might remain associated with python3, even though
people want it associated with python 2.

Regards,
Martin

From janssen at parc.com  Sun Apr 19 21:26:59 2009
From: janssen at parc.com (Bill Janssen)
Date: Sun, 19 Apr 2009 12:26:59 PDT
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
Message-ID: <92117.1240169219@parc.com>

Mart S?mermaa <mrts.pydev at gmail.com> wrote:

> On Sun, Apr 19, 2009 at 2:06 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > That said, I'm starting to wonder if an even better option may be to
> > just drop the kwargs support from the function and require people to
> > always supply a parameters dictionary. That would simplify the signature
> > to the quite straightforward:
> >
> > ?def add_query_params(url, params, allow_dups=True, sep='&')

Or even better, stop trying to use a mapping, and just make the "params"
value a list of (name, value) pairs.  That way you can stop fiddling
around with "allow_dups" and just get rid of it.

Bill

From fuzzyman at voidspace.org.uk  Sun Apr 19 21:30:05 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Sun, 19 Apr 2009 20:30:05 +0100
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse
 in	3.1 (and urlparse in 2.7)
In-Reply-To: <92117.1240169219@parc.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>	<loom.20090412T215625-611@post.gmane.org>	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>	<49E9CA6A.6060004@gmail.com>	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>	<49EA5D01.6040208@gmail.com>	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
Message-ID: <49EB7BBD.8010806@voidspace.org.uk>

Bill Janssen wrote:
> Mart S?mermaa <mrts.pydev at gmail.com> wrote:
>
>   
>> On Sun, Apr 19, 2009 at 2:06 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>     
>>> That said, I'm starting to wonder if an even better option may be to
>>> just drop the kwargs support from the function and require people to
>>> always supply a parameters dictionary. That would simplify the signature
>>> to the quite straightforward:
>>>
>>>  def add_query_params(url, params, allow_dups=True, sep='&')
>>>       
>
> Or even better, stop trying to use a mapping, and just make the "params"
> value a list of (name, value) pairs.  That way you can stop fiddling
> around with "allow_dups" and just get rid of it.
>   

Reluctant +1, it seems the best solution. You can always use {}.items() 
if you still want to store the params in a mapping.

Michael
> Bill
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From solipsis at pitrou.net  Sun Apr 19 21:35:56 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 19 Apr 2009 19:35:56 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
Message-ID: <loom.20090419T193428-106@post.gmane.org>

Bill Janssen <janssen <at> parc.com> writes:
> 
> Or even better, stop trying to use a mapping, and just make the "params"
> value a list of (name, value) pairs.

You can even accept both a list of (name, value) pairs /and/ some **kwargs, like
the dict constructor does. It would be a pity to drop the user-friendliness of
kwargs just to satisfy some rare and obscure requirement.

Regards

Antoine.

From janssen at parc.com  Sun Apr 19 22:21:05 2009
From: janssen at parc.com (Bill Janssen)
Date: Sun, 19 Apr 2009 13:21:05 PDT
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
In-Reply-To: <loom.20090419T193428-106@post.gmane.org>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
	<loom.20090419T193428-106@post.gmane.org>
Message-ID: <93230.1240172465@parc.com>

Antoine Pitrou <solipsis at pitrou.net> wrote:

> Bill Janssen <janssen <at> parc.com> writes:
> > 
> > Or even better, stop trying to use a mapping, and just make the "params"
> > value a list of (name, value) pairs.
> 
> You can even accept both a list of (name, value) pairs /and/ some **kwargs, like
> the dict constructor does. It would be a pity to drop the user-friendliness of
> kwargs just to satisfy some rare and obscure requirement.

This whole discussion seems a bit "rare and obscure" to me.  I've built
URLs for years without this method, and never felt the lack.  What bugs me
is the lack of a way to build multipart-formdata payloads, the only standard
way to send non-Latin1 strings as part of a request.

I'd like to suggest we move this off python-dev, and to either the
Web-SIG or stdlib-sig mailing lists, which are probably more interested
in all of this.

Bill

From solipsis at pitrou.net  Sun Apr 19 22:24:44 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 19 Apr 2009 20:24:44 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
	<loom.20090419T193428-106@post.gmane.org>
	<93230.1240172465@parc.com>
Message-ID: <loom.20090419T202311-575@post.gmane.org>

Bill Janssen <janssen <at> parc.com> writes:
> 
> This whole discussion seems a bit "rare and obscure" to me.  I've built
> URLs for years without this method, and never felt the lack.  What bugs me
> is the lack of a way to build multipart-formdata payloads, the only standard
> way to send non-Latin1 strings as part of a request.

?? What's the problem with sending non-Latin1 data without multipart-formdata?

From janssen at parc.com  Sun Apr 19 22:59:44 2009
From: janssen at parc.com (Bill Janssen)
Date: Sun, 19 Apr 2009 13:59:44 PDT
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
In-Reply-To: <loom.20090419T202311-575@post.gmane.org>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
	<loom.20090419T193428-106@post.gmane.org>
	<93230.1240172465@parc.com>
	<loom.20090419T202311-575@post.gmane.org>
Message-ID: <93939.1240174784@parc.com>

Antoine Pitrou <solipsis at pitrou.net> wrote:

> Bill Janssen <janssen <at> parc.com> writes:
> > 
> > This whole discussion seems a bit "rare and obscure" to me.  I've built
> > URLs for years without this method, and never felt the lack.  What bugs me
> > is the lack of a way to build multipart-formdata payloads, the only standard
> > way to send non-Latin1 strings as part of a request.
> 
> ?? What's the problem with sending non-Latin1 data without multipart-formdata?

I should have said, as values for a FORM submission.  There are two ways
to encode form values for a FORM submission,
application/x-www-form-urlencoded, and multipart/form-data.  As per
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4:

``The content type "application/x-www-form-urlencoded" is inefficient
for sending large quantities of binary data or text containing non-ASCII
characters. The content type "multipart/form-data" should be used for
submitting forms that contain files, non-ASCII data, and binary data.''

And we don't support this in the http client-side standard library code.
(Do we?  Haven't looked lately.)

The same section also says:

``Space characters are replaced by `+', and then reserved characters are
escaped as described in [RFC1738], section 2.2: Non-alphanumeric
characters are replaced by `%HH', a percent sign and two hexadecimal
digits representing the ASCII code of the character. Line breaks are
represented as "CR LF" pairs (i.e., `%0D%0A').''

That "the ASCII code of the character" seemingly restricts it to ASCII...

But this is complicated by the fact that most browsers try to use the
character set the server will understand, and the widely used technique
to accomplish this is to use the same charset the page the FORM occurs
in uses.  Unless this is set explicitly, it defaults to Latin-1.

I prefer to avoid all this uncertainty, and use a well-defined format
when submitting a form, so I tend to use multipart/form-data, which
allows explicit control over this.

Bill

From solipsis at pitrou.net  Sun Apr 19 23:45:04 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 19 Apr 2009 21:45:04 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
	<loom.20090419T193428-106@post.gmane.org>
	<93230.1240172465@parc.com>
	<loom.20090419T202311-575@post.gmane.org>
	<93939.1240174784@parc.com>
Message-ID: <loom.20090419T213922-775@post.gmane.org>

Bill Janssen <janssen <at> parc.com> writes:
> 
> ``The content type "application/x-www-form-urlencoded" is inefficient
> for sending large quantities of binary data or text containing non-ASCII
> characters.

The fact that it's "inefficient" (i.e. takes more bytes than an optimal encoding
scheme would) doesn't mean that it doesn't work.

There are millions of Web sites out there which allow you to submit non-ASCII
data without resorting to "multipart/form-data" encoding. The situations where
the submitted text is huge enough that encoding efficiency matters are probably
insanely rare.

> But this is complicated by the fact that most browsers try to use the
> character set the server will understand, and the widely used technique
> to accomplish this is to use the same charset the page the FORM occurs
> in uses.  Unless this is set explicitly, it defaults to Latin-1.

Look out there, many Web pages specify a different character set than
Latin-1... UTF8 is quite a common choice in the modern world.

Also, browsers will encode those characters that cannot be encoded in the
character set using HTML escapes ("&1234;"). This means you can enter any
unicode text into any form, regardless of the encoding of the source page. It's
up to the Web application to decode the text, sure, but any decent Web framework
or toolkit should do it for you.

Regards

Antoine.

From steve at pearwood.info  Mon Apr 20 01:03:28 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Mon, 20 Apr 2009 09:03:28 +1000
Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in
	3.1 (and urlparse in 2.7)
In-Reply-To: <92117.1240169219@parc.com>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
Message-ID: <200904200903.29083.steve@pearwood.info>

On Mon, 20 Apr 2009 05:26:59 am Bill Janssen wrote:
> Mart S?mermaa <mrts.pydev at gmail.com> wrote:
> > On Sun, Apr 19, 2009 at 2:06 AM, Nick Coghlan <ncoghlan at gmail.com> 
wrote:
> > > That said, I'm starting to wonder if an even better option may be
> > > to just drop the kwargs support from the function and require
> > > people to always supply a parameters dictionary. That would
> > > simplify the signature to the quite straightforward:
> > >
> > > ?def add_query_params(url, params, allow_dups=True, sep='&')
>
> Or even better, stop trying to use a mapping, and just make the
> "params" value a list of (name, value) pairs.  That way you can stop
> fiddling around with "allow_dups" and just get rid of it.

Surely it should support any mapping? That's what I do in my own code. 
People will use regular dicts for convenience when they don't care 
about order or duplicates, and (name,value) pairs, or an OrderedDict, 
when they do.

I suppose you could force people to write params.items() if params is a 
dict, but it seems wrong to force an order on input data when it 
doesn't require one.

-- 
Steven D'Aprano

From janssen at parc.com  Mon Apr 20 05:41:23 2009
From: janssen at parc.com (Bill Janssen)
Date: Sun, 19 Apr 2009 20:41:23 PDT
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
In-Reply-To: <loom.20090419T213922-775@post.gmane.org>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
	<loom.20090419T193428-106@post.gmane.org>
	<93230.1240172465@parc.com>
	<loom.20090419T202311-575@post.gmane.org>
	<93939.1240174784@parc.com>
	<loom.20090419T213922-775@post.gmane.org>
Message-ID: <98461.1240198883@parc.com>

Antoine Pitrou <solipsis at pitrou.net> wrote:

> Bill Janssen <janssen <at> parc.com> writes:
> > 
> > ``The content type "application/x-www-form-urlencoded" is inefficient
> > for sending large quantities of binary data or text containing non-ASCII
> > characters.
> 
> The fact that it's "inefficient" (i.e. takes more bytes than an optimal encoding
> scheme would) doesn't mean that it doesn't work.

Absolutely.  I'm just quoting the spec to you.  In any case, being able to send
multipart/form-data would be a nice thing to have, if only for file uploads.

> Look out there, many Web pages specify a different character set than
> Latin-1... UTF8 is quite a common choice in the modern world.

Sure.  But nowhere does a spec say that this page charset should be used
in sending the values of a FORM using application/x-www-form-urlencoded
in a new HTTP request.  It's just a convention some browsers use.

> Also, browsers will encode those characters that cannot be encoded in the
> character set using HTML escapes ("&1234;"). This means you can enter any

Sure, some browsers will.  Others will apparently replace them with
question marks.  It's undefined.

> unicode text into any form, regardless of the encoding of the source page. It's
> up to the Web application to decode the text, sure, but any decent Web framework
> or toolkit should do it for you.

Bill

From christian.doll at basf.com  Mon Apr 20 08:54:15 2009
From: christian.doll at basf.com (christian.doll at basf.com)
Date: Mon, 20 Apr 2009 08:54:15 +0200
Subject: [Python-Dev] Something like PEP-0304 - suppress *.pyc generation
Message-ID: <OF5E1FBD3D.2C690AD2-ONC125759E.00255F39-C125759E.0025EED7@basf-c-s.be>

Hello,

im looking for something like PEP-0304 (
http://www.python.org/dev/peps/pep-0304/)

I need something to suppress the generation of *.pyc files
because i have very much different machines which call a python program at 
same time.

the python program crashes at different places and on different machines - 
i think the problem are the *.pyc files of different machines which are 
generated at the same time.

is pep-0304 implemented in a newer python version ( we use 2.4.4 ) or is 
there a work around or can someone implement pep-0304?

thank you for your help!

Viele Gr??e
Christian Doll
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090420/4a0b163f/attachment.htm>

From solipsis at pitrou.net  Mon Apr 20 11:44:54 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 20 Apr 2009 09:44:54 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
	<loom.20090419T193428-106@post.gmane.org>
	<93230.1240172465@parc.com>
	<loom.20090419T202311-575@post.gmane.org>
	<93939.1240174784@parc.com>
	<loom.20090419T213922-775@post.gmane.org>
	<98461.1240198883@parc.com>
Message-ID: <loom.20090420T094356-200@post.gmane.org>

Bill Janssen <janssen <at> parc.com> writes:
> 
> Sure.  But nowhere does a spec say that this page charset should be used
> in sending the values of a FORM using application/x-www-form-urlencoded
> in a new HTTP request.  It's just a convention some browsers use.

Let's call it a de facto standard then. A behaviour doesn't have to be engraved
in an RFC to be considered standard.

Regards

Antoine.

From steve at pearwood.info  Mon Apr 20 11:25:24 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Mon, 20 Apr 2009 19:25:24 +1000
Subject: [Python-Dev] Something like PEP-0304 - suppress *.pyc generation
In-Reply-To: <OF5E1FBD3D.2C690AD2-ONC125759E.00255F39-C125759E.0025EED7@basf-c-s.be>
References: <OF5E1FBD3D.2C690AD2-ONC125759E.00255F39-C125759E.0025EED7@basf-c-s.be>
Message-ID: <200904201925.24851.steve@pearwood.info>

On Mon, 20 Apr 2009 04:54:15 pm christian.doll at basf.com wrote:

> I need something to suppress the generation of *.pyc files
> because i have very much different machines which call a python
> program at same time.

This list is for development *of* Python, not development *with* 
Python. You would probably be better off on comp.lang.python or 
python-list at python.org.

However, I believe that the normal way to prevent the generation 
of .pyc files is to remove write access to the directory where 
the .py files are.

-- 
Steven D'Aprano 

From ismail at namtrac.org  Mon Apr 20 11:54:15 2009
From: ismail at namtrac.org (=?UTF-8?B?xLBzbWFpbCBEw7ZubWV6?=)
Date: Mon, 20 Apr 2009 12:54:15 +0300
Subject: [Python-Dev] Something like PEP-0304 - suppress *.pyc generation
In-Reply-To: <200904201925.24851.steve@pearwood.info>
References: <OF5E1FBD3D.2C690AD2-ONC125759E.00255F39-C125759E.0025EED7@basf-c-s.be>
	<200904201925.24851.steve@pearwood.info>
Message-ID: <19e566510904200254j9a4f3acx695ec152e65af7cc@mail.gmail.com>

On Mon, Apr 20, 2009 at 12:25 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Mon, 20 Apr 2009 04:54:15 pm christian.doll at basf.com wrote:
>
>> I need something to suppress the generation of *.pyc files
>> because i have very much different machines which call a python
>> program at same time.
>
> This list is for development *of* Python, not development *with*
> Python. You would probably be better off on comp.lang.python or
> python-list at python.org.
>
> However, I believe that the normal way to prevent the generation
> of .pyc files is to remove write access to the directory where
> the .py files are.

Checkout http://docs.python.org/using/cmdline.html#envvar-PYTHONDONTWRITEBYTECODE

Regards.

-- 
?smail D?NMEZ

From a.badger at gmail.com  Mon Apr 20 16:46:06 2009
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Mon, 20 Apr 2009 07:46:06 -0700
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <49EACA2C.6060606@canterbury.ac.nz>
References: <p06240800c60fe2a6a405@10.0.1.221>	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>	<49EA5FE4.9040102@gmail.com>	<1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com>	<d11dcfba0904182051y1e0cfdafiee234659739ca04d@mail.gmail.com>
	<49EACA2C.6060606@canterbury.ac.nz>
Message-ID: <49EC8AAE.2050506@gmail.com>

Greg Ewing wrote:
> Steven Bethard wrote:
> 
>> That's an unfortunate decision. When the 2.X line stops being
>> maintained (after 2.7 maybe?) we're going to be stuck with the "3"
>> suffix forever for the "real" Python.
> 
> I don't see why we have to be stuck with it forever.
> When 2.x has faded into the sunset, we can start
> aliasing 'python' to 'python3' if we want, can't we?
> 
You could, but it's not my favorite idea.  Gets people used to the idea
of python == python2 and python3 == python3 as something they can count
on.  Then says, "Oops, that was just an implementation detail, we're
changing that now".  Much better to either make a clean break and call
the new language dialect python3 from now and forever or force people to
come up with solutions to whether /usr/bin/python == python2 or python3
right now while it's fresh and relevant in their minds.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090420/e0bf90c5/attachment.pgp>

From janssen at parc.com  Mon Apr 20 17:32:52 2009
From: janssen at parc.com (Bill Janssen)
Date: Mon, 20 Apr 2009 08:32:52 PDT
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
In-Reply-To: <loom.20090420T094356-200@post.gmane.org>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
	<loom.20090419T193428-106@post.gmane.org>
	<93230.1240172465@parc.com>
	<loom.20090419T202311-575@post.gmane.org>
	<93939.1240174784@parc.com>
	<loom.20090419T213922-775@post.gmane.org>
	<98461.1240198883@parc.com>
	<loom.20090420T094356-200@post.gmane.org>
Message-ID: <5339.1240241572@parc.com>

Antoine Pitrou <solipsis at pitrou.net> wrote:

> Bill Janssen <janssen <at> parc.com> writes:
> > 
> > Sure.  But nowhere does a spec say that this page charset should be used
> > in sending the values of a FORM using application/x-www-form-urlencoded
> > in a new HTTP request.  It's just a convention some browsers use.
> 
> Let's call it a de facto standard then. A behaviour doesn't have to be engraved
> in an RFC to be considered standard.

Sure.  And if HTTP was all about browsers keying off pages, that would
be fine with me.  But it's not.  HTTP is used in lots of places where
there are no browsers; in fact, the idea we're busy bike-shedding is all
about a client-side library making calls on a server.  It's used in
places where there are no "pages", too, just servers on which clients
are making REST-style calls.  So in the real world, the only way in
which you can reliably post non-ASCII values to a server using HTTP is
with multipart/form-data, which allows you to explicitly say what
character set you are using.  I've debugged this problem too many times
with REST servers of various kinds to think otherwise.

Bill

From solipsis at pitrou.net  Mon Apr 20 17:42:28 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 20 Apr 2009 15:42:28 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
	<loom.20090419T193428-106@post.gmane.org>
	<93230.1240172465@parc.com>
	<loom.20090419T202311-575@post.gmane.org>
	<93939.1240174784@parc.com>
	<loom.20090419T213922-775@post.gmane.org>
	<98461.1240198883@parc.com>
	<loom.20090420T094356-200@post.gmane.org>
	<5339.1240241572@parc.com>
Message-ID: <loom.20090420T153733-605@post.gmane.org>

Bill Janssen <janssen <at> parc.com> writes:
> 
> Sure.  And if HTTP was all about browsers keying off pages, that would
> be fine with me.  But it's not.  HTTP is used in lots of places where
> there are no browsers;

I'm sorry, I don't follow you. The fact that something else than a browser makes
the request shouldn't change the behaviour on the /server/ side.

> It's used in
> places where there are no "pages", too, just servers on which clients
> are making REST-style calls.

So what? The designer of the REST API must mandate an encoding (most probably
UTF-8 rather than Latin-1 as you bizarrely seemed to imply) and the problem is
solved.

Complaining that the RFC doesn't specify all this sounds like an excuse for
programmer laziness.

Antoine.

From janssen at parc.com  Mon Apr 20 18:33:31 2009
From: janssen at parc.com (Bill Janssen)
Date: Mon, 20 Apr 2009 09:33:31 PDT
Subject: [Python-Dev]
	=?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?=
	=?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?=
In-Reply-To: <loom.20090420T153733-605@post.gmane.org>
References: <ad1f81530903260849k7867f8e7k45a558f3cb608dd3@mail.gmail.com>
	<ad1f81530904120340n675c03f3u51c573cd3a6df404@mail.gmail.com>
	<loom.20090412T215625-611@post.gmane.org>
	<ad1f81530904130229x634a6be9vd805a0bb64261ebf@mail.gmail.com>
	<d11dcfba0904131023j5d4a9d86u909180b638886668@mail.gmail.com>
	<ad1f81530904131314y7348b4c5hcc7a9df7d38a2e7f@mail.gmail.com>
	<d11dcfba0904131819k654943b1m5a53a1de018d2740@mail.gmail.com>
	<49E9CA6A.6060004@gmail.com>
	<ad1f81530904181342t53138140y4462d62ac1affd9e@mail.gmail.com>
	<49EA5D01.6040208@gmail.com>
	<ad1f81530904190138k58215745l966fc8817ff82a54@mail.gmail.com>
	<92117.1240169219@parc.com>
	<loom.20090419T193428-106@post.gmane.org>
	<93230.1240172465@parc.com>
	<loom.20090419T202311-575@post.gmane.org>
	<93939.1240174784@parc.com>
	<loom.20090419T213922-775@post.gmane.org>
	<98461.1240198883@parc.com>
	<loom.20090420T094356-200@post.gmane.org>
	<5339.1240241572@parc.com>
	<loom.20090420T153733-605@post.gmane.org>
Message-ID: <6787.1240245211@parc.com>

Antoine Pitrou <solipsis at pitrou.net> wrote:

> Bill Janssen <janssen <at> parc.com> writes:
> > 
> > Sure.  And if HTTP was all about browsers keying off pages, that would
> > be fine with me.  But it's not.  HTTP is used in lots of places where
> > there are no browsers;
> 
> I'm sorry, I don't follow you. The fact that something else than a browser makes
> the request shouldn't change the behaviour on the /server/ side.

I'm talking about the client side, though.

> > It's used in
> > places where there are no "pages", too, just servers on which clients
> > are making REST-style calls.
> 
> So what? The designer of the REST API must mandate an encoding (most probably
> UTF-8 rather than Latin-1 as you bizarrely seemed to imply) and the problem is
> solved.

Sure, if they understand that they have to do it.

> Complaining that the RFC doesn't specify all this sounds like an excuse for
> programmer laziness.

Or incompetence, which I'm afraid is a more likely issue.  Lots of folks
write their own HTTP servers, and don't really understand just *what*
they need to specify.  As a client-side user of one of those servers,
I'm left in the dark.

I think we've beat this to death for python-dev.  Feel free to continue
it on Web-SIG, though, if you wish.

Bill

From jared.grubb at gmail.com  Mon Apr 20 20:22:37 2009
From: jared.grubb at gmail.com (Jared Grubb)
Date: Mon, 20 Apr 2009 11:22:37 -0700
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <87ocut2ean.fsf@xemacs.org>
References: <p06240800c60fe2a6a405@10.0.1.221>
	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>
	<49EA5FE4.9040102@gmail.com> <87ocut2ean.fsf@xemacs.org>
Message-ID: <B30B76BE-7632-4037-94B5-39E6812DD83D@gmail.com>

On 19 Apr 2009, at 02:17, Stephen J. Turnbull wrote:
> Nick Coghlan writes:
>> 3. Change the shebang lines in Python standard library scripts to be
>> version specific and update release.py to fix them all when bumping  
>> the
>> version number in the source tree.
>
> +1
>
> I think that it's probably best to leave "python", "python2", and
> "python3" for the use of downstream distributors.  ISTR that was what
> Guido concluded, in the discuss that led to Python 3 defaulting to
> altinstall---it wasn't just convenient because Python 3 is a major
> change, but that experience has shown that deciding which Python is
> going to be "The python" on somebody's system just isn't a decision
> that Python should make.

Ok, so if I understand, the situation is:
* python points to 2.x version
* python3 points to 3.x version
* need to be able to run certain 3k scripts from cmdline (since we're  
talking about shebangs) using Python3k even though "python" points to  
2.x

So, if I got the situation right, then do these same scripts  
understand that PYTHONPATH and PYTHONHOME and all the others are also  
probably pointing to 2.x code?

Jared

From fuzzyman at voidspace.org.uk  Mon Apr 20 20:24:30 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Mon, 20 Apr 2009 19:24:30 +0100
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
In-Reply-To: <B30B76BE-7632-4037-94B5-39E6812DD83D@gmail.com>
References: <p06240800c60fe2a6a405@10.0.1.221>	<1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com>	<49EA5FE4.9040102@gmail.com>
	<87ocut2ean.fsf@xemacs.org>
	<B30B76BE-7632-4037-94B5-39E6812DD83D@gmail.com>
Message-ID: <49ECBDDE.4040002@voidspace.org.uk>

Jared Grubb wrote:
>
> On 19 Apr 2009, at 02:17, Stephen J. Turnbull wrote:
>> Nick Coghlan writes:
>>> 3. Change the shebang lines in Python standard library scripts to be
>>> version specific and update release.py to fix them all when bumping the
>>> version number in the source tree.
>>
>> +1
>>
>> I think that it's probably best to leave "python", "python2", and
>> "python3" for the use of downstream distributors.  ISTR that was what
>> Guido concluded, in the discuss that led to Python 3 defaulting to
>> altinstall---it wasn't just convenient because Python 3 is a major
>> change, but that experience has shown that deciding which Python is
>> going to be "The python" on somebody's system just isn't a decision
>> that Python should make.
>
> Ok, so if I understand, the situation is:
> * python points to 2.x version
> * python3 points to 3.x version
> * need to be able to run certain 3k scripts from cmdline (since we're 
> talking about shebangs) using Python3k even though "python" points to 2.x
>
> So, if I got the situation right, then do these same scripts 
> understand that PYTHONPATH and PYTHONHOME and all the others are also 
> probably pointing to 2.x code?
IIRC the proposal was to also create PYTHON3PATH and PYTHON3HOME.

Michael

>
> Jared
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk 
>

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From benjamin at python.org  Tue Apr 21 00:06:20 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 20 Apr 2009 17:06:20 -0500
Subject: [Python-Dev] 3.1 beta blockers
Message-ID: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>

The first (and only) beta of 3.1 is scheduled for less than 2 weeks
away, May 2nd, and is creeping onto the horizon. There are currently 6
blockers:

#5692: test_zipfile fails under Windows - This looks like a fairly easy fix.

#5775: marshal.c needs to be checked for out of memory errors - Looks
like Eric has this one.

#5410: msvcrt bytes cleanup - It would be nice to have a Windows
expert examine the patch on this issue for correctness.

#5786: [This isn't applicable to 3.1]

#5783: IDLE cannot find windows chm file - Awaiting a fix to the IDLE
or the doc build system.

-- 
Thanks for your work,
Benjamin

From benjamin at python.org  Tue Apr 21 00:09:40 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 20 Apr 2009 17:09:40 -0500
Subject: [Python-Dev] 3.1 beta blockers
In-Reply-To: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>
References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>
Message-ID: <1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com>

I forgot one:

#4136 - Porting the json changes to py3k - This issue exposed the
brokenness of the json module in py3k. Was any consensus reached about
what the API of json should be? If the beta time rolls around and
nothing has changed on this issue, I think Antoine's patch, which
makes json input and output unicode should be applied.

2009/4/20 Benjamin Peterson <benjamin at python.org>:
> The first (and only) beta of 3.1 is scheduled for less than 2 weeks
> away, May 2nd, and is creeping onto the horizon. There are currently 6
> blockers:

-- 
Regards,
Benjamin

From nad at acm.org  Tue Apr 21 00:37:24 2009
From: nad at acm.org (Ned Deily)
Date: Mon, 20 Apr 2009 15:37:24 -0700
Subject: [Python-Dev] 3.1 beta blockers
References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>
	<1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com>
Message-ID: <nad-099CE0.15372420042009@news.gmane.org>

In article 
<1afaf6160904201509g2f5e784ah34c728732ca9b160 at mail.gmail.com>,
 Benjamin Peterson <benjamin at python.org> wrote:
> I forgot one: [...]

What about #5756 - idle, pydoc, et al removed from 3.1?

-- 
 Ned Deily,
 nad at acm.org

From barry at python.org  Tue Apr 21 00:44:00 2009
From: barry at python.org (Barry Warsaw)
Date: Mon, 20 Apr 2009 18:44:00 -0400
Subject: [Python-Dev] 3.1 beta blockers
In-Reply-To: <nad-099CE0.15372420042009@news.gmane.org>
References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>
	<1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com>
	<nad-099CE0.15372420042009@news.gmane.org>
Message-ID: <40D62762-ABAB-4DE1-9BE2-798E40AE23DD@python.org>

On Apr 20, 2009, at 6:37 PM, Ned Deily wrote:

> In article
> <1afaf6160904201509g2f5e784ah34c728732ca9b160 at mail.gmail.com>,
> Benjamin Peterson <benjamin at python.org> wrote:
>> I forgot one: [...]
>
> What about #5756 - idle, pydoc, et al removed from 3.1?

Were we going to remove this from 2.7 also?  I'm working on splitting  
two of my Tools (pynche and world) off into separate projects and  
can't remember what we decided about that.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090420/18a52e43/attachment.pgp>

From nad at acm.org  Tue Apr 21 00:55:30 2009
From: nad at acm.org (Ned Deily)
Date: Mon, 20 Apr 2009 15:55:30 -0700
Subject: [Python-Dev] 3.1 beta blockers
References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>
	<1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com>
	<nad-099CE0.15372420042009@news.gmane.org>
	<40D62762-ABAB-4DE1-9BE2-798E40AE23DD@python.org>
Message-ID: <nad-C1B6BB.15553020042009@news.gmane.org>

In article <40D62762-ABAB-4DE1-9BE2-798E40AE23DD at python.org>,
 Barry Warsaw <barry at python.org> wrote:
> On Apr 20, 2009, at 6:37 PM, Ned Deily wrote:
> 
> > In article
> > <1afaf6160904201509g2f5e784ah34c728732ca9b160 at mail.gmail.com>,
> > Benjamin Peterson <benjamin at python.org> wrote:
> >> I forgot one: [...]
> >
> > What about #5756 - idle, pydoc, et al removed from 3.1?
> 
> Were we going to remove this from 2.7 also?  I'm working on splitting  
> two of my Tools (pynche and world) off into separate projects and  
> can't remember what we decided about that.

I'm confused.  The point of #5756 was that 3.x builds are broken because 
the installation of idle, pydoc, 2to3, and smtpd.py have been commented 
out in setup.py and thus these scripts are no longer being installed.  
Unless I'm missing something, that's the only way they were being 
installed in any form.  If nothing else, the change breaks the OSX 
installer build.

If they were removed deliberately (and are intended to be removed from 
2.7??), there needs to be some replacement and/or doc changes, no?

-- 
 Ned Deily,
 nad at acm.org

From benjamin at python.org  Tue Apr 21 01:44:26 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 20 Apr 2009 18:44:26 -0500
Subject: [Python-Dev] 3.1 beta blockers
In-Reply-To: <40D62762-ABAB-4DE1-9BE2-798E40AE23DD@python.org>
References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>
	<1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com>
	<nad-099CE0.15372420042009@news.gmane.org>
	<40D62762-ABAB-4DE1-9BE2-798E40AE23DD@python.org>
Message-ID: <1afaf6160904201644s684a0044j837176f0a5e71d93@mail.gmail.com>

2009/4/20 Barry Warsaw <barry at python.org>:
> On Apr 20, 2009, at 6:37 PM, Ned Deily wrote:
>
>> In article
>> <1afaf6160904201509g2f5e784ah34c728732ca9b160 at mail.gmail.com>,
>> Benjamin Peterson <benjamin at python.org> wrote:
>>>
>>> I forgot one: [...]
>>
>> What about #5756 - idle, pydoc, et al removed from 3.1?
>
> Were we going to remove this from 2.7 also? ?I'm working on splitting two of
> my Tools (pynche and world) off into separate projects and can't remember
> what we decided about that.

Those aren't installed as scripts like idle and pydoc, so I believe they can go.

-- 
Regards,
Benjamin

From benjamin at python.org  Tue Apr 21 01:47:25 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 20 Apr 2009 18:47:25 -0500
Subject: [Python-Dev] 3.1 beta blockers
In-Reply-To: <nad-099CE0.15372420042009@news.gmane.org>
References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>
	<1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com>
	<nad-099CE0.15372420042009@news.gmane.org>
Message-ID: <1afaf6160904201647p445e22fcs872edd7d712f794@mail.gmail.com>

2009/4/20 Ned Deily <nad at acm.org>:
> In article
> <1afaf6160904201509g2f5e784ah34c728732ca9b160 at mail.gmail.com>,
> ?Benjamin Peterson <benjamin at python.org> wrote:
>> I forgot one: [...]
>
> What about #5756 - idle, pydoc, et al removed from 3.1?

I just bumped priority and left a comment.

-- 
Regards,
Benjamin

From alessiogiovanni.baroni at gmail.com  Tue Apr 21 11:13:31 2009
From: alessiogiovanni.baroni at gmail.com (Alessio Giovanni Baroni)
Date: Tue, 21 Apr 2009 11:13:31 +0200
Subject: [Python-Dev] 3.1 beta blockers
In-Reply-To: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>
References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>
Message-ID: <c010f2650904210213y50e63aacrb318f5e47de6c158@mail.gmail.com>

There are some cases of OutOfMemory? On my machine the float->string
conversion is all ok. Also 'make test' is all ok.

2009/4/21 Benjamin Peterson <benjamin at python.org>

> The first (and only) beta of 3.1 is scheduled for less than 2 weeks
> away, May 2nd, and is creeping onto the horizon. There are currently 6
> blockers:
>
> #5692: test_zipfile fails under Windows - This looks like a fairly easy
> fix.
>
> #5775: marshal.c needs to be checked for out of memory errors - Looks
> like Eric has this one.
>
> #5410: msvcrt bytes cleanup - It would be nice to have a Windows
> expert examine the patch on this issue for correctness.
>
> #5786: [This isn't applicable to 3.1]
>
> #5783: IDLE cannot find windows chm file - Awaiting a fix to the IDLE
> or the doc build system.
>
>
> --
> Thanks for your work,
> Benjamin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/alessiogiovanni.baroni%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090421/1728fe23/attachment-0001.htm>

From eric at trueblade.com  Tue Apr 21 13:13:34 2009
From: eric at trueblade.com (Eric Smith)
Date: Tue, 21 Apr 2009 07:13:34 -0400
Subject: [Python-Dev] 3.1 beta blockers
In-Reply-To: <c010f2650904210213y50e63aacrb318f5e47de6c158@mail.gmail.com>
References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>
	<c010f2650904210213y50e63aacrb318f5e47de6c158@mail.gmail.com>
Message-ID: <49EDAA5E.7040908@trueblade.com>

Alessio Giovanni Baroni wrote:
> There are some cases of OutOfMemory? On my machine the float->string 
> conversion is all ok. Also 'make test' is all ok.

I assume you're talking about issue 5775. I think it's all explained in 
the bug report. Basically, the float->string conversion can now return 
an out of memory error, which it could not before. marshal.c's w_object 
doesn't check for those error conditions. I doubt they'll ever occur in 
any test, but they need to be handled none the less.

It's on my list of things to do in the next week. But if there's anyone 
who understands the code and would like to take a look, feel free.

Eric.

> 
> 2009/4/21 Benjamin Peterson <benjamin at python.org 
> <mailto:benjamin at python.org>>
> 
>     The first (and only) beta of 3.1 is scheduled for less than 2 weeks
>     away, May 2nd, and is creeping onto the horizon. There are currently 6
>     blockers:
> 
>     #5692: test_zipfile fails under Windows - This looks like a fairly
>     easy fix.
> 
>     #5775: marshal.c needs to be checked for out of memory errors - Looks
>     like Eric has this one.
> 
>     #5410: msvcrt bytes cleanup - It would be nice to have a Windows
>     expert examine the patch on this issue for correctness.
> 
>     #5786: [This isn't applicable to 3.1]
> 
>     #5783: IDLE cannot find windows chm file - Awaiting a fix to the IDLE
>     or the doc build system.
> 
> 
>     --
>     Thanks for your work,
>     Benjamin
>     _______________________________________________
>     Python-Dev mailing list
>     Python-Dev at python.org <mailto:Python-Dev at python.org>
>     http://mail.python.org/mailman/listinfo/python-dev
>     Unsubscribe:
>     http://mail.python.org/mailman/options/python-dev/alessiogiovanni.baroni%40gmail.com
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/eric%2Bpython-dev%40trueblade.com

From eric at trueblade.com  Tue Apr 21 14:01:26 2009
From: eric at trueblade.com (Eric Smith)
Date: Tue, 21 Apr 2009 08:01:26 -0400
Subject: [Python-Dev] 3.1 beta blockers
In-Reply-To: <49EDAA5E.7040908@trueblade.com>
References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com>	<c010f2650904210213y50e63aacrb318f5e47de6c158@mail.gmail.com>
	<49EDAA5E.7040908@trueblade.com>
Message-ID: <49EDB596.4070505@trueblade.com>

Eric Smith wrote:
> Alessio Giovanni Baroni wrote:
>> There are some cases of OutOfMemory? On my machine the float->string 
>> conversion is all ok. Also 'make test' is all ok.
> 
> I assume you're talking about issue 5775. I think it's all explained in 
> the bug report. Basically, the float->string conversion can now return 
> an out of memory error, which it could not before. marshal.c's w_object 
> doesn't check for those error conditions. I doubt they'll ever occur in 
> any test, but they need to be handled none the less.
> 
> It's on my list of things to do in the next week. But if there's anyone 
> who understands the code and would like to take a look, feel free.

I just fixed it in r71783, so it should be off the list of release blockers.

From martin at v.loewis.de  Wed Apr 22 08:50:22 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 22 Apr 2009 08:50:22 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
Message-ID: <49EEBE2E.3090601@v.loewis.de>

I'm proposing the following PEP for inclusion into Python 3.1.
Please comment.

Regards,
Martin

PEP: 383
Title: Non-decodable Bytes in System Character Interfaces
Version: $Revision: 71793 $
Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 2009) $
Author: Martin v. L?wis <martin at v.loewis.de>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 22-Apr-2009
Python-Version: 3.1
Post-History:

Abstract
========

File names, environment variables, and command line arguments are
defined as being character data in POSIX; the C APIs however allow
passing arbitrary bytes - whether these conform to a certain encoding
or not. This PEP proposes a means of dealing with such irregularities
by embedding the bytes in character strings in such a way that allows
recreation of the original byte string.

Rationale
=========

The C char type is a data type that is commonly used to represent both
character data and bytes. Certain POSIX interfaces are specified and
widely understood as operating on character data, however, the system
call interfaces make no assumption on the encoding of these data, and
pass them on as-is. With Python 3, character strings use a
Unicode-based internal representation, making it difficult to ignore
the encoding of byte strings in the same way that the C interfaces can
ignore the encoding.

On the other hand, Microsoft Windows NT has correct the original
design limitation of Unix, and made it explicit in its system
interfaces that these data (file names, environment variables, command
line arguments) are indeed character data, by providing a
Unicode-based API (keeping a C-char-based one for backwards
compatibility).

For Python 3, one proposed solution is to provide two sets of APIs: a
byte-oriented one, and a character-oriented one, where the
character-oriented one would be limited to not being able to represent
all data accurately. Unfortunately, for Windows, the situation would
be exactly the opposite: the byte-oriented interface cannot represent
all data; only the character-oriented API can. As a consequence,
libraries and applications that want to support all user data in a
cross-platform manner have to accept mish-mash of bytes and characters
exactly in the way that caused endless troubles for Python 2.x.

With this PEP, a uniform treatment of these data as characters becomes
possible. The uniformity is achieved by using specific encoding
algorithms, meaning that the data can be converted back to bytes on
POSIX systems only if the same encoding is used.

Specification
=============

On Windows, Python uses the wide character APIs to access
character-oriented APIs, allowing direct conversion of the
environmental data to Python str objects.

On POSIX systems, Python currently applies the locale's encoding to
convert the byte data to Unicode. If the locale's encoding is UTF-8,
it can represent the full set of Unicode characters, otherwise, only a
subset is representable. In the latter case, using private-use
characters to represent these bytes would be an option. For UTF-8,
doing so would create an ambiguity, as the private-use characters may
regularly occur in the input also.

To convert non-decodable bytes, a new error handler "python-escape" is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which is believed to not conflict with private-use
characters that currently exist in Python codecs.

The error handler interface is extended to allow the encode error
handler to return byte strings immediately, in addition to returning
Unicode strings which then get encoded again.

If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
(which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.

Discussion
==========

While providing a uniform API to non-decodable bytes, this interface
has the limitation that chosen representation only "works" if the data
get converted back to bytes with the python-escape error handler
also. Encoding the data with the locale's encoding and the (default)
strict error handler will raise an exception, encoding them with UTF-8
will produce non-sensical data.

For most applications, we assume that they eventually pass data
received from a system interface back into the same system
interfaces. For example, and application invoking os.listdir() will
likely pass the result strings back into APIs like os.stat() or
open(), which then encodes them back into their original byte
representation. Applications that need to process the original byte
strings can obtain them by encoding the character strings with the
file system encoding, passing "python-escape" as the error handler
name.

Copyright
=========

This document has been placed in the public domain.

From ncoghlan at gmail.com  Wed Apr 22 12:56:51 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 22 Apr 2009 20:56:51 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <49EEF7F3.8060406@gmail.com>

Martin v. L?wis wrote:
> I'm proposing the following PEP for inclusion into Python 3.1.
> Please comment.

That seems like a much nicer solution than having parallel bytes/Unicode
APIs everywhere.

When the locale encoding is UTF-8, would UTF-8b also be used for the
command line decoding and environment variable encoding/decoding? (the
PEP currently only states that the encoding switch will be done for the
file system encoding - it is silent regarding the other two system
interfaces).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From glyph at divmod.com  Wed Apr 22 14:20:24 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Wed, 22 Apr 2009 12:20:24 -0000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>

On 06:50 am, martin at v.loewis.de wrote:
>I'm proposing the following PEP for inclusion into Python 3.1.
>Please comment.

>To convert non-decodable bytes, a new error handler "python-escape" is
>introduced, which decodes non-decodable bytes using into a private-use
>character U+F01xx, which is believed to not conflict with private-use
>characters that currently exist in Python codecs.

-1.  On UNIX, character data is not sufficient to represent paths.  We 
must, must, must continue to have a simple bytes interface to these 
APIs.  Covering it up in layers of obscure encoding hacks will not make 
the problem go away, it will just make it harder to understand.

To make matters worse, Linux and GNOME use the PUA for some printable 
characters.  If you open up charmap on an ubuntu system and select "view 
by unicode character block", then click on "private use area", you'll 
see many of these.  I know that Apple uses at least a few PUA codepoints 
for the apple logo and the propeller/option icons as well.

I am still -1 on any turn-non-decodable-bytes-into-text, because it 
makes life harder for those of us trying to keep bytes and text 
straight, but if you absolutely must represent POSIX filenames as 
mojibake rather than bytes, the only workable solution is to use NUL as 
your escape character.  That's the only code point which _actually_ 
can't show up in a filename somehow.  As we discussed last time, this is 
what Mono does with System.IO.Path.  As a bonus, it's _much_ easier to 
detect a NUL from random application code than to try to figure out if a 
string has any half-surrogates or magic PUA characters which shouldn't 
be interpreted according to platform PUA rules.

From walter at livinglogic.de  Wed Apr 22 13:48:04 2009
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Wed, 22 Apr 2009 13:48:04 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <49EF03F4.3010407@livinglogic.de>

Martin v. L?wis wrote:

> I'm proposing the following PEP for inclusion into Python 3.1.
> Please comment.
> 
> Regards,
> Martin
> 
> PEP: 383
> Title: Non-decodable Bytes in System Character Interfaces
> Version: $Revision: 71793 $
> Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 2009) $
> Author: Martin v. L?wis <martin at v.loewis.de>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 22-Apr-2009
> Python-Version: 3.1
> Post-History:
> 
> Abstract
> ========
> 
> File names, environment variables, and command line arguments are
> defined as being character data in POSIX; the C APIs however allow
> passing arbitrary bytes - whether these conform to a certain encoding
> or not. This PEP proposes a means of dealing with such irregularities
> by embedding the bytes in character strings in such a way that allows
> recreation of the original byte string.
> 
> Rationale
> =========
> 
> The C char type is a data type that is commonly used to represent both
> character data and bytes. Certain POSIX interfaces are specified and
> widely understood as operating on character data, however, the system
> call interfaces make no assumption on the encoding of these data, and
> pass them on as-is. With Python 3, character strings use a
> Unicode-based internal representation, making it difficult to ignore
> the encoding of byte strings in the same way that the C interfaces can
> ignore the encoding.
> 
> On the other hand, Microsoft Windows NT has correct the original

"correct" -> "corrected"

> design limitation of Unix, and made it explicit in its system
> interfaces that these data (file names, environment variables, command
> line arguments) are indeed character data, by providing a
> Unicode-based API (keeping a C-char-based one for backwards
> compatibility).
> 
> [...]
> 
> Specification
> =============
> 
> On Windows, Python uses the wide character APIs to access
> character-oriented APIs, allowing direct conversion of the
> environmental data to Python str objects.
> 
> On POSIX systems, Python currently applies the locale's encoding to
> convert the byte data to Unicode. If the locale's encoding is UTF-8,
> it can represent the full set of Unicode characters, otherwise, only a
> subset is representable. In the latter case, using private-use
> characters to represent these bytes would be an option. For UTF-8,
> doing so would create an ambiguity, as the private-use characters may
> regularly occur in the input also.
> 
> To convert non-decodable bytes, a new error handler "python-escape" is
> introduced, which decodes non-decodable bytes using into a private-use
> character U+F01xx, which is believed to not conflict with private-use
> characters that currently exist in Python codecs.

Would this mean that real private use characters in the file name would
raise an exception? How? The UTF-8 decoder doesn't pass those bytes to
any error handler.

> The error handler interface is extended to allow the encode error
> handler to return byte strings immediately, in addition to returning
> Unicode strings which then get encoded again.

Then the error callback for encoding would become specific to the target
encoding. Would this mean that the handler checks which encoding is used
and behaves like "strict" if it doesn't recognize the encoding?

> If the locale's encoding is UTF-8, the file system encoding is set to
> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.

Is this done by the codec, or the error handler? If it's done by the
codec I don't see a reason for the "python-escape" error handler.

> Discussion
> ==========
> 
> While providing a uniform API to non-decodable bytes, this interface
> has the limitation that chosen representation only "works" if the data
> get converted back to bytes with the python-escape error handler
> also.

I thought the error handler would be used for decoding.

> Encoding the data with the locale's encoding and the (default)
> strict error handler will raise an exception, encoding them with UTF-8
> will produce non-sensical data.
> 
> For most applications, we assume that they eventually pass data
> received from a system interface back into the same system
> interfaces. For example, and application invoking os.listdir() will

"and" -> "an"

> likely pass the result strings back into APIs like os.stat() or
> open(), which then encodes them back into their original byte
> representation. Applications that need to process the original byte
> strings can obtain them by encoding the character strings with the
> file system encoding, passing "python-escape" as the error handler
> name.

Servus,
   Walter

From google at mrabarnett.plus.com  Wed Apr 22 14:17:31 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Wed, 22 Apr 2009 13:17:31 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <49EF0ADB.2090107@mrabarnett.plus.com>

Martin v. L?wis wrote:
[snip]
> To convert non-decodable bytes, a new error handler "python-escape" is
> introduced, which decodes non-decodable bytes using into a private-use
> character U+F01xx, which is believed to not conflict with private-use
> characters that currently exist in Python codecs.
> 
> The error handler interface is extended to allow the encode error
> handler to return byte strings immediately, in addition to returning
> Unicode strings which then get encoded again.
> 
> If the locale's encoding is UTF-8, the file system encoding is set to
> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
> 
If the byte stream happens to include a sequence which decodes to
U+F01xx, shouldn't that raise an exception?

From dirkjan at ochtman.nl  Wed Apr 22 14:31:09 2009
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Wed, 22 Apr 2009 14:31:09 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System	Character
 Interfaces
In-Reply-To: <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>
Message-ID: <49EF0E0D.1060805@ochtman.nl>

On 22/04/2009 14:20, glyph at divmod.com wrote:
> -1. On UNIX, character data is not sufficient to represent paths. We
> must, must, must continue to have a simple bytes interface to these
> APIs. Covering it up in layers of obscure encoding hacks will not make
> the problem go away, it will just make it harder to understand.

As a hg developer, I have to concur. Keeping bytes-based APIs intact 
would make porting hg to py3k much, much easier. You may be able to 
imagine that dealing with paths correctly cross-platform on a VCS is a 
major PITA, and py3k is currently not helping the situation.

Cheers,

Dirkjan

From benjamin at python.org  Wed Apr 22 20:29:22 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 22 Apr 2009 13:29:22 -0500
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49EF0E0D.1060805@ochtman.nl>
References: <49EEBE2E.3090601@v.loewis.de>
	<20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>
	<49EF0E0D.1060805@ochtman.nl>
Message-ID: <1afaf6160904221129y2a81837bm133772bc0f4ea107@mail.gmail.com>

2009/4/22 Dirkjan Ochtman <dirkjan at ochtman.nl>:
> On 22/04/2009 14:20, glyph at divmod.com wrote:
>>
>> -1. On UNIX, character data is not sufficient to represent paths. We
>> must, must, must continue to have a simple bytes interface to these
>> APIs. Covering it up in layers of obscure encoding hacks will not make
>> the problem go away, it will just make it harder to understand.
>
> As a hg developer, I have to concur. Keeping bytes-based APIs intact would
> make porting hg to py3k much, much easier. You may be able to imagine that
> dealing with paths correctly cross-platform on a VCS is a major PITA, and
> py3k is currently not helping the situation.

You're concerns are valid, but I don't see anything in the PEP about
removing the bytes APIs.

-- 
Regards,
Benjamin

From solipsis at pitrou.net  Wed Apr 22 20:44:40 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 22 Apr 2009 18:44:40 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System?=
	=?utf-8?q?=09Character_Interfaces?=
References: <49EEBE2E.3090601@v.loewis.de>
	<20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>
	<49EF0E0D.1060805@ochtman.nl>
Message-ID: <loom.20090422T184031-474@post.gmane.org>

Dirkjan Ochtman <dirkjan <at> ochtman.nl> writes:
> 
> As a hg developer, I have to concur. Keeping bytes-based APIs intact 
> would make porting hg to py3k much, much easier. You may be able to 
> imagine that dealing with paths correctly cross-platform on a VCS is a 
> major PITA, and py3k is currently not helping the situation.

bytes-based APIs are certainly more bullet-proof under Unix, but it's the
reverse under Windows. Martin's proposal aims to bridge the gap and propose
something that makes text-based APIs as bullet-proof under Unix as they already
are under Windows.

Regards

Antoine.

From martin at v.loewis.de  Wed Apr 22 21:07:47 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 22 Apr 2009 21:07:47 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49EF03F4.3010407@livinglogic.de>
References: <49EEBE2E.3090601@v.loewis.de> <49EF03F4.3010407@livinglogic.de>
Message-ID: <49EF6B03.3010701@v.loewis.de>

> "correct" -> "corrected"

Thanks, fixed.

>> To convert non-decodable bytes, a new error handler "python-escape" is
>> introduced, which decodes non-decodable bytes using into a private-use
>> character U+F01xx, which is believed to not conflict with private-use
>> characters that currently exist in Python codecs.
> 
> Would this mean that real private use characters in the file name would
> raise an exception? How? The UTF-8 decoder doesn't pass those bytes to
> any error handler.

The python-escape codec is only used/meaningful if the env encoding
is not UTF-8. For any other encoding, it is assumed that no character
actually maps to the private-use characters.

>> The error handler interface is extended to allow the encode error
>> handler to return byte strings immediately, in addition to returning
>> Unicode strings which then get encoded again.
> 
> Then the error callback for encoding would become specific to the target
> encoding.

Why would it become specific? It can work the same way for any encoding:
take U+F01xx, and generate the byte xx.

>> If the locale's encoding is UTF-8, the file system encoding is set to
>> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
>> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
> 
> Is this done by the codec, or the error handler? If it's done by the
> codec I don't see a reason for the "python-escape" error handler.

utf-8b is a new codec. However, the utf-8b codec is only used if the
env encoding would otherwise be utf-8. For utf-8b, the error handler
is indeed unnecessary.

>> While providing a uniform API to non-decodable bytes, this interface
>> has the limitation that chosen representation only "works" if the data
>> get converted back to bytes with the python-escape error handler
>> also.
> 
> I thought the error handler would be used for decoding.

It's used in both directions: for decoding, it converts \xXX to
U+F01XX. For encoding, U+F01XX will trigger an error, which is then
handled by the handler to produce \xXX.

> "and" -> "an"

Thanks, fixed.

Regards,
Martin

From rdmurray at bitdance.com  Wed Apr 22 20:58:20 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 22 Apr 2009 14:58:20 -0400 (EDT)
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <1afaf6160904221129y2a81837bm133772bc0f4ea107@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>
	<49EF0E0D.1060805@ochtman.nl>
	<1afaf6160904221129y2a81837bm133772bc0f4ea107@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0904221456540.1740@kimball.webabinitio.net>

On Wed, 22 Apr 2009 at 13:29, Benjamin Peterson wrote:
> 2009/4/22 Dirkjan Ochtman <dirkjan at ochtman.nl>:
>> On 22/04/2009 14:20, glyph at divmod.com wrote:
>>>
>>> -1. On UNIX, character data is not sufficient to represent paths. We
>>> must, must, must continue to have a simple bytes interface to these
>>> APIs. Covering it up in layers of obscure encoding hacks will not make
>>> the problem go away, it will just make it harder to understand.
>>
>> As a hg developer, I have to concur. Keeping bytes-based APIs intact would
>> make porting hg to py3k much, much easier. You may be able to imagine that
>> dealing with paths correctly cross-platform on a VCS is a major PITA, and
>> py3k is currently not helping the situation.
>
> You're concerns are valid, but I don't see anything in the PEP about
> removing the bytes APIs.

Yeah, but IIRC a complete set of bytes APIs doesn't exist yet in py3k.

--David

From martin at v.loewis.de  Wed Apr 22 21:17:56 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 22 Apr 2009 21:17:56 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System	Character
 Interfaces
In-Reply-To: <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>
Message-ID: <49EF6D64.4030302@v.loewis.de>

> -1.  On UNIX, character data is not sufficient to represent paths.  We
> must, must, must continue to have a simple bytes interface to these
> APIs.

I'd like to respond to this concern in three ways:

1. The PEP doesn't remove any of the existing interfaces. So if the
   interfaces for byte-oriented file names in 3.0 work fine for you,
   feel free to continue to use them.

2. Even if they were taken away (which the PEP does not propose to do),
   it would be easy to emulate them for applications that want them.
   For example, listdir could be wrapped as

   def listdir_b(bytestring):
       fse = sys.getfilesystemencoding()
       string = bytestring.decode(fse, "python-escape")
       for fn in os.listdir(string):
           yield fn.encoded(fse, "python-escape")

3. I still disagree that we must, must, must continue to provide these
   interfaces. I don't understand from the rest of your message what
   would *actually* break if people would use the proposed interfaces.

Regards,
Martin

From martin at v.loewis.de  Wed Apr 22 21:19:31 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 22 Apr 2009 21:19:31 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System	Character
 Interfaces
In-Reply-To: <49EF0E0D.1060805@ochtman.nl>
References: <49EEBE2E.3090601@v.loewis.de>	<20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>
	<49EF0E0D.1060805@ochtman.nl>
Message-ID: <49EF6DC3.2040406@v.loewis.de>

Dirkjan Ochtman wrote:
> On 22/04/2009 14:20, glyph at divmod.com wrote:
>> -1. On UNIX, character data is not sufficient to represent paths. We
>> must, must, must continue to have a simple bytes interface to these
>> APIs. Covering it up in layers of obscure encoding hacks will not make
>> the problem go away, it will just make it harder to understand.
> 
> As a hg developer, I have to concur. Keeping bytes-based APIs intact
> would make porting hg to py3k much, much easier. You may be able to
> imagine that dealing with paths correctly cross-platform on a VCS is a
> major PITA, and py3k is currently not helping the situation.

I find these statements contradicting:
py3k *is* keeping the byte-based APIs for file names intact, so
why is it not helping the situation, when this is what is needed
to make porting much, much easier?

Regards,
Martin

From martin at v.loewis.de  Wed Apr 22 21:21:17 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 22 Apr 2009 21:21:17 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <Pine.LNX.4.64.0904221456540.1740@kimball.webabinitio.net>
References: <49EEBE2E.3090601@v.loewis.de>	<20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>	<49EF0E0D.1060805@ochtman.nl>	<1afaf6160904221129y2a81837bm133772bc0f4ea107@mail.gmail.com>
	<Pine.LNX.4.64.0904221456540.1740@kimball.webabinitio.net>
Message-ID: <49EF6E2D.9070504@v.loewis.de>

> Yeah, but IIRC a complete set of bytes APIs doesn't exist yet in py3k.

Define complete. I'm not aware of any interfaces wrt. file IO that are
lacking, so which ones were you thinking of?

Python doesn't currently provide a way to access environment variables
and command line arguments as bytes. With the PEP, such a way would
actually become available for applications that desire it.

Regards,
Martin

From martin at v.loewis.de  Wed Apr 22 21:24:54 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 22 Apr 2009 21:24:54 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49EF0ADB.2090107@mrabarnett.plus.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<49EF0ADB.2090107@mrabarnett.plus.com>
Message-ID: <49EF6F06.9060008@v.loewis.de>

MRAB wrote:
> Martin v. L?wis wrote:
> [snip]
>> To convert non-decodable bytes, a new error handler "python-escape" is
>> introduced, which decodes non-decodable bytes using into a private-use
>> character U+F01xx, which is believed to not conflict with private-use
>> characters that currently exist in Python codecs.
>>
>> The error handler interface is extended to allow the encode error
>> handler to return byte strings immediately, in addition to returning
>> Unicode strings which then get encoded again.
>>
>> If the locale's encoding is UTF-8, the file system encoding is set to
>> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
>> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
>>
> If the byte stream happens to include a sequence which decodes to
> U+F01xx, shouldn't that raise an exception?

I apparently have not expressed it clearly, so please help me improve
the text. What I mean is this:

- if the environment encoding (for lack of better name) is UTF-8,
  Python stops using the utf-8 codec under this PEP, and switches
  to the utf-8b codec.
- otherwise (env encoding is not utf-8), undecodable bytes get decoded
  with the error handler. In this case, U+F01xx will not occur
  in the byte stream, since no other codec ever produces this PUA
  character (this is not fully true - UTF-16 may also produce PUA
  characters, but they can't appear as env encodings).
So the case you are referring to should not happen.

Regards,
Martin

From rdmurray at bitdance.com  Wed Apr 22 21:28:32 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 22 Apr 2009 15:28:32 -0400 (EDT)
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49EF6E2D.9070504@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>
	<49EF0E0D.1060805@ochtman.nl>
	<1afaf6160904221129y2a81837bm133772bc0f4ea107@mail.gmail.com>
	<Pine.LNX.4.64.0904221456540.1740@kimball.webabinitio.net>
	<49EF6E2D.9070504@v.loewis.de>
Message-ID: <Pine.LNX.4.64.0904221524030.1740@kimball.webabinitio.net>

On Wed, 22 Apr 2009 at 21:21, "Martin v. L?wis" wrote:
>> Yeah, but IIRC a complete set of bytes APIs doesn't exist yet in py3k.
>
> Define complete. I'm not aware of any interfaces wrt. file IO that are
> lacking, so which ones were you thinking of?
>
> Python doesn't currently provide a way to access environment variables
> and command line arguments as bytes. With the PEP, such a way would
> actually become available for applications that desire it.

Those are the two that I'm thinking of.

I think I understand your proposal better now after your example of
implementing listdir(bytes).  Putting it in the PEP would probably
be a good idea.  I personally don't have enough practice in actually
working with various encodings (or any understanding of unicode escapes)
to comment further.

--David

From walter at livinglogic.de  Wed Apr 22 22:06:49 2009
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Wed, 22 Apr 2009 22:06:49 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49EF6B03.3010701@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de> <49EF03F4.3010407@livinglogic.de>
	<49EF6B03.3010701@v.loewis.de>
Message-ID: <49EF78D9.7020709@livinglogic.de>

Martin v. L?wis wrote:
>> "correct" -> "corrected"
> 
> Thanks, fixed.
> 
>>> To convert non-decodable bytes, a new error handler "python-escape" is
>>> introduced, which decodes non-decodable bytes using into a private-use
>>> character U+F01xx, which is believed to not conflict with private-use
>>> characters that currently exist in Python codecs.
>> Would this mean that real private use characters in the file name would
>> raise an exception? How? The UTF-8 decoder doesn't pass those bytes to
>> any error handler.
> 
> The python-escape codec is only used/meaningful if the env encoding
> is not UTF-8. For any other encoding, it is assumed that no character
> actually maps to the private-use characters.

Which should be true for any encoding from the pre-unicode era, but not
for UTF-16/32 and variants.

>>> The error handler interface is extended to allow the encode error
>>> handler to return byte strings immediately, in addition to returning
>>> Unicode strings which then get encoded again.
>> Then the error callback for encoding would become specific to the target
>> encoding.
> 
> Why would it become specific? It can work the same way for any encoding:
> take U+F01xx, and generate the byte xx.

If any error callback emits bytes these byte sequences must be legal in
the target encoding, which depends on the target encoding itself.

However for the normal use of this error handler this might be
irrelevant, because those filenames that get encoded were constructed in
such a way that reencoding them regenerates the original byte sequence.

>>> If the locale's encoding is UTF-8, the file system encoding is set to
>>> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
>>> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
>> Is this done by the codec, or the error handler? If it's done by the
>> codec I don't see a reason for the "python-escape" error handler.
> 
> utf-8b is a new codec. However, the utf-8b codec is only used if the
> env encoding would otherwise be utf-8. For utf-8b, the error handler
> is indeed unnecessary.

Wouldn't it make more sense to be consistent how non-decodable bytes get
decoded? I.e. should the utf-8b codec decode those bytes to PUA
characters too (and refuse to encode then, so the error handler outputs
them)?

>>> While providing a uniform API to non-decodable bytes, this interface
>>> has the limitation that chosen representation only "works" if the data
>>> get converted back to bytes with the python-escape error handler
>>> also.
>> I thought the error handler would be used for decoding.
> 
> It's used in both directions: for decoding, it converts \xXX to
> U+F01XX. For encoding, U+F01XX will trigger an error, which is then
> handled by the handler to produce \xXX.

But only for non-UTF8 encodings?

Servus,
   Walter

From mal at egenix.com  Wed Apr 22 22:43:34 2009
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 22 Apr 2009 22:43:34 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49EF78D9.7020709@livinglogic.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<49EF03F4.3010407@livinglogic.de>	<49EF6B03.3010701@v.loewis.de>
	<49EF78D9.7020709@livinglogic.de>
Message-ID: <49EF8176.1060704@egenix.com>

On 2009-04-22 22:06, Walter D?rwald wrote:
> Martin v. L?wis wrote:
>>> "correct" -> "corrected"
>> Thanks, fixed.
>>
>>>> To convert non-decodable bytes, a new error handler "python-escape" is
>>>> introduced, which decodes non-decodable bytes using into a private-use
>>>> character U+F01xx, which is believed to not conflict with private-use
>>>> characters that currently exist in Python codecs.
>>> Would this mean that real private use characters in the file name would
>>> raise an exception? How? The UTF-8 decoder doesn't pass those bytes to
>>> any error handler.
>> The python-escape codec is only used/meaningful if the env encoding
>> is not UTF-8. For any other encoding, it is assumed that no character
>> actually maps to the private-use characters.
> 
> Which should be true for any encoding from the pre-unicode era, but not
> for UTF-16/32 and variants.

Actually it's not even true for the pre-Unicode codecs. It was and is common
for Asian companies to use company specific symbols in private areas
or extended versions of CJK character sets.

Microsoft even published an editor for Asian users create their
own glyphs as needed:

    http://msdn.microsoft.com/en-us/library/cc194861.aspx

Here's an overview for some US companies using such extensions:

    http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&item_id=VendorUseOfPUA
(it's no surprise that most of these actually defined their own charsets)

SIL even started a registry for the private use areas (PUAs):

    http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&cat_id=UnicodePUA

This is their current list of assignments:

http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&item_id=SILPUAassignments

and here's how to register:

http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&cat_id=UnicodePUA#404a261e

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 22 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From jess.austin at gmail.com  Wed Apr 22 22:57:07 2009
From: jess.austin at gmail.com (Jess Austin)
Date: Wed, 22 Apr 2009 15:57:07 -0500
Subject: [Python-Dev] Issue5434: datetime.monthdelta
In-Reply-To: <b8ad139e0904161801t56a9be3o999d7f4b96a0cb5f@mail.gmail.com>
References: <b8ad139e0904152318p5473cbe5yb5f55a19894cc834@mail.gmail.com>
	<18918.61476.980951.991275@montanaro.dyndns.org>
	<b8ad139e0904161131g2f2b84fbpd67952697952afa9@mail.gmail.com>
	<18919.51931.874515.848841@montanaro.dyndns.org>
	<b8ad139e0904161801t56a9be3o999d7f4b96a0cb5f@mail.gmail.com>
Message-ID: <b8ad139e0904221357k5985662fmbb28dd5f785bc955@mail.gmail.com>

On Thu, Apr 16, 2009 at 8:01 PM, Jess Austin <jess.austin at gmail.com> wrote:
> These operations are useful in particular contexts. ?What I've
> submitted is also useful, and currently isn't easy in core,
> batteries-included python. ?While I would consider the foregoing
> interpretation of the Zen to be backwards (this doesn't add another
> way to do something that's already possible, it makes possible
> something that currently encourages one to pull her hair out), I
> suppose it doesn't matter. ?If adding a class and a function to a
> module will require extended advocacy on -ideas and c.l.p, I'm
> probably not the person for the job.
>
> If, on the other hand, one of the committers wants to toss this in at
> some point, whether now or 3 versions down the road, the patch is up
> at bugs.python.org (and I'm happy to make any suggested
> modifications). ?I'm glad to have written this; I learned a bit about
> CPython internals and scraped a layer of rust off my C skills. ?I will
> go ahead and backport the python-coded version to 2.3. ?I'll continue
> this conversation with whomever for however long, but I suspect this
> topic will soon have worn out its welcome on python-dev.

I've uploaded the backported python version source distribution to
PyPI, http://pypi.python.org/pypi?name=MonthDelta&:action=display with
better-formatted documentation at
http://packages.python.org/MonthDelta/

"easy_install MonthDelta" works too.

cheers,
Jess

From martin at v.loewis.de  Wed Apr 22 23:00:51 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 22 Apr 2009 23:00:51 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49EF78D9.7020709@livinglogic.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<49EF03F4.3010407@livinglogic.de>	<49EF6B03.3010701@v.loewis.de>
	<49EF78D9.7020709@livinglogic.de>
Message-ID: <49EF8583.1010602@v.loewis.de>

>> The python-escape codec is only used/meaningful if the env encoding
>> is not UTF-8. For any other encoding, it is assumed that no character
>> actually maps to the private-use characters.
> 
> Which should be true for any encoding from the pre-unicode era, but not
> for UTF-16/32 and variants.

Right. However, these can't appear as environment/file system encodings,
because they use null bytes.

>> Why would it become specific? It can work the same way for any encoding:
>> take U+F01xx, and generate the byte xx.
> 
> If any error callback emits bytes these byte sequences must be legal in
> the target encoding, which depends on the target encoding itself.

No. The whole process started with data having an *invalid* encoding
in the source encoding (which, after the roundtrip, is now the
target encoding). So the python-escape error handler deliberately
produces byte sequences that are invalid in the environment encoding
(hence the additional permission of having it produce bytes instead
of characters).

> However for the normal use of this error handler this might be
> irrelevant, because those filenames that get encoded were constructed in
> such a way that reencoding them regenerates the original byte sequence.

Exactly so. The error handler is not of much use outside this specific
scenario.

>> utf-8b is a new codec. However, the utf-8b codec is only used if the
>> env encoding would otherwise be utf-8. For utf-8b, the error handler
>> is indeed unnecessary.
> 
> Wouldn't it make more sense to be consistent how non-decodable bytes get
> decoded? I.e. should the utf-8b codec decode those bytes to PUA
> characters too (and refuse to encode then, so the error handler outputs
> them)?

Unfortunately, that won't work. If the original encoding is UTF-8, and
uses PUA characters, then, on re-encoding, it's not possible to tell
whether to encode as a PUA character, or as an invalid byte.

This was my original proposal a year ago, and people immediately
suggested that it is not at all acceptable if there is the slightest
chance of information loss. Hence the current PEP.

>>> I thought the error handler would be used for decoding.
>> It's used in both directions: for decoding, it converts \xXX to
>> U+F01XX. For encoding, U+F01XX will trigger an error, which is then
>> handled by the handler to produce \xXX.
> 
> But only for non-UTF8 encodings?

Right. For ease of use, the implementation will specify the error
handler regardless, and the recommended use for applications will
be to use the error handler regardless. For utf-8b, the error
handler will never be invoked, since all input can be converted
always.

Regards,
Martin

From glyph at divmod.com  Thu Apr 23 00:49:30 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Wed, 22 Apr 2009 22:49:30 -0000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <49EF6D64.4030302@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com>
	<49EF6D64.4030302@v.loewis.de>
Message-ID: <20090422224930.12555.316330142.divmod.xquotient.9043@weber.divmod.com>

On 07:17 pm, martin at v.loewis.de wrote:
>>-1.  On UNIX, character data is not sufficient to represent paths.  We
>>must, must, must continue to have a simple bytes interface to these
>>APIs.

>I'd like to respond to this concern in three ways:
>
>1. The PEP doesn't remove any of the existing interfaces. So if the
>   interfaces for byte-oriented file names in 3.0 work fine for you,
>   feel free to continue to use them.

It's good to know this.  It would be good if the PEP made it clear that 
it is proposing an additional way to work with undecodable bytes, not 
replacing the existing one.

For me, this PEP isn't an acceptable substitute for direct bytes-based 
access to command-line arguments and environment variables on UNIX.  To 
my knowledge *those* APIs still don't exist yet.  I would like it if 
this PEP were not used as an excuse to avoid adding them.
>2. Even if they were taken away (which the PEP does not propose to do),
>   it would be easy to emulate them for applications that want them.

I think this is a pretty clear abstraction inversion.  Luckily nobody is 
proposing it :).
>3. I still disagree that we must, must, must continue to provide these
>   interfaces.

You do have a point; if there is a clean, defined mapping between str 
and bytes in terms of all path/argv/environ APIs, then we don't *need* 
those APIs, since we can just implement them in terms of characters. 
But I still think that's a bad idea, since mixing the returned strings 
with *other* APIs remains problematic.  However, I still think the 
mapping you propose is problematic...
>   I don't understand from the rest of your message what
>   would *actually* break if people would use the proposed interfaces.

As far as more concrete problems: the utf-8 codec currently in python 
2.5 and 2.6, and 3.0 will happily encode half-surrogates, at least in 
the builds I have.

    >>> '\udc81'.encode('utf-8').decode('utf-8')
    '\udc81'

So there's an ambiguity when passing U+DC81 to this codec: do you mean 
\xed\xb2\x81 or do you just mean \x81?  Of course it would be possible 
to make UTF-8B consistent in this regard, but it is still going to 
interact with code that thinks in terms of actual UTF-8, and the failure 
mode here is very difficult to inspect.

A major problem here is that it's very difficult to puzzle out whether 
anything *will* actually break.  I might be wrong about the above for 
some subtlety of unicode that I don't quite understand, but I don't want 
to spend all day experimenting with every possible set of build options, 
python versions, and unicode specifications.  Neither, I wager, do most 
people who want to call listdir().

Another specific problem: looking at the Character Map application on my 
desktop, U+F0126 and U+F0127 are considered printable characters.  I'm 
not sure what they're supposed to be, exactly, but there are glyphs 
there.  This is running Ubuntu 8.04; there may be more of these in use 
in more recent version of GNOME.

There is nothing "private" about the "private use" area; Python can 
never use any of these characters for *anything*, except possibly 
internally in ways which are never exposed to application code, because 
the operating system (or window system, or libraries) might use them. 
If I pass a string with those printable PUA/A characters in it to 
listdir(), what happens?  Do they get turned into bytes, do they only 
get turned into bytes if my filesystem encoding happens to be something 
other than UTF-8...?

The PEP seems a bit ambiguous to me as far as how the PUA hack and the 
half-surrogate hack interact.  I could be wrong, but it seems to me to 
be an either-or proposition, in which case there would be *four* bytes 
types in python 3.1: bytes, bytearray, str-with-PUA/A-junk, str-with- 
half-surrogate-junk.  Detecting the difference would be an expensive and 
subtle affair; the simplest solution I could think of would be to use an 
error-prone regex.  If the encoding hack used were simply NULL, then the 
detection would be straightforward: "if '\u0000' in thingy:".

Ultimately I think I'm only -0 on all of this now, as long as we get 
bytes versions of environ and argv.  Even if these corner-case issues 
aren't fixed, those of us who want to have correct handling of 
undecodable filenames can do so.

From larry at hastings.org  Thu Apr 23 10:42:02 2009
From: larry at hastings.org (Larry Hastings)
Date: Thu, 23 Apr 2009 01:42:02 -0700
Subject: [Python-Dev] Proposed: add an environment variable, PYTHONPREFIXES
Message-ID: <49F029DA.6080107@hastings.org>

I've submitted a patch to implement a new environment variable, 
PYTHONPREFIXES.  The patch is here:

    http://bugs.python.org/issue5819

PYTHONPREFIXES is similar to PYTHONUSERBASE: it lets you add "prefix 
directories" to be culled for site packages. It differs from 
PYTHONUSERBASE in three ways:

* PYTHONPREFIXES has an empty default value. PYTHONUSERBASE has a
default, e.g. ~/.local on UNIX-like systems.

* PYTHONPREFIXES supports multiple directories, separated by the
site-specific directory separator character (os.pathsep). Earlier
directories take precedence. PYTHONUSERBASE supports specifying
at most one directory.

* PYTHONPREFIXES adds its directories to site.PREFIXES, so it reuses
the existing mechanisms for site package directories, exactly
simulating a real prefix directory. PYTHONUSERBASE only adds a
single directory, using its own custom code path.

This last point bears further discussion. PYTHONUSERBASE's custom code
to inspect only a single directory has resulted in at least one bug, if
not more, as follows:

* The bona-fide known bug: the Debian package mantainer for Python
decided to change "site-packages" to "dist-packages" in 2.6,
for reasons I still don't quite understand. He made this change in
site.addsitepackages and distutils.sysconfig.get_python_lib, and
similarly in setuptools, but he missed changing it in
site.addusersitepackages. This meant that if you used setup.py to
install a package to a private prefix directory, PYTHONUSERBASE had
no hope of ever finding the package. (Happily this bug is fixed.)

* I suspect there's a similar bug with PYTHONUSERBASE on the "os2emx"
and "riscos" platforms. site.addsitepackages on those platforms
looks in "{prefix}/Lib/site-packages", but
site.addusersitepackages looks in
"{prefix}/lib/python{version}/site-packages" as it does
on any non-Windows platform. Presumably setup.py on those two
platforms installs site packages to the directory site.addsitepackages
inspects, which means that PYTHONUSERBASE doesn't work on those
two platforms.

PYTHONUSERBASE's custom code path to add site package directories seems 
unnecessary to me. I cannot fathom why its implementors chose this 
approach; in any case I think reusing site.addsitepackages is a clear 
win. I fear it's too late to change PYTHONUSERBASE so it simply called 
site.addsitepackages, as that would change its established semantics.  
Though if that idea found support I'd be happy to contribute a patch.

A few more notes on PYTHONPREFIXES:

* PYTHONPREFIXES is gated by the exact same mechanisms that shut off
PYTHONUSERBASE.
* Specifying "-s" on the Python command line shuts it off.
* Setting the environment variable PYTHONNOUSERSITE to a non-empty
string shuts it off.
* If the effective uid / gid doesn't match the actual uid / gid it
automatically shuts off.

* I'm not enormously happy with the name. Until about an hour or two
ago I was calling it "PYTHONUSERBASES". I'm open to other
suggestions.

* I'm not sure that PYTHONPREFIX should literally modify site.PREFIXES.
If that's a bad idea I'd be happy to amend the patch so it didn't
touch site.PREFIXES.

* Reaction in python-ideas has been reasonably positive, though I gather
Nick Coughlan and Scott David Daniels think it's unnecessary. (To
read the discussion, search for the old name: "PYTHONUSERBASES".)

* Ben Finney prefers a counter-proposal he made in the python-ideas
discussion: change the existing PYTHONUSERBASE to support multiple
directories. I don't like this approach, because:
a) it means you have to explicitly add the local default if you
want to use it, and
b) PYTHONUSERBASE only inspects one directory, whereas PYTHONPREFIX
inspects all the directories Python might use for site packages.
I do admit this approach would be preferable to no change at all.

The patch is thrillingly simple and works fine. However it's not ready 
to be merged because I haven't touched the documentation. I figured I'd 
hold off until I see which way the wind blows.

I'd also be happy to convert this into a PEP if that's what's called for.

/larry/

From georg at python.org  Thu Apr 23 11:07:23 2009
From: georg at python.org (Georg Brandl)
Date: Thu, 23 Apr 2009 11:07:23 +0200
Subject: [Python-Dev] Reminder: Python Bug Day on Saturday
Message-ID: <49F02FCB.3020402@python.org>

Hi,

I'd like to remind everyoone that there will be a Python Bug Day on
April 25.  As always, this is a perfect opportunity to get involved
in Python development, or bring your own issues to attention, discuss
them and (hopefully) resolve them together with the core developers.

We will coordinate over IRC, in #python-dev on irc.freenode.net,
and the Wiki page http://wiki.python.org/moin/PythonBugDay has all
important information and a short list of steps how to get set up.

Hope to see you there!

Georg

From ben+python at benfinney.id.au  Thu Apr 23 11:13:13 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Thu, 23 Apr 2009 19:13:13 +1000
Subject: [Python-Dev] Location of OS-installed versus Python-installed
	libraries (was: Proposed: add an environment variable,
	PYTHONPREFIXES)
References: <49F029DA.6080107@hastings.org>
Message-ID: <87hc0f4tsm.fsf@benfinney.id.au>

Larry Hastings <larry at hastings.org> writes:

> the Debian package mantainer for Python decided to change
> "site-packages" to "dist-packages" in 2.6, for reasons I still don't
> quite understand.

For reference, Larry is referring to changes announced by Matthias Klose
on 2009-02-16 in Message-ID: <18841.49052.405847.359567 at gargle.gargle.HOWL>
<URL:http://lists.debian.org/debian-devel/2009/02/msg00431.html>:

> Local installation path
> -----------------------
> 
> When installing Python modules using distutils, the resulting files
> end up in the same location wether they are installed by a Debian
> package, or by a local user or administrator, unless the installation
> path is overwritten on the command line.  Compare this with most
> software based on autoconf, where an explicit prefix has to be
> provided for the packaging, while the default install installs into
> /usr/local.  For new Python versions packaged in Debian this will
> change so that an installation into /usr (not /usr/local) requires an
> extra option to distutils install command (--install-layout=deb).  To
> avoid breaking the packaging of existing code the distutils install
> command for 2.4 and 2.5 will just accept this option and ignore it.
> For the majority of packages we won't see changes in the packaging,
> provided that the python packaging helpers can find the files in the
> right location and move it to the expected target path.
> 
> A second issue raised by developers was the clash of modules and
> extensions installed by a local python installation (with default
> prefix /usr/local) with the modules provided by Debian packages
> (/usr/local/lib/pythonX.Y/site-packages shared by the patched "system"
> python and the locally installed python.  To avoid this clash the
> directory `site-packages' should be renamed to `dist-packages' in
> both locations:
> 
>  - /usr/lib/pythonX.Y/dist-packages (installation location for code
>    packaged for Debian)
>  - /usr/local/lib/pythonX.Y/dist-packages (installation location
>    for locally installed code using distutils install without
>    options).
> 
> The path /usr/lib/pythonX.Y/site-packages is not found on sys.path
> anymore.
> 
> About the name: Discussed this with Barry Warsaw and Martin v. Loewis,
> and we came to the conclusion that using the same directory name for
> both locations would be the most consistent way.

-- 
 \         ?In any great organization it is far, far safer to be wrong |
  `\          with the majority than to be right alone.? ?John Kenneth |
_o__)                                            Galbraith, 1989-07-28 |
Ben Finney

From ben at redfrontdoor.org  Thu Apr 23 14:54:59 2009
From: ben at redfrontdoor.org (Ben North)
Date: Thu, 23 Apr 2009 13:54:59 +0100
Subject: [Python-Dev] Suggested doc patch for tarfile
Message-ID: <5169ff10904230554i495b0cb3v34f9a761698a2c50@mail.gmail.com>

Hi,

The current documentation for tarfile.TarFile.extractfile() does not
mention that the returned 'file-like object' supports close() and also
iteration.  The attached patch (against svn trunk) fixes this.

(Background: I was wondering whether I could write

   def process_and_close_file(f_in):
       with closing(f_in) as f:
           # Do stuff with f.

and have it work whether f_in was a true file or the return value of
extractfile(), and thought from the documentation that I couldn't.  Of
course, I could have just tried it, but I think fixing the documentation
wouldn't hurt.)

Alternative: enhance the tarfile.ExFileObject class to support use as a
context manager?

Thanks,

Ben.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tarfile.rst.patch
Type: application/octet-stream
Size: 612 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090423/7a16d793/attachment.obj>

From aahz at pythoncraft.com  Thu Apr 23 14:57:37 2009
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 23 Apr 2009 05:57:37 -0700
Subject: [Python-Dev] Suggested doc patch for tarfile
In-Reply-To: <5169ff10904230554i495b0cb3v34f9a761698a2c50@mail.gmail.com>
References: <5169ff10904230554i495b0cb3v34f9a761698a2c50@mail.gmail.com>
Message-ID: <20090423125737.GB59@panix.com>

On Thu, Apr 23, 2009, Ben North wrote:
> 
> The current documentation for tarfile.TarFile.extractfile() does not
> mention that the returned 'file-like object' supports close() and also
> iteration.  The attached patch (against svn trunk) fixes this.

Please post the patch to bugs.python.org
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From ben at redfrontdoor.org  Thu Apr 23 15:04:44 2009
From: ben at redfrontdoor.org (Ben North)
Date: Thu, 23 Apr 2009 14:04:44 +0100
Subject: [Python-Dev] Suggested doc patch for tarfile
In-Reply-To: <20090423125737.GB59@panix.com>
References: <5169ff10904230554i495b0cb3v34f9a761698a2c50@mail.gmail.com>
	<20090423125737.GB59@panix.com>
Message-ID: <5169ff10904230604u6dcce35ar16b928fb920ce398@mail.gmail.com>

>> The current documentation for tarfile.TarFile.extractfile() does not
>> mention that the returned 'file-like object' supports close() and also
>> iteration. ?The attached patch (against svn trunk) fixes this.
>
> Please post the patch to bugs.python.org

Done:

   http://bugs.python.org/issue5821

Thanks,

Ben.

From cy6ergn0m at gmail.com  Thu Apr 23 20:21:37 2009
From: cy6ergn0m at gmail.com (cyberGn0m)
Date: Thu, 23 Apr 2009 22:21:37 +0400
Subject: [Python-Dev] Python3 and arm-linux
Message-ID: <645d12c20904231121y342247b8ib75f9eea00fea8ee@mail.gmail.com>

Somebody knowns, is python3 works on arm-linux. Is it possible to build it?
Where to find related discussions? Maybe some special patches already
available? Should i try to get sources from svn or get known version
snapshot?

-----------------------------------------------------------------

                         <y6erGn0m.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090423/4ad94546/attachment.htm>

From aleaxit at gmail.com  Thu Apr 23 20:55:48 2009
From: aleaxit at gmail.com (Alex Martelli)
Date: Thu, 23 Apr 2009 11:55:48 -0700
Subject: [Python-Dev] Python3 and arm-linux
In-Reply-To: <645d12c20904231121y342247b8ib75f9eea00fea8ee@mail.gmail.com>
References: <645d12c20904231121y342247b8ib75f9eea00fea8ee@mail.gmail.com>
Message-ID: <e8a0972d0904231155r19df94f3g93781fa941111894@mail.gmail.com>

On Thu, Apr 23, 2009 at 11:21 AM, cyberGn0m <cy6ergn0m at gmail.com> wrote:

> Somebody knowns, is python3 works on arm-linux. Is it possible to build it?
> Where to find related discussions? Maybe some special patches already
> available? Should i try to get sources from svn or get known version
> snapshot?
>

I haven't tried, but there's an interesting distro at
http://www.vanille-media.de/site/index.php/projects/python-for-arm-linux/ --
I don't know if other such distros have better-updated Python versions (eg.
current 2.6.* vs that one's 2.4.*) but that one includes a lot of very
useful add-ons.

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090423/8d2d961e/attachment.htm>

From cy6ergn0m at gmail.com  Thu Apr 23 20:59:01 2009
From: cy6ergn0m at gmail.com (cyberGn0m)
Date: Thu, 23 Apr 2009 22:59:01 +0400
Subject: [Python-Dev] Python3 and arm-linux
In-Reply-To: <e8a0972d0904231155r19df94f3g93781fa941111894@mail.gmail.com>
References: <645d12c20904231121y342247b8ib75f9eea00fea8ee@mail.gmail.com>
	<e8a0972d0904231155r19df94f3g93781fa941111894@mail.gmail.com>
Message-ID: <645d12c20904231159s3eb1df23oc16b9ae0ff74dff4@mail.gmail.com>

Yes, i visited it.. but.. looks that it was not updated for a few years,
maybe it was integrated and python-for-arm-linux is a part of openembedded
project?

2009/4/23 Alex Martelli <aleaxit at gmail.com>

> On Thu, Apr 23, 2009 at 11:21 AM, cyberGn0m <cy6ergn0m at gmail.com> wrote:
>
>> Somebody knowns, is python3 works on arm-linux. Is it possible to build
>> it? Where to find related discussions? Maybe some special patches already
>> available? Should i try to get sources from svn or get known version
>> snapshot?
>>
>
> I haven't tried, but there's an interesting distro at
> http://www.vanille-media.de/site/index.php/projects/python-for-arm-linux/-- I don't know if other such distros have better-updated Python versions
> (eg. current 2.6.* vs that one's 2.4.*) but that one includes a lot of very
> useful add-ons.
>
>
> Alex
>
>

-- 
-----------------------------------------------------------------
????? ??????????

                         <y6erGn0m.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090423/c05e35d4/attachment.htm>

From cs at zip.com.au  Fri Apr 24 01:27:12 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Fri, 24 Apr 2009 09:27:12 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <20090423232712.GA31693@cskk.homeip.net>

On 22Apr2009 08:50, Martin v. L?wis <martin at v.loewis.de> wrote:
| File names, environment variables, and command line arguments are
| defined as being character data in POSIX;

Specific citation please? I'd like to check the specifics of this.

| the C APIs however allow
| passing arbitrary bytes - whether these conform to a certain encoding
| or not.

Indeed.

| This PEP proposes a means of dealing with such irregularities
| by embedding the bytes in character strings in such a way that allows
| recreation of the original byte string.
[...]

So you're proposing that all POSIX OS interfaces (which use byte strings)
interpret those byte strings into Python3 str objects, with a codec
that will accept arbitrary byte sequences losslessly and is totally
reversible, yes?

And, I hope, that the os.* interfaces silently use it by default.

| For most applications, we assume that they eventually pass data
| received from a system interface back into the same system
| interfaces. For example, and application invoking os.listdir() will
| likely pass the result strings back into APIs like os.stat() or
| open(), which then encodes them back into their original byte
| representation. Applications that need to process the original byte
| strings can obtain them by encoding the character strings with the
| file system encoding, passing "python-escape" as the error handler
| name.

-1

This last sentence kills the idea for me, unless I'm missing something.
Which I may be, of course.

POSIX filesystems _do_not_ have a file system encoding.

The user's environment suggests a preferred encoding via the locale
stuff, and apps honouring that will make nice looking byte strings as
filenames for that user. (Some platforms, like MacOSX' HFS filesystems,
_do_ enforce an encoding, and a quite specific variety of UTF-8 it is;
I would say they're not a full UNIX filesystem _precisely_ because they
reject certain byte strings that are valid on other UNIX filesystems.
What will your proposal do here? I can imagine it might cope with
existing names, but what happens when the user creates a new name?)

Further, different users can use different locales and encodings.
If they do it in different work areas they'll be perfectly happy;
if they do it in a shared area doubtless confusion will reign,
but only in the users' minds, not in the filesystem.

If I'm writing a general purpose UNIX tool like chmod or find, I expect
it to work reliably on _any_ UNIX pathname. It must be totally encoding
blind. If I speak to the os.* interface to open a file, I expect to hand
it bytes and have it behave. As an explicit example, I would be just fine
with python's open(filename, "w") to take a string and encode it for use,
but _not_ ok for os.open() to require me to supply a string and cross
my fingers and hope something sane happens when it is turned into bytes
for the UNIX system call.

I'm very much in favour of being able to work in strings for most
purposes, but if I use the os.* interfaces on a UNIX system it is
necessary to be _able_ to work in bytes, because UNIX file pathnames
are bytes.

If there isn't a byte-safe os.* facility in Python3, it will simply be
unsuitable for writing low level UNIX tools. And I very much like using
Python2 for that.

Finally, I have a small python program whose whole purpose in life
is to transcode UNIX filenames before transfer to a MacOSX HFS
directory, because of HFS's enforced particular encoding. What approach
should a Python app take to transcode UNIX pathnames under your scheme?

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

The nice thing about standards is that you have so many to choose from;
furthermore, if you do not like any of them, you can just wait for next
year's model.   - Andrew S. Tanenbaum

From cs at zip.com.au  Fri Apr 24 01:32:45 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Fri, 24 Apr 2009 09:32:45 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090423232712.GA31693@cskk.homeip.net>
Message-ID: <20090423233245.GA6401@cskk.homeip.net>

On 24Apr2009 09:27, I wrote:
| If I'm writing a general purpose UNIX tool like chmod or find, I expect
| it to work reliably on _any_ UNIX pathname. It must be totally encoding
| blind. If I speak to the os.* interface to open a file, I expect to hand
| it bytes and have it behave. As an explicit example, I would be just fine
| with python's open(filename, "w") to take a string and encode it for use,
| but _not_ ok for os.open() to require me to supply a string and cross
| my fingers and hope something sane happens when it is turned into bytes
| for the UNIX system call.
| 
| I'm very much in favour of being able to work in strings for most
| purposes, but if I use the os.* interfaces on a UNIX system it is
| necessary to be _able_ to work in bytes, because UNIX file pathnames
| are bytes.

Just to follow up to my own words here, I would be ok for all the
pure-byte stuff to be off in the "posix" module if os.* goes pure
character instead of bytes or bytes+strings.
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

... that, in a few years, all great physical constants will have been
approximately estimated, and that the only occupation which will be
left to men of science will be to carry these measurements to another
place of decimals.      - James Clerk Maxwell (1813-1879)
                          Scientific Papers 2, 244, October 1871

From cs at zip.com.au  Fri Apr 24 01:47:24 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Fri, 24 Apr 2009 09:47:24 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49EF6D64.4030302@v.loewis.de>
Message-ID: <20090423234724.GA8077@cskk.homeip.net>

On 22Apr2009 21:17, Martin v. L?wis <martin at v.loewis.de> wrote:
| > -1.  On UNIX, character data is not sufficient to represent paths.  We
| > must, must, must continue to have a simple bytes interface to these
| > APIs.
| 
| I'd like to respond to this concern in three ways:
| 
| 1. The PEP doesn't remove any of the existing interfaces. So if the
|    interfaces for byte-oriented file names in 3.0 work fine for you,
|    feel free to continue to use them.

Ok. I think I had read things as supplanting byte-oriented interfaces
with this exciting new strings-can-do-it-all approach.

| 2. Even if they were taken away (which the PEP does not propose to do),
|    it would be easy to emulate them for applications that want them.
|    For example, listdir could be wrapped as
| 
|    def listdir_b(bytestring):
|        fse = sys.getfilesystemencoding()

Alas, no, because there is no sys.getfilesystemencoding() at the POSIX
level. It's only the user's current locale stuff on a UNIX system, and
has _nothing_ to do with the filesystem because UNIX filesystems don't
have encodings.

In particular, because the "best" (or to my mind "misleading") you
can do for this is report what the current user thinks:
  http://docs.python.org/library/sys.html#sys.getfilesystemencoding
then there's no guarrentee that what is chosen has any releationship to
what was in use when the files being consulted were made.

Now, if I were writing listdir_b() I'd want to be able to do something
along these lines:
  - set LC_ALL=C (or some equivalent mechanism)
  - have os.listdir() read bytes as numeric values and transcode their values
    _directly_ into the corresponding Unicode code points.
  - yield bytes( ord(c) for c in os_listdir_string )
  - have os.open() et al transcode unicode code points back into bytes.
i.e. a straight one-to-one mapping, using only codepoints in the range
1..255.

Then I'd have some confidence that I had got hold of the bytes as they
had come from the underlying UNIX system call, and a way to get those
bytes _back_ to a UNIX system call intact.

|        string = bytestring.decode(fse, "python-escape")
|        for fn in os.listdir(string):
|            yield fn.encoded(fse, "python-escape")
| 
| 3. I still disagree that we must, must, must continue to provide these
|    interfaces. I don't understand from the rest of your message what
|    would *actually* break if people would use the proposed interfaces.

My other longer message describes what would break, if I understand your
proposal.
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

From foom at fuhm.net  Fri Apr 24 02:52:03 2009
From: foom at fuhm.net (James Y Knight)
Date: Thu, 23 Apr 2009 20:52:03 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <BF7BB64F-3B34-4BB7-94A2-D27E6D583DA3@fuhm.net>

On Apr 22, 2009, at 2:50 AM, Martin v. L?wis wrote:

> I'm proposing the following PEP for inclusion into Python 3.1.
> Please comment.

+1. Even if some people still want a low-level bytes API, it's  
important that the easy case be easy. That is: the majority of Python  
applications should *just work, damnit* even with not-properly-encoded- 
in-current-LC_CTYPE filenames. It looks like this proposal  
accomplishes that, and does so in a relatively nice fashion.

James

From google at mrabarnett.plus.com  Fri Apr 24 03:38:49 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Fri, 24 Apr 2009 02:38:49 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49EF6F06.9060008@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<49EF0ADB.2090107@mrabarnett.plus.com>
	<49EF6F06.9060008@v.loewis.de>
Message-ID: <49F11829.9070504@mrabarnett.plus.com>

Martin v. L?wis wrote:
> MRAB wrote:
>> Martin v. L?wis wrote:
>> [snip]
>>> To convert non-decodable bytes, a new error handler "python-escape" is
>>> introduced, which decodes non-decodable bytes using into a private-use
>>> character U+F01xx, which is believed to not conflict with private-use
>>> characters that currently exist in Python codecs.
>>>
>>> The error handler interface is extended to allow the encode error
>>> handler to return byte strings immediately, in addition to returning
>>> Unicode strings which then get encoded again.
>>>
>>> If the locale's encoding is UTF-8, the file system encoding is set to
>>> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
>>> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
>>>
>> If the byte stream happens to include a sequence which decodes to
>> U+F01xx, shouldn't that raise an exception?
> 
> I apparently have not expressed it clearly, so please help me improve
> the text. What I mean is this:
> 
> - if the environment encoding (for lack of better name) is UTF-8,
>   Python stops using the utf-8 codec under this PEP, and switches
>   to the utf-8b codec.
> - otherwise (env encoding is not utf-8), undecodable bytes get decoded
>   with the error handler. In this case, U+F01xx will not occur
>   in the byte stream, since no other codec ever produces this PUA
>   character (this is not fully true - UTF-16 may also produce PUA
>   characters, but they can't appear as env encodings).
> So the case you are referring to should not happen.
> 
I think what's confusing me is that you talk about mapping non-decodable
bytes to U+F01xx, but you also talk about decoding to half surrogate
codes U+DC80..U+DCFF.

If the bytes are mapped to single half surrogate codes instead of the
normal pairs (low+high), then I can see that decoding could never be
ambiguous and encoding could produce the original bytes.

From larry.bugbee at boeing.com  Fri Apr 24 05:55:04 2009
From: larry.bugbee at boeing.com (Bugbee, Larry)
Date: Thu, 23 Apr 2009 20:55:04 -0700
Subject: [Python-Dev] Python3 and arm-linux
In-Reply-To: <mailman.15804.1240529243.11745.python-dev@python.org>
References: <mailman.15804.1240529243.11745.python-dev@python.org>
Message-ID: <9418DB6C0B9D434190E54A78E931C3D109465F84@XCH-NW-7V1.nw.nos.boeing.com>

> > Somebody knowns, is python3 works on arm-linux. Is it possible to build it?
> > Where to find related discussions? Maybe some special patches already 
> > available? Should i try to get sources from svn or get known version 
> > snapshot?
> >
> 
> I haven't tried, but there's an interesting distro at http://www.vanille-
> media.de/site/index.php/projects/python-for-arm-linux/ -- I don't know if 
> other such distros have better-updated Python versions (eg. current 2.6.* 
> vs that one's 2.4.*) but that one includes a lot of very useful add-ons.

You may want to look at the ?ngstr?m distro.
  http://www.angstrom-distribution.org/  
  http://www.angstrom-distribution.org/repo/?pkgname=libpython2.6-1.0
That's where I'll be heading in a couple of weeks.  (I have a new BeagleBoard with an ARM Cortex-A8.)

Larry

From hodgestar+pythondev at gmail.com  Fri Apr 24 09:59:03 2009
From: hodgestar+pythondev at gmail.com (Simon Cross)
Date: Fri, 24 Apr 2009 09:59:03 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>

On Wed, Apr 22, 2009 at 8:50 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> For Python 3, one proposed solution is to provide two sets of APIs: a
> byte-oriented one, and a character-oriented one, where the
> character-oriented one would be limited to not being able to represent
> all data accurately. Unfortunately, for Windows, the situation would
> be exactly the opposite: the byte-oriented interface cannot represent
> all data; only the character-oriented API can. As a consequence,
> libraries and applications that want to support all user data in a
> cross-platform manner have to accept mish-mash of bytes and characters
> exactly in the way that caused endless troubles for Python 2.x.

Is the second part of this actually true? My understanding may be
flawed, but surely all Unicode data can be converted to and from bytes
using UTF-8? Obviously not all byte sequences are valid UTF-8, but
this doesn't prevent one from creating an arbitrary Unicode string
using "utf-8 bytes".decode("utf-8").  Given this, can't people who
must have access to all files / environment data just use the bytes
interface?

Disclosure: My gut reaction is that the solution described in the PEP
is a hack, but I'm hardly a character encoding expert.  My feeling is
that the correct solution is to either standardise on the bytes
interface as the lowest common denominator, or to add a Path type (and
I guess an EnvironmentalData type) and use the new type to attempt to
hide the differences.

Schiavo
Simon

From v+python at g.nevcal.com  Fri Apr 24 11:22:14 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Fri, 24 Apr 2009 02:22:14 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
Message-ID: <49F184C6.8000905@g.nevcal.com>

On approximately 4/24/2009 12:59 AM, came the following characters from 
the keyboard of Simon Cross:
> On Wed, Apr 22, 2009 at 8:50 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> For Python 3, one proposed solution is to provide two sets of APIs: a
>> byte-oriented one, and a character-oriented one, where the
>> character-oriented one would be limited to not being able to represent
>> all data accurately. Unfortunately, for Windows, the situation would
>> be exactly the opposite: the byte-oriented interface cannot represent
>> all data; only the character-oriented API can. As a consequence,
>> libraries and applications that want to support all user data in a
>> cross-platform manner have to accept mish-mash of bytes and characters
>> exactly in the way that caused endless troubles for Python 2.x.
> 
> Is the second part of this actually true? My understanding may be
> flawed, but surely all Unicode data can be converted to and from bytes
> using UTF-8? Obviously not all byte sequences are valid UTF-8, but
> this doesn't prevent one from creating an arbitrary Unicode string
> using "utf-8 bytes".decode("utf-8").  Given this, can't people who
> must have access to all files / environment data just use the bytes
> interface?
> 
> Disclosure: My gut reaction is that the solution described in the PEP
> is a hack, but I'm hardly a character encoding expert.  My feeling is
> that the correct solution is to either standardise on the bytes
> interface as the lowest common denominator, or to add a Path type (and
> I guess an EnvironmentalData type) and use the new type to attempt to
> hide the differences.

Oh clearly it is a hack.  The right solution of a Path type (and 
friends) was discarded in earlier discussion, because it would impact 
too much existing code.  The use of bytes would be annoying in the 
context of py3, where things that you want to display are in str 
(Unicode).  So there is no solution that allows the use of str, and the 
robustness of bytes, and is 100% compatible with existing practice. 
Hence the desire is to find a hack that is "good enough".  At least, 
that is my understanding and synopsis.

I never saw MvL's original message with the PEP delivered to my mailbox, 
but some of the replies came there, so I found and extensively replied 
to it using the Google group / usenet.  My reply never showed up here 
and no one has commented on it either... Should I repost via the mailing 
list?  I think so... I'll just paste it in here, with one tweak I 
noticed after I sent it fixed... (Sorry Simon, but it is still the same 
thread, anyway.) (Sorry to others, if my original reply was seen, and 
just wasn't worth replying to.)

On Apr 21, 11:50 pm, "Martin v. L?wis" <mar... at v.loewis.de> wrote:

 > I'm proposing the following PEP for inclusion into Python 3.1.
 > Please comment.

Basically the scheme doesn't work.  Aside from that, it is very close.

There are tons of encoding schemes that could work... they don't have
to include half-surrogates or bytes.  What they have to do, is make
sure that they are uniformly applied to all appropriate strings.

The problem with this, and other preceding schemes that have been
discussed here, is that there is no means of ascertaining whether a
particular file name str was obtained from a str API, or was funny-
decoded from a bytes API... and thus, there is no means of reliably
ascertaining whether a particular filename str should be passed to a
str API, or funny-encoded back to bytes.

The assumption in the 2nd Discussion paragraph may hold for a large
percentage of cases, maybe even including some number of 9s, but it is
not guaranteed, and cannot be enforced, therefore there are cases that
could fail.  Whether those failure cases are a concern or not is an
open question.  Picking a character (I don't find U+F01xx in the
Unicode standard, so I don't know what it is) that is obscure, and
unlikely to be used in "real" file names, might help the heuristic
nature of the encoding and decoding avoid most conflicts, but provides
no guarantee that data puns will not occur in practice.  Today's
obscure character is tomorrows commonly used character, perhaps.
Someone not on this list may be happily using that character for their
own nefarious, incompatible purpose.

As I realized in the email-sig, in talking about decoding corrupted
headers, there is only one way to guarantee this... to encode _all_
character sequences, from _all_ interfaces.  Basically it requires
reserving an escape character (I'll use ? in these examples -- yes, an
ASCII question mark -- happens to be illegal in Windows filenames so
all the better on that platform, but the specific character doesn't
matter... avoiding / \ and . is probably good, though).

So the rules would be, when obtaining a file name from the bytes OS
interface, that doesn't properly decode according to UTF-8, decode it
by placing a ? at the beginning, then for each decodable UTF-8
sequence, add a Unicode character -- unless the character is ?, in
which case you add two ??, and for each non-decodable byte sequence,
place a ? and two hex digits, or a ? and a half surrogate code, or a ?
and whatever gibberish you like.  Two hex digits are fine by me, and
will serve for this discussion.

ALSO, when obtaining a file name from the str OS interfaces, encode it
too... if it contains any ?, then place a ? at the front, and then any 
other ? in the name must be doubled.

Then you have a string that can/must be encoded to be used on either
str or bytes OS interfaces... or any other interfaces that want str or
bytes... but whichever they want, you can do a decode, or determine
that you can't, into that form.  The encode and decode functions
should be available for coders to use, that code to external
interfaces, either OS or 3rd party packages, that do not use this
encoding scheme.  This encoding scheme would be used throughout all
Python APIs (most of which would need very little change to
accommodate it).  However, programs would have to keep track of
whether they were dealing with encoded or unencoded strings, if they
use both types in their program (an example, is hard-coded file names
or file name parts).

The initial ? is not strictly necessary for this scheme to work, but I
think it would be a good flag to the user that this name has been
altered.

This scheme does not depend on assumptions about the use of file
names.

This scheme would be enhanced if the file name APIs returned a subtype
of str for the encoded names, but that should be considered only a
hint, not a requirement.

When encoding file name strings to pass to bytes APIs, the ? followed
by two hex digits would be converted to a byte.  Leading ? would be
dropped, and ?? would convert to ?.  I don't believe failures are
possible when encoding to bytes.

When encoding file name strings to pass to str APIs, the discovery
of ? followed by two hex digits would raise an exception, the file
name is not acceptable to a str API.  However, leading ? would be
dropped, and ?? would convert to ?, and if no ? followed by two hex
digits were found, the file name would be successfully converted for
use on the str API.

Note that not even on Unix/Posix is it particularly easy nor useful to
place a ? into file names from command lines due to shell escapes,
etc.  The use of ? in file names also interferes with easy ability to
specifically match them in globs, etc.

Anything short of such an encoding of both types of interfaces, such
that it is known that all python-manipulated filenames will be
encoded, will have data puns that provide a potential for failure in
edge cases.

Note that in this scheme, no file names that are fully Unicode and do
not contain ? characters are altered by the decoding or the encoding
process.  That will probably reach quite a few 9s of likelihood that
the scheme will go unnoticed by most people and programs and
filenames.  But the scheme will work reliably if implemented correctly
and completely, and will have no edge cases of failure due to not
having data puns.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From hodgestar+pythondev at gmail.com  Fri Apr 24 11:37:02 2009
From: hodgestar+pythondev at gmail.com (Simon Cross)
Date: Fri, 24 Apr 2009 11:37:02 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F184C6.8000905@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
Message-ID: <fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>

On Fri, Apr 24, 2009 at 11:22 AM, Glenn Linderman <v+python at g.nevcal.com> wrote:
> Oh clearly it is a hack. ?The right solution of a Path type (and friends)
> was discarded in earlier discussion, because it would impact too much
> existing code. ?The use of bytes would be annoying in the context of py3,
> where things that you want to display are in str (Unicode). ?So there is no
> solution that allows the use of str, and the robustness of bytes, and is
> 100% compatible with existing practice. Hence the desire is to find a hack
> that is "good enough". ?At least, that is my understanding and synopsis.

What about keeping the bytes interface (utf-8 encoded Unicode on
Windows) and adding a Path type (and friends) interface that mirrors
it?

> (Sorry Simon, but it is still the same thread, anyway.)

Python discussions do seem to womble through a rather large set of
mailing lists and news groups. :)

Schiavo
Simon

From hodgestar+pythondev at gmail.com  Fri Apr 24 12:39:15 2009
From: hodgestar+pythondev at gmail.com (Simon Cross)
Date: Fri, 24 Apr 2009 12:39:15 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
Message-ID: <fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>

On Fri, Apr 24, 2009 at 12:04 PM, Glenn Linderman <glenn at nevcal.com> wrote:
> The goal of Unicode users everywhere is to use Unicode for everything, no?
> ?After all, all "real" file should have Unicode based names, and the only
> proper byte sequences that should exist are UTF-8 encoding Unicode bytes.
> ?(Cheek to tongue: Get out of here!)

Humour aside :), the expectation that filenames are Unicode data
simply doesn't agree with the reality of POSIX file systems. ?I think
an approach similar to that adopted by glib [1] could work -- i.e. use
the bytes API and provide some tools to assist application developers
in converting them to and from Unicode strings (these tools are then
where all the guess work about what encoding to use can live).

[1] http://library.gnome.org/devel/glib/stable/glib-Character-Set-Conversion.html

Schiavo
Simon

From p.f.moore at gmail.com  Fri Apr 24 14:00:40 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 24 Apr 2009 13:00:40 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
Message-ID: <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>

2009/4/24 Simon Cross <hodgestar+pythondev at gmail.com>:
> On Fri, Apr 24, 2009 at 12:04 PM, Glenn Linderman <glenn at nevcal.com> wrote:
>> The goal of Unicode users everywhere is to use Unicode for everything, no?
>> ?After all, all "real" file should have Unicode based names, and the only
>> proper byte sequences that should exist are UTF-8 encoding Unicode bytes.
>> ?(Cheek to tongue: Get out of here!)
>
> Humour aside :), the expectation that filenames are Unicode data
> simply doesn't agree with the reality of POSIX file systems.

However, it *does* agree with the reality of Windows file systems. The
fundamental problem here is that there is a strong OS disparity - for
Windows, the OS uses Unicode, for POSIX, the OS uses bytes.
Traditionally, Python has been happy to expose OS differences, and let
application code address platform portability issues. But this is such
a fundamental area, that doing so is problematic - it could easily
result in *more* code being OS-specific (in subtle,
only-affects-non-Latin-alphabet-using-users manners) rather than less.

That is why it makes sense to have *some* means of normalising things
in a way that does the best it can. The raw bytes interfaces should be
available for POSIX users writing low-level code that *must* handle
all possible nightmare scenarios[1], but Martin's proposal is designed
to handle "the majority of cases" in a platform-independent way. To
that end, a string-based interface makes sense, as frankly that's how
"normal" users think of filenames. The rest of Martin's proposal seems
to follow the same sort of practical approach.

Paul.

[1] Maybe there's a need for a Unicode interface on Windows that
doesn't do *any* encoding, even in the face of garbled Unicode - I
don't know low-level details well enough to be sure here. But the same
principle applies, that "get the raw data, regardless" is a low-level
OS-specific operation, and should not be the one used in day-to-day
programming.

From yesim4 at yahoo.com  Fri Apr 24 15:34:29 2009
From: yesim4 at yahoo.com (Yuma Scott)
Date: Fri, 24 Apr 2009 06:34:29 -0700 (PDT)
Subject: [Python-Dev] version for blender Vista
Message-ID: <140172.96060.qm@web30808.mail.mud.yahoo.com>

Can you tell me which installer of Python I need to work with
Blender and Windows Vista Home Premium?

Thanks!
Yuma Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090424/148f5fbc/attachment.htm>

From orsenthil at gmail.com  Fri Apr 24 15:59:08 2009
From: orsenthil at gmail.com (Senthil Kumaran)
Date: Fri, 24 Apr 2009 19:29:08 +0530
Subject: [Python-Dev] version for blender Vista
In-Reply-To: <140172.96060.qm@web30808.mail.mud.yahoo.com>
References: <140172.96060.qm@web30808.mail.mud.yahoo.com>
Message-ID: <7c42eba10904240659o515ed8fcqa685e068af6fe3f4@mail.gmail.com>

From:

http://mail.python.org/mailman/listinfo/python-dev

About Python-Dev   	

***Do not post general Python questions to this list. For help with
Python please see the Python help page.***

On this list the key Python developers discuss the future of the
language and its implementation. Topics include Python design issues,
release mechanics, and maintenance of existing releases.

On Fri, Apr 24, 2009 at 7:04 PM, Yuma Scott <yesim4 at yahoo.com> wrote:
>
> Can you tell me which installer of Python I need to work with
> Blender and Windows Vista Home Premium?
> Thanks!
> Yuma Scott
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/orsenthil%40gmail.com
>
>

-- 
-- 
Senthil

From foom at fuhm.net  Fri Apr 24 16:54:07 2009
From: foom at fuhm.net (James Y Knight)
Date: Fri, 24 Apr 2009 10:54:07 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
Message-ID: <0313C0B7-27A3-455E-962E-A178B41CE049@fuhm.net>

On Apr 24, 2009, at 8:00 AM, Paul Moore wrote:
> However, it *does* agree with the reality of Windows file systems. The
> fundamental problem here is that there is a strong OS disparity - for
> Windows, the OS uses Unicode, for POSIX, the OS uses bytes.

It's unfortunately the case that this isn't *precisely* true. Windows  
uses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bit  
sequences. Neither one is required by the operating system to be a  
proper unicode encoding. The main difference is that there is already  
a widely accepted way to decode a improperly-encoded 16-bit-sequence  
with the utf-16 codec: simply leave the lone surrogate pairs in place.

James

From aahz at pythoncraft.com  Fri Apr 24 17:27:46 2009
From: aahz at pythoncraft.com (Aahz)
Date: Fri, 24 Apr 2009 08:27:46 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
Message-ID: <20090424152746.GA9543@panix.com>

On Fri, Apr 24, 2009, Paul Moore wrote:
> 2009/4/24 Simon Cross <hodgestar+pythondev at gmail.com>:
>> 
>> Humour aside :), the expectation that filenames are Unicode data
>> simply doesn't agree with the reality of POSIX file systems.
> 
> However, it *does* agree with the reality of Windows file systems. The
> fundamental problem here is that there is a strong OS disparity - for
> Windows, the OS uses Unicode, for POSIX, the OS uses bytes.
> Traditionally, Python has been happy to expose OS differences, and let
> application code address platform portability issues. But this is such
> a fundamental area, that doing so is problematic - it could easily
> result in *more* code being OS-specific (in subtle,
> only-affects-non-Latin-alphabet-using-users manners) rather than less.

The part that I haven't seen clearly addressed so far is what happens
when disks get mounted across OSes (e.g. NFS).

While I agree that there should be a layer on top that can handle "most"
situations, it also seems clear that the raw layer needs to be readily
accessible.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From solipsis at pitrou.net  Fri Apr 24 17:33:05 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 24 Apr 2009 15:33:05 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System?=
	=?utf-8?q?=09Character=09Interfaces?=
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
Message-ID: <loom.20090424T153112-15@post.gmane.org>

Aahz <aahz <at> pythoncraft.com> writes:
> 
> The part that I haven't seen clearly addressed so far is what happens
> when disks get mounted across OSes (e.g. NFS).

Unless there's some kind of native NFS API for file access, it is hopelessly out
of scope for Python. We use whatever the C library exports to us, and don't have
any control over filesystem details.

From stephen at xemacs.org  Fri Apr 24 17:53:53 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 25 Apr 2009 00:53:53 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <0313C0B7-27A3-455E-962E-A178B41CE049@fuhm.net>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<0313C0B7-27A3-455E-962E-A178B41CE049@fuhm.net>
Message-ID: <87y6tqkpym.fsf@uwakimon.sk.tsukuba.ac.jp>

James Y Knight writes:

 > It's unfortunately the case that this isn't *precisely* true. Windows  
 > uses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bit  
 > sequences.

Including U+FFFE and U+FFFF "not a character nowhere nohow"?  Just
when I was thinking Microsoft would actually nail one....

From p.f.moore at gmail.com  Fri Apr 24 17:59:59 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 24 Apr 2009 16:59:59 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <loom.20090424T153112-15@post.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
Message-ID: <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>

2009/4/24 Antoine Pitrou <solipsis at pitrou.net>:
> Aahz <aahz <at> pythoncraft.com> writes:
>>
>> The part that I haven't seen clearly addressed so far is what happens
>> when disks get mounted across OSes (e.g. NFS).
>
> Unless there's some kind of native NFS API for file access, it is hopelessly out
> of scope for Python. We use whatever the C library exports to us, and don't have
> any control over filesystem details.

For "raw" level stuff (bytes on Unix, Unicode-nearly (:-)) on Windows)
that's right. Resist the temptation to guess and all that.

For the level Martin is (as far as I can tell) aiming at [1], we need
some defined rules on how to behave (relatively) sanely. Windows is
fairly easy - "nearly-Unicode" to Unicode isn't too bad. But on Unix,
you're dealing with bytes-to-Unicode in the absence of a clearly
stated encoding - which is a known can of worms...

In my view:

The pros for Martin's proposal are a uniform cross-platform interface,
and a user-friendly API for the common case.
The cons are subtle and complex corner cases, and lack of agreement on
the validity of the proposed encoding in those cases.

The fact that the bytes APIs won't go away probably mitigates the cons
to a large extent (again, in my view...)

Paul.

[1] Actually, all the PEP says is "With this PEP, a uniform treatment
of these data as characters becomes
possible." An argument as to why this is a good thing would be a
useful addition to the PEP. At the moment it's more or less treated as
self-evident - which I agree with, but which clearly the Unix people
here are not as certain of.

From status at bugs.python.org  Fri Apr 24 18:07:29 2009
From: status at bugs.python.org (Python tracker)
Date: Fri, 24 Apr 2009 18:07:29 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20090424160729.B33E5780C9@psf.upfronthosting.co.za>

ACTIVITY SUMMARY (04/17/09 - 04/24/09)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.

 2227 open (+32) / 15427 closed (+17) / 17654 total (+49)

Open issues with patches:   865

Average duration of open issues: 641 days.
Median duration of open issues: 395 days.

Open Issues Breakdown
   open  2175 (+31)
pending    52 ( +1)

Issues Created Or Reopened (51)
_______________________________

Builtin round function is sometimes inaccurate for floats        04/18/09
CLOSED http://bugs.python.org/issue1869    reopened marketdickinson               
       patch                                                                   

logging to file + encoding                                       04/20/09
CLOSED http://bugs.python.org/issue5170    reopened shamilbi                      

IDLE cannot find windows chm file                                04/17/09
       http://bugs.python.org/issue5783    created  rhettinger                    
       patch                                                                   

raw deflate format and zlib module                               04/17/09
       http://bugs.python.org/issue5784    created  phr                           

Condition.wait() does not respect its timeout                    04/18/09
CLOSED http://bugs.python.org/issue5785    created  Kjir                          

len(reversed([1,2,3])) does not work anymore in 2.6.2            04/19/09
       http://bugs.python.org/issue5786    reopened rhettinger                    

object.__getattribute__(super, '__bases__') crashes the interpre 04/19/09
CLOSED http://bugs.python.org/issue5787    reopened alexer                        

datetime.timedelta is inconvenient to use...                     04/18/09
       http://bugs.python.org/issue5788    created  bquinlan                      
       patch                                                                   

powerset recipe listed twice in itertools docs                   04/19/09
CLOSED http://bugs.python.org/issue5789    created  stevenjd                      
       easy                                                                    

itertools.izip python code has a typo                            04/19/09
CLOSED http://bugs.python.org/issue5790    created  stevenjd                      

title information of unicodedata is wrong in some cases          04/19/09
CLOSED http://bugs.python.org/issue5791    created  cfbolz                        

Enable short float repr() on Solaris/x86                         04/19/09
       http://bugs.python.org/issue5792    created  marketdickinson               
       easy                                                                    

Rationalize isdigit / isalpha / tolower / ... uses throughout Py 04/19/09
       http://bugs.python.org/issue5793    created  marketdickinson               
       easy                                                                    

pickle/cPickle of recursive tuples create pickles that cPickle c 04/19/09
       http://bugs.python.org/issue5794    created  cwitty                        

test_distutils failure on the ppc Debian buildbot                04/19/09
CLOSED http://bugs.python.org/issue5795    created  pitrou                        

test_posix, test_pty crash under Windows                         04/19/09
CLOSED http://bugs.python.org/issue5796    created  pitrou                        
       patch                                                                   

there is en exception om Create User page                        04/20/09
       http://bugs.python.org/issue5797    created  nabeel                        

test_asynchat fails on Mac OSX                                   04/20/09
       http://bugs.python.org/issue5798    created  cartman                       

Change ntpath functions to implicitly support UNC paths          04/20/09
       http://bugs.python.org/issue5799    created  larry                         
       patch                                                                   

make wsgiref.headers.Headers accept empty constructor            04/20/09
       http://bugs.python.org/issue5800    created  tarek                         
       easy                                                                    

spurious empty lines in wsgiref code                             04/20/09
       http://bugs.python.org/issue5801    created  tarek                         

The security descriptors of python binaries in Windows are not s 04/20/09
       http://bugs.python.org/issue5802    created  kindloaf                      

email/quoprimime: encode and decode are very slow on large messa 04/20/09
       http://bugs.python.org/issue5803    created  dmbaggett                     

Add a "tail" argument to zlib.decompress                         04/21/09
       http://bugs.python.org/issue5804    created  krisvale                      
       patch                                                                   

Distutils (or py2exe) error with DistributionMetaData            04/21/09
CLOSED http://bugs.python.org/issue5805    created  varash                        

MySQL crash on machine startup....                               04/21/09
CLOSED http://bugs.python.org/issue5806    created  plattecoducks                 

ConfigParser.RawConfigParser it's an "old-style" class           04/21/09
CLOSED http://bugs.python.org/issue5807    created  ZeD                           

Subprocess.getstatusoutput Fails Executing 'dir' Command on Wind 04/21/09
CLOSED http://bugs.python.org/issue5808    created  mrwizard82d1                  

"No such file or directory" with framework build under MacOS 10. 04/21/09
       http://bugs.python.org/issue5809    created  creachadair                   

test_distutils fails - sysconfig._config_vars is None            04/22/09
       http://bugs.python.org/issue5810    created  srid                          

io.BufferedReader.peek(): Documentation differs from Implementat 04/22/09
       http://bugs.python.org/issue5811    created  trott                         

Fraction('1e6') should be valid.                                 04/22/09
CLOSED http://bugs.python.org/issue5812    created  marketdickinson               
       patch                                                                   

Pointer into language reference from __future__ module documenta 04/22/09
CLOSED http://bugs.python.org/issue5813    created  ncoghlan                      

SocketServer: TypeError: waitpid() takes no keyword arguments    04/22/09
       http://bugs.python.org/issue5814    created  arekm                         

locale.getdefaultlocale() missing corner case                    04/22/09
       http://bugs.python.org/issue5815    created  rg3                           
       patch                                                                   

Simplify parsing of complex numbers and make	complex('inf') vali 04/22/09
CLOSED http://bugs.python.org/issue5816    created  marketdickinson               
       patch                                                                   

Right-click behavior from Windows Explorer                       04/23/09
       http://bugs.python.org/issue5817    created  Mkop                          

Fix five small bugs in the bininstall and altbininstall pseudota 04/23/09
       http://bugs.python.org/issue5818    created  larry                         
       patch                                                                   

Add PYTHONPREFIXES environment variable                          04/23/09
       http://bugs.python.org/issue5819    created  larry                         
       patch                                                                   

Very small bug in documentation of json.load()                   04/23/09
CLOSED http://bugs.python.org/issue5820    created  pcshyamshankar                
       patch                                                                   

Documentation: mention 'close' and iteration for tarfile.TarFile 04/23/09
       http://bugs.python.org/issue5821    created  bennorth                      
       patch                                                                   

inconsistent behavior of range when used in combination with rem 04/23/09
CLOSED http://bugs.python.org/issue5822    created  zero79                        

feature request: a conditional "for" statement                   04/23/09
CLOSED http://bugs.python.org/issue5823    created  zero79                        

SocketServer.DatagramRequestHandler Broken under Linux           04/23/09
       http://bugs.python.org/issue5824    created  jimd                          

Patch to add "remove" method to tempfile.NamedTemporaryFile      04/24/09
       http://bugs.python.org/issue5825    created  tebeka                        
       patch                                                                   

new unittest function listed as assertIsNotNot() instead of asse 04/24/09
       http://bugs.python.org/issue5826    created  mrooney                       

os.path.normpath doesn't preserve unicode                        04/24/09
       http://bugs.python.org/issue5827    created  mgiuca                        
       patch                                                                   

Invalid behavior of unicode.lower                                04/24/09
       http://bugs.python.org/issue5828    created  jarek                         

float('1e500') -> inf, complex('1e500') -> ValueError            04/24/09
       http://bugs.python.org/issue5829    created  marketdickinson               
       easy                                                                    

heapq item comparison problematic with sched's events            04/24/09
       http://bugs.python.org/issue5830    created  kfj                           

Doc mistake : threading.Timer is *not* a class                   04/24/09
       http://bugs.python.org/issue5831    created  maxenced                      
       easy                                                                    

Issues Now Closed (50)
______________________

Use shorter float repr when possible                              495 days
       http://bugs.python.org/issue1580    marketdickinson               
       patch                                                                   

Builtin round function is sometimes inaccurate for floats           0 days
       http://bugs.python.org/issue1869    marketdickinson               
       patch                                                                   

Idle, some Linuxes, cannot position Cursor by mouseclick          329 days
       http://bugs.python.org/issue2995    gpolo                         

Make conversions from long to float correctly rounded.            303 days
       http://bugs.python.org/issue3166    marketdickinson               
       patch                                                                   

Exception for test_urllib2_localnet                               246 days
       http://bugs.python.org/issue3584    r.david.murray                

float.fromhex discrepancy under Solaris                           240 days
       http://bugs.python.org/issue3633    marketdickinson               
       patch, needs review                                                     

patch for review: OS/2 EMX port fixes for 2.6                     221 days
       http://bugs.python.org/issue3868    aimacintyre                   
       patch, patch                                                            

logging to file + encoding                                          2 days
       http://bugs.python.org/issue5170    vsajip                        

Allow auto-numbered replacement fields in str.format() strings     68 days
       http://bugs.python.org/issue5237    eric.smith                    
       patch                                                                   

Add test.support.import_python_only                                58 days
       http://bugs.python.org/issue5354    ncoghlan                      

'n' formatting for int and float handles leading zero padding po   35 days
       http://bugs.python.org/issue5515    eric.smith                    

bad repr of itertools.count object with negative value on OS X 1   18 days
       http://bugs.python.org/issue5657    ronaldoussoren                

Fix BufferedRWPair                                                  8 days
       http://bugs.python.org/issue5734    pitrou                        
       patch                                                                   

Typo in documentation of print function parameters                  7 days
       http://bugs.python.org/issue5751    georg.brandl                  

Documentation error for Condition.notify()                          7 days
       http://bugs.python.org/issue5757    georg.brandl                  

__getitem__ error message hard to understand                        3 days
       http://bugs.python.org/issue5760    georg.brandl                  

SA bugs with unittest.py at r71263                                     2 days
       http://bugs.python.org/issue5771    benjamin.peterson             
       patch                                                                   

For float.__format__, don't add a trailing ".0" if we're using n    6 days
       http://bugs.python.org/issue5772    eric.smith                    
       easy                                                                    

marshal.c needs to be checked for out of memory errors              5 days
       http://bugs.python.org/issue5775    eric.smith                    

unable to search in python V3 documentation                         1 days
       http://bugs.python.org/issue5777    georg.brandl                  

_elementtree import can fail silently                               1 days
       http://bugs.python.org/issue5779    benjamin.peterson             

test_float fails for 'legacy' float repr style                      1 days
       http://bugs.python.org/issue5780    marketdickinson               
       patch                                                                   

Legacy float repr is used unnecessarily on some platforms           1 days
       http://bugs.python.org/issue5781    marketdickinson               
       easy                                                                    

',' formatting with empty format type '' (PEP 378)                  5 days
       http://bugs.python.org/issue5782    eric.smith                    
       easy                                                                    

Condition.wait() does not respect its timeout                       1 days
       http://bugs.python.org/issue5785    benjamin.peterson             

object.__getattribute__(super, '__bases__') crashes the interpre    0 days
       http://bugs.python.org/issue5787    benjamin.peterson             

powerset recipe listed twice in itertools docs                      3 days
       http://bugs.python.org/issue5789    rhettinger                    
       easy                                                                    

itertools.izip python code has a typo                               2 days
       http://bugs.python.org/issue5790    rhettinger                    

title information of unicodedata is wrong in some cases             0 days
       http://bugs.python.org/issue5791    loewis                        

test_distutils failure on the ppc Debian buildbot                   1 days
       http://bugs.python.org/issue5795    tarek                         

test_posix, test_pty crash under Windows                            2 days
       http://bugs.python.org/issue5796    r.david.murray                
       patch                                                                   

Distutils (or py2exe) error with DistributionMetaData               0 days
       http://bugs.python.org/issue5805    loewis                        

MySQL crash on machine startup....                                  0 days
       http://bugs.python.org/issue5806    plattecoducks                 

ConfigParser.RawConfigParser it's an "old-style" class              0 days
       http://bugs.python.org/issue5807    benjamin.peterson             

Subprocess.getstatusoutput Fails Executing 'dir' Command on Wind    0 days
       http://bugs.python.org/issue5808    benjamin.peterson             

Fraction('1e6') should be valid.                                    2 days
       http://bugs.python.org/issue5812    marketdickinson               
       patch                                                                   

Pointer into language reference from __future__ module documenta    1 days
       http://bugs.python.org/issue5813    georg.brandl                  

Simplify parsing of complex numbers and make	complex('inf') vali    2 days
       http://bugs.python.org/issue5816    marketdickinson               
       patch                                                                   

Very small bug in documentation of json.load()                      0 days
       http://bugs.python.org/issue5820    georg.brandl                  
       patch                                                                   

inconsistent behavior of range when used in combination with rem    0 days
       http://bugs.python.org/issue5822    zero79                        

feature request: a conditional "for" statement                      0 days
       http://bugs.python.org/issue5823    zero79                        

No sleep or busy wait                                            2093 days
       http://bugs.python.org/issue780602  gpolo                         

ConfigParser non-string defaults broken with .getboolean()       1771 days
       http://bugs.python.org/issue974019  draghuram                     
       patch, easy                                                             

ctrl-left/-right works incorectly with diacritics                1707 days
       http://bugs.python.org/issue1012435 gpolo                         

not enough information in SGMLParseError                         1625 days
       http://bugs.python.org/issue1063229 ajaksu2                       

Frame does not receive configure event on move                   1562 days
       http://bugs.python.org/issue1100366 gpolo                         

Let shift operators take any integer value                       1436 days
       http://bugs.python.org/issue1205239 marketdickinson               

Allow thread(ing) tests to pass without setting stack size        994 days
       http://bugs.python.org/issue1533520 aimacintyre                   
       patch                                                                   

import deadlocks when using PyObjC threads                        898 days
       http://bugs.python.org/issue1590864 abaron                        

object.__init__ shouldn't allow args/kwds                         763 days
       http://bugs.python.org/issue1683368 KayEss                        

Top Issues Most Discussed (10)
______________________________

 18 Add DTrace probes                                                194 days
open    http://bugs.python.org/issue4111   

 15 IDLE cannot find windows chm file                                  7 days
pending http://bugs.python.org/issue5783   

  9 len(reversed([1,2,3])) does not work anymore in 2.6.2              6 days
open    http://bugs.python.org/issue5786   

  8 Use shorter float repr when possible                             495 days
closed  http://bugs.python.org/issue1580   

  7 Fraction('1e6') should be valid.                                   2 days
closed  http://bugs.python.org/issue5812   

  7 datetime.timedelta is inconvenient to use...                       6 days
open    http://bugs.python.org/issue5788   

  7 logging to file + encoding                                         2 days
closed  http://bugs.python.org/issue5170   

  7 Builtin round function is sometimes inaccurate for floats          0 days
closed  http://bugs.python.org/issue1869   

  6 Simplify parsing of complex numbers and make	complex('inf') val    2 days
closed  http://bugs.python.org/issue5816   

  6 locale.getdefaultlocale() missing corner case                      2 days
open    http://bugs.python.org/issue5815   

From google at mrabarnett.plus.com  Fri Apr 24 18:29:29 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Fri, 24 Apr 2009 17:29:29 +0100
Subject: [Python-Dev] Dates in python-dev
Message-ID: <49F1E8E9.60903@mrabarnett.plus.com>

Hi,

I've recently subscribed to this list and received my first "Summary of
Python tracker Issues". What I find annoying are the dates, for example:

     ACTIVITY SUMMARY (04/17/09 - 04/24/09)

3 x double-digits (have we learned nothing from Y2K? :-)) with the
_middle_ ones changing fastest!

I know it's the US standard, but Python is global. Could we have an
'international' style instead, say, year-month-day:

     ACTIVITY SUMMARY (2009-04-17 - 2009-04-24)

Thank you for your attention, etc.

From arfrever.fta at gmail.com  Fri Apr 24 18:37:15 2009
From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis)
Date: Fri, 24 Apr 2009 18:37:15 +0200
Subject: [Python-Dev] Dates in python-dev
In-Reply-To: <49F1E8E9.60903@mrabarnett.plus.com>
References: <49F1E8E9.60903@mrabarnett.plus.com>
Message-ID: <200904241837.21090.Arfrever.FTA@gmail.com>

2009-04-24 18:29:29 MRAB napisa?(a):
> Hi,
> 
> I've recently subscribed to this list and received my first "Summary of
> Python tracker Issues". What I find annoying are the dates, for example:
> 
>      ACTIVITY SUMMARY (04/17/09 - 04/24/09)
> 
> 3 x double-digits (have we learned nothing from Y2K? :-)) with the
> _middle_ ones changing fastest!
> 
> I know it's the US standard, but Python is global. Could we have an
> 'international' style instead, say, year-month-day:
> 
>      ACTIVITY SUMMARY (2009-04-17 - 2009-04-24)

+1.
ISO 8601 should be mandatory.

-- 
Arfrever Frehtes Taifersar Arahesis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090424/5e149f68/attachment.pgp>

From phd at phd.pp.ru  Fri Apr 24 19:06:57 2009
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Fri, 24 Apr 2009 21:06:57 +0400
Subject: [Python-Dev] Dates in python-dev
In-Reply-To: <49F1E8E9.60903@mrabarnett.plus.com>
References: <49F1E8E9.60903@mrabarnett.plus.com>
Message-ID: <20090424170657.GA13056@phd.pp.ru>

On Fri, Apr 24, 2009 at 05:29:29PM +0100, MRAB wrote:
> I've recently subscribed to this list and received my first "Summary of
> Python tracker Issues". What I find annoying are the dates, for example:
>
>     ACTIVITY SUMMARY (04/17/09 - 04/24/09)
>
> 3 x double-digits (have we learned nothing from Y2K? :-)) with the
> _middle_ ones changing fastest!
>
> I know it's the US standard, but Python is global. Could we have an
> 'international' style instead, say, year-month-day:
>
>     ACTIVITY SUMMARY (2009-04-17 - 2009-04-24)

   +1000 from me!

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From stephen at xemacs.org  Fri Apr 24 19:25:03 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 25 Apr 2009 02:25:03 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
Message-ID: <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>

Paul Moore writes:

 > The pros for Martin's proposal are a uniform cross-platform interface,
 > and a user-friendly API for the common case.

A more accurate phrasing would be "... a user-friendly API for those
who feel very lucky today."  Which is the common case, of course, but
spins a little differently.

 > [1] Actually, all the PEP says is "With this PEP, a uniform
 > treatment of these data as characters becomes possible." An
 > argument as to why this is a good thing would be a useful addition
 > to the PEP. At the moment it's more or less treated as self-evident
 > - which I agree with, but which clearly the Unix people here are
 > not as certain of.

Well, the problem is that both parts are false.  If you didn't start
with a valid string in a known encoding, you shouldn't treat it as
characters because it's not.  Hand it to a careful API, and you'll get
an Exception raised in your face.  And that's precisely why it's not
obviously a good thing.  Careful clients will have to treat it as
"transcoded bytes", and so the people who develop those clients get no
benefit.  OTOH, at least some of those who feel lucky and use it
naively are going to turn out to be wrong.

That said, I'm +0 on the PEP as is.  It's a little bit better than the
current situation in that developers who would otherwise just punt on
dealing with the other world (ie, Windows for Unix hackers, and Unix
for Windows coders) will have a unified interface so it'll maybe work
automagically (when you're luck :-) in that other world, too.  And if
somebody comes up with an idea of true genius for handling the
underlying problem, or even just a slight practical improvement, then
everybody who uses this API can benefit simply by upgrading Python.

From stephen at xemacs.org  Fri Apr 24 19:39:51 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 25 Apr 2009 02:39:51 +0900
Subject: [Python-Dev] Dates in python-dev
In-Reply-To: <200904241837.21090.Arfrever.FTA@gmail.com>
References: <49F1E8E9.60903@mrabarnett.plus.com>
	<200904241837.21090.Arfrever.FTA@gmail.com>
Message-ID: <87skjykl20.fsf@uwakimon.sk.tsukuba.ac.jp>

Followups directed to Tracker-Discuss, where the people who can do
something about it are hanging out.  (They're here too, but I'm pretty
sure they'd rather discuss this issue on that list.)

Arfrever Frehtes Taifersar Arahesis writes:
 > 2009-04-24 18:29:29 MRAB napisa?(a):
 > > Hi,
 > > 
 > > I've recently subscribed to this list and received my first "Summary of
 > > Python tracker Issues". What I find annoying are the dates, for example:
 > > 
 > >      ACTIVITY SUMMARY (04/17/09 - 04/24/09)
 > > 
 > > 3 x double-digits (have we learned nothing from Y2K? :-)) with the
 > > _middle_ ones changing fastest!
 > > 
 > > I know it's the US standard, but Python is global. Could we have an
 > > 'international' style instead, say, year-month-day:
 > > 
 > >      ACTIVITY SUMMARY (2009-04-17 - 2009-04-24)
 > 
 > +1.
 > ISO 8601 should be mandatory.

From solipsis at pitrou.net  Fri Apr 24 19:39:15 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 24 Apr 2009 17:39:15 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System?=
	=?utf-8?q?=09Character=09Interfaces?=
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <loom.20090424T173704-587@post.gmane.org>

Stephen J. Turnbull <stephen <at> xemacs.org> writes:
> 
> Well, the problem is that both parts are false.  If you didn't start
> with a valid string in a known encoding, you shouldn't treat it as
> characters because it's not.  Hand it to a careful API, and you'll get
> an Exception raised in your face.

Which "careful API" are you talking about?

> OTOH, at least some of those who feel lucky and use it
> naively are going to turn out to be wrong.

Why will they turn out to be wrong?

From tlesher at gmail.com  Fri Apr 24 19:44:13 2009
From: tlesher at gmail.com (Tim Lesher)
Date: Fri, 24 Apr 2009 13:44:13 -0400
Subject: [Python-Dev] PyEval_Call* convenience functions
Message-ID: <9613db600904241044i7b7a9e46x1110d809a72235e1@mail.gmail.com>

Is there a reason that the PyEval_CallFunction() and
PyEval_CallMethod() convenience functions remain undocumented? (i.e.,
would a doc-and-test patch to correct this be rejected?)

I didn't see any mention of this coming up in python-dev before.

Also, despite its name, PyEval_CallMethod() is quite useful for
calling module-level functions or classes (given that it's just a
PyObject_GetAttrString plus the implementation of
PyEval_CallFunction).  Is there any reason (beyond its undocumented
status) to believe this use case would ever be deprecated?

Thanks.

-- 
Tim Lesher <tlesher at gmail.com>

From ajaksu at gmail.com  Fri Apr 24 19:50:47 2009
From: ajaksu at gmail.com (Daniel Diniz)
Date: Fri, 24 Apr 2009 14:50:47 -0300
Subject: [Python-Dev] [Tracker-discuss]  Dates in python-dev
In-Reply-To: <87skjykl20.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <49F1E8E9.60903@mrabarnett.plus.com>
	<200904241837.21090.Arfrever.FTA@gmail.com> 
	<87skjykl20.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <2d75d7660904241050l27665a8ege0aa52f6822375bd@mail.gmail.com>

http://psf.upfronthosting.co.za/roundup/meta/issue274

From python at rcn.com  Fri Apr 24 19:52:50 2009
From: python at rcn.com (Raymond Hettinger)
Date: Fri, 24 Apr 2009 10:52:50 -0700
Subject: [Python-Dev] Tuples and underorderable types
Message-ID: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>

Does anyone have any ideas about what to do with issue 5830 and handling the problem in a general way (not just for sched)?

The basic problem is that decorate/compare/undecorate patterns no longer work when the primary sort keys are equal and the secondary 
keys are unorderable (which is now the case for many callables).

    >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)]
    >>> tasks.sort()
    Traceback (most recent call last):
    ...
    TypeError: unorderable types: function() < function()

Would it make sense to provide a default ordering whenever the types are the same?

    def object.__lt__(self, other):
            if type(self) == type(other):
                 return id(self) < id(other)
            raise TypeError

Raymond

From solipsis at pitrou.net  Fri Apr 24 20:02:43 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 24 Apr 2009 18:02:43 +0000 (UTC)
Subject: [Python-Dev] Tuples and underorderable types
References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>
Message-ID: <loom.20090424T175642-920@post.gmane.org>

Raymond Hettinger <python <at> rcn.com> writes:
> 
> Would it make sense to provide a default ordering whenever the types are
> the same?

This doesn't work when they are not the same :-)

Instead, you could make the decorating a bit more sophisticated:

  decorated = [(key, id(value), value) for key, value in blah(values)]

or even:

  decorated = [(key, n, value) for n, key, value in enumerate(blah(values))]

From python at rcn.com  Fri Apr 24 20:19:39 2009
From: python at rcn.com (Raymond Hettinger)
Date: Fri, 24 Apr 2009 11:19:39 -0700
Subject: [Python-Dev] Tuples and underorderable types
References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>
	<loom.20090424T175642-920@post.gmane.org>
Message-ID: <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1>

>> Would it make sense to provide a default ordering whenever the types are
>> the same?
> 
> This doesn't work when they are not the same :-)

 _ ~
 @ @
 \_/

> Instead, you could make the decorating a bit more sophisticated:
> 
>  decorated = [(key, id(value), value) for key, value in blah(values)]
> 
> or even:
> 
>  decorated = [(key, n, value) for n, key, value in enumerate(blah(values))]

I already do something along those lines in heapq.nsmallest() 
and nlargest() to preserve sort stability. 
The real issue isn't how to fix one particular module.
The problem is that a basic python pattern is now broken
in a way that may not readily surface during testing.

I'm wondering if there is something we can do to mitigate
the issue in a general way.  It bites that the venerable technique
of tuple sorting has lost some of its mojo.  This may be
an unintended consequence of eliminating default comparisons.

Raymond

From scott+python-dev at scottdial.com  Fri Apr 24 20:25:04 2009
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Fri, 24 Apr 2009 14:25:04 -0400
Subject: [Python-Dev] Tuples and underorderable types
In-Reply-To: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>
References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>
Message-ID: <49F20400.3030400@scottdial.com>

Raymond Hettinger wrote:
> Would it make sense to provide a default ordering whenever the types are
> the same?
> 
>    def object.__lt__(self, other):
>            if type(self) == type(other):
>                 return id(self) < id(other)
>            raise TypeError

No. This only makes it more difficult for someone wanting to behave
smartly with incomparable types. I can easily imagine someone wanting
incomparable objects to be treated as equal wrt. sorting. I am thinking
especially with respect to keeping the sort stable. I think many
developers would be surprised to find,

 >>> a =
 >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)]
 >>> tasks.sort()
 >>> assert tasks[0][1]() == 0

, is not guaranteed.

Moreover, I fail to see your point in general as a bug if you accept
that there is not all objects can be total ordered. We shouldn't be
patching the object base class because of legacy code that relied on
sorting tuples; this code should be updated to either use a key function.

-Scott

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From stephen at xemacs.org  Fri Apr 24 20:40:12 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 25 Apr 2009 03:40:12 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in
	System		Character	Interfaces
In-Reply-To: <loom.20090424T173704-587@post.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090424T173704-587@post.gmane.org>
Message-ID: <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp>

Antoine Pitrou writes:
 > Stephen J. Turnbull <stephen <at> xemacs.org> writes:
 > > 
 > > Well, the problem is that both parts are false.  If you didn't start
 > > with a valid string in a known encoding, you shouldn't treat it as
 > > characters because it's not.  Hand it to a careful API, and you'll get
 > > an Exception raised in your face.
 > 
 > Which "careful API" are you talking about?
 >
 > > OTOH, at least some of those who feel lucky and use it
 > > naively are going to turn out to be wrong.
 > 
 > Why will they turn out to be wrong?

To quote the PEP:

"""
While providing a uniform API to non-decodable bytes, this interface
has the limitation that chosen representation only "works" if the data
get converted back to bytes with the python-escape error handler
also. Encoding the data with the locale's encoding and the (default)
strict error handler will raise an exception, encoding them with UTF-8
will produce non-sensical data.

For most applications, we assume that they eventually pass data
received from a system interface back into the same system
interfaces.
"""

But you can't know that.  These are now "just strings", which could
end up in pickles and other persistent objects, be passed across
network interfaces (remote copy, for example), etc, etc, and there is
no way to guarantee that the recipient will understand the rules,
unless the application encapsulates them in some kind of
representation that says "I look like a Unicode but I'm really just
encoded bytes."  But the whole point is to turn them into plain old
strings so people *don't have to bother* keeping track.

As I already said, this is no worse than the current situation, but it
gives the impression that Python has a standard "solution".  (Yes, I
know Martin doesn't claim it's a solution to any of those problems.
The point is user perception.)

I have to wonder whether having a standard way of not solving any
problems is better than having no standard way of not solving any
problems.  It may be, and it probably can't hurt, which is why I'm +0.

From martin at v.loewis.de  Fri Apr 24 20:31:54 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 24 Apr 2009 20:31:54 +0200
Subject: [Python-Dev] Tuples and underorderable types
In-Reply-To: <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1>
References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>	<loom.20090424T175642-920@post.gmane.org>
	<3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1>
Message-ID: <49F2059A.9090708@v.loewis.de>

> I'm wondering if there is something we can do to mitigate
> the issue in a general way.  It bites that the venerable technique
> of tuple sorting has lost some of its mojo.  This may be
> an unintended consequence of eliminating default comparisons.

I would discourage use of the decorate/sort/undecorate pattern,
and encourage use of the key= argument. Or, if you really need
to decorate into a tuple, still pass a key= argument.

Regards,
Martin

From ajaksu at gmail.com  Fri Apr 24 20:53:41 2009
From: ajaksu at gmail.com (Daniel Diniz)
Date: Fri, 24 Apr 2009 15:53:41 -0300
Subject: [Python-Dev] Tuples and underorderable types
In-Reply-To: <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1>
References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> 
	<loom.20090424T175642-920@post.gmane.org>
	<3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1>
Message-ID: <2d75d7660904241153ub47507ai1ec23545129985ed@mail.gmail.com>

Raymond Hettinger wrote:
> The problem is that a basic python pattern is now broken
> in a way that may not readily surface during testing.
>
> I'm wondering if there is something we can do to mitigate
> the issue in a general way. ?It bites that the venerable technique
> of tuple sorting has lost some of its mojo. ?This may be
> an unintended consequence of eliminating default comparisons.

There could be a high performance, non-lame version of the mapping
pattern below available in the stdlib (or at least in the docs):

keymap = {type(lambda: 1) : id}

def decorate_helper(tup):
    return tuple(keymap[type(i)](i) if type(i) in keymap else i for i in tup)

tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)]
tasks.sort(key=decorate_helper)

This works when comparing different types too, but then some care must
be taken to avoid bad surprises:

keymap[type(1j)] = abs
imaginary_tasks = [(10j, lambda: 0), (20, lambda: 1), (10+1j, lambda: 2)]
imaginary_tasks.sort(key=decorate_helper) # not so bad if intended

mixed_tasks = [(lambda: 0,), (0.0,), (2**32,)]
mixed_tasks.sort(key=decorate_helper) # oops, not the same order as in 2.x

Regards,
Daniel

From g.brandl at gmx.net  Fri Apr 24 20:59:01 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 24 Apr 2009 18:59:01 +0000
Subject: [Python-Dev] PyEval_Call* convenience functions
In-Reply-To: <9613db600904241044i7b7a9e46x1110d809a72235e1@mail.gmail.com>
References: <9613db600904241044i7b7a9e46x1110d809a72235e1@mail.gmail.com>
Message-ID: <gst26e$t4t$1@ger.gmane.org>

Tim Lesher schrieb:
> Is there a reason that the PyEval_CallFunction() and
> PyEval_CallMethod() convenience functions remain undocumented? (i.e.,
> would a doc-and-test patch to correct this be rejected?)
> 
> I didn't see any mention of this coming up in python-dev before.
> 
> Also, despite its name, PyEval_CallMethod() is quite useful for
> calling module-level functions or classes (given that it's just a
> PyObject_GetAttrString plus the implementation of
> PyEval_CallFunction).  Is there any reason (beyond its undocumented
> status) to believe this use case would ever be deprecated?

FWIW, there's also PyObject_CallMethod(); all PyObject_Call* variants are
documented, but none of the PyEval_Call* functions are.  I actually don't
know why we have two sets of these, with partially conflicting definitions;
perhaps someone else can shed some light?

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From aahz at pythoncraft.com  Fri Apr 24 21:10:33 2009
From: aahz at pythoncraft.com (Aahz)
Date: Fri, 24 Apr 2009 12:10:33 -0700
Subject: [Python-Dev] Tuples and underorderable types
In-Reply-To: <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1>
References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>
	<loom.20090424T175642-920@post.gmane.org>
	<3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1>
Message-ID: <20090424191033.GA14924@panix.com>

On Fri, Apr 24, 2009, Raymond Hettinger wrote:
>
> I'm wondering if there is something we can do to mitigate the issue in
> a general way.  It bites that the venerable technique of tuple sorting
> has lost some of its mojo.  This may be an unintended consequence of
> eliminating default comparisons.

My understanding was that this was entirely an *intended* consequence of
eliminating default comparisons.  Not so much in the sense that it was
desired by itself, but that the whole discussion of whether to keep
moving forward in stripping out default comparisons explicitly revolved
around whether this kind of difficulty warranted the overall
simplification we now have (I don't remember off-hand whether this
specific case was discussed, though).

I think that anyone who wants to suggest reverting to some kind of
default comparison behavior needs to write up a PEP and clearly summarize
all previous discussion prior to 3.0 release, then go through the usual
grind of starting with python-ideas before coming back to python-dev.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From l.mastrodomenico at gmail.com  Fri Apr 24 21:41:21 2009
From: l.mastrodomenico at gmail.com (Lino Mastrodomenico)
Date: Fri, 24 Apr 2009 21:41:21 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <cc93256f0904241241o602096dj76a1df607dfaafde@mail.gmail.com>

2009/4/22 "Martin v. L?wis" <martin at v.loewis.de>:
> To convert non-decodable bytes, a new error handler "python-escape" is
> introduced, which decodes non-decodable bytes using into a private-use
> character U+F01xx, which is believed to not conflict with private-use
> characters that currently exist in Python codecs.

Why not use U+DCxx for non-UTF-8 encodings too?

Overall I like the PEP: I think it's the best proposal so far that
doesn't put an heavy burden on applications that only want to do
simple things with the API.

-- 
Lino Mastrodomenico

From v+python at g.nevcal.com  Fri Apr 24 21:41:25 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Fri, 24 Apr 2009 12:41:25 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in	System		Character
 Interfaces
In-Reply-To: <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	<20090424152746.GA9543@panix.com>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090424T173704-587@post.gmane.org>
	<87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <49F215E5.4050205@g.nevcal.com>

On approximately 4/24/2009 11:40 AM, came the following characters from 
the keyboard of Stephen J. Turnbull:
> Antoine Pitrou writes:
>  > Stephen J. Turnbull <stephen <at> xemacs.org> writes:
>  > > 
>  > > Well, the problem is that both parts are false.  If you didn't start
>  > > with a valid string in a known encoding, you shouldn't treat it as
>  > > characters because it's not.  Hand it to a careful API, and you'll get
>  > > an Exception raised in your face.
>  > 
>  > Which "careful API" are you talking about?
>  >
>  > > OTOH, at least some of those who feel lucky and use it
>  > > naively are going to turn out to be wrong.
>  > 
>  > Why will they turn out to be wrong?

Because the encoding is not reliably reversible.  That is why I proposed 
one that is.

> To quote the PEP:
> 
> """
> While providing a uniform API to non-decodable bytes, this interface
> has the limitation that chosen representation only "works" if the data
> get converted back to bytes with the python-escape error handler
> also. Encoding the data with the locale's encoding and the (default)
> strict error handler will raise an exception, encoding them with UTF-8
> will produce non-sensical data.
> 
> For most applications, we assume that they eventually pass data
> received from a system interface back into the same system
> interfaces.
> """

And so my encoding (1) doesn't alter the data stream for any valid 
Windows file name, and where the naivest of users reside (2) doesn't 
alter the data stream for any Posix file name that was encoded as UTF-8 
sequences and doesn't contain ? characters in the file name [I perceive 
the use of ? in file names to be rare on Posix, because of experience, 
and because of the other problems caused by such use] (3) doesn't 
introduce data puns within applications that are correctly coded to know 
the encoding occurs.  The encoding technique in the PEP not only can 
produce data puns, thus not being reversible, it provides no reliable 
mechanism to know that this has occurred.

> But you can't know that.  These are now "just strings", which could
> end up in pickles and other persistent objects, be passed across
> network interfaces (remote copy, for example), etc, etc, and there is
> no way to guarantee that the recipient will understand the rules,
> unless the application encapsulates them in some kind of
> representation that says "I look like a Unicode but I'm really just
> encoded bytes."  

This could happen.  Well-formed programs need to use the encoding at the 
boundaries.  Python could encapsulate its interfaces to the file system, 
but cannot encapsulate other interfaces.  Fortunately, something that is 
pickled, would probably be unpicked by Python, and therefore all would 
be well.  But any interface that expects a file name, and is not 
encapsulated by Python, must be encapsulated by the application.

> But the whole point is to turn them into plain old
> strings so people *don't have to bother* keeping track.

And if that is the point, it isn't worth doing.  If the point is that it 
can minimize the amount of existing, file name manipulation code that 
uses string manipulations, that must be reworked to be functional during 
a 2to3 migration, then it can be worth doing.  But I think it should be 
done with an encoding that doesn't introduce undetectable data puns, 
whether mine or some different encoding with that characteristic, but 
not the one presently in the PEP, because it does introduce undetectable 
data puns.

> As I already said, this is no worse than the current situation, but it
> gives the impression that Python has a standard "solution".  (Yes, I
> know Martin doesn't claim it's a solution to any of those problems.
> The point is user perception.)
> 
> I have to wonder whether having a standard way of not solving any
> problems is better than having no standard way of not solving any
> problems.  It may be, and it probably can't hurt, which is why I'm +0.

Interesting phraseology there, Stephen!

I'm +1 on the concept, -1 on the PEP, due solely to the lack of a 
reversible encoding.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at g.nevcal.com  Fri Apr 24 21:44:11 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Fri, 24 Apr 2009 12:44:11 -0700
Subject: [Python-Dev] Dates in python-dev
In-Reply-To: <20090424170657.GA13056@phd.pp.ru>
References: <49F1E8E9.60903@mrabarnett.plus.com>
	<20090424170657.GA13056@phd.pp.ru>
Message-ID: <49F2168B.2050306@g.nevcal.com>

On approximately 4/24/2009 10:06 AM, came the following characters from 
the keyboard of Oleg Broytmann:
> On Fri, Apr 24, 2009 at 05:29:29PM +0100, MRAB wrote:
>> I've recently subscribed to this list and received my first "Summary of
>> Python tracker Issues". What I find annoying are the dates, for example:
>>
>>     ACTIVITY SUMMARY (04/17/09 - 04/24/09)
>>
>> 3 x double-digits (have we learned nothing from Y2K? :-)) with the
>> _middle_ ones changing fastest!
>>
>> I know it's the US standard, but Python is global. Could we have an
>> 'international' style instead, say, year-month-day:
>>
>>     ACTIVITY SUMMARY (2009-04-17 - 2009-04-24)
> 
>    +1000 from me!
> 
> Oleg.

You missed a prime opportunity, Oleg...

+2000 from me!

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at v.loewis.de  Fri Apr 24 22:25:25 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 24 Apr 2009 22:25:25 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <cc93256f0904241241o602096dj76a1df607dfaafde@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<cc93256f0904241241o602096dj76a1df607dfaafde@mail.gmail.com>
Message-ID: <49F22035.9030405@v.loewis.de>

> Why not use U+DCxx for non-UTF-8 encodings too?

I thought of that, and was tricked into believing that only U+DC8x
is a half surrogate. Now I see that you are right, and have fixed
the PEP accordingly.

Regards,
Martin

From tjreedy at udel.edu  Fri Apr 24 22:25:42 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 24 Apr 2009 16:25:42 -0400
Subject: [Python-Dev] Summary of Python tracker Issues
In-Reply-To: <20090424160729.B33E5780C9@psf.upfronthosting.co.za>
References: <20090424160729.B33E5780C9@psf.upfronthosting.co.za>
Message-ID: <gst785$d7b$1@ger.gmane.org>

Python tracker wrote:
[snip]

In going through this, I notice a lot of effort by Mark Dickenson and 
others to get some details of numbers computation and display right in 
time for 3.1.  As a certain-to-be beneficiary, I want to thank all who 
contributed.

Terry Jan Reedy

From dickinsm at gmail.com  Fri Apr 24 22:37:55 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Fri, 24 Apr 2009 21:37:55 +0100
Subject: [Python-Dev] Summary of Python tracker Issues
In-Reply-To: <gst785$d7b$1@ger.gmane.org>
References: <20090424160729.B33E5780C9@psf.upfronthosting.co.za>
	<gst785$d7b$1@ger.gmane.org>
Message-ID: <5c6f2a5d0904241337g275450f5uee119a15b831b659@mail.gmail.com>

On Fri, Apr 24, 2009 at 9:25 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> In going through this, I notice a lot of effort by Mark Dickenson and others

Many others, but Eric Smith's name needs to be in big lights here.
There's no way the short float repr would have been ready for 3.1 if
Eric hadn't shown an interest in this at PyCon, and then taken on
the major internal replumbing job this entailed for all of Python's
string formatting.

> 3.1. ?As a certain-to-be beneficiary, I want to thank all who contributed.

Glad you like it!

Mark

From eric at trueblade.com  Fri Apr 24 23:08:51 2009
From: eric at trueblade.com (Eric Smith)
Date: Fri, 24 Apr 2009 17:08:51 -0400
Subject: [Python-Dev] Summary of Python tracker Issues
In-Reply-To: <5c6f2a5d0904241337g275450f5uee119a15b831b659@mail.gmail.com>
References: <20090424160729.B33E5780C9@psf.upfronthosting.co.za>	<gst785$d7b$1@ger.gmane.org>
	<5c6f2a5d0904241337g275450f5uee119a15b831b659@mail.gmail.com>
Message-ID: <49F22A63.60808@trueblade.com>

Mark Dickinson wrote:
> On Fri, Apr 24, 2009 at 9:25 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>> In going through this, I notice a lot of effort by Mark Dickenson and others
> 
> Many others, but Eric Smith's name needs to be in big lights here.
> There's no way the short float repr would have been ready for 3.1 if
> Eric hadn't shown an interest in this at PyCon, and then taken on
> the major internal replumbing job this entailed for all of Python's
> string formatting.

Not to get too much into a mutual admiration mode, but Mark did the 
parts involving hard thinking.

>> 3.1.  As a certain-to-be beneficiary, I want to thank all who contributed.
> 
> Glad you like it!

Me, too. I think it's going to be great once we get it all straightened 
out. And I think we're close!

Eric.

From eric at trueblade.com  Fri Apr 24 23:15:13 2009
From: eric at trueblade.com (Eric Smith)
Date: Fri, 24 Apr 2009 17:15:13 -0400
Subject: [Python-Dev] Deprecating PyOS_ascii_formatd
In-Reply-To: <49DD2E41.80401@trueblade.com>
References: <49DD2E41.80401@trueblade.com>
Message-ID: <49F22BE1.7020603@trueblade.com>

Eric Smith wrote:
> Assuming that Mark's and my changes in the py3k-short-float-repr branch 
> get checked in shortly, I'd like to deprecate PyOS_ascii_formatd. Its 
> functionality is largely being replaced by PyOS_double_to_string, which 
> we're introducing on our branch.

We've checked the changes in, and everything looks good as far as I can 
tell.

> My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in 3.2.

Having heard no dissent, I'd like to go ahead and deprecate this API. 
What are the mechanics of deprecating this? Just documentation, or is 
there something I should do in the code to generate a warning? Any 
pointers to examples would be great.

> The 2.7 situation is tricker, because we're not planning on backporting 
> the short-float-repr work back to 2.7. In 2.7 I guess we'll leave 
> PyOS_ascii_formatd around, unfortunately.

I backported the new API to 2.7, so I'll also deprecate 
PyOS_ascii_formatd there.

Eric.

From benjamin at python.org  Fri Apr 24 23:17:19 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Fri, 24 Apr 2009 16:17:19 -0500
Subject: [Python-Dev] Deprecating PyOS_ascii_formatd
In-Reply-To: <49F22BE1.7020603@trueblade.com>
References: <49DD2E41.80401@trueblade.com> <49F22BE1.7020603@trueblade.com>
Message-ID: <1afaf6160904241417i64fc6640x680f7a54789b322c@mail.gmail.com>

2009/4/24 Eric Smith <eric at trueblade.com>:
>> My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in
>> 3.2.
>
> Having heard no dissent, I'd like to go ahead and deprecate this API. What
> are the mechanics of deprecating this? Just documentation, or is there
> something I should do in the code to generate a warning? Any pointers to
> examples would be great.

You can use PyErr_WarnEx().

-- 
Regards,
Benjamin

From a.badger at gmail.com  Fri Apr 24 23:26:12 2009
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 24 Apr 2009 14:26:12 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in	System		Character
 Interfaces
In-Reply-To: <49F215E5.4050205@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	<20090424152746.GA9543@panix.com>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090424T173704-587@post.gmane.org>	<87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp>
	<49F215E5.4050205@g.nevcal.com>
Message-ID: <49F22E74.4070108@gmail.com>

Glenn Linderman wrote:
> On approximately 4/24/2009 11:40 AM, came the following characters from
> And so my encoding (1) doesn't alter the data stream for any valid
> Windows file name, and where the naivest of users reside (2) doesn't
> alter the data stream for any Posix file name that was encoded as UTF-8
> sequences and doesn't contain ? characters in the file name [I perceive
> the use of ? in file names to be rare on Posix, because of experience,
> and because of the other problems caused by such use] (3) doesn't
> introduce data puns within applications that are correctly coded to know
> the encoding occurs.  The encoding technique in the PEP not only can
> produce data puns, thus not being reversible, it provides no reliable
> mechanism to know that this has occurred.
> 
Uhm....  Not arguing with your goals but '?' is unfortunately reasonably
easy to get into a filename.  For instance, I've had to download a lot
of scratch built packages from our buildsystem recently.  Scratch builds
have url's with query strings in them so::

wget
'http://koji.fedoraproject.org/koji/getfile?taskID=1318059&name=monodevelop-debugger-gdb-2.0-1.1.i586.rpm'

Which results in the filename:
  getfile?taskID=1318059&name=monodevelop-debugger-gdb-2.0-1.1.i586.rpm

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090424/232171fd/attachment-0001.pgp>

From tjreedy at udel.edu  Fri Apr 24 23:36:54 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 24 Apr 2009 17:36:54 -0400
Subject: [Python-Dev] Tuples and underorderable types
In-Reply-To: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>
References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>
Message-ID: <gstbdm$pb1$1@ger.gmane.org>

Raymond Hettinger wrote:
> Does anyone have any ideas about what to do with issue 5830 and handling 
> the problem in a general way (not just for sched)?
> 
> The basic problem is that decorate/compare/undecorate patterns no longer 
> work when the primary sort keys are equal and the secondary keys are 
> unorderable (which is now the case for many callables).
> 
>    >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)]
>    >>> tasks.sort()
>    Traceback (most recent call last):
>    ...
>    TypeError: unorderable types: function() < function()
> 
> Would it make sense to provide a default ordering whenever the types are 
> the same?
> 
>    def object.__lt__(self, other):
>            if type(self) == type(other):
>                 return id(self) < id(other)
>            raise TypeError

The immediate problem with this is that 'same type', or not, is 
sometimes a somewhat arbitrary implementation detail.  In 2.x, 
4000000000 could be int or long, depending on the build.  In 3.0, that 
difference disappeared.  User-defined and builtin functions are 
different classes for implementation, not conceptual reasons.  (This 
could potentially bite what I understand to be your r71844/5 fix.) 
Unbound methods used to be the same class as bound methods (as I 
remember).  In 3.0, the wrapping disappeared and they are the same thing 
as the underlying function.  In 2.x, ascii text and binary data might 
both be str.  Now they might be str and bytes.

Universal ordering and default ordering by id was broken (and doomed) 
when Guido decided that complex numbers should not be comparable either 
lexicographically or by id.  Your proposed object.__lt__ would reverse 
that decision, unless, of course, complex was special-cased (again) to 
over-ride it, but then we would be back to the 2.x situation of mixed 
rules and exceptions.

Terry Jan Reedy

From p.f.moore at gmail.com  Sat Apr 25 00:05:04 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 24 Apr 2009 23:05:04 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>

2009/4/24 Stephen J. Turnbull <stephen at xemacs.org>:
> Paul Moore writes:
>
> ?> The pros for Martin's proposal are a uniform cross-platform interface,
> ?> and a user-friendly API for the common case.
>
> A more accurate phrasing would be "... a user-friendly API for those
> who feel very lucky today." ?Which is the common case, of course, but
> spins a little differently.

Sorry, but I think you're misrepresenting things. I'd have probably
let you off if you'd missed out the "very" - but I do think that it's
the common case. Consider:

- Windows systems where broken Unicode (lone surrogates or whatever)
isn't involved
- Unix systems where the user's stated filesystem encoding is correct

Can you honestly say that this isn't the vast majority of real-world
environments? (IIRC, you are based in Japan, so it may well be true
that the likelihood of problems is a lot higher where you are than
where I am - the UK - but I suspect that averaging out, things are
generally as above).

> ?> [1] Actually, all the PEP says is "With this PEP, a uniform
> ?> treatment of these data as characters becomes possible." An
> ?> argument as to why this is a good thing would be a useful addition
> ?> to the PEP. At the moment it's more or less treated as self-evident
> ?> - which I agree with, but which clearly the Unix people here are
> ?> not as certain of.
>
> Well, the problem is that both parts are false.

I can't work out which "parts" you are referring to here.

> If you didn't start
> with a valid string in a known encoding, you shouldn't treat it as
> characters because it's not.

Again, that's the purist argument. If you have a string (of bytes, I
guess) and a 99% certain guess as to the correct encoding, then I'd
argue that, as long as (a) it's not mission-critical (lives or backups
depend on it) and (b) you have a means of failing relatively
gracefully, you have every reason to make the assumption about
encoding.

After all, what's the alternative? Ultimately, you have a byte string
and no encoding. You make some assumption, or you can do hardly
anything. What use is "Processing file \x66\x6f\x6f" as a progress
indicator for a program that scans a directory? (That was "foo" for
people who can't read latin-1 written in hex :-))

> Hand it to a careful API, and you'll get
> an Exception raised in your face. ?And that's precisely why it's not
> obviously a good thing. ?Careful clients will have to treat it as
> "transcoded bytes", and so the people who develop those clients get no
> benefit. ?OTOH, at least some of those who feel lucky and use it
> naively are going to turn out to be wrong.

But 99% of the time, "it" is a perfectly acceptable string.
(Percentage invented out of thin air, admitted :-)) Remember, only
when the system encounters an undecodable byte sequence, would a
technically invalid string be generated - and as far as I can tell,
the main case when that would happen is on Unix, if the user specifies
UTF-8 as the encoding, and the actual filesystem uses something else,
*and* there's a file with a name whose byte sequence is invalid UTF-8.
I'm *really* struggling to see that as a common scenario.

Admittedly, there are other, possibly more common, cases where the
string translation is valid, but semantically not what the user
expects - user says CP1251, but filesystem is CP850, say. As a UK
Windows user, I'm used to seeing CP850 vs CP1251 confusions like this
- "?" replaced with ? is the common case. It happens occasionally, and
occasionally causes code to behave unexpectedly. But it doesn't
reformat my hard drive and the alternative (having to be extra-careful
to tell every program precisely which encoding I'm using in every
situation) would make programs effectively unusable.

> That said, I'm +0 on the PEP as is.

So I'm largely preaching to the converted here. After all, lukewarm
acceptance from someone with experience of Asian encoding issues is
pretty much the equivalent of resounding support from someone who only
ever works in English! :-)

Paul.

From python at rcn.com  Sat Apr 25 00:23:52 2009
From: python at rcn.com (Raymond Hettinger)
Date: Fri, 24 Apr 2009 15:23:52 -0700
Subject: [Python-Dev] Tuples and underorderable types
References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>	<loom.20090424T175642-920@post.gmane.org>
	<3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1>
	<49F2059A.9090708@v.loewis.de>
Message-ID: <2DDEC3B33EDF49CABD54E3C1D1CAAF0B@RaymondLaptop1>

> I would discourage use of the decorate/sort/undecorate pattern,
> and encourage use of the key= argument. Or, if you really need
> to decorate into a tuple, still pass a key= argument.

The bug report was actually about the sched module which used
heapq to prioritize tuples consisting of times, priorities, and actions.
I fixed and closed the original bug a few hours ago but had a
thought that the pattern itself may be ubiquitious (especially with heapq).
ISTM that other bugs like this are lurking about.  But all of you 
guys seem to think the status quo is fine, so that's the end of it.

Cheers,

Raymond

From Leo.Barendse at nokia.com  Fri Apr 24 23:57:36 2009
From: Leo.Barendse at nokia.com (Leo.Barendse at nokia.com)
Date: Fri, 24 Apr 2009 23:57:36 +0200
Subject: [Python-Dev] "Length of str "  changes after passed in Python 2.5
Message-ID: <89F9BF23C080784180D44D7A8FEBA42909E037321B@NOK-EUMSG-03.mgdnok.nokia.com>

-----------------------------------------------------------

I have the following code:

#  len(all_svs) = 10

# the I call a function  with 2 list parameters

def proc_line(line,all_svs) :

# inside the function the length of the list "all_svs" is 1 more -> 11
# I had to workaround it

for i in range(len(all_svs)  - 1 ) :    # some how the length of all_svs  is incremented !!!!!!!!!!!!!!!!!!!!!!!!!!!

--------------------------------------------------------------

Is this a compiler bug ??

Or is it because of my first try of Python

Thanks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090424/173e46da/attachment.htm>

From aahz at pythoncraft.com  Sat Apr 25 00:34:18 2009
From: aahz at pythoncraft.com (Aahz)
Date: Fri, 24 Apr 2009 15:34:18 -0700
Subject: [Python-Dev] "Length of str " changes after passed in Python	2.5
In-Reply-To: <89F9BF23C080784180D44D7A8FEBA42909E037321B@NOK-EUMSG-03.mgdnok.nokia.com>
References: <89F9BF23C080784180D44D7A8FEBA42909E037321B@NOK-EUMSG-03.mgdnok.nokia.com>
Message-ID: <20090424223418.GA23575@panix.com>

On Fri, Apr 24, 2009, Leo.Barendse at nokia.com wrote:
>
> I have the following code:
> #  len(all_svs) = 10
> 
> # the I call a function  with 2 list parameters
> def proc_line(line,all_svs) :
> 
> # inside the function the length of the list "all_svs" is 1 more -> 11
> # I had to workaround it

This sounds like a usage question.  Please use comp.lang.python (or
possibly the tutor mailing list).
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From foom at fuhm.net  Sat Apr 25 01:06:29 2009
From: foom at fuhm.net (James Y Knight)
Date: Fri, 24 Apr 2009 19:06:29 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
Message-ID: <F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>

On Apr 24, 2009, at 6:05 PM, Paul Moore wrote:
> - Windows systems where broken Unicode (lone surrogates or whatever)
> isn't involved
> - Unix systems where the user's stated filesystem encoding is correct
>
> Can you honestly say that this isn't the vast majority of real-world
> environments? (IIRC, you are based in Japan, so it may well be true
> that the likelihood of problems is a lot higher where you are than
> where I am - the UK - but I suspect that averaging out, things are
> generally as above).

In my experience, it is normal on most unix systems that some programs  
(mostly daemons) are running in default "POSIX" locale, others (most  
user programs) are running in the "en_US.utf-8" locale, and some  
luddite users have set themselves to "en_US.8859-1". All running on  
the same system.

James

From tjreedy at udel.edu  Sat Apr 25 03:08:16 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 24 Apr 2009 21:08:16 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F22E74.4070108@gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	<20090424152746.GA9543@panix.com>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090424T173704-587@post.gmane.org>	<87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp>	<49F215E5.4050205@g.nevcal.com>
	<49F22E74.4070108@gmail.com>
Message-ID: <gstnpv$l4b$1@ger.gmane.org>

Toshio Kuratomi wrote:
> Glenn Linderman wrote:
>> On approximately 4/24/2009 11:40 AM, came the following characters from
>> And so my encoding (1) doesn't alter the data stream for any valid
>> Windows file name, and where the naivest of users reside (2) doesn't
>> alter the data stream for any Posix file name that was encoded as UTF-8
>> sequences and doesn't contain ? characters in the file name [I perceive
>> the use of ? in file names to be rare on Posix, because of experience,
>> and because of the other problems caused by such use] (3) doesn't
>> introduce data puns within applications that are correctly coded to know
>> the encoding occurs.  The encoding technique in the PEP not only can
>> produce data puns, thus not being reversible, it provides no reliable
>> mechanism to know that this has occurred.
>>
> Uhm....  Not arguing with your goals but '?' is unfortunately reasonably
> easy to get into a filename.  For instance, I've had to download a lot
> of scratch built packages from our buildsystem recently.  Scratch builds
> have url's with query strings in them so::

Is NUL \0 allowed in POSIX file names?  If not, could that be used as an 
escape char.  If it is not legal, then custom translated strings that 
escape in the wild would raise a red flag as soon as something else 
tried to use them.

From tjreedy at udel.edu  Sat Apr 25 03:16:24 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 24 Apr 2009 21:16:24 -0400
Subject: [Python-Dev] Tuples and underorderable types
In-Reply-To: <2DDEC3B33EDF49CABD54E3C1D1CAAF0B@RaymondLaptop1>
References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1>	<loom.20090424T175642-920@post.gmane.org>	<3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1>	<49F2059A.9090708@v.loewis.de>
	<2DDEC3B33EDF49CABD54E3C1D1CAAF0B@RaymondLaptop1>
Message-ID: <gsto98$mbl$1@ger.gmane.org>

Raymond Hettinger wrote:
> 
>> I would discourage use of the decorate/sort/undecorate pattern,
>> and encourage use of the key= argument. Or, if you really need
>> to decorate into a tuple, still pass a key= argument.
> 
> The bug report was actually about the sched module which used
> heapq to prioritize tuples consisting of times, priorities, and actions.
> I fixed and closed the original bug a few hours ago but had a
> thought that the pattern itself may be ubiquitious (especially with heapq).
> ISTM that other bugs like this are lurking about.  But all of you guys 
> seem to think the status quo is fine, so that's the end of it.

If you define the bug as the sched module not being updated to the 3.0 
order, then there are possibly more.

I notice that most of the heapq functions do not take a key function 
argument.  Has or will this change in the future?  Or is making 
key-decorated tuples the responsibility of the user?  (I can see that a 
key func would work better with PriQueue class where the key func is 
passed just once.)

tjr

From a.badger at gmail.com  Sat Apr 25 03:20:56 2009
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 24 Apr 2009 18:20:56 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <gstnpv$l4b$1@ger.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	<20090424152746.GA9543@panix.com>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090424T173704-587@post.gmane.org>	<87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp>	<49F215E5.4050205@g.nevcal.com>	<49F22E74.4070108@gmail.com>
	<gstnpv$l4b$1@ger.gmane.org>
Message-ID: <49F26578.60603@gmail.com>

Terry Reedy wrote:

> Is NUL \0 allowed in POSIX file names?  If not, could that be used as an
> escape char.  If it is not legal, then custom translated strings that
> escape in the wild would raise a red flag as soon as something else
> tried to use them.
> 
AFAIK NUL should be okay but I haven't read a specification to reach
that conclusion.  Is that a proposal?  Should I go find someone who has
read the relevant standards to find out?

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090424/f2abc3c7/attachment.pgp>

From cs at zip.com.au  Sat Apr 25 06:22:47 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Sat, 25 Apr 2009 14:22:47 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F26578.60603@gmail.com>
Message-ID: <20090425042247.GA26029@cskk.homeip.net>

On 24Apr2009 18:20, Toshio Kuratomi <a.badger at gmail.com> wrote:
| Terry Reedy wrote:
| > Is NUL \0 allowed in POSIX file names?  If not, could that be used as an
| > escape char.  If it is not legal, then custom translated strings that
| > escape in the wild would raise a red flag as soon as something else
| > tried to use them.
| > 
| AFAIK NUL should be okay but I haven't read a specification to reach
| that conclusion.  Is that a proposal?  Should I go find someone who has
| read the relevant standards to find out?

NUL cannot occur in a POSIX file path, if for no other reason than that
the API uses C strings, which are NUL terminated.

So, yes, you could use NUL as an escape character if you're sure you're
never dealing with _non_POSIX pathnames:-)

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

| I'm the female partner of a climber (I don't climb) and until now, I was
| under the impression that climbers are cool people, but alas, you had to
| ruin it for me.
*REAL* climbers are crude, impolite, solitary, abrupt, arrogant.  Sport
climbers are cool.
        - Rene Tio <tor at bnr.ca> in rec.climbing

From p.f.moore at gmail.com  Sat Apr 25 11:00:24 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 25 Apr 2009 10:00:24 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>
Message-ID: <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>

2009/4/25 James Y Knight <foom at fuhm.net>:
> On Apr 24, 2009, at 6:05 PM, Paul Moore wrote:
>>
>> - Windows systems where broken Unicode (lone surrogates or whatever)
>> isn't involved
>> - Unix systems where the user's stated filesystem encoding is correct
>>
>> Can you honestly say that this isn't the vast majority of real-world
>> environments? (IIRC, you are based in Japan, so it may well be true
>> that the likelihood of problems is a lot higher where you are than
>> where I am - the UK - but I suspect that averaging out, things are
>> generally as above).
>
> In my experience, it is normal on most unix systems that some programs
> (mostly daemons) are running in default "POSIX" locale, others (most user
> programs) are running in the "en_US.utf-8" locale, and some luddite users
> have set themselves to "en_US.8859-1". All running on the same system.

OK, thanks for the data point.

Following on from that, would this (under Martin's proposal) result in
programs receiving encoded strings, or just semantically-incorrect
ones?

Specifically, the 8859-1 case cannot result in encoded strings, as
8859-1 can represent all byte strings (possibly garbled, but at least
validly). The utf8 case can hit unrepresentable bytes, but only if
there are characters greater than 0x7F in filenames. Is the "POSIX"
case ASCII? If so, then the same logic (>=0x80 is unrepresentable).

So, the next question is - do people on such systems frequently use
high-bit characters in filenames?

Paul.

PS Unfortunately, I suspect that the biggest group of people likely to
be hit badly by this is people using non-latin scripts. And arguing
probabilities without real data is optimistic at best. But those
people are also the *least* likely people to contribute on an
English-speaking list, I guess :-( (Sincere apologies if everyone but
me on this list happens to actually be fluent English-speaking
Russians :-))

From eric at trueblade.com  Sat Apr 25 13:03:39 2009
From: eric at trueblade.com (Eric Smith)
Date: Sat, 25 Apr 2009 07:03:39 -0400
Subject: [Python-Dev] Deprecating PyOS_ascii_formatd
In-Reply-To: <1afaf6160904241417i64fc6640x680f7a54789b322c@mail.gmail.com>
References: <49DD2E41.80401@trueblade.com> <49F22BE1.7020603@trueblade.com>
	<1afaf6160904241417i64fc6640x680f7a54789b322c@mail.gmail.com>
Message-ID: <49F2EE0B.1090006@trueblade.com>

Benjamin Peterson wrote:
> 2009/4/24 Eric Smith <eric at trueblade.com>:
>>> My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in
>>> 3.2.
>> Having heard no dissent, I'd like to go ahead and deprecate this API. What
>> are the mechanics of deprecating this? Just documentation, or is there
>> something I should do in the code to generate a warning? Any pointers to
>> examples would be great.
> 
> You can use PyErr_WarnEx().

Thanks. I created issue 5835 to track this. I marked it as a release 
blocker, but I should have no problem finishing it up this weekend.

From martin at v.loewis.de  Sat Apr 25 14:07:44 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 25 Apr 2009 14:07:44 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <20090423232712.GA31693@cskk.homeip.net>
References: <20090423232712.GA31693@cskk.homeip.net>
Message-ID: <49F2FD10.9080302@v.loewis.de>

Cameron Simpson wrote:
> On 22Apr2009 08:50, Martin v. L?wis <martin at v.loewis.de> wrote:
> | File names, environment variables, and command line arguments are
> | defined as being character data in POSIX;
> 
> Specific citation please? I'd like to check the specifics of this.

For example, on environment variables:

http://opengroup.org/onlinepubs/007908799/xbd/envvar.html

# For values to be portable across XSI-conformant systems, the value
# must be composed of characters from the portable character set (except
# NUL and as indicated below).

# Environment variable names used by the utilities in the XCU
# specification consist solely of upper-case letters, digits and the "_"
# (underscore) from the characters defined in Portable Character Set .
# Other characters may be permitted by an implementation;

Or, on command line arguments:

http://opengroup.org/onlinepubs/007908799/xsh/execve.html

# The arguments represented by arg0, ... are pointers to null-terminated
# character strings

where a character string is "A contiguous sequence of characters
terminated by and including the first null byte.", and a character
is

# A sequence of one or more bytes representing a single graphic symbol
# or control code. This term corresponds to the ISO C standard term
# multibyte character (multi-byte character), where a single-byte
# character is a special case of a multi-byte character. Unlike the
# usage in the ISO C standard, character here has no necessary
# relationship with storage space, and byte is used when storage space
# is discussed.

> So you're proposing that all POSIX OS interfaces (which use byte strings)
> interpret those byte strings into Python3 str objects, with a codec
> that will accept arbitrary byte sequences losslessly and is totally
> reversible, yes?

Correct.

> And, I hope, that the os.* interfaces silently use it by default.

Correct.

> | Applications that need to process the original byte
> | strings can obtain them by encoding the character strings with the
> | file system encoding, passing "python-escape" as the error handler
> | name.
> 
> -1
> 
> This last sentence kills the idea for me, unless I'm missing something.
> Which I may be, of course.
> 
> POSIX filesystems _do_not_ have a file system encoding.

Why is that a problem for the PEP?

> If I'm writing a general purpose UNIX tool like chmod or find, I expect
> it to work reliably on _any_ UNIX pathname. It must be totally encoding
> blind. If I speak to the os.* interface to open a file, I expect to hand
> it bytes and have it behave.

See the other messages. If you want to do that, you can continue to.

> I'm very much in favour of being able to work in strings for most
> purposes, but if I use the os.* interfaces on a UNIX system it is
> necessary to be _able_ to work in bytes, because UNIX file pathnames
> are bytes.

Please re-read the PEP. It provides a way of being able to access any
POSIX file name correctly, and still pass strings.

> If there isn't a byte-safe os.* facility in Python3, it will simply be
> unsuitable for writing low level UNIX tools.

Why is that? The mechanism in the PEP is precisely defined to allow
writing low level UNIX tools.

> Finally, I have a small python program whose whole purpose in life
> is to transcode UNIX filenames before transfer to a MacOSX HFS
> directory, because of HFS's enforced particular encoding. What approach
> should a Python app take to transcode UNIX pathnames under your scheme?

Compute the corresponding character strings, and use them.

Regards,
Martin

From martin at v.loewis.de  Sat Apr 25 14:12:25 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 25 Apr 2009 14:12:25 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <20090423234724.GA8077@cskk.homeip.net>
References: <20090423234724.GA8077@cskk.homeip.net>
Message-ID: <49F2FE29.6080608@v.loewis.de>

> | 2. Even if they were taken away (which the PEP does not propose to do),
> |    it would be easy to emulate them for applications that want them.
> |    For example, listdir could be wrapped as
> | 
> |    def listdir_b(bytestring):
> |        fse = sys.getfilesystemencoding()
> 
> Alas, no

No, what? No, that algorithm would be incorrect?

> because there is no sys.getfilesystemencoding() at the POSIX
> level. It's only the user's current locale stuff on a UNIX system, and
> has _nothing_ to do with the filesystem because UNIX filesystems don't
> have encodings.

So can you produce a specific example where my proposed listdir_b
function would fail to work correctly?

For it to work, it is not necessary that POSIX has no notion of
character sets on the file system level (which is actually not true -
POSIX very well recognizes the notion of character sets for file
names, and recommends that you restrict yourself to the portable
character set).

> In particular, because the "best" (or to my mind "misleading") you
> can do for this is report what the current user thinks:
>   http://docs.python.org/library/sys.html#sys.getfilesystemencoding
> then there's no guarrentee that what is chosen has any releationship to
> what was in use when the files being consulted were made.

For this PEP, it's irrelevant. It will work even if the chosen encoding
is a bad choice.

> Now, if I were writing listdir_b() I'd want to be able to do something
> along these lines:
>   - set LC_ALL=C (or some equivalent mechanism)
>   - have os.listdir() read bytes as numeric values and transcode their values
>     _directly_ into the corresponding Unicode code points.
>   - yield bytes( ord(c) for c in os_listdir_string )
>   - have os.open() et al transcode unicode code points back into bytes.
> i.e. a straight one-to-one mapping, using only codepoints in the range
> 1..255.

That would be an alternative approach to the same problem (and one that
I think will fail more badly than the one I'm proposing).

Regards,
Martin

From martin at v.loewis.de  Sat Apr 25 14:17:14 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 25 Apr 2009 14:17:14 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
Message-ID: <49F2FF4A.8010507@v.loewis.de>

Simon Cross wrote:
>> Unfortunately, for Windows, the situation would
>> be exactly the opposite: the byte-oriented interface cannot represent
>> all data; only the character-oriented API can.
> 
> Is the second part of this actually true? My understanding may be
> flawed, but surely all Unicode data can be converted to and from bytes
> using UTF-8?

[I hope, by "second part", you refer to the part that I left]

It's true that UTF-8 could represent all Windows file names. However,
the byte-oriented APIs of Windows do not use UTF-8, but instead, they
use the Windows ANSI code page (which varies with the installation).

> Given this, can't people who
> must have access to all files / environment data just use the bytes
> interface?

No, because the Windows API would interpret the bytes differently,
and not find the right file.

Regards,
Martin

From martin at v.loewis.de  Sat Apr 25 14:22:27 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 25 Apr 2009 14:22:27 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F184C6.8000905@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
Message-ID: <49F30083.5050506@v.loewis.de>

> The problem with this, and other preceding schemes that have been
> discussed here, is that there is no means of ascertaining whether a
> particular file name str was obtained from a str API, or was funny-
> decoded from a bytes API... and thus, there is no means of reliably
> ascertaining whether a particular filename str should be passed to a
> str API, or funny-encoded back to bytes.

Why is it necessary that you are able to make this distinction?

> Picking a character (I don't find U+F01xx in the
> Unicode standard, so I don't know what it is)

It's a private use area. It will never carry an official character
assignment.

> As I realized in the email-sig, in talking about decoding corrupted
> headers, there is only one way to guarantee this... to encode _all_
> character sequences, from _all_ interfaces.  Basically it requires
> reserving an escape character (I'll use ? in these examples -- yes, an
> ASCII question mark -- happens to be illegal in Windows filenames so
> all the better on that platform, but the specific character doesn't
> matter... avoiding / \ and . is probably good, though).

I think you'll have to write an alternative PEP if you want to see
something like this implemented throughout Python.

Regards,
Martin

From martin at v.loewis.de  Sat Apr 25 14:24:50 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 25 Apr 2009 14:24:50 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
Message-ID: <49F30112.80402@v.loewis.de>

> Humour aside :), the expectation that filenames are Unicode data
> simply doesn't agree with the reality of POSIX file systems.  I think
> an approach similar to that adopted by glib [1] could work

Are you saying that the approach presented in the PEP will not work?
I believe it would work no matter whether that expectation agrees
with reality or not. The amount of moji-bake that you get is larger
when the disagreement is larger, but it will continue to *work*.

Regards,
Martin

From martin at v.loewis.de  Sat Apr 25 14:28:11 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 25 Apr 2009 14:28:11 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System	Character
 Interfaces
In-Reply-To: <20090424152746.GA9543@panix.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
Message-ID: <49F301DB.1090704@v.loewis.de>

> The part that I haven't seen clearly addressed so far is what happens
> when disks get mounted across OSes (e.g. NFS).
> 
> While I agree that there should be a layer on top that can handle "most"
> situations, it also seems clear that the raw layer needs to be readily
> accessible.

Indeed, with the PEP, the raw layer does remain readily available. If
you know that it was originally bytes, you can get the very same bytes
back if you want to.

However, for disks mounted across OSes, you won't have to, normally.
If you think there is a problem with these, can you please describe a
specific scenario? What application, what file names, what encodings,
what problems?

Regards,
Martin

From martin at v.loewis.de  Sat Apr 25 14:31:53 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 25 Apr 2009 14:31:53 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	<20090424152746.GA9543@panix.com>	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
Message-ID: <49F302B9.6020907@v.loewis.de>

> [1] Actually, all the PEP says is "With this PEP, a uniform treatment
> of these data as characters becomes
> possible." An argument as to why this is a good thing would be a
> useful addition to the PEP. At the moment it's more or less treated as
> self-evident - which I agree with, but which clearly the Unix people
> here are not as certain of.

Ok, I have added another paragraph. Not sure whether it helps to clarify
though.

Regards,
Martin

From martin at v.loewis.de  Sat Apr 25 14:35:28 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 25 Apr 2009 14:35:28 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in	System		Character
 Interfaces
In-Reply-To: <49F215E5.4050205@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	<20090424152746.GA9543@panix.com>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090424T173704-587@post.gmane.org>	<87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp>
	<49F215E5.4050205@g.nevcal.com>
Message-ID: <49F30390.2040808@v.loewis.de>

> Because the encoding is not reliably reversible.

Why do you say that? The encoding is completely reversible
(unless we disagree on what "reversible" means).

> I'm +1 on the concept, -1 on the PEP, due solely to the lack of a
> reversible encoding.

Then please provide an example for a setup where it is not reversible.

Regards,
Martin

From martin at v.loewis.de  Sat Apr 25 14:42:37 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 25 Apr 2009 14:42:37 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	<20090424152746.GA9543@panix.com>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>
	<79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>
Message-ID: <49F3053D.9090100@v.loewis.de>

> Following on from that, would this (under Martin's proposal) result in
> programs receiving encoded strings, or just semantically-incorrect
> ones?

Not sure I understand the question - what is an "encoded string"?

As you analyse below, sometimes, the current (2.x) file system encoding
will do the right thing; sometimes, it will decode successfully, but
still not give the intended string, and sometimes, it will fail. With
the PEP, it won't fail, but give a string back that likely wasn't
intended by the user. This might be confusing if you try to render it to
a user interface; if the application merely passes it back to file
system APIs, it will work fine.

> So, the next question is - do people on such systems frequently use
> high-bit characters in filenames?

They typically do until they run into problems. For example, if they
set the locale to something, and then create files in their
homedirectory, it will work just fine, and nobody else will ever see
the files (except for the backup software).

When they find that the files they created are inaccessible to others,
they will often stop using funny characters.

Regards,
Martin

From martin at v.loewis.de  Sat Apr 25 14:44:49 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 25 Apr 2009 14:44:49 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F11829.9070504@mrabarnett.plus.com>
References: <49EEBE2E.3090601@v.loewis.de>	<49EF0ADB.2090107@mrabarnett.plus.com>	<49EF6F06.9060008@v.loewis.de>
	<49F11829.9070504@mrabarnett.plus.com>
Message-ID: <49F305C1.5070309@v.loewis.de>

> If the bytes are mapped to single half surrogate codes instead of the
> normal pairs (low+high), then I can see that decoding could never be
> ambiguous and encoding could produce the original bytes.

I was confused by Markus Kuhn's original UTF-8b specification. I have
now changed the PEP to avoid using PUA characters at all.

Regards,
Martin

From google at mrabarnett.plus.com  Sat Apr 25 16:21:13 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Sat, 25 Apr 2009 15:21:13 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F305C1.5070309@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>	<49EF0ADB.2090107@mrabarnett.plus.com>	<49EF6F06.9060008@v.loewis.de>	<49F11829.9070504@mrabarnett.plus.com>
	<49F305C1.5070309@v.loewis.de>
Message-ID: <49F31C59.1040309@mrabarnett.plus.com>

Martin v. L?wis wrote:
>> If the bytes are mapped to single half surrogate codes instead of the
>> normal pairs (low+high), then I can see that decoding could never be
>> ambiguous and encoding could produce the original bytes.
> 
> I was confused by Markus Kuhn's original UTF-8b specification. I have
> now changed the PEP to avoid using PUA characters at all.
> 
I find the PEP easier to understand now.

In detail I'd say that if a sequence of bytes >=0x80 is found which is
not valid UTF-8, then the first byte is mapped to a half surrogate and
then decoding is continued from the next byte.

The only drawback I can see is if the UTF-8 bytes actually decode to a
half surrogate. However, half surrogates should really only occur in
UTF-16 (as I understand it), so they shouldn't be encoded in UTF-8
anyway!

As for handling this case, you could either:

1. Raise an exception (which is what you're trying to avoid)

or:

2. Treat it as invalid UTF-8 and map the bytes to half surrogates
(encoding would produce the original bytes).

I'd prefer option 2.

Anyway, +1 from me.

From p.f.moore at gmail.com  Sat Apr 25 16:38:03 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 25 Apr 2009 15:38:03 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F3053D.9090100@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>
	<79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>
	<49F3053D.9090100@v.loewis.de>
Message-ID: <79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com>

2009/4/25 "Martin v. L?wis" <martin at v.loewis.de>:
>> Following on from that, would this (under Martin's proposal) result in
>> programs receiving encoded strings, or just semantically-incorrect
>> ones?
>
> Not sure I understand the question - what is an "encoded string"?

Sorry. I was struggling to come up with terminology for the various
concepts I was trying to express, as I went along.

I was meaning a string which has been created from a non-decodable
byte sequence using the encoding process you specify in the PEP (with
the current version of the PEP, this would be a string with lone half
surrogate codes).

I was distinguishing these because some people seemed to be implying
that such strings were the ones which would result in exceptions. (I
think that was Stephen, when he referred to a "careful API").

> As you analyse below, sometimes, the current (2.x) file system encoding
> will do the right thing; sometimes, it will decode successfully, but
> still not give the intended string, and sometimes, it will fail. With
> the PEP, it won't fail, but give a string back that likely wasn't
> intended by the user. This might be confusing if you try to render it to
> a user interface; if the application merely passes it back to file
> system APIs, it will work fine.

OK, looks like my analysis matches yours, except that I wasn't sure if
the third case (a string that "likely wasn't intended") could result
in exceptions. From what you're saying, it sounds like it would
actually be similar to the second case - I'm not clear on how
surrogates work, though.

>> So, the next question is - do people on such systems frequently use
>> high-bit characters in filenames?
>
> They typically do until they run into problems. For example, if they
> set the locale to something, and then create files in their
> homedirectory, it will work just fine, and nobody else will ever see
> the files (except for the backup software).
>
> When they find that the files they created are inaccessible to others,
> they will often stop using funny characters.

Which sounds fairly practical - and the irony of someone with a "funny
character" in his surname telling me this hasn't escaped me :-)

Paul.

From martin at v.loewis.de  Sat Apr 25 17:00:17 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 25 Apr 2009 17:00:17 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>	
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	
	<20090424152746.GA9543@panix.com>	
	<loom.20090424T153112-15@post.gmane.org>	
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>	
	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>	
	<79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>	
	<49F3053D.9090100@v.loewis.de>
	<79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com>
Message-ID: <49F32581.1050004@v.loewis.de>

> OK, looks like my analysis matches yours, except that I wasn't sure if
> the third case (a string that "likely wasn't intended") could result
> in exceptions. From what you're saying, it sounds like it would
> actually be similar to the second case - I'm not clear on how
> surrogates work, though.

On decoding, there is a guarantee that it decodes successfully. There is
also a guarantee that the result will re-encode successfully, and yield
the same byte string.

If you pass a different string into encoding, you still may get
exceptions. For example, if the filesystem encoding is latin-1,
passing u"\u20ac" will continue to raise exceptions, even under the
python-escape error handler - that error handler will only handle
surrogates.

There isn't really that much trickery to surrogates. They *have*
to come in pairs to be meaningful, with the first one in the range
D800..DBFF (high surrogate), and the second in the range DC00..DCFF
(low surrogate). Having a lone low surrogate is not meaningful; this
is how the escaping works.

Proper surrogate pairs encode characters outside the BMP, for use with
UTF-16: each code contributes 10 bits (just count how many codes there
are in D800..DCFF), together, a pair encodes 20 bits, allowing for
2**20 characters, starting at U+10000.

>> When they find that the files they created are inaccessible to others,
>> they will often stop using funny characters.
> 
> Which sounds fairly practical - and the irony of someone with a "funny
> character" in his surname telling me this hasn't escaped me :-)

Sure: my Unix account name was always "loewis", and even on Windows,
our admins didn't dare to put the umlaut into the account name - it
would be difficult to login with a US keyboard, for example. People
who use non-ASCII characters in filenames around here are primarily
non-IT people who aren't aware that these characters are different
from the rest.

I recognize that for other languages (without trivial transliterations)
the problem is more severe, and people are more likely to create
files with Cyrillic, or Japanese, names (say) if the systems accepts
them at all.

Regards,
Martin

From martin at v.loewis.de  Sat Apr 25 17:05:23 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 25 Apr 2009 17:05:23 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F31C59.1040309@mrabarnett.plus.com>
References: <49EEBE2E.3090601@v.loewis.de>	<49EF0ADB.2090107@mrabarnett.plus.com>	<49EF6F06.9060008@v.loewis.de>	<49F11829.9070504@mrabarnett.plus.com>	<49F305C1.5070309@v.loewis.de>
	<49F31C59.1040309@mrabarnett.plus.com>
Message-ID: <49F326B3.60908@v.loewis.de>

> The only drawback I can see is if the UTF-8 bytes actually decode to a
> half surrogate. However, half surrogates should really only occur in
> UTF-16 (as I understand it), so they shouldn't be encoded in UTF-8
> anyway!

Right: that's the rationale for UTF-8b. Encoding half surrogates
violates parts of the Unicode spec, so UTF-8b is "safe".

> As for handling this case, you could either:
> 
> 1. Raise an exception (which is what you're trying to avoid)
> 
> or:
> 
> 2. Treat it as invalid UTF-8 and map the bytes to half surrogates
> (encoding would produce the original bytes).
> 
> I'd prefer option 2.

I hadn't thought of this case, but you are right - they *are*
illegal bytes, after all. Raising an exception would be useless
since the whole point of this codec is to never raise unicode
errors.

Regards,
Martin

From zooko at zooko.com  Sat Apr 25 17:29:54 2009
From: zooko at zooko.com (Zooko O'Whielacronx)
Date: Sat, 25 Apr 2009 09:29:54 -0600
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <9E5E533C-E323-4937-A296-52F47F68AC3F@zooko.com>

Thanks for writing this PEP 383, MvL.  I recently ran into this  
problem in Python 2.x in the Tahoe project [1].  The Tahoe project  
should be considered a good use case showing what some people need.   
For example, the assumption that a file will later be written back  
into the same local filesystem (and thus luckily use the same  
encoding) from which it originally came doesn't hold for us, because  
Tahoe is used for file-sharing as well as for backup-and-restore.

One of my first conclusions in pursuing this issue is that we can  
never use the Python 2.x unicode APIs on Linux, just as we can never  
use the Python 2.x str APIs on Windows [2].  (You mentioned this  
ugliness in your PEP.)  My next conclusion was that the Linux way of  
doing encoding of filenames really sucks compared to, for example,  
the Mac OS X way.  I'm heartened to see what David Wheeler is trying  
to persuade the maintainers of Linux filesystems to improve some of  
this: [3].

My final conclusion was that we needed to have two kinds of  
workaround for the Linux suckage: first, if decoding using the  
suggested filesystem encoding fails, then we fall back to mojibake  
[4] by decoding with iso-8859-1 (or else with windows-1252 -- I'm not  
sure if it matters and I haven't yet understood if utf-8b offers  
another alternative for this case).  Second, if decoding succeeds  
using the suggested filesystem encoding on Linux, then write down the  
encoding that we used and include that with the filename.  This  
expands the size of our filenames significantly, but it is the only  
way to allow some future programmer to undo the damage of a falsely- 
successful decoding.  Here's our whole plan: [5].

Regards,

Zooko

[1] http://allmydata.org
[2] http://allmydata.org/pipermail/tahoe-dev/2009-March/001379.html #  
see the footnote of this message
[3] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html
[4] http://en.wikipedia.org/wiki/Mojibake
[5] http://allmydata.org/trac/tahoe/ticket/534#comment:47

From phd at phd.pp.ru  Sat Apr 25 17:51:57 2009
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Sat, 25 Apr 2009 19:51:57 +0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <49F32581.1050004@v.loewis.de>
References: <20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>
	<79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>
	<49F3053D.9090100@v.loewis.de>
	<79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com>
	<49F32581.1050004@v.loewis.de>
Message-ID: <20090425155157.GA10071@phd.pp.ru>

On Sat, Apr 25, 2009 at 05:00:17PM +0200, "Martin v. L?wis" wrote:
> I recognize that for other languages (without trivial transliterations)
> the problem is more severe, and people are more likely to create
> files with Cyrillic, or Japanese, names (say) if the systems accepts
> them at all.

   In different encodings on the same filesystem...

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From murman at gmail.com  Sat Apr 25 18:18:20 2009
From: murman at gmail.com (Michael Urman)
Date: Sat, 25 Apr 2009 11:18:20 -0500
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F32581.1050004@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>
	<79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>
	<49F3053D.9090100@v.loewis.de>
	<79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com>
	<49F32581.1050004@v.loewis.de>
Message-ID: <dcbbbb410904250918s3ee3b931n28c877641a27d38c@mail.gmail.com>

On Sat, Apr 25, 2009 at 10:00, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> On decoding, there is a guarantee that it decodes successfully. There is
> also a guarantee that the result will re-encode successfully, and yield
> the same byte string.
>
> If you pass a different string into encoding, you still may get
> exceptions. For example, if the filesystem encoding is latin-1,
> passing u"\u20ac" will continue to raise exceptions, even under the
> python-escape error handler - that error handler will only handle
> surrogates.

One angle I've not seen discussed yet is a set of use cases. While the
PEP addresses the need for the python developer to not have to write
insane conditional code that maps between bytes and str depending on
the platform, it doesn't talk about what this allows an application to
provide to a user, and at what risks.

I see two main user-oriented use cases for the resulting Unicode
strings this PEP will produce on all systems: displaying a list of
filenames for the user to select from (an open file dialog), and
allowing a user to edit or supply a filename (a save dialog or a
rename control).

It's clear what this PEP provides for the former. On well-behaved
systems where a simpler filesystemencoding approach would work, the
results are identical; the user can select filenames that are what he
expects to see on both Unix and Windows. On less well-behaved systems,
some characters may appear as junk in the middle of the name (or would
they be invisible?), but should be recognizable enough to choose, or
at least to open sequentially and remember what the last one was. On
particularly poorly behaved systems, the results will be extremely
difficult to read, but no approach is likely to fix this.

What I don't find clear is what the risks are for the latter. On the
less well behaved system, a user may well attempt to use this python
application to fix filenames. Can we estimate a likelihood that edits
to the names would result in a Unicode string that can no longer be
encoded with the python-escape? Will a new name fully provided by a
user on his keyboard (ignoring copy and paste) almost always safely
encode?

-- 
Michael Urman

From martin at v.loewis.de  Sat Apr 25 18:33:17 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 25 Apr 2009 18:33:17 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <dcbbbb410904250918s3ee3b931n28c877641a27d38c@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>	<79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>	<49F3053D.9090100@v.loewis.de>	<79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com>	<49F32581.1050004@v.loewis.de>
	<dcbbbb410904250918s3ee3b931n28c877641a27d38c@mail.gmail.com>
Message-ID: <49F33B4D.3070707@v.loewis.de>

> I see two main user-oriented use cases for the resulting Unicode
> strings this PEP will produce on all systems: displaying a list of
> filenames for the user to select from (an open file dialog), and
> allowing a user to edit or supply a filename (a save dialog or a
> rename control).

There are more, in particular the case "user passes a file name
on the command line", and "web server passes URL in environment
variable".

> It's clear what this PEP provides for the former. On well-behaved
> systems where a simpler filesystemencoding approach would work, the
> results are identical; the user can select filenames that are what he
> expects to see on both Unix and Windows. On less well-behaved systems,
> some characters may appear as junk in the middle of the name (or would
> they be invisible?)

Depends on the rendering. Try "print u'\udc00'" in your terminal to see
what happens; for me, it renders the glyph for "replacement character".
In GUI applications, you often see white boxes (rectangles).

> What I don't find clear is what the risks are for the latter. On the
> less well behaved system, a user may well attempt to use this python
> application to fix filenames. Can we estimate a likelihood that edits
> to the names would result in a Unicode string that can no longer be
> encoded with the python-escape? Will a new name fully provided by a
> user on his keyboard (ignoring copy and paste) almost always safely
> encode?

That very much depends on the system setup, and your impression is
right that the PEP doesn't address it - it only deals with cases
where you get random unsupported bytes; getting random unsupported
characters from the user is not considered.

If the user has the locale setup in way that matches his keyboard,
it should work all fine - and will already, even without the PEP.
If the user enters a character that doesn't directly map to a
good file name, you get an exception, and have to tell the user
to pick a different filename.

Notice that it may fail at several layers:
- it may be that characters entered are not supported in what
  Python choses as the file system encoding.
- it may be that the characters are not supported by the file
  system, e.g. leading spaces in Win32.
- it may be that the file cannot be renamed because the target
  name already exists.
In all these cases, the application has to ask the user to
reconsider; for at least the last case, it should be prepared
to do that, anyway (there is also the case where renaming fails
because of lack of permissions; in that case, picking a different
file name won't help).

Regards,
Martin

From solipsis at pitrou.net  Sat Apr 25 18:34:13 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 25 Apr 2009 16:34:13 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?=
	=?utf-8?q?haracter=09Interfaces?=
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>
	<79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>
Message-ID: <loom.20090425T163157-847@post.gmane.org>

Paul Moore <p.f.moore <at> gmail.com> writes:
> But those
> people are also the *least* likely people to contribute on an
> English-speaking list, I guess  (Sincere apologies if everyone but
> me on this list happens to actually be fluent English-speaking
> Russians )

Actually, we're all Finnish.

Regards,

?ntoine.

From murman at gmail.com  Sat Apr 25 18:48:21 2009
From: murman at gmail.com (Michael Urman)
Date: Sat, 25 Apr 2009 11:48:21 -0500
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F33B4D.3070707@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>
	<79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>
	<49F3053D.9090100@v.loewis.de>
	<79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com>
	<49F32581.1050004@v.loewis.de>
	<dcbbbb410904250918s3ee3b931n28c877641a27d38c@mail.gmail.com>
	<49F33B4D.3070707@v.loewis.de>
Message-ID: <dcbbbb410904250948g1dba64aif1435417191e507f@mail.gmail.com>

On Sat, Apr 25, 2009 at 11:33, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> If the user has the locale setup in way that matches his keyboard,
> it should work all fine - and will already, even without the PEP.
> If the user enters a character that doesn't directly map to a
> good file name, you get an exception, and have to tell the user
> to pick a different filename.

This sound good so far - the 90% (or higher) case is still clean.

> Notice that it may fail at several layers:
> - it may be that characters entered are not supported in what
> ?Python choses as the file system encoding.
> - it may be that the characters are not supported by the file
> ?system, e.g. leading spaces in Win32.
> - it may be that the file cannot be renamed because the target
> ?name already exists.
> In all these cases, the application has to ask the user to
> reconsider; for at least the last case, it should be prepared
> to do that, anyway (there is also the case where renaming fails
> because of lack of permissions; in that case, picking a different
> file name won't help).

This argument sounds good to me too. How will we communicate to
developers what new exception might occur where? It would be a shame
to have a solid application developed under Windows start raising
encoding exceptions on linux. Would the encoding error get mapped to
an IOError for all file APIs that do this encoding?

-- 
Michael Urman

From google at mrabarnett.plus.com  Sat Apr 25 19:27:47 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Sat, 25 Apr 2009 18:27:47 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F33B4D.3070707@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>	<79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>	<49F3053D.9090100@v.loewis.de>	<79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com>	<49F32581.1050004@v.loewis.de>	<dcbbbb410904250918s3ee3b931n28c877641a27d38c@mail.gmail.com>
	<49F33B4D.3070707@v.loewis.de>
Message-ID: <49F34813.5030206@mrabarnett.plus.com>

Martin v. L?wis wrote:
>> I see two main user-oriented use cases for the resulting Unicode
>> strings this PEP will produce on all systems: displaying a list of
>> filenames for the user to select from (an open file dialog), and
>> allowing a user to edit or supply a filename (a save dialog or a
>> rename control).
> 
> There are more, in particular the case "user passes a file name
> on the command line", and "web server passes URL in environment
> variable".
> 
>> It's clear what this PEP provides for the former. On well-behaved
>> systems where a simpler filesystemencoding approach would work, the
>> results are identical; the user can select filenames that are what he
>> expects to see on both Unix and Windows. On less well-behaved systems,
>> some characters may appear as junk in the middle of the name (or would
>> they be invisible?)
> 
> Depends on the rendering. Try "print u'\udc00'" in your terminal to see
> what happens; for me, it renders the glyph for "replacement character".
> In GUI applications, you often see white boxes (rectangles).
> 
>> What I don't find clear is what the risks are for the latter. On the
>> less well behaved system, a user may well attempt to use this python
>> application to fix filenames. Can we estimate a likelihood that edits
>> to the names would result in a Unicode string that can no longer be
>> encoded with the python-escape? Will a new name fully provided by a
>> user on his keyboard (ignoring copy and paste) almost always safely
>> encode?
> 
> That very much depends on the system setup, and your impression is
> right that the PEP doesn't address it - it only deals with cases
> where you get random unsupported bytes; getting random unsupported
> characters from the user is not considered.
> 
> If the user has the locale setup in way that matches his keyboard,
> it should work all fine - and will already, even without the PEP.
> If the user enters a character that doesn't directly map to a
> good file name, you get an exception, and have to tell the user
> to pick a different filename.
> 
> Notice that it may fail at several layers:
> - it may be that characters entered are not supported in what
>   Python choses as the file system encoding.
> - it may be that the characters are not supported by the file
>   system, e.g. leading spaces in Win32.
> - it may be that the file cannot be renamed because the target
>   name already exists.
> In all these cases, the application has to ask the user to
> reconsider; for at least the last case, it should be prepared
> to do that, anyway (there is also the case where renaming fails
> because of lack of permissions; in that case, picking a different
> file name won't help).
> 
This has made me think about what happens going the other way, ie when a 
user-supplied Unicode string needs to be converted to UTF-8b. That 
should also be reversible.

Therefore:

When encoding using UTF-8b, codepoints in the range U+DC80..U+DCFF
should map to bytes 0x80..0xFF; all other codepoints, including the
remaining half surrogates, should be encoded normally.

When decoding using UTF-8b, undecodable bytes in the range 0x80..0xFF
should map to U+DC80..U+DCFF; all other bytes, including the encodings
for the remaining half surrogates, should be decoded normally.

This will ensure that even when the user has provided a string
containing half surrogates it can be encoded to bytes and then decoded
back to the original string.

From asmodai at in-nomine.org  Sat Apr 25 21:31:40 2009
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Sat, 25 Apr 2009 21:31:40 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>
References: <fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<F8D91625-9704-4760-9205-79A11C15CDE9@fuhm.net>
	<79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com>
Message-ID: <20090425193140.GV10900@nexus.in-nomine.org>

-On [20090425 11:01], Paul Moore (p.f.moore at gmail.com) wrote:
>PS Unfortunately, I suspect that the biggest group of people likely to
>be hit badly by this is people using non-latin scripts. And arguing
>probabilities without real data is optimistic at best. But those
>people are also the *least* likely people to contribute on an
>English-speaking list, I guess :-( (Sincere apologies if everyone but
>me on this list happens to actually be fluent English-speaking
>Russians :-))

Even though I am Dutch I have to deal with a variety of scripts for my i18n
and L10n efforts, which includes contributions to Unicode. Aside from that I
also have the fair share of audio files which have the names/descriptions in
the respective script (Thai, Korean, Chinese, Taiwanese, Japanese, and so
on).

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Necessity relieves us of the ordeal of choice...

From eric at trueblade.com  Sun Apr 26 03:28:17 2009
From: eric at trueblade.com (Eric Smith)
Date: Sat, 25 Apr 2009 21:28:17 -0400
Subject: [Python-Dev] [Python-checkins] r71946 - peps/trunk/pep-0315.txt
In-Reply-To: <20090426003437.05F081E4022@bag.python.org>
References: <20090426003437.05F081E4022@bag.python.org>
Message-ID: <49F3B8B1.6090707@trueblade.com>

You might want to note in the PEP that the problem that's being solved 
is known as the "loop and a half" problem.

http://www.cs.duke.edu/~ola/patterns/plopd/loops.html#loop-and-a-half

raymond.hettinger wrote:
> Author: raymond.hettinger
> Date: Sun Apr 26 02:34:36 2009
> New Revision: 71946
> 
> Log:
> Revive PEP 315.
> 
> Modified:
>    peps/trunk/pep-0315.txt
> 
> Modified: peps/trunk/pep-0315.txt
> ==============================================================================
> --- peps/trunk/pep-0315.txt	(original)
> +++ peps/trunk/pep-0315.txt	Sun Apr 26 02:34:36 2009
> @@ -2,9 +2,9 @@
>  Title: Enhanced While Loop
>  Version: $Revision$
>  Last-Modified: $Date$
> -Author: W Isaac Carroll <icarroll at pobox.com>
> -        Raymond Hettinger <python at rcn.com>
> -Status: Deferred
> +Author: Raymond Hettinger <python at rcn.com>
> +        W Isaac Carroll <icarroll at pobox.com>
> +Status: Draft
>  Type: Standards Track
>  Content-Type: text/plain
>  Created: 25-Apr-2003
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://mail.python.org/mailman/listinfo/python-checkins
> 

From cs at zip.com.au  Sun Apr 26 03:51:13 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Sun, 26 Apr 2009 11:51:13 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F2FD10.9080302@v.loewis.de>
Message-ID: <20090426015113.GA17300@cskk.homeip.net>

On 25Apr2009 14:07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
| Cameron Simpson wrote:
| > On 22Apr2009 08:50, Martin v. L?wis <martin at v.loewis.de> wrote:
| > | File names, environment variables, and command line arguments are
| > | defined as being character data in POSIX;
| > 
| > Specific citation please? I'd like to check the specifics of this.
| For example, on environment variables:
| http://opengroup.org/onlinepubs/007908799/xbd/envvar.html
[...]
| http://opengroup.org/onlinepubs/007908799/xsh/execve.html
[...]

Thanks.

| > So you're proposing that all POSIX OS interfaces (which use byte strings)
| > interpret those byte strings into Python3 str objects, with a codec
| > that will accept arbitrary byte sequences losslessly and is totally
| > reversible, yes?
| 
| Correct.
| 
| > And, I hope, that the os.* interfaces silently use it by default.
| 
| Correct.

Ok, then I'm probably good with the PEP. Though I have a quite strong
desire to be able to work in bytes at need without doing multiple
encode/decode steps.

| > | Applications that need to process the original byte
| > | strings can obtain them by encoding the character strings with the
| > | file system encoding, passing "python-escape" as the error handler
| > | name.
| > 
| > -1
| > This last sentence kills the idea for me, unless I'm missing something.
| > Which I may be, of course.
| > POSIX filesystems _do_not_ have a file system encoding.
| 
| Why is that a problem for the PEP?

Because you said above "by encoding the character strings with the file
system encoding", which is a fiction.

| > If I'm writing a general purpose UNIX tool like chmod or find, I expect
| > it to work reliably on _any_ UNIX pathname. It must be totally encoding
| > blind. If I speak to the os.* interface to open a file, I expect to hand
| > it bytes and have it behave.
| 
| See the other messages. If you want to do that, you can continue to.
| 
| > I'm very much in favour of being able to work in strings for most
| > purposes, but if I use the os.* interfaces on a UNIX system it is
| > necessary to be _able_ to work in bytes, because UNIX file pathnames
| > are bytes.
| 
| Please re-read the PEP. It provides a way of being able to access any
| POSIX file name correctly, and still pass strings.
| 
| > If there isn't a byte-safe os.* facility in Python3, it will simply be
| > unsuitable for writing low level UNIX tools.
| 
| Why is that? The mechanism in the PEP is precisely defined to allow
| writing low level UNIX tools.

Then implicitly it's byte safe. Clearly I'm being unclear; I mean
original OS-level byte strings must be obtainable undamaged, and it must
be possible to create/work on OS objects starting with a byte string as
the pathname.

| > Finally, I have a small python program whose whole purpose in life
| > is to transcode UNIX filenames before transfer to a MacOSX HFS
| > directory, because of HFS's enforced particular encoding. What approach
| > should a Python app take to transcode UNIX pathnames under your scheme?
| 
| Compute the corresponding character strings, and use them.

In Python2 I've been going (ignoring checks for unchanged names):

  - Obtain the old name and interpret it into a str() "correctly".
    I mean here that I go:
      unicode_name = unicode(name, srcencoding)
    in old Python2 speak. name is a bytes string obtained from listdir()
    and srcencoding is the encoding known to have been used when the old name
    was constructed. Eg iso8859-1.
  - Compute the new name in the desired encoding. For MacOSX HFS,
    that's:
      utf8_name = unicodedata.normalize('NFD',unicode_name).encode('utf8')
    Still in Python2 speak, that's a byte string.
  - os.rename(name, utf8_name)

Under your scheme I imagine this is amended. I would change your
listdir_b() function as follows:

  def listdir_b(bytestring, fse=None):
       if fse is None:
           fse = sys.getfilesystemencoding()
       string = bytestring.decode(fse, "python-escape")
       for fn in os.listdir(string):
           yield fn.encoded(fse, "python-escape")

So, internally, os.listdir() takes a string and encodes it to an
_unspecified_ encoding in bytes, and opens the directory with that
byte string using POSIX opendir(3).

How does listdir() ensure that the byte string it passes to the underlying
opendir(3) is identical to 'bytestring' as passed to listdir_b()?

It seems from the PEP that "On POSIX systems, Python currently applies the
locale's encoding to convert the byte data to Unicode". Your extension
is to augument that by expressing the non-decodable byte sequences in a
non-conflicting way for reversal later, yes?

That seems to double the complexity of my example application, since
it wants to interpret the original bytes in a caller-specified fashion,
not using the locale defaults.

So I must go:

  def macify(dirname, srcencoding):
    # I need this to reverse your encoding scheme
    fse = sys.getfilesystemencoding()
    # I'll pretend dirname is ready for use
    # it possibly has had to undergo the inverse of what happens inside
    # the loop below
    for fn in listdir(dirname):
      # listdir reads POSIX-bytes from readdir(3)
      # then encodes using the locale encoding, with your escape addition
      bytename = fn.encoded(fse, "python-escape")
      oldname = unicode(bytename, srcencoding)
      newbytename = unicodedata.normalize('NFD',unicode_name).encode('utf8')
      newname = newbytename.decode(fse, "python-escape")
      if fn != newname:
        os.rename(fn, newname)

And I'm sure there's some os.path.join() complexity I have omitted.

Is that correct? You'll note I need to recode the oldname unicode string
because I don't know that fse is the same as the required target MacOSX
UTF8 NFD encoding.

So if my changes above are correct WRT the PEP, I grant that this
is still doable in your scheme. But it would be far far easier with a
bytes API. And let us not consider threads or other effects from locale
changes during the loop run.

I forget what was decided with the pure-bytes interfaces (out of scope
for your PEP). Would there be a posix module with a bytes API?

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

The old day of Perl's try-it-before-you-use-it are long as gone.  Nowadays
you can write as many as 20..100 lines of Perl without hitting a bug in the
perl implementation.    - Ilya Zakharevich <ilya at math.ohio-state.edu>,
                          in the perl-porters list, 22sep1998

From dickinsm at gmail.com  Sun Apr 26 12:06:56 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 26 Apr 2009 11:06:56 +0100
Subject: [Python-Dev] Two proposed changes to float formatting
Message-ID: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>

I'd like to propose two minor changes to float and complex
formatting, for 3.1.  I don't think either change should prove
particularly disruptive.

(1) Currently, '%f' formatting automatically changes to '%g' formatting for
numbers larger than 1e50.  For example:

>>> '%f' % 2**166.
'93536104789177786765035829293842113257979682750464.000000'
>>> '%f' % 2**167.
'1.87072e+50'

I propose removing this feature for 3.1

More details: The current behaviour is documented (standard
library->builtin types).  (Until very recently, it was actually
misdocumented as changing at 1e25, not 1e50.)

"""For safety reasons, floating point precisions are clipped to 50; %f
conversions for numbers whose absolute value is over 1e50 are
replaced by %g conversions. [5] All other errors raise exceptions."""

There's even a footnote:

"""[5]	These numbers are fairly arbitrary. They are intended to
avoid printing endless strings of meaningless digits without
hampering correct use and without having to know the exact
precision of floating point values on a particular machine."""

I don't find this particularly convincing, though---I just don't see
a really good reason not to give the user exactly what she/he
asks for here.  I have a suspicion that at least part of the
motivation for the '%f' -> '%g' switch is that it means the
implementation can use a fixed-size buffer.  But Eric has
fixed this (in 3.1, at least) and the buffer is now dynamically
allocated, so this isn't a concern any more.

Other reasons not to switch from '%f' to '%g' in this way:

 - the change isn't gentle:  as you go over the 1e50 boundary,
   the number of significant digits produced suddenly changes
   from 56 to 6;  it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.
 - now that we're using David Gay's 'perfect rounding'
   code, we can be sure that the digits aren't entirely
   meaningless, or at least that they're the 'right' meaningless
   digits.  This wasn't true before.
 - C doesn't do this, and the %f, %g, %e formats really
   owe their heritage to C.
 - float formatting is already quite complicated enough; no
   need to add to the mental complexity
 - removal simplifies the implementation :-)

On to the second proposed change:

(2) complex str and repr don't behave like float str and repr, in that
the float version always adds a trailing '.0' (unless there's an
exponent), but the complex version doesn't:

>>> 4., 10.
(4.0, 10.0)
>>> 4. + 10.j
(4+10j)

I propose changing the complex str and repr to behave like the
float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
rather than "(4+10j)".

Mostly this is just about consistency, ease of implementation,
and aesthetics.  As far as I can tell, the extra '.0' in the float
repr serves two closely-related purposes:  it makes it clear to
the human reader that the number is a float rather than an
integer, and it makes sure that e.g., eval(repr(x)) recovers a
float rather than an int.  The latter point isn't a concern for
the current complex repr, but the former is:  4+10j looks to
me more like a Gaussian integer than a complex number.

Any comments?

Mark

From steve at pearwood.info  Sun Apr 26 13:17:57 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 26 Apr 2009 21:17:57 +1000
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
Message-ID: <200904262117.58087.steve@pearwood.info>

On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote:
> I'd like to propose two minor changes to float and complex
> formatting, for 3.1.  I don't think either change should prove
> particularly disruptive.
>
> (1) Currently, '%f' formatting automatically changes to '%g'
> formatting for numbers larger than 1e50.
...
> I propose removing this feature for 3.1

No objections from me. +1

> I propose changing the complex str and repr to behave like the
> float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
> rather than "(4+10j)".

No objections here either. +0

-- 
Steven D'Aprano

From fuzzyman at voidspace.org.uk  Sun Apr 26 15:10:50 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Sun, 26 Apr 2009 14:10:50 +0100
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <200904262117.58087.steve@pearwood.info>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
	<200904262117.58087.steve@pearwood.info>
Message-ID: <49F45D5A.1020401@voidspace.org.uk>

Steven D'Aprano wrote:
> On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote:
>   
>> I'd like to propose two minor changes to float and complex
>> formatting, for 3.1.  I don't think either change should prove
>> particularly disruptive.
>>
>> (1) Currently, '%f' formatting automatically changes to '%g'
>> formatting for numbers larger than 1e50.
>>     
> ...
>   
>> I propose removing this feature for 3.1
>>     
>
> No objections from me. +1
>
>   
>> I propose changing the complex str and repr to behave like the
>> float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
>> rather than "(4+10j)".
>>     
>
> No objections here either. +0
>
>
>
>   
Doing it sooner rather than later means that it is less likely to 
disrupt anyone relying on the representation (i.e. doctests).

Michael

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From stephen at xemacs.org  Sun Apr 26 15:47:44 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 26 Apr 2009 22:47:44 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
Message-ID: <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>

Paul Moore writes:
 > 2009/4/24 Stephen J. Turnbull <stephen at xemacs.org>:
 > > Paul Moore writes:
 > >
 > > ?> The pros for Martin's proposal are a uniform cross-platform interface,
 > > ?> and a user-friendly API for the common case.
 > >
 > > A more accurate phrasing would be "... a user-friendly API for those
 > > who feel very lucky today." ?Which is the common case, of course, but
 > > spins a little differently.
 > 
 > Sorry, but I think you're misrepresenting things. I'd have probably
 > let you off if you'd missed out the "very" - but I do think that it's
 > the common case. Consider:

If you need reliability, then you can't get it this way.  The reason
"very" is (somewhat) justified is that this kind of issue is a little
like unemployment.  You hardly ever meet someone who's 7.2%
unemployed, but you probably know several who are 100% unemployed.  If
you see a broken encoding once, you're likely to see it a million times
(spammers have the most broken software) or maybe have it raise an
unhandled Exception a dozen times (in rate of using busted software,
the spammers are closely followed by bosses---which would be very bad,
eh, if you 2/3 of the mail from your boss ends up in an undeliverables
queue due to encoding errors that are unhandled by your some filter in
your mail pipeline).

 > - Windows systems where broken Unicode (lone surrogates or whatever)
 > isn't involved
 > - Unix systems where the user's stated filesystem encoding is correct

 > Can you honestly say that this isn't the vast majority of real-world
 > environments?

Again, that's not the point.  The point is that six-sigma reliability
world-wide is not going to be very comforting to the poor souls who
happen to have broken software in their environment sending broken
encodings regularly, because they're going to be dealing with one or
two sigmas, and that's just not good enough in a production
environment.

 > > If you didn't start with a valid string in a known encoding, you
 > > shouldn't treat it as characters because it's not.
 > 
 > Again, that's the purist argument. If you have a string (of bytes, I
 > guess) and a 99% certain guess as to the correct encoding, then I'd
 > argue that, as long as (a) it's not mission-critical (lives or backups
 > depend on it)

Assurance that you can even determine (a) is not provided by the PEP.
There is no way to contain a problem if it should occur, because it's
"just a string" and could go anywhere, and get converted back or
otherwise manipulated in a context that doesn't know how to handle it
(which might not even be Python if a C-level extension is involved).
Given that Python has no internal mechanism for saying "in this area
only valid Unicode will be accepted", it seems likely that mission
critical software *will* interact with this feature, if only
indirectly (or perhaps only in software originally intended for use in
the U.S. only, but then it gets exported, etc).

 > and (b) you have a means of failing relatively
 > gracefully, you have every reason to make the assumption about
 > encoding.

(b) is not provided in the PEP, either.  We have no idea what the
failure mode will be.

 > After all, what's the alternative?

The alternative is to refuse to provide a simple standard way to
decode unreliably, and in that way make the user reponsible for an
explicit choice about what level and kinds of unreliability they will
accept.

I realize that's unpalatable to most people who use Python to develop
software, and so I'm unwilling to go even -0 on the PEP.  However, to
give one example, I've been following Mailman development for about 10
years, and it is a dismal story despite a group of developers very
sympathetic to encoding and multicultural issues.  As recently as
Mailman 2.10 (IIRC) there were *still* bugs in encoding handling that
could stop the show (ie, not only did the buggy post not get
processed, but the exception propagated high enough to cause
everything behind it in the queue to fail, too).  I think it would be
sad if ten years from now there was software using this technique and
failing occasionally.

From dickinsm at gmail.com  Sun Apr 26 15:45:47 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 26 Apr 2009 14:45:47 +0100
Subject: [Python-Dev] Bug tracker down?
Message-ID: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com>

The bugs.python.org site seems to be down.  ping gives me
the following (from Ireland):

Macintosh-4:py3k dickinsm$ ping bugs.python.org
PING bugs.python.org (88.198.142.26): 56 data bytes
36 bytes from et.2.16.rs3k6.rz5.hetzner.de (213.239.244.101):
Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 77e1   0 0000  3a  01 603d 192.168.1.2  88.198.142.26

Various others on #python-dev have confirmed that it's not working for them.
Does anyone know what the problem is?

Mark

From aahz at pythoncraft.com  Sun Apr 26 17:19:46 2009
From: aahz at pythoncraft.com (Aahz)
Date: Sun, 26 Apr 2009 08:19:46 -0700
Subject: [Python-Dev] Bug tracker down?
In-Reply-To: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com>
References: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com>
Message-ID: <20090426151946.GB17459@panix.com>

On Sun, Apr 26, 2009, Mark Dickinson wrote:
>
> The bugs.python.org site seems to be down.  

Dunno -- forwarded to the people who can do something about it.  (There's
a migration to a new mailserver going on, but I don't think this is
related.)
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From dickinsm at gmail.com  Sun Apr 26 18:35:30 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 26 Apr 2009 17:35:30 +0100
Subject: [Python-Dev] Bug tracker down?
In-Reply-To: <20090426151946.GB17459@panix.com>
References: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com>
	<20090426151946.GB17459@panix.com>
Message-ID: <5c6f2a5d0904260935v6d08ebb8y49896f145889f49c@mail.gmail.com>

On Sun, Apr 26, 2009 at 4:19 PM, Aahz <aahz at pythoncraft.com> wrote:
> On Sun, Apr 26, 2009, Mark Dickinson wrote:
>>
>> The bugs.python.org site seems to be down.
>
> Dunno -- forwarded to the people who can do something about it. ?(There's
> a migration to a new mailserver going on, but I don't think this is
> related.)

Thanks.  Who should I contact next time, to avoid spamming python-dev?

Mark

From aahz at pythoncraft.com  Sun Apr 26 18:36:48 2009
From: aahz at pythoncraft.com (Aahz)
Date: Sun, 26 Apr 2009 09:36:48 -0700
Subject: [Python-Dev] Bug tracker down?
In-Reply-To: <5c6f2a5d0904260935v6d08ebb8y49896f145889f49c@mail.gmail.com>
References: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com>
	<20090426151946.GB17459@panix.com>
	<5c6f2a5d0904260935v6d08ebb8y49896f145889f49c@mail.gmail.com>
Message-ID: <20090426163648.GA6892@panix.com>

On Sun, Apr 26, 2009, Mark Dickinson wrote:
> On Sun, Apr 26, 2009 at 4:19 PM, Aahz <aahz at pythoncraft.com> wrote:
>> On Sun, Apr 26, 2009, Mark Dickinson wrote:
>>>
>>> The bugs.python.org site seems to be down.
>>
>> Dunno -- forwarded to the people who can do something about it. ?(There's
>> a migration to a new mailserver going on, but I don't think this is
>> related.)
> 
> Thanks.  Who should I contact next time, to avoid spamming python-dev?

python-dev isn't a bad place (because it alerts the core developers), but
you can also send a message to pydotorg at python.org
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From eric at trueblade.com  Sun Apr 26 18:59:14 2009
From: eric at trueblade.com (Eric Smith)
Date: Sun, 26 Apr 2009 12:59:14 -0400
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
Message-ID: <49F492E2.7010702@trueblade.com>

Mark Dickinson wrote:
> I'd like to propose two minor changes to float and complex
> formatting, for 3.1.  I don't think either change should prove
> particularly disruptive.
> 
> (1) Currently, '%f' formatting automatically changes to '%g' formatting for
> numbers larger than 1e50.  For example:
...
> I propose removing this feature for 3.1

I'm +1 on this.

> I have a suspicion that at least part of the
> motivation for the '%f' -> '%g' switch is that it means the
> implementation can use a fixed-size buffer.  But Eric has
> fixed this (in 3.1, at least) and the buffer is now dynamically
> allocated, so this isn't a concern any more.

I agree that this is a big part of the reason it was done. There's still 
some work to be done in the fallback code which we use if we can't use 
Gay's implementation of _Py_dg_dtoa. But it's reasonably easy to 
calculate the maximum buffer size needed given the precision, for 
passing on to PyOS_snprintf. (At least I think that sentence is true, 
I'll very with Mark offline).

> Other reasons not to switch from '%f' to '%g' in this way:
> 
>  - the change isn't gentle:  as you go over the 1e50 boundary,
>    the number of significant digits produced suddenly changes
>    from 56 to 6;  it would make more sense to me if it
>    stayed fixed at 56 sig digits for numbers larger than 1e50.

This is the big reason for me.

>  - float formatting is already quite complicated enough; no
>    need to add to the mental complexity

And this, too.

> (2) complex str and repr don't behave like float str and repr, in that
> the float version always adds a trailing '.0' (unless there's an
> exponent), but the complex version doesn't:
...
> I propose changing the complex str and repr to behave like the
> float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
> rather than "(4+10j)".

I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, 
I'm not sure about the spaces around the sign. If we do want the spaces 
there, we can get rid of Py_DTSF_SIGN, since that's the only place it's 
used and we won't be able to use it for complex going forward.

Eric.

From dickinsm at gmail.com  Sun Apr 26 19:40:44 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 26 Apr 2009 18:40:44 +0100
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <49F492E2.7010702@trueblade.com>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
	<49F492E2.7010702@trueblade.com>
Message-ID: <5c6f2a5d0904261040m4fbdcc14rd0f81c37ce4bf85b@mail.gmail.com>

On Sun, Apr 26, 2009 at 5:59 PM, Eric Smith <eric at trueblade.com> wrote:
> Mark Dickinson wrote:
>> I propose changing the complex str and repr to behave like the
>> float version. ?That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
>> rather than "(4+10j)".
>
> I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, I'm
> not sure about the spaces around the sign. If we do want the spaces there,

Whoops.  The spaces were a mistake:  I'm not proposing to add those.
I meant "(4.0+10.0j)" rather than "(4.0 + 10.0j)".

Mark

From tseaver at palladion.com  Sun Apr 26 20:03:12 2009
From: tseaver at palladion.com (Tres Seaver)
Date: Sun, 26 Apr 2009 14:03:12 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <gstnpv$l4b$1@ger.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	<20090424152746.GA9543@panix.com>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090424T173704-587@post.gmane.org>	<87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp>	<49F215E5.4050205@g.nevcal.com>	<49F22E74.4070108@gmail.com>
	<gstnpv$l4b$1@ger.gmane.org>
Message-ID: <gt27l6$uib$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Terry Reedy wrote:

> Is NUL \0 allowed in POSIX file names?  If not, could that be used as an 
> escape char.  If it is not legal, then custom translated strings that 
> escape in the wild would raise a red flag as soon as something else 
> tried to use them.

Per David Wheeler's excellent "Fixing Linux/Unix/POSIX Filenames"[1]:

 Traditionally, Unix/Linux/POSIX filenames can be almost any sequence
 of bytes, and their meaning is unassigned. The only real rules are that
 ?/? is always the directory separator, and that filenames can?t contain
 byte 0 (because this is the terminator).

[1] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ9KHg+gerLs4ltQ4RAs0HAKCiAOxmB8oBJRIoOIK+OK2LryUN6ACgp64k
fzGUNScJwcdzzod3N+5JhOE=
=Cw4m
-----END PGP SIGNATURE-----

From martin at v.loewis.de  Sun Apr 26 21:03:00 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 26 Apr 2009 21:03:00 +0200
Subject: [Python-Dev] Bug tracker down?
In-Reply-To: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com>
References: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com>
Message-ID: <49F4AFE4.6040709@v.loewis.de>

> Does anyone know what the problem is?

The hardware running it apparently has serious problems.
Upfronthosting, the company providing the hardware, is
working on a solution. Unfortunately, it is difficult to
get support from the datacenter on weekends.

Regards,
Martin

From Scott.Daniels at Acm.Org  Sun Apr 26 21:11:36 2009
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Sun, 26 Apr 2009 12:11:36 -0700
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
Message-ID: <gt2bce$9k3$1@ger.gmane.org>

Mark Dickinson wrote:
> ... """[5]	These numbers are fairly arbitrary. They are intended to
>    avoid printing endless strings of meaningless digits without
>    hampering correct use and without having to know the exact
>    precision of floating point values on a particular machine."""
> I don't find this particularly convincing, though---I just don't see
> a really good reason not to give the user exactly what she/he
> asks for here.
As a user of Idle, I would not like to see the change you seek of
having %f stay full-precision.  When a number gets too long to print
on a single line, the wrap depends on the current window width, and
is calculated dynamically.  One section of the display with a 8000
-digit (100-line) text makes Idle slow to scroll around in.  It is
too easy for numbers to go massively positive in a bug.

>  - the change isn't gentle:  as you go over the 1e50 boundary,
>    the number of significant digits produced suddenly changes
>    from 56 to 6;  it would make more sense to me if it
>    stayed fixed at 56 sig digits for numbers larger than 1e50.
 >  - now that we're using David Gay's 'perfect rounding'
 >    code, we can be sure that the digits aren't entirely
 >    meaningless, or at least that they're the 'right' meaningless
 >    digits.  This wasn't true before.

However, this is, I agree, a problem.  Since all of these numbers
should end in a massive number of zeroes, how about we replace
only the trailing zeroes with the e, so we wind up with:
      1157920892373161954235709850086879078532699846656405640e+23
   or 115792089237316195423570985008687907853269984665640564.0e+24
or some such, rather than
      1.157920892373162e+77
   or 1.15792089237316195423570985008687907853269984665640564e+77

--Scott David Daniels
Scott.Daniels at Acm.Org

From dickinsm at gmail.com  Sun Apr 26 21:19:20 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 26 Apr 2009 20:19:20 +0100
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <gt2bce$9k3$1@ger.gmane.org>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
	<gt2bce$9k3$1@ger.gmane.org>
Message-ID: <5c6f2a5d0904261219n1164e451j877a268f1e4a7c28@mail.gmail.com>

On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels
<Scott.Daniels at acm.org> wrote:
> As a user of Idle, I would not like to see the change you seek of
> having %f stay full-precision. ?When a number gets too long to print
> on a single line, the wrap depends on the current window width, and
> is calculated dynamically. ?One section of the display with a 8000
> -digit (100-line) text makes Idle slow to scroll around in. ?It is
> too easy for numbers to go massively positive in a bug.

I see your point.  Since we're talking about floats, thought, there
should never be more than 316 characters in a '%f' % x: the
largest float is around 1.8e308, giving 308 digits before the
point, 6 after, a decimal point, and possibly a minus sign.
(Assuming that your platform uses IEEE 754 doubles.)

> However, this is, I agree, a problem. ?Since all of these numbers
> should end in a massive number of zeroes

But they typically don't end in zeros (except the six zeros following
the point),
because they're stored in binary rather than decimal.  For example:

>>> int(1e308)
100000000000000001097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336

Mark

From allison at shasta.stanford.edu  Sun Apr 26 21:29:38 2009
From: allison at shasta.stanford.edu (Dennis Allison)
Date: Sun, 26 Apr 2009 12:29:38 -0700
Subject: [Python-Dev] float formatting
Message-ID: <200904261929.n3QJTcJY030239@shasta.stanford.edu>

Floating point printing is tricky, as I am sure you know.  You might
want to refrefresh your understanding by consulting the literture--I
know I would.  For example, you might want to look at 

http://portal.acm.org/citation.cfm?id=93559

Guy Steele's paper:

Guy L. Steele , Jon L. White, How to print floating-point numbers accurately, ACM SIGPLAN Notices, v.39 n.4, April 2004 

is a classic and worthy of a read.

From tjreedy at udel.edu  Sun Apr 26 23:02:19 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 26 Apr 2009 17:02:19 -0400
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
Message-ID: <gt2i4p$ovj$1@ger.gmane.org>

Mark Dickinson wrote:
> I'd like to propose two minor changes to float and complex
> formatting, for 3.1.  I don't think either change should prove
> particularly disruptive.
> 
> (1) Currently, '%f' formatting automatically changes to '%g' formatting for
> numbers larger than 1e50.  For example:
> 
>>>> '%f' % 2**166.
> '93536104789177786765035829293842113257979682750464.000000'
>>>> '%f' % 2**167.
> '1.87072e+50'
> 
> I propose removing this feature for 3.1
> 
> More details: The current behaviour is documented (standard
> library->builtin types).  (Until very recently, it was actually
> misdocumented as changing at 1e25, not 1e50.)
> 
> """For safety reasons, floating point precisions are clipped to 50; %f
> conversions for numbers whose absolute value is over 1e50 are
> replaced by %g conversions. [5] All other errors raise exceptions."""
> 
> There's even a footnote:
> 
> """[5]	These numbers are fairly arbitrary. They are intended to
> avoid printing endless strings of meaningless digits without
> hampering correct use and without having to know the exact
> precision of floating point values on a particular machine."""
> 
> I don't find this particularly convincing, though---I just don't see
> a really good reason not to give the user exactly what she/he
> asks for here.  I have a suspicion that at least part of the
> motivation for the '%f' -> '%g' switch is that it means the
> implementation can use a fixed-size buffer.  But Eric has
> fixed this (in 3.1, at least) and the buffer is now dynamically
> allocated, so this isn't a concern any more.
> 
> Other reasons not to switch from '%f' to '%g' in this way:
> 
>  - the change isn't gentle:  as you go over the 1e50 boundary,
>    the number of significant digits produced suddenly changes
>    from 56 to 6; 

Looking at your example, that jumped out at me as somewhat startling...

> it would make more sense to me if it
>    stayed fixed at 56 sig digits for numbers larger than 1e50.

So I agree with this, even if the default # of sig digits were less.
+1

>  - now that we're using David Gay's 'perfect rounding'
>    code, we can be sure that the digits aren't entirely
>    meaningless, or at least that they're the 'right' meaningless
>    digits.  This wasn't true before.
>  - C doesn't do this, and the %f, %g, %e formats really
>    owe their heritage to C.
>  - float formatting is already quite complicated enough; no
>    need to add to the mental complexity
>  - removal simplifies the implementation :-)
> 
> 
> On to the second proposed change:
> 
> (2) complex str and repr don't behave like float str and repr, in that
> the float version always adds a trailing '.0' (unless there's an
> exponent), but the complex version doesn't:
> 
>>>> 4., 10.
> (4.0, 10.0)
>>>> 4. + 10.j
> (4+10j)
> 
> I propose changing the complex str and repr to behave like the
> float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
> rather than "(4+10j)".
> 
> Mostly this is just about consistency, ease of implementation,
> and aesthetics.  As far as I can tell, the extra '.0' in the float
> repr serves two closely-related purposes:  it makes it clear to
> the human reader that the number is a float rather than an
> integer, and it makes sure that e.g., eval(repr(x)) recovers a
> float rather than an int.  The latter point isn't a concern for
> the current complex repr, but the former is:  4+10j looks to
> me more like a Gaussian integer than a complex number.

I agree.  A complex is alternately an ordered pair of floats.  A 
different, number-theory oriented implementation of Python might even 
want to read 4+10j as a G. i.

tjr

From Scott.Daniels at Acm.Org  Sun Apr 26 23:42:20 2009
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Sun, 26 Apr 2009 14:42:20 -0700
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <5c6f2a5d0904261219n1164e451j877a268f1e4a7c28@mail.gmail.com>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>	
	<gt2bce$9k3$1@ger.gmane.org>
	<5c6f2a5d0904261219n1164e451j877a268f1e4a7c28@mail.gmail.com>
Message-ID: <49F4D53C.8070304@Acm.Org>

Mark Dickinson wrote:
 > On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels
 > <Scott.Daniels at acm.org> wrote:
 >> As a user of Idle, I would not like to see the change you seek of
 >> having %f stay full-precision.  When a number gets too long to print
 >> on a single line, the wrap depends on the current window width, and
 >> is calculated dynamically.  One section of the display with a 8000
 >> -digit (100-line) text makes Idle slow to scroll around in.  It is
 >> too easy for numbers to go massively positive in a bug.
 >
I had also said (without explaining:
 > > only the trailing zeroes with the e, so we wind up with:
 > >      1157920892373161954235709850086879078532699846656405640e+23
 > >  or 115792089237316195423570985008687907853269984665640564.0e+24
 > >  or some such, rather than
 > >      1.157920892373162e+77
 > >  or 1.15792089237316195423570985008687907853269984665640564e+77
These are all possible representations for 2 ** 256.

 > I see your point.  Since we're talking about floats, thought, there
 > should never be more than 316 characters in a '%f' % x: the
 > largest float is around 1.8e308, giving 308 digits before the
 > point, 6 after, a decimal point, and possibly a minus sign.
 > (Assuming that your platform uses IEEE 754 doubles.)
You are correct that I had not thought long and hard about that.
308 is livable, if not desireable.  I was remebering accidentally
displaying the result of a factorial call.

 >> However, this is, I agree, a problem.  Since all of these numbers
 >> should end in a massive number of zeroes
 >
 > But they typically don't end in zeros (except the six zeros following
 > the point),
 > because they're stored in binary rather than decimal....
_but_ the printed decimal number I am proposing is within one ULP of
the value of the binary numbery.  That is, the majority of the digits
in int(1e308) are a fiction -- they could just as well be the digits of
int(1e308) + int(1e100) because 1e308 + 1e100 == 1e308
That is the sense in which I say those digits in decimal are zeroes.
My proposal was to have the integer part of the expansion be a
representation of the accuracy of the number in a visible form.
I chose the value I chose since a zero lies at the very end, and
tried to indicate I did not really care where trailing actual accuracy
zeros get taken off the representation.  The reason I don't care is
that the code from getting a floating point value is tricky, and I
suspect the printing code might not easily be able to distinguish
between a significant trailing zero and fictitous bits.

--Scott David Daniels
Scott.Daniels at Acm.Org

From dickinsm at gmail.com  Mon Apr 27 00:35:00 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 26 Apr 2009 23:35:00 +0100
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <49F4D53C.8070304@Acm.Org>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
	<gt2bce$9k3$1@ger.gmane.org>
	<5c6f2a5d0904261219n1164e451j877a268f1e4a7c28@mail.gmail.com>
	<49F4D53C.8070304@Acm.Org>
Message-ID: <5c6f2a5d0904261535m50781b00vfba17efb5aaf631f@mail.gmail.com>

On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels
<Scott.Daniels at acm.org> wrote:

> I had also said (without explaining:
>> > only the trailing zeroes with the e, so we wind up with:
>> > ? ? ?1157920892373161954235709850086879078532699846656405640e+23
>> > ?or 115792089237316195423570985008687907853269984665640564.0e+24
>> > ?or some such, rather than
>> > ? ? ?1.157920892373162e+77
>> > ?or 1.15792089237316195423570985008687907853269984665640564e+77
> These are all possible representations for 2 ** 256.

Understood.

> _but_ the printed decimal number I am proposing is within one ULP of
> the value of the binary numbery.

But there are plenty of ways to get this if this is what you want: if
you want a displayed result that's within 1 ulp (or 0.5 ulps, which
would be better) of the true value then repr should serve your needs.
If you want more control over the number of significant digits then
'%g' formatting gives that, together with a nice-looking output for
small numbers.

It's only '%f' formatting that I'm proposing changing: I see a
'%.2f' formatting request as a very specific, precise one: give me
exactly 2 digits after the point---no more, no less, and it seems
wrong and arbitrary that this request should be ignored for
numbers larger than 1e50 in absolute value.

That is, for general float formatting needs, use %g, str and repr.
%e and %f are for when you want fine control.

> That is, the majority of the digits
> in int(1e308) are a fiction

Not really: the float that Python stores has a very specific value,
and the '%f' formatting is showing exactly that value.  (Yes, I
know that some people advocate viewing a float as a range
of values rather than a specific value;  but I'm pretty sure that
that's not the way that the creators of IEEE 754 were thinking.)

> zeros get taken off the representation. ?The reason I don't care is
> that the code from getting a floating point value is tricky, and I
> suspect the printing code might not easily be able to distinguish
> between a significant trailing zero and fictitous bits.

As of 3.1, the printing code should be fine:  it's using David
Gay's 'perfect rounding' code, so what's displayed should
be correctly rounded to the requested precision.

Mark

From python at rcn.com  Mon Apr 27 01:35:43 2009
From: python at rcn.com (Raymond Hettinger)
Date: Sun, 26 Apr 2009 16:35:43 -0700
Subject: [Python-Dev] Two proposed changes to float formatting
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
	<gt2i4p$ovj$1@ger.gmane.org>
Message-ID: <200F596842DC4E0CA0305794F05B8905@RaymondLaptop1>

>> it would make more sense to me if it
>>    stayed fixed at 56 sig digits for numbers larger than 1e50.
> 
> So I agree with this, even if the default # of sig digits were less.

Several reasons to accept Mark's proposal:

* It matches what C does and many languages tend to copy the
   C standards with respect to format codes.  Matching other
   languages helps in porting code, copying algorithms, and mentally
   switching back and forth when working in multiple languages.

* When a programmer has chosen %f, that means that they have
   consciously rejected choosing %e or %g.  It is generally best to
   have the code do what the programmer asked for ;-)

* Code that tested well with 1e47, 1e48, 1e49, and 1e50
   suddenly shifts behavior with 1e51.  Behavior shifts like that
   are bug bait.

* The 56 significant digits may be rooted in the longest
   decimal expansion of a 53 bit float.  For example,
   len(str(Decimal.from_float(.1))) is 57 including the leading
   zero.   But not all machines (now, in the past, or in the future)
   use 53 bits for the significand.

* Use of exponents is common but not universal.  Some converters
   for SQL specs like Decimal(10,80) may not recognize the
   e-notation.  The xmlrpc spec only accepts decimal expansions
   not %e notation.

* The programmer needs to have some way to spell-out a
   decimal expansion when needed.   Currently, %f is the only way.

Raymond

From eric at trueblade.com  Mon Apr 27 01:42:51 2009
From: eric at trueblade.com (Eric Smith)
Date: Sun, 26 Apr 2009 19:42:51 -0400
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>
Message-ID: <49F4F17B.9090807@trueblade.com>

Mark Dickinson wrote:
> (1) Currently, '%f' formatting automatically changes to '%g' formatting for
> numbers larger than 1e50.  For example:
> 
>>>> '%f' % 2**166.
> '93536104789177786765035829293842113257979682750464.000000'
>>>> '%f' % 2**167.
> '1.87072e+50'
> 
> I propose removing this feature for 3.1

I don't think we've stated it on this discussion, but I know from 
private email with Mark that his proposal is for both %-formatting and 
for float.__format__ to have this change. I just want to get it on the 
record here.

Eric.

From agbauer at gmail.com  Mon Apr 27 02:59:54 2009
From: agbauer at gmail.com (Adrian)
Date: Mon, 27 Apr 2009 00:59:54 +0000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <5f1bf48e0904261759i54c730fdvbdc47e0f80aa0667@mail.gmail.com>

How about another str-like type, a sequence of char-or-bytes? Could be
called strbytes or stringwithinvalidcharacters. It would support
whatever subset of str functionality makes sense / is easy to
implement plus a to_escaped_str() method (that does the escaping the
PEP talks about) for people who want to use regexes or other str-only
stuff.

Here is a description by example:
os.listdir('.') -> [strbytes('normal_file'), strbytes('bad', 128, 'file')]
strbytes('a')[0] -> strbytes('a')
strbytes('bad', 128, 'file')[3] -> strbytes(128)
strbytes('bad', 128, 'file').to_escaped_str() -> 'bad?128file'

Having a separate type is cleaner than a "str that isn't exactly what
it represents". And making the escaping an explicit (but
rarely-needed) step would be less surprising for users. Anyway, I
don't know a whole lot about this issue so there may an obvious reason
this is a bad idea.

On Wed, Apr 22, 2009 at 6:50 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I'm proposing the following PEP for inclusion into Python 3.1.
> Please comment.
>
> Regards,
> Martin
>
> PEP: 383
> Title: Non-decodable Bytes in System Character Interfaces
> Version: $Revision: 71793 $
> Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 2009) $
> Author: Martin v. L?wis <martin at v.loewis.de>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 22-Apr-2009
> Python-Version: 3.1
> Post-History:
>
> Abstract
> ========
>
> File names, environment variables, and command line arguments are
> defined as being character data in POSIX; the C APIs however allow
> passing arbitrary bytes - whether these conform to a certain encoding
> or not. This PEP proposes a means of dealing with such irregularities
> by embedding the bytes in character strings in such a way that allows
> recreation of the original byte string.
>
> Rationale
> =========
>
> The C char type is a data type that is commonly used to represent both
> character data and bytes. Certain POSIX interfaces are specified and
> widely understood as operating on character data, however, the system
> call interfaces make no assumption on the encoding of these data, and
> pass them on as-is. With Python 3, character strings use a
> Unicode-based internal representation, making it difficult to ignore
> the encoding of byte strings in the same way that the C interfaces can
> ignore the encoding.
>
> On the other hand, Microsoft Windows NT has correct the original
> design limitation of Unix, and made it explicit in its system
> interfaces that these data (file names, environment variables, command
> line arguments) are indeed character data, by providing a
> Unicode-based API (keeping a C-char-based one for backwards
> compatibility).
>
> For Python 3, one proposed solution is to provide two sets of APIs: a
> byte-oriented one, and a character-oriented one, where the
> character-oriented one would be limited to not being able to represent
> all data accurately. Unfortunately, for Windows, the situation would
> be exactly the opposite: the byte-oriented interface cannot represent
> all data; only the character-oriented API can. As a consequence,
> libraries and applications that want to support all user data in a
> cross-platform manner have to accept mish-mash of bytes and characters
> exactly in the way that caused endless troubles for Python 2.x.
>
> With this PEP, a uniform treatment of these data as characters becomes
> possible. The uniformity is achieved by using specific encoding
> algorithms, meaning that the data can be converted back to bytes on
> POSIX systems only if the same encoding is used.
>
> Specification
> =============
>
> On Windows, Python uses the wide character APIs to access
> character-oriented APIs, allowing direct conversion of the
> environmental data to Python str objects.
>
> On POSIX systems, Python currently applies the locale's encoding to
> convert the byte data to Unicode. If the locale's encoding is UTF-8,
> it can represent the full set of Unicode characters, otherwise, only a
> subset is representable. In the latter case, using private-use
> characters to represent these bytes would be an option. For UTF-8,
> doing so would create an ambiguity, as the private-use characters may
> regularly occur in the input also.
>
> To convert non-decodable bytes, a new error handler "python-escape" is
> introduced, which decodes non-decodable bytes using into a private-use
> character U+F01xx, which is believed to not conflict with private-use
> characters that currently exist in Python codecs.
>
> The error handler interface is extended to allow the encode error
> handler to return byte strings immediately, in addition to returning
> Unicode strings which then get encoded again.
>
> If the locale's encoding is UTF-8, the file system encoding is set to
> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
>
> Discussion
> ==========
>
> While providing a uniform API to non-decodable bytes, this interface
> has the limitation that chosen representation only "works" if the data
> get converted back to bytes with the python-escape error handler
> also. Encoding the data with the locale's encoding and the (default)
> strict error handler will raise an exception, encoding them with UTF-8
> will produce non-sensical data.
>
> For most applications, we assume that they eventually pass data
> received from a system interface back into the same system
> interfaces. For example, and application invoking os.listdir() will
> likely pass the result strings back into APIs like os.stat() or
> open(), which then encodes them back into their original byte
> representation. Applications that need to process the original byte
> strings can obtain them by encoding the character strings with the
> file system encoding, passing "python-escape" as the error handler
> name.
>
> Copyright
> =========
>
> This document has been placed in the public domain.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/agbauer%40gmail.com
>

From Scott.Daniels at Acm.Org  Mon Apr 27 04:56:43 2009
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Sun, 26 Apr 2009 19:56:43 -0700
Subject: [Python-Dev] Two proposed changes to float formatting
In-Reply-To: <5c6f2a5d0904261535m50781b00vfba17efb5aaf631f@mail.gmail.com>
References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com>	
	<gt2bce$9k3$1@ger.gmane.org>	
	<5c6f2a5d0904261219n1164e451j877a268f1e4a7c28@mail.gmail.com>	
	<49F4D53C.8070304@Acm.Org>
	<5c6f2a5d0904261535m50781b00vfba17efb5aaf631f@mail.gmail.com>
Message-ID: <49F51EEB.7000508@Acm.Org>

ark Dickinson wrote:
 > On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels wrote:
 >...
 >> I had also said (without explaining:
 >>>> only the trailing zeroes with the e, so we wind up with:
 >>>>      1157920892373161954235709850086879078532699846656405640e+23
 >>>>  or 115792089237316195423570985008687907853269984665640564.0e+24
 >>>>  or some such, rather than
 >>>>      1.157920892373162e+77
 >>>>  or 1.15792089237316195423570985008687907853269984665640564e+77
 >> These are all possible representations for 2 ** 256.
 >
 > Understood.
 >
 >> _but_ the printed decimal number I am proposing is within one ULP of
 >> the value of the binary numbery.
 >
 > But there are plenty of ways to get this if this is what you want: if
 > you want a displayed result that's within 1 ulp (or 0.5 ulps, which
 > would be better) of the true value then repr should serve your needs.

The representation I am suggesting here is a half-way measure between
your proposal and the existing behvior.  This representation addresses
the abrupt transition that you point out (number of significant digits
drops precipitously) without particularly changing the goal of the
transition (displaying faux accuracy), without, in my (possibly naive)
view, seriously complicating either the print-generating code or the
issues for the reader of the output.

To wit, the proposal is (A) for numbers where the printed digits exceed
the accuracy presented, represent the result as an integer with an e+N,
rather than a number between 1 and 2-epsilon with an exponent that makes
you have to count digits to compare the two values, and (B) that the full
precision available in the the value be shown in the representation.

Given that everyone understands that is what I am proposing, I am OK
with the decision going where it will.  I am comforted that we are only
talking about about four wrapped lines if we go to the full integer,
which I had not realized.  Further, I agree with you that there is an
abrupt transition in represented accuracy as we cross from %f to %g,
that should be somehow addressed.  You want to address it by continuing
to show digits, and I want to limit the digits shown to a value that
reflects the known accuracy.  I also want text that compares "smoothly"
with numbers near the transition (so that greater-than and less-than
relationships are obvious without thinking, hence the representation
that avoids the "normalized" mantissa.
                                  .
Having said all this, I think my compromise position should be clear.
I did not mean to argue with you, but rather intended to propose a
possible middle way that some might find appealing.

--Scott David Daniels
Scott.Daniels at Acm.Org

From martin at v.loewis.de  Mon Apr 27 07:34:03 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 27 Apr 2009 07:34:03 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <5f1bf48e0904261759i54c730fdvbdc47e0f80aa0667@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<5f1bf48e0904261759i54c730fdvbdc47e0f80aa0667@mail.gmail.com>
Message-ID: <49F543CB.7000707@v.loewis.de>

> How about another str-like type, a sequence of char-or-bytes?

That would be a different PEP. I personally like my own proposal
more, but feel free to propose something different.

Regards,
Martin

From v+python at g.nevcal.com  Mon Apr 27 08:39:41 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Sun, 26 Apr 2009 23:39:41 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in	System		Character
 Interfaces
In-Reply-To: <49F30390.2040808@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	<20090424152746.GA9543@panix.com>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090424T173704-587@post.gmane.org>	<87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp>
	<49F215E5.4050205@g.nevcal.com> <49F30390.2040808@v.loewis.de>
Message-ID: <49F5532D.2090700@g.nevcal.com>

On approximately 4/25/2009 5:35 AM, came the following characters from 
the keyboard of Martin v. L?wis:
>> Because the encoding is not reliably reversible.
> 
> Why do you say that? The encoding is completely reversible
> (unless we disagree on what "reversible" means).
> 
>> I'm +1 on the concept, -1 on the PEP, due solely to the lack of a
>> reversible encoding.
> 
> Then please provide an example for a setup where it is not reversible.
> 
> Regards,
> Martin

It is reversible if you know that it is decoded, and apply the encoding. 
  But if you don't know that has been encoded, then applying the reverse 
transform can convert an undecoded str that matches the decoded str to 
the form that it could have, but never did take.

The problem is that there is no guarantee that the str interface 
provides only strictly conforming Unicode, so decoding bytes to 
non-strictly conforming Unicode, can result in a data pun between 
non-strictly conforming Unicode coming from the str interface vs bytes 
being decoded to non-strictly conforming Unicode coming from the bytes 
interface.

Any particular problem that always consistently uses one or the other 
(bytes vs str) APIs under the covers might never be affected by such a 
data pun, but programs that may use both types of interface could 
potentially see a data pun.

If your PEP depends on consistent use of one or the other type of 
interface, you should say so, and if the platform only provides that 
type of interface, maybe all is well.  Both types of interfaces are 
available on Windows, perhaps POSIX only provides native bytes 
interfaces, and if the PEP is the only way to provide str interfaces, 
then perhaps consistency use is required.

There are still issues regarding how Windows and POSIX programs that are 
sharing cross-mounted file systems might communicate file names between 
each other, which is not at all clear from the PEP.  If this is an 
insoluble or un-addressed issue, it should be stated.  (It is probably 
insoluble, due to there being multiple ways that the cross-mounted file 
systems might translate names; but if there are, can we learn something 
from the rules the mounting systems use, to be compatible with (one of) 
them, or not.

Together with your change to avoid using PUA characters, and the rule 
suggested by MRAB in another branch of this thread, of treating 
half-surrogates as invalid byte sequences may avoid the data puns I'm 
concerned about.

It is not clear how half-surrogate characters would be displayed, when 
the user prints or displays such a file name string.  It would seem that 
programs that display file names to users might still have issues with 
such; an escaping mechanism that uses displayable characters would have 
an advantage there.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at g.nevcal.com  Mon Apr 27 09:07:16 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 27 Apr 2009 00:07:16 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F30083.5050506@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>
Message-ID: <49F559A4.8050400@g.nevcal.com>

On approximately 4/25/2009 5:22 AM, came the following characters from 
the keyboard of Martin v. L?wis:
>> The problem with this, and other preceding schemes that have been
>> discussed here, is that there is no means of ascertaining whether a
>> particular file name str was obtained from a str API, or was funny-
>> decoded from a bytes API... and thus, there is no means of reliably
>> ascertaining whether a particular filename str should be passed to a
>> str API, or funny-encoded back to bytes.
> 
> Why is it necessary that you are able to make this distinction?

It is necessary that programs (not me) can make the distinction, so that 
it knows whether or not to do the funny-encoding or not.  If a name is 
funny-decoded when the name is accessed by a directory listing, it needs 
to be funny-encoded in order to open the file.

>> Picking a character (I don't find U+F01xx in the
>> Unicode standard, so I don't know what it is)
> 
> It's a private use area. It will never carry an official character
> assignment.

I know that U+F0000 - U+FFFFF is a private use area.  I don't find a 
definition of U+F01xx to know what the notation means.  Are you picking 
a particular character within the private use area, or a particular 
range, or what?

>> As I realized in the email-sig, in talking about decoding corrupted
>> headers, there is only one way to guarantee this... to encode _all_
>> character sequences, from _all_ interfaces.  Basically it requires
>> reserving an escape character (I'll use ? in these examples -- yes, an
>> ASCII question mark -- happens to be illegal in Windows filenames so
>> all the better on that platform, but the specific character doesn't
>> matter... avoiding / \ and . is probably good, though).
> 
> I think you'll have to write an alternative PEP if you want to see
> something like this implemented throughout Python.

I'm certainly not experienced enough in Python development processes or 
internals to attempt such, as yet.  But somewhere in 25 years of 
programming, I picked up the knowledge that if you want to have a 1-to-1 
reversible mapping, you have to avoid data puns, mappings of two 
different data values into a single data value.  Your PEP, as first 
written, didn't seem to do that... since there are two interfaces from 
which to obtain data values, one performing a mapping from bytes to 
"funny invalid" Unicode, and the other performing no mapping, but 
accepting any sort of Unicode, possibly including "funny invalid" 
Unicode, the possibility of data puns seems to exist.  I may be 
misunderstanding something about the use cases that prevent these two 
sources of "funny invalid" Unicode from ever coexisting, but if so, 
perhaps you could point it out, or clarify the PEP.  I'll try to reread 
it again... could you post a URL to the most up-to-date version of the 
PEP, since I haven't seen such appear here, and the version I found via 
a Google search seems to be the original?

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From cs at zip.com.au  Mon Apr 27 09:55:49 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Mon, 27 Apr 2009 17:55:49 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F5532D.2090700@g.nevcal.com>
Message-ID: <20090427075549.GA4418@cskk.homeip.net>

On 26Apr2009 23:39, Glenn Linderman <v+python at g.nevcal.com> wrote:
[...snip...]
> There are still issues regarding how Windows and POSIX programs that are  
> sharing cross-mounted file systems might communicate file names between  
> each other, which is not at all clear from the PEP.  If this is an  
> insoluble or un-addressed issue, it should be stated.  (It is probably  
> insoluble, due to there being multiple ways that the cross-mounted file  
> systems might translate names; but if there are, can we learn something  
> from the rules the mounting systems use, to be compatible with (one of)  
> them, or not.

I'd say that's out of scope. A windows filesystem mounted on a UNIX host
should probably be mounted with a mapping to translate the Windows
Unicode names into whatever the sysadmin deems the locally most apt
byte encoding. But sys.getfilesystemencoding() is based on the current user's
locale settings, which need not be the same.

> Together with your change to avoid using PUA characters, and the rule  
> suggested by MRAB in another branch of this thread, of treating  
> half-surrogates as invalid byte sequences may avoid the data puns I'm  
> concerned about.
>
> It is not clear how half-surrogate characters would be displayed, when  
> the user prints or displays such a file name string.  It would seem that  
> programs that display file names to users might still have issues with  
> such; an escaping mechanism that uses displayable characters would have  
> an advantage there.

Wouldn't any escaping mechanism that uses displayable characters
require visually mangling occurences of those characters that
legitimately occur in the original?
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

From v+python at g.nevcal.com  Mon Apr 27 10:40:43 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 27 Apr 2009 01:40:43 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090427075549.GA4418@cskk.homeip.net>
References: <20090427075549.GA4418@cskk.homeip.net>
Message-ID: <49F56F8B.7030108@g.nevcal.com>

On approximately 4/27/2009 12:55 AM, came the following characters from 
the keyboard of Cameron Simpson:
> On 26Apr2009 23:39, Glenn Linderman <v+python at g.nevcal.com> wrote:
> [...snip...]
>   
>> There are still issues regarding how Windows and POSIX programs that are  
>> sharing cross-mounted file systems might communicate file names between  
>> each other, which is not at all clear from the PEP.  If this is an  
>> insoluble or un-addressed issue, it should be stated.  (It is probably  
>> insoluble, due to there being multiple ways that the cross-mounted file  
>> systems might translate names; but if there are, can we learn something  
>> from the rules the mounting systems use, to be compatible with (one of)  
>> them, or not.
>>     
>
> I'd say that's out of scope. A windows filesystem mounted on a UNIX host
> should probably be mounted with a mapping to translate the Windows
> Unicode names into whatever the sysadmin deems the locally most apt
> byte encoding. But sys.getfilesystemencoding() is based on the current user's
> locale settings, which need not be the same.
>   

And if it were, what would it do with files that can't be encoded with 
the locally most apt byte encoding?  That's where we might learn 
something about what behaviors are deemed acceptable.  Would such files 
be inaccessible?  Accessible with mangled names?  or what?

And for a Unix filesystem mounted on a Windows host?  Or accessed via 
some network connection?

>> Together with your change to avoid using PUA characters, and the rule  
>> suggested by MRAB in another branch of this thread, of treating  
>> half-surrogates as invalid byte sequences may avoid the data puns I'm  
>> concerned about.
>>
>> It is not clear how half-surrogate characters would be displayed, when  
>> the user prints or displays such a file name string.  It would seem that  
>> programs that display file names to users might still have issues with  
>> such; an escaping mechanism that uses displayable characters would have  
>> an advantage there.
>>     
>
> Wouldn't any escaping mechanism that uses displayable characters
> require visually mangling occurences of those characters that
> legitimately occur in the original?
>   

Yes.  My suggested use of ? is a visible character that is illegal in 
Windows file names, thus causing no valid Windows file names to be 
visually mangled.  It is also a character that should be avoided in 
POSIX names because:

1) it is known to be illegal on Windows, and thus non-portable
2) it is hard to write globs that match ? without allowing matches of 
other characters as well
3) it must be quoted to specify it on a command line

That said, someone provided a case where it is "easy" to get ? in POSIX 
file names.  The remaining question is whether that is a reasonable use 
case, a frequent use case, or a stupid use case; and whether the 
resulting visible mangling is more or less understandable and disruptive 
than using half-surrogates which are:

1) invalid Unicode
2) non-displayable
3) indistinguishable using normal non-displayable character substitution 
rules

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From rdmurray at bitdance.com  Mon Apr 27 11:32:42 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Mon, 27 Apr 2009 05:32:42 -0400 (EDT)
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F56F8B.7030108@g.nevcal.com>
References: <20090427075549.GA4418@cskk.homeip.net>
	<49F56F8B.7030108@g.nevcal.com>
Message-ID: <Pine.LNX.4.64.0904270528260.1740@kimball.webabinitio.net>

On Mon, 27 Apr 2009 at 01:40, Glenn Linderman wrote:
> Yes.  My suggested use of ? is a visible character that is illegal in Windows 
> file names, thus causing no valid Windows file names to be visually mangled. 
> It is also a character that should be avoided in POSIX names because:
>
> 1) it is known to be illegal on Windows, and thus non-portable
> 2) it is hard to write globs that match ? without allowing matches of other 
> characters as well
> 3) it must be quoted to specify it on a command line
>
> That said, someone provided a case where it is "easy" to get ? in POSIX file 
> names.  The remaining question is whether that is a reasonable use case, a 
> frequent use case, or a stupid use case; and whether the resulting visible

Reasonable I don't know, but frequent (FSDO frequent) and out of
our control yes.  It happens often when downloading files with wget,
for example.

--David

From solipsis at pitrou.net  Mon Apr 27 13:29:14 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 27 Apr 2009 11:29:14 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System?=
	=?utf-8?q?=09Character=09Interfaces?=
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <loom.20090427T111118-510@post.gmane.org>

Stephen J. Turnbull <stephen <at> xemacs.org> writes:
> 
> If
> you see a broken encoding once, you're likely to see it a million times
> (spammers have the most broken software) or maybe have it raise an
> unhandled Exception a dozen times (in rate of using busted software,
> the spammers are closely followed by bosses---which would be very bad,
> eh, if you 2/3 of the mail from your boss ends up in an undeliverables
> queue due to encoding errors that are unhandled by your some filter in
> your mail pipeline).

I'm not sure how mail being stuck in a pipeline has anything to do with Martin's
proposal (which deals with file paths, not with SMTP...).
Besides, I don't care about spammers and their broken software.

> Again, that's not the point.  The point is that six-sigma reliability
> world-wide is not going to be very comforting to the poor souls who
> happen to have broken software in their environment sending broken
> encodings regularly, because they're going to be dealing with one or
> two sigmas, and that's just not good enough in a production
> environment.

So you're arguing that whatever solution which isn't 100% perfect but only
99.999% perfect shouldn't be implemented at all, and leave the status quo at
98%? This sounds disturbing to me.

(especially given you probably sent this mail using TCP/IP...)

Regards

Antoine.

From dd at crosstwine.com  Mon Apr 27 16:25:47 2009
From: dd at crosstwine.com (Damien Diederen)
Date: Mon, 27 Apr 2009 16:25:47 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <loom.20090408T110540-221@post.gmane.org> (Antoine Pitrou's
	message of "Wed, 8 Apr 2009 11:10:21 +0000 (UTC)")
References: <loom.20090408T110540-221@post.gmane.org>
Message-ID: <87k556dvh0.fsf@keem.bcc>

Hello,

Antoine Pitrou <solipsis at pitrou.net> writes:
> Hello,
>
> We're in the process of forward-porting the recent (massive) json
> updates to 3.1, and we are also thinking of dropping remnants of
> support of the bytes type in the json library (in 3.1, again). This
> bytes support almost didn't work at all, but there was a lot of C and
> Python code for it nevertheless. We're also thinking of dropping the
> "encoding" argument in the various APIs, since it is useless.

I had a quick look into the module on both branches, and at Antoine's
latest patch (json_py3k-3).  The current situation on trunk is indeed
not very pretty in terms of code duplication, and I agree it would be
nice not to carry that forward.

I couldn't figure out a way to get rid of it short of multi-#including
"templates" and playing with the C preprocessor, however, and have the
nagging feeling the latter would be frowned upon by the maintainers.

There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm
wrong about that.  Should I give it a try, and see how "clean" the
result can be made?

> Under the new situation, json would only ever allow str as input, and
> output str as well. By posting here, I want to know whether anybody
> would oppose this (knowing, once again, that bytes support is already
> broken in the current py3k trunk).

Provided one of the alternatives is dropped, wouldn't it be better to do
the opposite, i.e., have the decoder take bytes as input, and the
encoder produce bytes?and layer the str functionality on top of that?  I
guess the answer depends on how the (most common) lower layers are
structured, but it would be nice to allow a straight bytes path to/from
the underlying transport.

(I'm willing to have a go at the conversion in case somebody is
interested.)

Bob, would you have an idea of which lower layers are most commonly used
with the json module, and whether people are more likely to expect strs
or bytes in Python 3.x?  Maybe that data could be inferred from some bug
tracking system?

> The bug entry is: http://bugs.python.org/issue4136
>
> Regards
> Antoine.

Regards,
Damien

-- 
http://crosstwine.com

"Strong Opinions, Weakly Held"
                 -- Bob Johansen

From eric at trueblade.com  Mon Apr 27 17:03:21 2009
From: eric at trueblade.com (Eric Smith)
Date: Mon, 27 Apr 2009 11:03:21 -0400
Subject: [Python-Dev] Windows buildbots failing test_types in trunk
Message-ID: <49F5C939.6020802@trueblade.com>

Mark Dickinson pointed out to me that the trunk buildbots are failing 
under Windows.

After some analysis, I think this is because of a change I made to use 
_toupper in integer formatting. The correct solution to this is to 
implement issue 5793 to come up with a working, cross-platform, 
locale-unaware set of functions and/or macros for isdigit / isupper / 
toupper, etc.

I'll work on this tonight or tomorrow, at which point the Windows 
buildbots should turn green.

I don't think this affects py3k, although I'll port it there before the 
beta release.

Eric.

From eric at trueblade.com  Mon Apr 27 17:05:04 2009
From: eric at trueblade.com (Eric Smith)
Date: Mon, 27 Apr 2009 11:05:04 -0400 (EDT)
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <87k556dvh0.fsf@keem.bcc>
References: <loom.20090408T110540-221@post.gmane.org> <87k556dvh0.fsf@keem.bcc>
Message-ID: <26274.63.251.87.214.1240844704.squirrel@mail.trueblade.com>

> I couldn't figure out a way to get rid of it short of multi-#including
> "templates" and playing with the C preprocessor, however, and have the
> nagging feeling the latter would be frowned upon by the maintainers.

Not sure if this is exactly what you mean, but look at Objects/stringlib.
str.format() and unicode.format() share the same implementation, using
stringdefs.h and unicodedefs.h.

Eric.

From bob at redivi.com  Mon Apr 27 17:07:04 2009
From: bob at redivi.com (Bob Ippolito)
Date: Mon, 27 Apr 2009 08:07:04 -0700
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <87k556dvh0.fsf@keem.bcc>
References: <loom.20090408T110540-221@post.gmane.org> <87k556dvh0.fsf@keem.bcc>
Message-ID: <6a36e7290904270807kbe9ac4y90c7078393e1a393@mail.gmail.com>

On Mon, Apr 27, 2009 at 7:25 AM, Damien Diederen <dd at crosstwine.com> wrote:
>
> Antoine Pitrou <solipsis at pitrou.net> writes:
>> Hello,
>>
>> We're in the process of forward-porting the recent (massive) json
>> updates to 3.1, and we are also thinking of dropping remnants of
>> support of the bytes type in the json library (in 3.1, again). This
>> bytes support almost didn't work at all, but there was a lot of C and
>> Python code for it nevertheless. We're also thinking of dropping the
>> "encoding" argument in the various APIs, since it is useless.
>
> I had a quick look into the module on both branches, and at Antoine's
> latest patch (json_py3k-3). ?The current situation on trunk is indeed
> not very pretty in terms of code duplication, and I agree it would be
> nice not to carry that forward.
>
> I couldn't figure out a way to get rid of it short of multi-#including
> "templates" and playing with the C preprocessor, however, and have the
> nagging feeling the latter would be frowned upon by the maintainers.
>
> There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm
> wrong about that. ?Should I give it a try, and see how "clean" the
> result can be made?
>
>> Under the new situation, json would only ever allow str as input, and
>> output str as well. By posting here, I want to know whether anybody
>> would oppose this (knowing, once again, that bytes support is already
>> broken in the current py3k trunk).
>
> Provided one of the alternatives is dropped, wouldn't it be better to do
> the opposite, i.e., have the decoder take bytes as input, and the
> encoder produce bytes?and layer the str functionality on top of that? ?I
> guess the answer depends on how the (most common) lower layers are
> structured, but it would be nice to allow a straight bytes path to/from
> the underlying transport.
>
> (I'm willing to have a go at the conversion in case somebody is
> interested.)
>
> Bob, would you have an idea of which lower layers are most commonly used
> with the json module, and whether people are more likely to expect strs
> or bytes in Python 3.x? ?Maybe that data could be inferred from some bug
> tracking system?

I don't know what Python 3.x users expect. As far as I know, none of
the lower layers of the json package are used directly. They're
certainly not supposed to be or documented as such.

My use case for dumps is typically bytes output because we push it
straight to and from IO. Some people embed JSON in other documents
(e.g. HTML) where you would want it to be text. I'm pretty sure that
the IO case is more common.

-bob

From dd at crosstwine.com  Mon Apr 27 17:22:32 2009
From: dd at crosstwine.com (Damien Diederen)
Date: Mon, 27 Apr 2009 17:22:32 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
In-Reply-To: <26274.63.251.87.214.1240844704.squirrel@mail.trueblade.com>
	(Eric Smith's message of "Mon, 27 Apr 2009 11:05:04 -0400 (EDT)")
References: <loom.20090408T110540-221@post.gmane.org> <87k556dvh0.fsf@keem.bcc>
	<26274.63.251.87.214.1240844704.squirrel@mail.trueblade.com>
Message-ID: <87y6tmce9z.fsf@keem.bcc>

Hi Eric,

"Eric Smith" <eric at trueblade.com> writes:
>> I couldn't figure out a way to get rid of it short of multi-#including
>> "templates" and playing with the C preprocessor, however, and have the
>> nagging feeling the latter would be frowned upon by the maintainers.
>
> Not sure if this is exactly what you mean, but look at Objects/stringlib.
> str.format() and unicode.format() share the same implementation, using
> stringdefs.h and unicodedefs.h.

That's indeed a much better example!  I'm more confortable applying the
same technique to the json module now that I see it used in the core.

(Provided Bob and Antoine are not turned away by the relative ugliness,
that is.)

> Eric.

Cheers,
Damien

--
http://crosstwine.com

"Strong Opinions, Weakly Held"
		 -- Bob Johansen

From solipsis at pitrou.net  Mon Apr 27 17:24:29 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 27 Apr 2009 15:24:29 +0000 (UTC)
Subject: [Python-Dev] Dropping bytes "support" in json
References: <loom.20090408T110540-221@post.gmane.org> <87k556dvh0.fsf@keem.bcc>
Message-ID: <loom.20090427T151353-398@post.gmane.org>

Damien Diederen <dd <at> crosstwine.com> writes:
> 
> I couldn't figure out a way to get rid of it short of multi-#including
> "templates" and playing with the C preprocessor, however, and have the
> nagging feeling the latter would be frowned upon by the maintainers.
> 
> There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm
> wrong about that.  Should I give it a try, and see how "clean" the
> result can be made?

Keep in mind that json is externally maintained by Bob. The more we rework his
code, the less easy it will be to backport other changes from the simplejson
library.

I think we should either keep the code duplication (if we want to keep fast
paths for both bytes and str objects), or only keep one of the two versions as
my patch does.

> Provided one of the alternatives is dropped, wouldn't it be better to do
> the opposite, i.e., have the decoder take bytes as input, and the
> encoder produce bytes?and layer the str functionality on top of that?  I
> guess the answer depends on how the (most common) lower layers are
> structured, but it would be nice to allow a straight bytes path to/from
> the underlying transport.

The straightest path is actually to/from unicode, since JSON data can contain
unicode strings but no byte strings. Also, the json library /has/ to output
unicode when `ensure_ascii` is False. In 2.x:

>>> json.dumps([u"?l?phant"], ensure_ascii=False)
u'["\xe9l\xe9phant"]'

In any case, I don't think it will matter much in terms of speed whether we take
one route or the other. UTF-8 encoding/decoding is probably much faster (in
characters per second) than JSON encoding/decoding is.

Regards

Antoine.

From stephen at xemacs.org  Mon Apr 27 17:47:05 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 28 Apr 2009 00:47:05 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <loom.20090427T111118-510@post.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
Message-ID: <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>

Antoine Pitrou writes:

 > I'm not sure how mail being stuck in a pipeline has anything to do
 > with Martin's proposal (which deals with file paths, not with
 > SMTP...).

I hate to break it to you, but most stages of mail processing have
very little to do with SMTP.  In particular, processing MIME
attachments often requires dealing with file names.  Would practical
problems arise?  I expect they would.  Can I tell you what they are?
No; if I could I'd write a better PEP.  I'm just saying that my
experience is that Murphy's Law applies more to encoding processing
than any other area of software I've worked in (admittedly, I don't do
threads ;-).

 > Besides, I don't care about spammers and their broken software.

That's precisely my point.  The PEP's "solution" will be very
appealing to people who just don't care as long as it works for them,
in the subset of corner cases they happen to encounter.  A lot of
software, including low-level components, will be written using these
APIs, and they will result in escapes of uninterpreted bytes (encoded
as Unicode) into the textual world.

 > So you're arguing that whatever solution which isn't 100% perfect
 > but only 99.999% perfect shouldn't be implemented at all, and leave
 > the status quo at 98%?

No, I'm not talking about "whatever solution".  I'm only arguing about
PEP 383.  The point is that Martin's proposal is not just a solution
to the problem he posed.  It's also going to be the one obvious way to
make the usual mistakes, i.e., the return values will escape into code
paths they're not intended for.  And the APIs won't be killable until
Python 4000.  If we find a better way (which I think Python 3's move
to "text is Unicode" is likely to inspire!), we'll have to wait 10-15
years or more before it becomes the OOWTDI.  The only real hope about
that is that Unicode will become universal before that, and only
archaeologists will ever encounter malformed text.

I believe there are solutions that don't have that problem.
Specifically, if the return values were bytes, or (better for 2.x,
where bytes are strings as far as most programmers are concerned) as a
new data type, to indicate that they're not text until the client
acknowledges them as such.  EIBTI.

Unfortunately, Martin clearly doesn't intend to make such a change to
the PEP.  I don't have the time or the Python expertise to generate an
alternative PEP. :-(  I do have long experience with the pain of
dealing with encoding issues caused by APIs that are intended to DTRT,
conveniently.  Martin's is better than most, but I just don't think
convenience and robustness can be combined in this area.

 > This sounds disturbing to me.

BTW, I'm on record as +0 on the PEP.  I don't think the better
proposals have a chance, because most people *want* the non-solution
that they can just use as a habit, allowing Python to make decisions
that should be made by the application, and not have to do
"unnecessary" conversions and the like.  It's not obvious to me that
it should not be given to them, but I don't much like it.

From p.f.moore at gmail.com  Mon Apr 27 17:58:46 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 27 Apr 2009 16:58:46 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <49EEBE2E.3090601@v.loewis.de>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <79990c6b0904270858h72760fe6m2248f0e8bb99c3d7@mail.gmail.com>

2009/4/27 Stephen J. Turnbull <stephen at xemacs.org>:
> I believe there are solutions that don't have that problem.
> Specifically, if the return values were bytes, or (better for 2.x,
> where bytes are strings as far as most programmers are concerned) as a
> new data type, to indicate that they're not text until the client
> acknowledges them as such. ?EIBTI.

I think you're ignoring the fact that under Windows, it's the *bytes*
APIs that are lossy.

Can I at least assume that you aren't recommending that only the bytes
API exists on Unix, and only the Unicode API on Windows?

So what's your suggestion?

> Unfortunately, Martin clearly doesn't intend to make such a change to
> the PEP. ?I don't have the time or the Python expertise to generate an
> alternative PEP. :-( ?I do have long experience with the pain of
> dealing with encoding issues caused by APIs that are intended to DTRT,
> conveniently. ?Martin's is better than most, but I just don't think
> convenience and robustness can be combined in this area.

The *only* "robust" solution is to completely separate the 2
platforms. Which helps no-one, and is at least as bad as the 2.x
situation. (Probably worse).

> BTW, I'm on record as +0 on the PEP. ?I don't think the better
> proposals have a chance, because most people *want* the non-solution
> that they can just use as a habit, allowing Python to make decisions
> that should be made by the application, and not have to do
> "unnecessary" conversions and the like. ?It's not obvious to me that
> it should not be given to them, but I don't much like it.

People *want* a solution that doesn't require every application
developer to sweat blood to write working code, simply to cover corner
cases that they don't believe will happen. Not every application is a
24x7 server, and all that. Similarly, not every application is a
backup program. Such applications have unique issues, which the
developers should (but don't always, admittedly!) understand. The rest
of us don't want to be made to care.

It's not sloppiness. It's a realistic appreciation of the requirements
of the application. (And an acceptance that not every bug must be
fixed before release).

Paul.

From aahz at pythoncraft.com  Mon Apr 27 17:59:13 2009
From: aahz at pythoncraft.com (Aahz)
Date: Mon, 27 Apr 2009 08:59:13 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes
	in	System?Character?Interfaces
In-Reply-To: <loom.20090427T111118-510@post.gmane.org>
References: <fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
Message-ID: <20090427155912.GA524@panix.com>

On Mon, Apr 27, 2009, Antoine Pitrou wrote:
> Stephen J. Turnbull <stephen <at> xemacs.org> writes:
>> 
>> If
>> you see a broken encoding once, you're likely to see it a million times
>> (spammers have the most broken software) or maybe have it raise an
>> unhandled Exception a dozen times (in rate of using busted software,
>> the spammers are closely followed by bosses---which would be very bad,
>> eh, if you 2/3 of the mail from your boss ends up in an undeliverables
>> queue due to encoding errors that are unhandled by your some filter in
>> your mail pipeline).
> 
> Besides, I don't care about spammers and their broken software.

Maybe you don't, but anyone who has to process random messages does; you
have to assume that messages will be broken.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From solipsis at pitrou.net  Mon Apr 27 18:09:07 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 27 Apr 2009 16:09:07 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?=
	=?utf-8?q?haracter=09Interfaces?=
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <loom.20090427T154536-926@post.gmane.org>

Stephen J. Turnbull <stephen <at> xemacs.org> writes:
> 
> I hate to break it to you, but most stages of mail processing have
> very little to do with SMTP.  In particular, processing MIME
> attachments often requires dealing with file names.

AFAIK, the file name is only there as an indication for the user when he wants
to save the file. If it's garbled a bit, no big deal.

> The point is that Martin's proposal is not just a solution
> to the problem he posed.

But you haven't concretely demonstrated it with actual use cases. The problems
that the PEP tries to solve, conversely, /have/ been experienced.

> And the APIs won't be killable until
> Python 4000.

Which APIs? The PEP doesn't propose any new API, it just enhances the
implementation of current APIs so that they work out of the box in all cases.

> Specifically, if the return values were bytes,

... it would make Windows support worse.

> or (better for 2.x,
> where bytes are strings as far as most programmers are concerned) as a
> new data type,

I'm -1 on any new string-like type (for file paths or whatever else) with custom
encoding/decoding semantics. It's the best way to ruin the clean str/bytes
separation that 3.x introduced.

Besides, the goal is also to makes things easier for the programmer. Otherwise,
we'll have the same situation as in 2.x where many English-centric programmers
produced code that was incapable of dealing with non-ASCII input, because they
didn't care about the distinction between str and unicode.

Regards

Antoine.

From dd at crosstwine.com  Mon Apr 27 18:21:15 2009
From: dd at crosstwine.com (Damien Diederen)
Date: Mon, 27 Apr 2009 18:21:15 +0200
Subject: [Python-Dev] Dropping bytes "support" in json
References: <loom.20090408T110540-221@post.gmane.org>
	<87k556dvh0.fsf@keem.bcc> <loom.20090427T151353-398@post.gmane.org>
Message-ID: <87ab62awzo.fsf@keem.bcc>

Hi Antoine,

Antoine Pitrou <solipsis at pitrou.net> writes:
> Damien Diederen <dd <at> crosstwine.com> writes:
>> I couldn't figure out a way to get rid of it short of multi-#including
>> "templates" and playing with the C preprocessor, however, and have the
>> nagging feeling the latter would be frowned upon by the maintainers.
>> 
>> There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm
>> wrong about that.  Should I give it a try, and see how "clean" the
>> result can be made?
>
> Keep in mind that json is externally maintained by Bob. The more we rework his
> code, the less easy it will be to backport other changes from the simplejson
> library.
>
> I think we should either keep the code duplication (if we want to keep fast
> paths for both bytes and str objects), or only keep one of the two versions as
> my patch does.

Yes, I was (slowly) reaching the same conclusion.

>> Provided one of the alternatives is dropped, wouldn't it be better to do
>> the opposite, i.e., have the decoder take bytes as input, and the
>> encoder produce bytes?and layer the str functionality on top of that?  I
>> guess the answer depends on how the (most common) lower layers are
>> structured, but it would be nice to allow a straight bytes path to/from
>> the underlying transport.
>
> The straightest path is actually to/from unicode, since JSON data can contain
> unicode strings but no byte strings. Also, the json library /has/ to output
> unicode when `ensure_ascii` is False. In 2.x:
>
>>>> json.dumps([u"?l?phant"], ensure_ascii=False)
> u'["\xe9l\xe9phant"]'
>
> In any case, I don't think it will matter much in terms of speed
> whether we take one route or the other. UTF-8 encoding/decoding is
> probably much faster (in characters per second) than JSON
> encoding/decoding is.

You're undoubtedly right.  I was more concerned about the interaction
with other modules, and avoiding unnecessary copies/conversions
especially when they don't make sense from the user's perspective.

I will whip up a patch adding a {loadb,dumpb} API as you suggested in
another email, with the most trivial implementation, and then we'll see
where to go from there.

It can still be dropped if there is a concern of perpetuating a "bad
idea," or I can follow up with a port of Bob's "bytes" implementation
from 2.x if there is any interest.

> Regards
> Antoine.

Cheers,
Damien

-- 
http://crosstwine.com

"Strong Opinions, Weakly Held"
                 -- Bob Johansen

From jek-gmane1 at kleckner.net  Mon Apr 27 19:10:31 2009
From: jek-gmane1 at kleckner.net (Jim Kleckner)
Date: Mon, 27 Apr 2009 10:10:31 -0700
Subject: [Python-Dev] 2.6.2 Vista installer failure on upgrade from 2.6.1
Message-ID: <gt4ou9$jse$1@ger.gmane.org>

I went to upgrade a Vista machine from 2.6.1 to 2.6.2 and got error 2755 
with the message "system cannot open the device or file".

I uninstalled 2.6.1, removing all residual files also, and got the error 
message again.

When I ran msiexec as follows to get a log, it magically worked:
  msiexec /i python-2.6.2.msi  /l*v install.log

Should I attempt to explore this further or just be happy?

From stephen at xemacs.org  Mon Apr 27 19:45:15 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 28 Apr 2009 02:45:15 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <79990c6b0904270858h72760fe6m2248f0e8bb99c3d7@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904270858h72760fe6m2248f0e8bb99c3d7@mail.gmail.com>
Message-ID: <87y6tmj8ic.fsf@uwakimon.sk.tsukuba.ac.jp>

Paul Moore writes:
 > 2009/4/27 Stephen J. Turnbull <stephen at xemacs.org>:
 > > I believe there are solutions that don't have that problem.
 > > Specifically, if the return values were bytes, or (better for 2.x,
 > > where bytes are strings as far as most programmers are concerned) as a
 > > new data type, to indicate that they're not text until the client
 > > acknowledges them as such. ?EIBTI.
 > 
 > I think you're ignoring the fact that under Windows, it's the *bytes*
 > APIs that are lossy.

The *Windows* bytes APIs may be lossy.  Python's bytes on the other
hand can represent anything that UTF-16 can.  Just represented as
UTF-8.  The point is that in Python 3 "bytes" means it's *your*
responsibility, not Python's, to decode that data.  The advantage of a
new data type is that Python can provide ways to do it and hide the
internal representation (in theory, it could even be different for the
different platforms).

 > Can I at least assume that you aren't recommending that only the bytes
 > API exists on Unix, and only the Unicode API on Windows?

I'm agnostic about the underlying APIs used to talk to the OS; people
who actually use that OS should decide that.  I'm just recommending
that the return values of the getters not be of a "character string"
type until converted explicitly by the application.

 > The *only* "robust" solution is to completely separate the 2
 > platforms.

I'm not so pessimistic, unless you're referring to Microsoft's
penchant for forking any solution they don't own.

 > People *want* a solution that doesn't require every application
 > developer to sweat blood to write working code, simply to cover
 > corner cases that they don't believe will happen.  The rest of us
 > don't want to be made to care.

Well, yes, I wrote pretty much the same thing in the post you're
replying to.  But do you really think PEP 383 as written is the unique
solution to those requirements?

From stephen at xemacs.org  Mon Apr 27 20:04:44 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 28 Apr 2009 03:04:44 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	C	haracter	Interfaces
In-Reply-To: <loom.20090427T154536-926@post.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
Message-ID: <87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp>

Antoine Pitrou writes:

 > > or (better for 2.x, where bytes are strings as far as most
 > > programmers are concerned) as a new data type,
 > 
 > I'm -1 on any new string-like type (for file paths or whatever
 > else) with custom encoding/decoding semantics. It's the best way to
 > ruin the clean str/bytes separation that 3.x introduced.

Excuse me, but I can't see a scheme that encodes bytes as Unicodes but
only sometimes as a "clean separation".  It's a dirty hack that makes
life a lot easier for Windows programmers and a little easier for many
Unix programmers.  Practicality beats purity, true, but at the cost of
the purity.

 > Besides, the goal is also to makes things easier for the
 > programmer. Otherwise, we'll have the same situation as in 2.x
 > where many English-centric programmers produced code that was
 > incapable of dealing with non-ASCII input, because they didn't care
 > about the distinction between str and unicode.

So what you'll get here, AFAICS, is a new situation where many
Windows-centric programmers will produce code that's incapable of
dealing with non-Unicode input because they don't have to care about
the distinction between Unicode and bytes.

That's an improvement, but we can do still better and not at huge
expense to programmers.

From tonynelson at georgeanelson.com  Mon Apr 27 20:08:51 2009
From: tonynelson at georgeanelson.com (Tony Nelson)
Date: Mon, 27 Apr 2009 14:08:51 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in	System	 Character
 Interfaces
In-Reply-To: <49F5532D.2090700@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090424T173704-587@post.gmane.org>
	<87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp>
	<49F215E5.4050205@g.nevcal.com> <49F30390.2040808@v.loewis.de>
	<49F5532D.2090700@g.nevcal.com>
Message-ID: <p04330105c61b9a2a8015@[192.168.123.162]>

At 23:39 -0700 04/26/2009, Glenn Linderman wrote:
>On approximately 4/25/2009 5:35 AM, came the following characters from
>the keyboard of Martin v. L?wis:
>>> Because the encoding is not reliably reversible.
>>
>> Why do you say that? The encoding is completely reversible
>> (unless we disagree on what "reversible" means).
>>
>>> I'm +1 on the concept, -1 on the PEP, due solely to the lack of a
>>> reversible encoding.
>>
>> Then please provide an example for a setup where it is not reversible.
>>
>> Regards,
>> Martin
>
>It is reversible if you know that it is decoded, and apply the encoding.
>  But if you don't know that has been encoded, then applying the reverse
>transform can convert an undecoded str that matches the decoded str to
>the form that it could have, but never did take.
>
>The problem is that there is no guarantee that the str interface
>provides only strictly conforming Unicode, so decoding bytes to
>non-strictly conforming Unicode, can result in a data pun between
>non-strictly conforming Unicode coming from the str interface vs bytes
>being decoded to non-strictly conforming Unicode coming from the bytes
>interface.
 ...

Maybe this is a dumb idea, but some people might be reassured if the
half-surrogates had some particular pattern that is unlikely to occur even
in unreasonable text (as half-surrogates are an error in Unicode).  The
pattern could be some sequence of half-surrogate encoded bytes, framing the
intended data, as is done for RFC 2047 internationalized header fields in
email.  It would take up a few more bytes in the string, but no matter.  It
would also make it easier to diagnose when decoding was not properly done.

FWIW, I like the idea in the PEP, now that I think I understand it.

(BTW, gotta love what the email package is doing to the Subject: header
field. ;-')
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>

From tonynelson at georgeanelson.com  Mon Apr 27 20:07:45 2009
From: tonynelson at georgeanelson.com (Tony Nelson)
Date: Mon, 27 Apr 2009 14:07:45 -0400
Subject: [Python-Dev]
 =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?=
 =?utf-8?q?haracter=09Interfaces?=
In-Reply-To: <loom.20090427T154536-926@post.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
Message-ID: <p04330106c61b9f2aad0a@[192.168.123.162]>

At 16:09 +0000 04/27/2009, Antoine Pitrou wrote:
>Stephen J. Turnbull <stephen <at> xemacs.org> writes:
>>
>> I hate to break it to you, but most stages of mail processing have
>> very little to do with SMTP.  In particular, processing MIME
>> attachments often requires dealing with file names.
>
>AFAIK, the file name is only there as an indication for the user when he wants
>to save the file. If it's garbled a bit, no big deal.
 ...

Yep.  In fact, it should be cleaned carefully.  RFC 2183, 2.3:

"It is important that the receiving MUA not blindly use the suggested
filename.  The suggested filename SHOULD be checked (and possibly
changed) to see that it conforms to local filesystem conventions,
does not overwrite an existing file, and does not present a security
problem (see Security Considerations below).

The receiving MUA SHOULD NOT respect any directory path information
that may seem to be present in the filename parameter.  The filename
should be treated as a terminal component only.  Portable
specification of directory paths might possibly be done in the future
via a separate Content Disposition parmeter, but no provision is
made for it in this draft."

-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>

From solipsis at pitrou.net  Mon Apr 27 20:13:47 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 27 Apr 2009 18:13:47 +0000 (UTC)
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
	<87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <loom.20090427T180957-232@post.gmane.org>

Stephen J. Turnbull <stephen <at> xemacs.org> writes:
> 
> Excuse me, but I can't see a scheme that encodes bytes as Unicodes but
> only sometimes as a "clean separation".

Yet it is. Filenames are all unicode, without exception, and there's no implicit
conversion to bytes. That's a clean separation.

> So what you'll get here, AFAICS, is a new situation where many
> Windows-centric programmers will produce code that's incapable of
> dealing with non-Unicode input because they don't have to care about
> the distinction between Unicode and bytes.

I don't understand what you're saying. py3k filenames are all unicode, even on
POSIX systems, so where is the problem with/for Windows programmers?

From asmodai at in-nomine.org  Mon Apr 27 20:28:40 2009
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Mon, 27 Apr 2009 20:28:40 +0200
Subject: [Python-Dev] UTF-8 Decoder
In-Reply-To: <loom.20090414T143924-906@post.gmane.org>
References: <20090413080908.GM13110@nexus.in-nomine.org>
	<loom.20090414T143924-906@post.gmane.org>
Message-ID: <20090427182840.GA64563@nexus.in-nomine.org>

-On [20090414 16:43], Antoine Pitrou (solipsis at pitrou.net) wrote:
>If you have some time on your hands, you could try benchmarking it against
>Python 3.1's (py3k) decoder. There are two cases to consider:

Bjoern actually did it himself already:

http://bjoern.hoehrmann.de/utf-8/decoder/dfa/#performance

(results are Large, Medium, Tiny)

PyUnicode_DecodeUTF8Stateful (3.1a2), Visual C++ 7.1 -Ox -Ot -G7
4523ms 	5686ms 	3138ms

Manually inlined transcoder (see above), Visual C++ 7.1 -Ox -Ot -G7
4277ms 	4998ms 	4640ms

So on medium and large datasets the decoder of Bjoern is very interesting,
but the tiny case (just Bjoern's name) is quite a tad bit slower. The other
cases seems more typical of what the average use in Python would be.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Nobilitas sola est atque unica virtus...

From solipsis at pitrou.net  Mon Apr 27 20:48:38 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 27 Apr 2009 18:48:38 +0000 (UTC)
Subject: [Python-Dev] UTF-8 Decoder
References: <20090413080908.GM13110@nexus.in-nomine.org>
	<loom.20090414T143924-906@post.gmane.org>
	<20090427182840.GA64563@nexus.in-nomine.org>
Message-ID: <loom.20090427T184250-607@post.gmane.org>

Jeroen Ruigrok van der Werven <asmodai <at> in-nomine.org> writes:
> 
> So on medium and large datasets the decoder of Bjoern is very interesting,
> but the tiny case (just Bjoern's name) is quite a tad bit slower. The other
> cases seems more typical of what the average use in Python would be.

Keep in mind what the datasets are:

? The large buffer is a April 2009 Hindi Wikipedia article XML dump, the medium
buffer Markus Kuhn's UTF-8-demo.txt, and the tiny buffer my name ?

It would be interesting to test with mostly ASCII data to see what that gives.
Now the good thing is that, even with wildly non-ASCII data, our current decoder
is very efficient.

Regards

Antoine.

From martin at v.loewis.de  Mon Apr 27 21:42:02 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 27 Apr 2009 21:42:02 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F559A4.8050400@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>
	<49F559A4.8050400@g.nevcal.com>
Message-ID: <49F60A8A.8090603@v.loewis.de>

>> It's a private use area. It will never carry an official character
>> assignment.
> 
> 
> I know that U+F0000 - U+FFFFF is a private use area.  I don't find a
> definition of U+F01xx to know what the notation means.  Are you picking
> a particular character within the private use area, or a particular
> range, or what?

It's a range. The lower-case 'x' denotes a variable half-byte, ranging
from 0 to F. So this is the range U+F0100..U+F01FF, giving 256 code
points.

Regards,
Martin

From martin at v.loewis.de  Mon Apr 27 21:48:27 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 27 Apr 2009 21:48:27 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F56F8B.7030108@g.nevcal.com>
References: <20090427075549.GA4418@cskk.homeip.net>
	<49F56F8B.7030108@g.nevcal.com>
Message-ID: <49F60C0B.9000905@v.loewis.de>

>>> There are still issues regarding how Windows and POSIX programs that
>>> are  sharing cross-mounted file systems might communicate file names
>>> between  each other, which is not at all clear from the PEP.  If this
>>> is an  insoluble or un-addressed issue, it should be stated.  (It is
>>> probably  insoluble, due to there being multiple ways that the
>>> cross-mounted file  systems might translate names; but if there are,
>>> can we learn something  from the rules the mounting systems use, to
>>> be compatible with (one of)  them, or not.
>>>     
>>
>> I'd say that's out of scope. A windows filesystem mounted on a UNIX host
>> should probably be mounted with a mapping to translate the Windows
>> Unicode names into whatever the sysadmin deems the locally most apt
>> byte encoding. But sys.getfilesystemencoding() is based on the current
>> user's locale settings, which need not be the same.
>>   
> 
> And if it were, what would it do with files that can't be encoded with
> the locally most apt byte encoding? 

As Cameron says: it's out of the scope of the PEP. It really depends how
the operating system deals with them. Most likely, the files are not
accessible - not only not from Python, but also not accessible from
any other Unix program. Details depend on the specific operating system
software being used, and the specific parameters passed to it.

> That's where we might learn
> something about what behaviors are deemed acceptable.  Would such files
> be inaccessible?  Accessible with mangled names?  or what?

Difficult to tell. What operating system did you use, and what mount
options did you pass?

> And for a Unix filesystem mounted on a Windows host?  Or accessed via
> some network connection?

Same issue really: what specific mounting software did you use? Windows
cannot mount Unix file systems on its own, or through some network
connection.

Regards,
Martin

From martin at v.loewis.de  Mon Apr 27 22:27:27 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 27 Apr 2009 22:27:27 +0200
Subject: [Python-Dev] 2.6.2 Vista installer failure on upgrade from 2.6.1
In-Reply-To: <gt4ou9$jse$1@ger.gmane.org>
References: <gt4ou9$jse$1@ger.gmane.org>
Message-ID: <49F6152F.9080707@v.loewis.de>

Jim Kleckner wrote:
> I went to upgrade a Vista machine from 2.6.1 to 2.6.2 and got error 2755
> with the message "system cannot open the device or file".
> 
> I uninstalled 2.6.1, removing all residual files also, and got the error
> message again.
> 
> When I ran msiexec as follows to get a log, it magically worked:
>  msiexec /i python-2.6.2.msi  /l*v install.log
> 
> Should I attempt to explore this further or just be happy?

Where you by an chance using a SUBSTed drive? If so, just be happy:
this is a known limitation (of Windows installer).

Otherwise, if you can contribute a useful bug report (or even a patch),
please go ahead. I would try to turn logging on through the registry and
see whether that gives any insight.

Regards,
Martin

From cs at zip.com.au  Mon Apr 27 23:14:47 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Tue, 28 Apr 2009 07:14:47 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F559A4.8050400@g.nevcal.com>
Message-ID: <20090427211447.GA4291@cskk.homeip.net>

On 27Apr2009 00:07, Glenn Linderman <v+python at g.nevcal.com> wrote:
> On approximately 4/25/2009 5:22 AM, came the following characters from  
> the keyboard of Martin v. L?wis:
>>> The problem with this, and other preceding schemes that have been
>>> discussed here, is that there is no means of ascertaining whether a
>>> particular file name str was obtained from a str API, or was funny-
>>> decoded from a bytes API... and thus, there is no means of reliably
>>> ascertaining whether a particular filename str should be passed to a
>>> str API, or funny-encoded back to bytes.
>>
>> Why is it necessary that you are able to make this distinction?
>
>
> It is necessary that programs (not me) can make the distinction, so that  
> it knows whether or not to do the funny-encoding or not.

I would say this isn't so. It's important that programs know if they're
dealing with strings-for-filenames, but not that they be able to figure
that out "a priori" if handed a bare string (especially since they
can't:-)

> If a name is  
> funny-decoded when the name is accessed by a directory listing, it needs  
> to be funny-encoded in order to open the file.

Hmm. I had thought that legitimate unicode strings already get transcoded
to bytes via the mapping specified by sys.getfilesystemencoding()
(the user's locale). That already happens I believe, and Martin's
scheme doesn't change this. He's just funny-encoding non-decodable byte
sequences, not the decoded stuff that surrounds them.

So it is already the case that strings get decoded to bytes by
calls like open(). Martin isn't changing that.

I suppose if your program carefully constructs a unicode string riddled
with half-surrogates etc and imagines something specific should happen
to them on the way to being POSIX bytes then you might have a problem...

I think the advantage to Martin's choice of encoding-for-undecodable-bytes
is that it _doesn't_ use normal characters for the special bits. This
means that _all_ normal characters are left unmangled un both "bare"
and "funny-encoded" strings.

Because of that, I now think I'm -1 on your "use printable characters
for the encoding". I think presentation of the special characters
_should_ look bogus in an app (eg little rectangles or whatever in a
GUI); it's a fine flashing red light to the user.

Also, by avoiding reuse of legitimate characters in the encoding we can
avoid your issue with losing track of where a string came from;
legitimate characters are currently untouched by Martin's scheme, except
for the normal "bytes<->string via the user's locale" translation that
must already happen, and there you're aided by byets and strings being
different types.

> I'm certainly not experienced enough in Python development processes or  
> internals to attempt such, as yet.  But somewhere in 25 years of  
> programming, I picked up the knowledge that if you want to have a 1-to-1  
> reversible mapping, you have to avoid data puns, mappings of two  
> different data values into a single data value.  Your PEP, as first  
> written, didn't seem to do that... since there are two interfaces from  
> which to obtain data values, one performing a mapping from bytes to  
> "funny invalid" Unicode, and the other performing no mapping, but  
> accepting any sort of Unicode, possibly including "funny invalid"  
> Unicode, the possibility of data puns seems to exist.  I may be  
> misunderstanding something about the use cases that prevent these two  
> sources of "funny invalid" Unicode from ever coexisting, but if so,  
> perhaps you could point it out, or clarify the PEP.

Please elucidate the "second source" of strings. I'm presuming you mean
strings egenrated from scratch rather than obtained by something like
listdir().

Given such a string with "funny invalid" stuff in it, and _absent_
Martin's scheme, what do you expect the source of the strings to _expect_
to happen to them if passed to open()? They still have to be converted
to bytes at the POSIX layer anyway.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Heaven could change from chocolate to vanilla without violating perfection.
        - arromdee at jyusenkyou.cs.jhu.edu (Ken Arromdee)

From hodgestar+pythondev at gmail.com  Mon Apr 27 23:27:23 2009
From: hodgestar+pythondev at gmail.com (Simon Cross)
Date: Mon, 27 Apr 2009 23:27:23 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F60C0B.9000905@v.loewis.de>
References: <20090427075549.GA4418@cskk.homeip.net>
	<49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de>
Message-ID: <fb73205e0904271427u20c082cdi93d30f6e3befd1f3@mail.gmail.com>

On Mon, Apr 27, 2009 at 9:48 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> As Cameron says: it's out of the scope of the PEP. It really depends how
> the operating system deals with them. Most likely, the files are not
> accessible - not only not from Python, but also not accessible from
> any other Unix program. Details depend on the specific operating system
> software being used, and the specific parameters passed to it.

$ touch $'\xFF\xAA\xFF'
$ vi $'\xFF\xAA\xFF'
$ egrep foo $'\xFF\xAA\xFF'

All worked fine from my Bash shell with locale encoding set to UTF-8.
I can also open the created file from the GNOME editor file dialog (it
even tells me the filename is not valid in my locale's encoding). The
Nedit editor also worked. So far I haven't found anything that failed.

From martin at v.loewis.de  Mon Apr 27 23:33:56 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 27 Apr 2009 23:33:56 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <fb73205e0904271427u20c082cdi93d30f6e3befd1f3@mail.gmail.com>
References: <20090427075549.GA4418@cskk.homeip.net>	<49F56F8B.7030108@g.nevcal.com>
	<49F60C0B.9000905@v.loewis.de>
	<fb73205e0904271427u20c082cdi93d30f6e3befd1f3@mail.gmail.com>
Message-ID: <49F624C4.10006@v.loewis.de>

> $ touch $'\xFF\xAA\xFF'
> $ vi $'\xFF\xAA\xFF'
> $ egrep foo $'\xFF\xAA\xFF'
> 
> All worked fine from my Bash shell with locale encoding set to UTF-8.
> I can also open the created file from the GNOME editor file dialog (it
> even tells me the filename is not valid in my locale's encoding). The
> Nedit editor also worked. So far I haven't found anything that failed.

So what SMB server did you mount here, using what software, and what
mount options?

I think you might be referring to an entirely different use case.

Regards,
Martin

From solipsis at pitrou.net  Mon Apr 27 23:55:41 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 27 Apr 2009 21:55:41 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?=
	=?utf-8?q?haracter=09Interfaces?=
References: <20090427075549.GA4418@cskk.homeip.net>
	<49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de>
	<fb73205e0904271427u20c082cdi93d30f6e3befd1f3@mail.gmail.com>
Message-ID: <loom.20090427T215429-816@post.gmane.org>

Simon Cross <hodgestar+pythondev <at> gmail.com> writes:
> 
> $ touch $'\xFF\xAA\xFF'
> $ vi $'\xFF\xAA\xFF'
> $ egrep foo $'\xFF\xAA\xFF'
> 
> All worked fine from my Bash shell with locale encoding set to UTF-8.

The PEP is precisely about making py3k able to better handle these files (right
now os.listdir() doesn't return the offending file in its list of results).

Regards

Antoine.

From fuzzyman at voidspace.org.uk  Tue Apr 28 00:56:05 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Mon, 27 Apr 2009 23:56:05 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System	C	haracter
 Interfaces
In-Reply-To: <87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>	<49F18E90.9070801@nevcal.com>	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>	<20090424152746.GA9543@panix.com>	<loom.20090424T153112-15@post.gmane.org>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090427T111118-510@post.gmane.org>	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090427T154536-926@post.gmane.org>
	<87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <49F63805.6000208@voidspace.org.uk>

Stephen J. Turnbull wrote:
> Antoine Pitrou writes:
>
>  > > or (better for 2.x, where bytes are strings as far as most
>  > > programmers are concerned) as a new data type,
>  > 
>  > I'm -1 on any new string-like type (for file paths or whatever
>  > else) with custom encoding/decoding semantics. It's the best way to
>  > ruin the clean str/bytes separation that 3.x introduced.
>
> Excuse me, but I can't see a scheme that encodes bytes as Unicodes but
> only sometimes as a "clean separation".  It's a dirty hack that makes
> life a lot easier for Windows programmers and a little easier for many
> Unix programmers.  Practicality beats purity, true, but at the cost of
> the purity.
>
>   

The problem you don't address, which is still the reality for most 
programmers (especially Mac OS X where filesystem encoding is UTF 8), is 
that programmers *are* going to treat filenames as strings.

The proposed PEP allows that to work for them - whatever platform their 
program runs on.

Michael

>  > Besides, the goal is also to makes things easier for the
>  > programmer. Otherwise, we'll have the same situation as in 2.x
>  > where many English-centric programmers produced code that was
>  > incapable of dealing with non-ASCII input, because they didn't care
>  > about the distinction between str and unicode.
>
> So what you'll get here, AFAICS, is a new situation where many
> Windows-centric programmers will produce code that's incapable of
> dealing with non-Unicode input because they don't have to care about
> the distinction between Unicode and bytes.
>
> That's an improvement, but we can do still better and not at huge
> expense to programmers.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From v+python at g.nevcal.com  Tue Apr 28 01:09:13 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 27 Apr 2009 16:09:13 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F60A8A.8090603@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>
	<49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de>
Message-ID: <49F63B19.7010306@g.nevcal.com>

On approximately 4/27/2009 12:42 PM, came the following characters from 
the keyboard of Martin v. L?wis:
>>> It's a private use area. It will never carry an official character
>>> assignment.
>>
>> I know that U+F0000 - U+FFFFF is a private use area.  I don't find a
>> definition of U+F01xx to know what the notation means.  Are you picking
>> a particular character within the private use area, or a particular
>> range, or what?
> 
> It's a range. The lower-case 'x' denotes a variable half-byte, ranging
> from 0 to F. So this is the range U+F0100..U+F01FF, giving 256 code
> points.

So you only need 128 code points, so there is something else unclear.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at g.nevcal.com  Tue Apr 28 01:46:06 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 27 Apr 2009 16:46:06 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F60C0B.9000905@v.loewis.de>
References: <20090427075549.GA4418@cskk.homeip.net>
	<49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de>
Message-ID: <49F643BE.4050605@g.nevcal.com>

On approximately 4/27/2009 12:48 PM, came the following characters from 
the keyboard of Martin v. L?wis:
>>>> There are still issues regarding how Windows and POSIX programs that
>>>> are  sharing cross-mounted file systems might communicate file names
>>>> between  each other, which is not at all clear from the PEP.  If this
>>>> is an  insoluble or un-addressed issue, it should be stated.  (It is
>>>> probably  insoluble, due to there being multiple ways that the
>>>> cross-mounted file  systems might translate names; but if there are,
>>>> can we learn something  from the rules the mounting systems use, to
>>>> be compatible with (one of)  them, or not.
>>>>     
>>>>         
>>> I'd say that's out of scope. A windows filesystem mounted on a UNIX host
>>> should probably be mounted with a mapping to translate the Windows
>>> Unicode names into whatever the sysadmin deems the locally most apt
>>> byte encoding. But sys.getfilesystemencoding() is based on the current
>>> user's locale settings, which need not be the same.
>>>   
>>>       
>> And if it were, what would it do with files that can't be encoded with
>> the locally most apt byte encoding? 
>>     
>
> As Cameron says: it's out of the scope of the PEP. It really depends how
> the operating system deals with them. Most likely, the files are not
> accessible - not only not from Python, but also not accessible from
> any other Unix program. Details depend on the specific operating system
> software being used, and the specific parameters passed to it.
>   

I'm not suggesting the PEP should solve the problem of mounting foreign 
file systems, although if it doesn't it should probably point that out.  
I'm just suggesting that if the people that write software to solve the 
problem of mounting foreign file systems have already solved the naming 
problem, then it might be a source of a good solution.  On the other 
hand, it might be the source of a mediocre or bad solution.  However, if 
those mounting system have good solutions, it would be good to be 
compatible with them, rather than have yet another solution.  It was in 
that sense, of thinking about possibly existing practice, and leveraging 
an existing solution, that caused me to bring up the topic.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From steve at pearwood.info  Tue Apr 28 02:27:17 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 28 Apr 2009 10:27:17 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <loom.20090427T180957-232@post.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de>
	<87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T180957-232@post.gmane.org>
Message-ID: <200904281027.20431.steve@pearwood.info>

On Tue, 28 Apr 2009 04:13:47 am Antoine Pitrou wrote:
> Stephen J. Turnbull <stephen <at> xemacs.org> writes:
...
> > So what you'll get here, AFAICS, is a new situation where many
> > Windows-centric programmers will produce code that's incapable of
> > dealing with non-Unicode input because they don't have to care
> > about the distinction between Unicode and bytes.
>
> I don't understand what you're saying. py3k filenames are all
> unicode, even on POSIX systems, 

How is that possible on POSIX systems where the underlying file system 
uses bytes for filenames?

If I write a piece of Python code:

    filename = 'some path/some name'

I might call it a filename, I might think of it as a filename, but it 
*isn't*, it's a string in a Python program. It isn't a filename until 
it hits the file system, and in POSIX systems that makes it bytes.

-- 
Steven D'Aprano

From cs at zip.com.au  Tue Apr 28 02:42:32 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Tue, 28 Apr 2009 10:42:32 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F60C0B.9000905@v.loewis.de>
Message-ID: <20090428004232.GA12325@cskk.homeip.net>

On 27Apr2009 21:48, Martin v. L?wis <martin at v.loewis.de> wrote:
| >>> There are still issues regarding how Windows and POSIX programs that
| >>> are  sharing cross-mounted file systems might communicate file names
| >>> between  each other, which is not at all clear from the PEP.  If this
| >>> is an  insoluble or un-addressed issue, it should be stated.  (It is
| >>> probably  insoluble, due to there being multiple ways that the
| >>> cross-mounted file  systems might translate names; but if there are,
| >>> can we learn something  from the rules the mounting systems use, to
| >>> be compatible with (one of)  them, or not.
| >>
| >> I'd say that's out of scope. A windows filesystem mounted on a UNIX host
| >> should probably be mounted with a mapping to translate the Windows
| >> Unicode names into whatever the sysadmin deems the locally most apt
| >> byte encoding. But sys.getfilesystemencoding() is based on the current
| >> user's locale settings, which need not be the same.
| >>   
| > 
| > And if it were, what would it do with files that can't be encoded with
| > the locally most apt byte encoding? 
| 
| As Cameron says: it's out of the scope of the PEP. It really depends how
| the operating system deals with them. Most likely, the files are not
| accessible - not only not from Python, but also not accessible from
| any other Unix program.

Well... If the files exist and the encoding of the mount software
permits, there will be a sequence of bytes for the filename, and it
will be accessible to a pure UNIX byte-speaking program. It will also
be accessible from Python, because the os.* calls convert both ways:
bytes->string an string->bytes as required. Martin's PEP just makes that
lossless, which current it is not.

Conversely, if the mount software refuses to map the filename to a POSIX
byte string, the file won't exist, or will refuse to be created. For a
concrete example we have but to observe my macify program I was trying
to counter the PEP with (I'm now a convert, btw). It is to run on a real
UNIX system and recode filenames into UTF-8 NFD, _prior_ to rsyncing
to a Mac. Why? Because the MacOSX HFS filesystem refuses to accept byte
strings not parsable by that encoding, and my music rsyncs were exploding,
refusing to create files on the target Mac.

And there's probably some grey area where a dodgy mount software will present
names that can't be used.

There's a supposed counter example in another followup post which I'll
address there, since it seemed a little bogus to me.

I think that, almost independent of this PEP, there should be an
os.fsencode() function that takes a byte string (as a POSIX OS call
will take) and performs the _same_ byte->string encoding that listdir()
and friends are doing under the hood. And a partner os.fsdecode() for
string->bytes. That will save a lot of wheel respoking and probably make
it easier for people to think about this.

Aside: thinking on that, perhaps those functions should be in posix.*,
or alternatively would a Windows system offer them in os.* to produce
native UTF-16 byte strings; useless for the WIndows API which cleanly
takes unicode (I gather) but perhaps handy for people hacking filesystems
directly or something like that.  (Except I gather from a former existence
that there is a multitude of on-disk filename encoding under WIndows
depending how old your filesystems are and if they're FAT or NTFS, etc).

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Your eyes are weary from staring at the CRT.  You feel sleepy.  Notice how
restful it is to watch the cursor blink.  Close your eyes.  The opinions
stated above are yours.  You cannot imagine why you ever felt otherwise.
        - gabrielh at tplrd.tpl.oz.au

From cs at zip.com.au  Tue Apr 28 02:48:09 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Tue, 28 Apr 2009 10:48:09 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <fb73205e0904271427u20c082cdi93d30f6e3befd1f3@mail.gmail.com>
Message-ID: <20090428004809.GA17780@cskk.homeip.net>

On 27Apr2009 23:27, Simon Cross <hodgestar+pythondev at gmail.com> wrote:
| On Mon, Apr 27, 2009 at 9:48 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
| > As Cameron says: it's out of the scope of the PEP. It really depends how
| > the operating system deals with them. Most likely, the files are not
| > accessible - not only not from Python, but also not accessible from
| > any other Unix program. Details depend on the specific operating system
| > software being used, and the specific parameters passed to it.
| 
| $ touch $'\xFF\xAA\xFF'
| $ vi $'\xFF\xAA\xFF'
| $ egrep foo $'\xFF\xAA\xFF'
| 
| All worked fine from my Bash shell with locale encoding set to UTF-8.
| I can also open the created file from the GNOME editor file dialog (it
| even tells me the filename is not valid in my locale's encoding). The
| Nedit editor also worked. So far I haven't found anything that failed.

Yes, they would. Are you doing that on a real UNIX filesystem
(ext2/3/4, XFS etc)?

I'm not sure whether you're arguing for or against the propsal here,
btw.

This would make a file with a presumably UTF-8-invalid name. Martin's
proposal would cheerfully map that losslessly to a string. Is there a
problem here?
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Stepwise Refinement n.  A sequence of kludges K, neither distinct or finite,
applied to a program P aimed at transforming it into the target program Q.

From benjamin at python.org  Tue Apr 28 03:09:10 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 27 Apr 2009 20:09:10 -0500
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090428004232.GA12325@cskk.homeip.net>
References: <49F60C0B.9000905@v.loewis.de>
	<20090428004232.GA12325@cskk.homeip.net>
Message-ID: <1afaf6160904271809g773641aag3975ff67178ab69@mail.gmail.com>

2009/4/27 Cameron Simpson <cs at zip.com.au>:
> I think that, almost independent of this PEP, there should be an
> os.fsencode() function that takes a byte string (as a POSIX OS call
> will take) and performs the _same_ byte->string encoding that listdir()
> and friends are doing under the hood. And a partner os.fsdecode() for
> string->bytes. That will save a lot of wheel respoking and probably make
> it easier for people to think about this.

some_path.encode(sys.getfilesystemencoding())

-- 
Regards,
Benjamin

From v+python at g.nevcal.com  Tue Apr 28 03:15:17 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 27 Apr 2009 18:15:17 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090427211447.GA4291@cskk.homeip.net>
References: <20090427211447.GA4291@cskk.homeip.net>
Message-ID: <49F658A5.7080807@g.nevcal.com>

On approximately 4/27/2009 2:14 PM, came the following characters from 
the keyboard of Cameron Simpson:
> On 27Apr2009 00:07, Glenn Linderman <v+python at g.nevcal.com> wrote:
>   
>> On approximately 4/25/2009 5:22 AM, came the following characters from  
>> the keyboard of Martin v. L?wis:
>>     
>>>> The problem with this, and other preceding schemes that have been
>>>> discussed here, is that there is no means of ascertaining whether a
>>>> particular file name str was obtained from a str API, or was funny-
>>>> decoded from a bytes API... and thus, there is no means of reliably
>>>> ascertaining whether a particular filename str should be passed to a
>>>> str API, or funny-encoded back to bytes.
>>>>         
>>> Why is it necessary that you are able to make this distinction?
>>>       
>> It is necessary that programs (not me) can make the distinction, so that  
>> it knows whether or not to do the funny-encoding or not.
>>     
>
> I would say this isn't so. It's important that programs know if they're
> dealing with strings-for-filenames, but not that they be able to figure
> that out "a priori" if handed a bare string (especially since they
> can't:-)
>   

So you agree they can't... that there are data puns.   (OK, you may not 
have thought that through)

>> If a name is  
>> funny-decoded when the name is accessed by a directory listing, it needs  
>> to be funny-encoded in order to open the file.
>>     
>
> Hmm. I had thought that legitimate unicode strings already get transcoded
> to bytes via the mapping specified by sys.getfilesystemencoding()
> (the user's locale). That already happens I believe, and Martin's
> scheme doesn't change this. He's just funny-encoding non-decodable byte
> sequences, not the decoded stuff that surrounds them.
>   

So assume a non-decodable sequence in a name.  That puts us into 
Martin's funny-decode scheme.  His funny-decode scheme produces a bare 
string, indistinguishable from a bare string that would be produced by a 
str API that happens to contain that same sequence.  Data puns.

So when open is handed the string, should it open the file with the name 
that matches the string, or the file with the name that funny-decodes to 
the same string?  It can't know, unless it knows that the string is a 
funny-decoded string or not.

> So it is already the case that strings get decoded to bytes by
> calls like open(). Martin isn't changing that.
>   

I thought the process of converting strings to bytes is called 
encoding.  You seem to be calling it decoding?

> I suppose if your program carefully constructs a unicode string riddled
> with half-surrogates etc and imagines something specific should happen
> to them on the way to being POSIX bytes then you might have a problem...
>   

Right.  Or someone else's program does that.  I only want to use Unicode 
file names.  But if those other file names exist, I want to be able to 
access them, and not accidentally get a different file.

> I think the advantage to Martin's choice of encoding-for-undecodable-bytes
> is that it _doesn't_ use normal characters for the special bits. This
> means that _all_ normal characters are left unmangled un both "bare"
> and "funny-encoded" strings.
>   

Whether the characters used for funny decoding are normal or abnormal, 
unless they are prevented from also appearing in filenames when they are 
obtained from or passed to other APIs, there is the possibility that the 
funny-decoded name also exists in the filesystem by the funny-decoded 
name... a data pun on the name.

Whether the characters used for funny decoding are normal or abnormal, 
if they are not prevented from also appearing in filenames when they are 
obtained from or passed to other APIs, then in order to prevent data 
puns, *all* names must be passed through the decoder, and the decoder 
must perform a 1-to-1 reversible mapping.  Martin's funny-decode process 
does not perform a 1-to-1 reversible mapping (unless he's changed it 
from the version of the PEP I found to read).

This is why some people have suggested using the null character for the 
decoding, because it and / can't appear in POSIX file names, but 
everything else can.  But that makes it really hard to display the 
funny-decoded characters.

> Because of that, I now think I'm -1 on your "use printable characters
> for the encoding". I think presentation of the special characters
> _should_ look bogus in an app (eg little rectangles or whatever in a
> GUI); it's a fine flashing red light to the user.
>   

The reason I picked a ASCII printable character is just to make it 
easier for humans to see the encoding.  The scheme would also work with 
a non-ASCII non-printable character... but I fail to see how that would 
help a human compare the strings on a display of file names.  Having a 
bunch of abnormal characters in a row, displayed using a single 
replacement glyph, just makes an annoying mess in the file open dialog.

> Also, by avoiding reuse of legitimate characters in the encoding we can
> avoid your issue with losing track of where a string came from;
> legitimate characters are currently untouched by Martin's scheme, except
> for the normal "bytes<->string via the user's locale" translation that
> must already happen, and there you're aided by byets and strings being
> different types.
>   

There are abnormal characters, but there are no illegal characters.  
NTFS permits any 16-bit "character" code, including abnormal ones, 
including half-surrogates, and including full surrogate sequences that 
decode to PUA characters.  POSIX permits all byte sequences, including 
things that look like UTF-8, things that don't look like UTF-8, things 
that look like half-surrogates, and things that look like full surrogate 
sequences that decode to PUA characters.

So whether the decoding/encoding scheme uses common characters, or 
uncommon characters, you still have the issue of data puns, unless you 
use a 1-to-1 transformation, that is reversible.  With ASCII strings, I 
think no one questions that you need to escape the escape characters.  C 
uses \ as an escape character... Everyone understands that if you want 
to use a \ in a C string, you have to use \\ instead... and that scheme 
has escaped the boundaries of C to other use cases.  But it seems that 
you think that if we could just find one more character that no one else 
uses, that we wouldn't have to escape it.... and that could be true, but 
there aren't any characters that no one else uses.  So whatever 
character (and a range makes it worse) you pick, someone else uses it.  
So in order for the scheme to work, you have to escape the escape 
character(s), even in names that wouldn't otherwise need to be 
funny-decoded.

>> I'm certainly not experienced enough in Python development processes or  
>> internals to attempt such, as yet.  But somewhere in 25 years of  
>> programming, I picked up the knowledge that if you want to have a 1-to-1  
>> reversible mapping, you have to avoid data puns, mappings of two  
>> different data values into a single data value.  Your PEP, as first  
>> written, didn't seem to do that... since there are two interfaces from  
>> which to obtain data values, one performing a mapping from bytes to  
>> "funny invalid" Unicode, and the other performing no mapping, but  
>> accepting any sort of Unicode, possibly including "funny invalid"  
>> Unicode, the possibility of data puns seems to exist.  I may be  
>> misunderstanding something about the use cases that prevent these two  
>> sources of "funny invalid" Unicode from ever coexisting, but if so,  
>> perhaps you could point it out, or clarify the PEP.
>>     
>
> Please elucidate the "second source" of strings. I'm presuming you mean
> strings egenrated from scratch rather than obtained by something like
> listdir().
>   

POSIX has byte APIs for strings, that's one source, that is most under 
discussion.  Windows has both bytes and 16-bit APIs for strings... the 
16-bit APIs are generally mapped directly to UTF-16, but are not checked 
for UTF-16 validity, so all of Martin's funny-decoded files could be 
used for Windows file names on the 16-bit APIs.  And yes, strings can be 
generated from scratch.

> Given such a string with "funny invalid" stuff in it, and _absent_
> Martin's scheme, what do you expect the source of the strings to _expect_
> to happen to them if passed to open()? They still have to be converted
> to bytes at the POSIX layer anyway.

There is a fine encoding scheme that can take any str and encode to 
bytes: UTF-8.

The problem is that UTF-8 doesn't work to take any byte sequence and 
decode to str, and that means that special handling has to happen when 
such byte sequences are encountered.  But there is no str that can be 
generated that can't be generated in other ways, which would be properly 
encoded to a different byte sequence.  Hence there are data puns, no 
1-to-1 mapping.  Hence it seems obvious to me that the only complete 
solution is to have an escape character, and ensure that all strings are 
decoded and encoded.  As soon as you have an escape character, then you 
can decode anything into displayable, standard, Unicode, and you can 
create the reverse encoding unambiguously.

Without an escape character, you just have a heuristic that will work 
sometimes, and break sometimes.  If you believe non-UTF-8-decodable byte 
sequences are rare, you can ignore them.  That's what we do now, but 
people squawk.  If you believe that you can invent an encoding that has 
data puns, and that because of the character or characters involved are 
rare, that the problems that result can be ignored, fine... but people 
will squawk when they hit the problem... I'm just trying to squawk now, 
to point out that this is complexity for complexities sake, it adds 
complexity to trade one problem for a different problem, under the 
belief that the other problem is somehow rarer than the first.  And 
maybe it is, today.  I'd much rather have a solution that actually 
solves the problem.

If you don't like ? as the escape character, then pick U+10F01, and 
anytime a U+10F01 is encountered in a file name, double it.  And anytime 
there is an undecodable byte sequence, emit U+10F01, and then U+80 
through U+FF as a subsequent character for the first byte in the 
undecodable sequence, and restart the decoder with the next byte.  
That'll work too.  But use of rare, abnormal characters to take the 
place of undecodable bytes can never work, because of data puns, and 
valid use of the rare, abnormal characters.

Someone suggested treating the byte sequences of the rare, abnormal 
characters as undecodable bytes, and decoding them using the same 
substitution rules.  That would work too, if applied consistently, 
because then the rare, abnormal characters would each be escaped.  But 
having 128 escape characters seems more complex than necessary, also.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at g.nevcal.com  Tue Apr 28 03:24:24 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 27 Apr 2009 18:24:24 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090428004232.GA12325@cskk.homeip.net>
References: <20090428004232.GA12325@cskk.homeip.net>
Message-ID: <49F65AC8.5010206@g.nevcal.com>

On approximately 4/27/2009 5:42 PM, came the following characters from 
the keyboard of Cameron Simpson:
> I think that, almost independent of this PEP, there should be an
> os.fsencode() function that takes a byte string (as a POSIX OS call
> will take) and performs the _same_ byte->string encoding that listdir()
> and friends are doing under the hood. And a partner os.fsdecode() for
> string->bytes. That will save a lot of wheel respoking and probably make
> it easier for people to think about this.
>   

If a generally useful encoding scheme is invented for transforming file 
names within Python, it should definitely be made available for those 
cases where the application must transform between an encoded Python 
name and either a str or bytes interface presented by 3rd party software.

It should be available on all platforms, so that portable code can be 
written.  Of course, if there are variations in the 3rd party software 
on the various platforms, there still may be a need for 
platform-specific code.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From cs at zip.com.au  Tue Apr 28 04:11:17 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Tue, 28 Apr 2009 12:11:17 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F658A5.7080807@g.nevcal.com>
Message-ID: <20090428021117.GA25536@cskk.homeip.net>

On 27Apr2009 18:15, Glenn Linderman <v+python at g.nevcal.com> wrote:
>>>>> The problem with this, and other preceding schemes that have been
>>>>> discussed here, is that there is no means of ascertaining whether a
>>>>> particular file name str was obtained from a str API, or was funny-
>>>>> decoded from a bytes API... and thus, there is no means of reliably
>>>>> ascertaining whether a particular filename str should be passed to a
>>>>> str API, or funny-encoded back to bytes.
>>>>>         
>>>> Why is it necessary that you are able to make this distinction?
>>>>       
>>> It is necessary that programs (not me) can make the distinction, so 
>>> that  it knows whether or not to do the funny-encoding or not.
>>>     
>>
>> I would say this isn't so. It's important that programs know if they're
>> dealing with strings-for-filenames, but not that they be able to figure
>> that out "a priori" if handed a bare string (especially since they
>> can't:-)
>
> So you agree they can't... that there are data puns.   (OK, you may not  
> have thought that through)

I agree you can't examine a string and know if it came from the os.* munging
or from someone else's munging.

I totally disagree that this is a problem.

There may be puns. So what? Use the right strings for the right purpose
and all will be well.

I think what is missing here, and missing from Martin's PEP, is some
utility functions for the os.* namespace.

PROPOSAL: add to the PEP the following functions:

  os.fsdecode(bytes) -> funny-encoded Unicode
    This is what os.listdir() does to produce the strings it hands out.
  os.fsencode(funny-string) -> bytes
    This is what open(filename,..) does to turn the filename into bytes
    for the POSIX open.
  os.pathencode(your-string) -> funny-encoded-Unicode
    This is what you must do to a de novo string to turn it into a
    string suitable for use by open.
    Importantly, for most strings not hand crafted to have weird
    sequences in them, it is a no-op. But it will recode your puns
    for survival.

and for me, I would like to see:

  os.setfilesystemencoding(coding)

Currently os.getfilesystemencoding() returns you the encoding based on
the current locale, and (I trust) the os.* stuff encodes on that basis.
setfilesystemencoding() would override that, unless coding==None in what
case it reverts to the former "use the user's current locale" behaviour.
(We have locale "C" for what one might otherwise expect None to mean:-)

The idea here is to let to program control the codec used for filenames
for special purposes, without working indirectly through the locale.

>>> If a name is  funny-decoded when the name is accessed by a directory 
>>> listing, it needs  to be funny-encoded in order to open the file.
>>
>> Hmm. I had thought that legitimate unicode strings already get transcoded
>> to bytes via the mapping specified by sys.getfilesystemencoding()
>> (the user's locale). That already happens I believe, and Martin's
>> scheme doesn't change this. He's just funny-encoding non-decodable byte
>> sequences, not the decoded stuff that surrounds them.
>
> So assume a non-decodable sequence in a name.  That puts us into  
> Martin's funny-decode scheme.  His funny-decode scheme produces a bare  
> string, indistinguishable from a bare string that would be produced by a  
> str API that happens to contain that same sequence.  Data puns.

See my proposal above. Does it address your concerns? A program still
must know the providence of the string, and _if_ you're working with
non-decodable sequences in a names then you should transmute then into
the funny encoding using the os.pathencode() function described above.

In this way the punning issue can be avoided.

_Lacking_ such a function, your punning concern is valid.

> So when open is handed the string, should it open the file with the name  
> that matches the string, or the file with the name that funny-decodes to  
> the same string?  It can't know, unless it knows that the string is a  
> funny-decoded string or not.

True. open() should always expect a funny-encoded name.

>> So it is already the case that strings get decoded to bytes by
>> calls like open(). Martin isn't changing that.
>
> I thought the process of converting strings to bytes is called encoding.  
> You seem to be calling it decoding?

My head must be standing in the wrong place. Yes, I probably mean
encoding here. I'm trying to accompany these terms with little pictures
like "string->bytes" to avoid confusion.

>> I suppose if your program carefully constructs a unicode string riddled
>> with half-surrogates etc and imagines something specific should happen
>> to them on the way to being POSIX bytes then you might have a problem...
>
> Right.  Or someone else's program does that.  I only want to use Unicode  
> file names.  But if those other file names exist, I want to be able to  
> access them, and not accidentally get a different file.

Point taken. And I think addressed by the utility function proposed
above.

[...snip normal versus odd chars for the funny-encoding ...]
>> Also, by avoiding reuse of legitimate characters in the encoding we can
>> avoid your issue with losing track of where a string came from;
>> legitimate characters are currently untouched by Martin's scheme, except
>> for the normal "bytes<->string via the user's locale" translation that
>> must already happen, and there you're aided by byets and strings being
>> different types.
>
> There are abnormal characters, but there are no illegal characters.   

I though half-surrogates were illegal in well formed Unicode. I confess
to being weak in this area. By "legitimate" above I meant things like
half-surrogates which, like quarks, should not occur alone?

> NTFS permits any 16-bit "character" code, including abnormal ones,  
> including half-surrogates, and including full surrogate sequences that  
> decode to PUA characters.  POSIX permits all byte sequences, including  
> things that look like UTF-8, things that don't look like UTF-8, things  
> that look like half-surrogates, and things that look like full surrogate  
> sequences that decode to PUA characters.

Sure. I'm not really talking about what filesystem will accept at
the native layer, I was talking in the python funny-encoded space.

[..."escaping is necessary"... I agree...]
>>> I'm certainly not experienced enough in Python development processes 
>>> or  internals to attempt such, as yet.  But somewhere in 25 years of  
>>> programming, I picked up the knowledge that if you want to have a 
>>> 1-to-1  reversible mapping, you have to avoid data puns, mappings of 
>>> two  different data values into a single data value.  Your PEP, as 
>>> first  written, didn't seem to do that... since there are two 
>>> interfaces from  which to obtain data values, one performing a 
>>> mapping from bytes to  "funny invalid" Unicode, and the other 
>>> performing no mapping, but  accepting any sort of Unicode, possibly 
>>> including "funny invalid"  Unicode, the possibility of data puns 
>>> seems to exist.  I may be  misunderstanding something about the use 
>>> cases that prevent these two  sources of "funny invalid" Unicode from 
>>> ever coexisting, but if so,  perhaps you could point it out, or 
>>> clarify the PEP.
>>
>> Please elucidate the "second source" of strings. I'm presuming you mean
>> strings egenrated from scratch rather than obtained by something like
>> listdir().
>>   
>
> POSIX has byte APIs for strings, that's one source, that is most under  
> discussion.  Windows has both bytes and 16-bit APIs for strings... the  
> 16-bit APIs are generally mapped directly to UTF-16, but are not checked  
> for UTF-16 validity, so all of Martin's funny-decoded files could be  
> used for Windows file names on the 16-bit APIs.

These are existing file objects, I'll take them as source 1. They get
encoded for release by os.listdir() et al.

> And yes, strings can be  
> generated from scratch.

I take this to be source 2.

I think I agree with all the discussion that followed, and think the
real problem is lack of utlities functions to funny-encode source 2
strings for use. hence the proposal above.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Be smart, be safe, be paranoid.
        - Ryan Cousineau, courier at compdyn.com DoD#863, KotRB, KotKWaWCRH

From benjamin at python.org  Tue Apr 28 04:58:34 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 27 Apr 2009 21:58:34 -0500
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090428021117.GA25536@cskk.homeip.net>
References: <49F658A5.7080807@g.nevcal.com>
	<20090428021117.GA25536@cskk.homeip.net>
Message-ID: <1afaf6160904271958r15f2c3c0ide616c9bbc8ca0ee@mail.gmail.com>

2009/4/27 Cameron Simpson <cs at zip.com.au>:
>
> PROPOSAL: add to the PEP the following functions:
>
> ?os.fsdecode(bytes) -> funny-encoded Unicode
> ? ?This is what os.listdir() does to produce the strings it hands out.
> ?os.fsencode(funny-string) -> bytes
> ? ?This is what open(filename,..) does to turn the filename into bytes
> ? ?for the POSIX open.
> ?os.pathencode(your-string) -> funny-encoded-Unicode
> ? ?This is what you must do to a de novo string to turn it into a
> ? ?string suitable for use by open.
> ? ?Importantly, for most strings not hand crafted to have weird
> ? ?sequences in them, it is a no-op. But it will recode your puns
> ? ?for survival.
>
> and for me, I would like to see:
>
> ?os.setfilesystemencoding(coding)
>
> Currently os.getfilesystemencoding() returns you the encoding based on
> the current locale, and (I trust) the os.* stuff encodes on that basis.
> setfilesystemencoding() would override that, unless coding==None in what
> case it reverts to the former "use the user's current locale" behaviour.
> (We have locale "C" for what one might otherwise expect None to mean:-)

Time machine! http://docs.python.org/dev/py3k/library/sys.html#sys.setfilesystemencoding

-- 
Regards,
Benjamin

From martin at v.loewis.de  Tue Apr 28 05:35:59 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 28 Apr 2009 05:35:59 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F63B19.7010306@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>
	<49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de>
	<49F63B19.7010306@g.nevcal.com>
Message-ID: <49F6799F.5030208@v.loewis.de>

Glenn Linderman wrote:
> On approximately 4/27/2009 12:42 PM, came the following characters from
> the keyboard of Martin v. L?wis:
>>>> It's a private use area. It will never carry an official character
>>>> assignment.
>>>
>>> I know that U+F0000 - U+FFFFF is a private use area.  I don't find a
>>> definition of U+F01xx to know what the notation means.  Are you picking
>>> a particular character within the private use area, or a particular
>>> range, or what?
>>
>> It's a range. The lower-case 'x' denotes a variable half-byte, ranging
>> from 0 to F. So this is the range U+F0100..U+F01FF, giving 256 code
>> points.
> 
> 
> So you only need 128 code points, so there is something else unclear.

(please understand that this is history now, since the PEP has stopped
using PUA characters).

No. You seem to assume that all bytes < 128 decode successfully always.
I believe this assumption is wrong, in general:

py> "\x1b$B' \x1b(B".decode("iso-2022-jp") #2.x syntax
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'iso2022_jp' codec can't decode bytes in position
3-4: illegal multibyte sequence

All bytes are below 128, yet it fails to decode.

Regards,
Martin

From martin at v.loewis.de  Tue Apr 28 05:39:40 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 05:39:40 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F643BE.4050605@g.nevcal.com>
References: <20090427075549.GA4418@cskk.homeip.net>
	<49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de>
	<49F643BE.4050605@g.nevcal.com>
Message-ID: <49F67A7C.4070602@v.loewis.de>

> I'm not suggesting the PEP should solve the problem of mounting foreign
> file systems, although if it doesn't it should probably point that out. 
> I'm just suggesting that if the people that write software to solve the
> problem of mounting foreign file systems have already solved the naming
> problem, then it might be a source of a good solution.  On the other
> hand, it might be the source of a mediocre or bad solution.  However, if
> those mounting system have good solutions, it would be good to be
> compatible with them, rather than have yet another solution.  It was in
> that sense, of thinking about possibly existing practice, and leveraging
> an existing solution, that caused me to bring up the topic.

I think you make quite a lot of assumptions here. It would be better
to research the state of the art first, and only then propose to follow it.

Regards,
Martin

From cs at zip.com.au  Tue Apr 28 05:39:46 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Tue, 28 Apr 2009 13:39:46 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <1afaf6160904271958r15f2c3c0ide616c9bbc8ca0ee@mail.gmail.com>
Message-ID: <20090428033946.GA14685@cskk.homeip.net>

On 27Apr2009 21:58, Benjamin Peterson <benjamin at python.org> wrote:
| 2009/4/27 Cameron Simpson <cs at zip.com.au>:
| > PROPOSAL: add to the PEP the following functions:
[...]
| > and for me, I would like to see:
| > ?os.setfilesystemencoding(coding)
| >
| > Currently os.getfilesystemencoding() returns you the encoding based on
| > the current locale, and (I trust) the os.* stuff encodes on that basis.
| > setfilesystemencoding() would override that, unless coding==None in what
| > case it reverts to the former "use the user's current locale" behaviour.
| > (We have locale "C" for what one might otherwise expect None to mean:-)
| 
| Time machine! http://docs.python.org/dev/py3k/library/sys.html#sys.setfilesystemencoding

How embarrassing. I thought I'd looked.

It doesn't have the None->return-to-default mode, and I'd like to see
the word "overwritten" replaced by "overidden".

And of course if Martin's PEP gets adopted then the "e.g." cleause needs
replacing:-)
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Do not taunt Happy Fun Coder.

From martin at v.loewis.de  Tue Apr 28 05:50:11 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 05:50:11 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <200904281027.20431.steve@pearwood.info>
References: <49EEBE2E.3090601@v.loewis.de>	<87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090427T180957-232@post.gmane.org>
	<200904281027.20431.steve@pearwood.info>
Message-ID: <49F67CF3.1030403@v.loewis.de>

>> I don't understand what you're saying. py3k filenames are all
>> unicode, even on POSIX systems, 
> 
> 
> How is that possible on POSIX systems where the underlying file system 
> uses bytes for filenames?
> 
> If I write a piece of Python code:
> 
>     filename = 'some path/some name'
> 
> I might call it a filename, I might think of it as a filename, but it 
> *isn't*, it's a string in a Python program. It isn't a filename until 
> it hits the file system, and in POSIX systems that makes it bytes.

Python automatically encodes strings with the file system encoding
before passing them to the POSIX API.

Regards,
Martin

From stephen at xemacs.org  Tue Apr 28 06:26:36 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 28 Apr 2009 13:26:36 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F63805.6000208@voidspace.org.uk>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
	<87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp>
	<49F63805.6000208@voidspace.org.uk>
Message-ID: <87skjtjtdv.fsf@uwakimon.sk.tsukuba.ac.jp>

Michael Foord writes:

 > The problem you don't address, which is still the reality for most 
 > programmers (especially Mac OS X where filesystem encoding is UTF 8), is 
 > that programmers *are* going to treat filenames as strings.

 > The proposed PEP allows that to work for them - whatever platform their 
 > program runs on.

Sure, for values of "work" == "No exception will be raised in my
module, and some content will actually be returned."  It doesn't say
anything about what happens once those strings escape the immediate
context.  So it *encourages* those programmers to pass any problems
downstream, but only after discarding the resources needed to deal
with problems effectively.

It's not that hard to overcome that problem, but it does require a
slightly more complex API, and one that doesn't return a string but
rather a stringlike object annotated with the information about how it
was decoded.  Conversion to a string *should* be trivial; I just think
it should be invoked explicitly to make it clear where information is
being discarded.  Without an implicit conversion, the nature of the
data (ie, context-dependent structure) is made explicit.  There's a
natural place to document the problem that context must be used to
interpret the data accurately, and even add more robust processing (in
a new PEP, of course!), etc.

Then in the future this interface could be used as the basis of a more
robust API.  With good design (and luck) it might be subclassible or
extensible to a path object API, for example.  PEP 383 on the other
hand is a dead end as it stands.  AFAICS it gives the best possible
treatment of conversion of OS data to plain string, but we're already
got developers lining up to say "I can't use it". :-(

From stephen at xemacs.org  Tue Apr 28 06:43:12 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 28 Apr 2009 13:43:12 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <p04330106c61b9f2aad0a@[192.168.123.162]>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240237p44ba9b4bw26b40304d3c4b48a@mail.gmail.com>
	<49F18E90.9070801@nevcal.com>
	<fb73205e0904240338i64b7d10et3f3356237ff707e9@mail.gmail.com>
	<fb73205e0904240339l101bf2a9j8f3e3e96d78a8019@mail.gmail.com>
	<79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com>
	<20090424152746.GA9543@panix.com>
	<loom.20090424T153112-15@post.gmane.org>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
	<p04330106c61b9f2aad0a@[192.168.123.162]>
Message-ID: <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>

Tony Nelson writes:
 > At 16:09 +0000 04/27/2009, Antoine Pitrou wrote:
 > >Stephen J. Turnbull <stephen <at> xemacs.org> writes:
 > >>
 > >> I hate to break it to you, but most stages of mail processing have
 > >> very little to do with SMTP.  In particular, processing MIME
 > >> attachments often requires dealing with file names.
 > >
 > >AFAIK, the file name is only there as an indication for the user
 > >when he wants to save the file. If it's garbled a bit, no big
 > >deal.

Nobody said we were at the stage of *saving* the file!

From foom at fuhm.net  Tue Apr 28 07:19:22 2009
From: foom at fuhm.net (James Y Knight)
Date: Tue, 28 Apr 2009 01:19:22 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F6799F.5030208@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>
	<49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de>
	<49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de>
Message-ID: <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>

On Apr 27, 2009, at 11:35 PM, Martin v. L?wis wrote:

> No. You seem to assume that all bytes < 128 decode successfully  
> always.
> I believe this assumption is wrong, in general:
>
> py> "\x1b$B' \x1b(B".decode("iso-2022-jp") #2.x syntax
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'iso2022_jp' codec can't decode bytes in position
> 3-4: illegal multibyte sequence
>
> All bytes are below 128, yet it fails to decode.

Surely nobody uses iso2022 as an LC_CTYPE encoding. That's expressly  
forbidden by POSIX, if I'm not mistaken...and I can't see how it would  
work, considering that it uses all the bytes from 0x20-0x7f, including  
0x2f ("/"), to represent non-ascii characters.

Hopefully it can be assumed that your locale encoding really is a non- 
overlapping superset of ASCII, as is required by POSIX...

I'm a bit scared at the prospect that U+DCAF could turn into "/", that  
just screams security vulnerability to me.  So I'd like to propose  
that only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be  
encoded/decoded via the error handler.

James

From v+python at g.nevcal.com  Tue Apr 28 07:25:15 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 27 Apr 2009 22:25:15 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F6799F.5030208@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>
	<49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de>
	<49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de>
Message-ID: <49F6933B.7020705@g.nevcal.com>

On approximately 4/27/2009 8:35 PM, came the following characters from 
the keyboard of Martin v. L?wis:
> Glenn Linderman wrote:
>> On approximately 4/27/2009 12:42 PM, came the following characters from
>> the keyboard of Martin v. L?wis:
>>>>> It's a private use area. It will never carry an official character
>>>>> assignment.
>>>> I know that U+F0000 - U+FFFFF is a private use area.  I don't find a
>>>> definition of U+F01xx to know what the notation means.  Are you picking
>>>> a particular character within the private use area, or a particular
>>>> range, or what?
>>> It's a range. The lower-case 'x' denotes a variable half-byte, ranging
>>> from 0 to F. So this is the range U+F0100..U+F01FF, giving 256 code
>>> points.
>>
>> So you only need 128 code points, so there is something else unclear.
> 
> (please understand that this is history now, since the PEP has stopped
> using PUA characters).

Yes, but having found the latest PEP finally (at least I hope the one at 
python.org is the latest, it has quit using PUA anyway), I confirm it is 
history.  But the same issue applies to the range of half-surrogates.

> No. You seem to assume that all bytes < 128 decode successfully always.
> I believe this assumption is wrong, in general:
> 
> py> "\x1b$B' \x1b(B".decode("iso-2022-jp") #2.x syntax
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'iso2022_jp' codec can't decode bytes in position
> 3-4: illegal multibyte sequence
> 
> All bytes are below 128, yet it fails to decode.

Indeed, that was the missing piece.  I'd forgotten about the encodings 
that use escape sequences, rather than UTF-8, and DBCS.  I don't think 
those encodings are permitted by POSIX file systems, but I suppose they 
could sneak in via Environment variable values, and the like.

The switch from PUA to half-surrogates does not resolve the issues with 
the encoding not being a 1-to-1 mapping, though.  The very fact that you 
  think you can get away with use of lone surrogates means that other 
people might, accidentally or intentionally, also use lone surrogates 
for some other purpose.  Even in file names.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From robert.collins at canonical.com  Tue Apr 28 07:39:01 2009
From: robert.collins at canonical.com (Robert Collins)
Date: Tue, 28 Apr 2009 15:39:01 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F6933B.7020705@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>
	<49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de>
	<49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de>
	<49F6933B.7020705@g.nevcal.com>
Message-ID: <1240897141.5830.12.camel@lifeless-64>

On Mon, 2009-04-27 at 22:25 -0700, Glenn Linderman wrote:
> 
> Indeed, that was the missing piece.  I'd forgotten about the
> encodings 
> that use escape sequences, rather than UTF-8, and DBCS.  I don't
> think 
> those encodings are permitted by POSIX file systems, but I suppose
> they 
> could sneak in via Environment variable values, and the like.

This may already have been discussed, and if so I apologise for the for
the noise.

Does the PEP take into consideration the normalising behaviour of Mac
OSX ? We've had some ongoing challenges in bzr related to this with bzr.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/40a89615/attachment.pgp>

From v+python at g.nevcal.com  Tue Apr 28 07:41:34 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 27 Apr 2009 22:41:34 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F67A7C.4070602@v.loewis.de>
References: <20090427075549.GA4418@cskk.homeip.net>
	<49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de>
	<49F643BE.4050605@g.nevcal.com> <49F67A7C.4070602@v.loewis.de>
Message-ID: <49F6970E.4000701@g.nevcal.com>

On approximately 4/27/2009 8:39 PM, came the following characters from 
the keyboard of Martin v. L?wis:
>> I'm not suggesting the PEP should solve the problem of mounting foreign
>> file systems, although if it doesn't it should probably point that out. 
>> I'm just suggesting that if the people that write software to solve the
>> problem of mounting foreign file systems have already solved the naming
>> problem, then it might be a source of a good solution.  On the other
>> hand, it might be the source of a mediocre or bad solution.  However, if
>> those mounting system have good solutions, it would be good to be
>> compatible with them, rather than have yet another solution.  It was in
>> that sense, of thinking about possibly existing practice, and leveraging
>> an existing solution, that caused me to bring up the topic.
>>     
>
> I think you make quite a lot of assumptions here. It would be better
> to research the state of the art first, and only then propose to follow it.

I didn't propose to follow it.  I only proposed an area that could be 
researched as a source of ideas and/or potential solutions.  Apparently 
there wasn't, but there could have been someone listening that had the 
results of such research on the tip of their tongue, and might have 
piped up with the techniques used.  I did, in fact, begin researching 
the topic after making the suggestion, and thus far haven't found any 
brilliant solutions from that arena.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From tmbdev at gmail.com  Tue Apr 28 08:29:23 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Tue, 28 Apr 2009 08:29:23 +0200
Subject: [Python-Dev] PEP 383 (again)
Message-ID: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>

I thought PEP-383 was a fairly neat approach, but after thinking about it, I
now think that it is wrong.

PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode strings in
a reversible way.  But how do those non-UTF-8 byte sequences get into those
path names in the first place?  Most likely because an encoding other than
UTF-8 was used to write the file system, but you're now trying to interpret
its path names as UTF-8.

Quietly escaping a bad UTF-8 encoding with private Unicode characters is
unlikely to be the right thing, since using the wrong encoding likely means
that other characters are decoded incorrectly as well.   As a result, the
path name may fail in string comparisons and pattern matching, and will look
wrong to the user in print statements and dialog boxes. Therefore, when
Python encounters path names on a file system that are not consistent with
the (assumed) encoding for that file system, Python should raise an error.

If you really don't care what the string looks like and you just want an
encoding that round-trips without loss, you can probably just set your
encoding to one of the 8 bit encodings, like ISO 8859-15.   Decoding
arbitrary byte sequences to unicode strings as ISO 8859-15 is no less
correct than decoding them as the proposed "utf-8b".  In fact, the most
likely source of non-UTF-8 sequences is ISO 8859 encodings.

As for what the byte-oriented interfaces should do, they are simply platform
dependent.  On UNIX, they should do the obvious thing.  On Windows, they can
either hook up to the low-level byte-oriented system calls that the systems
supply, or Windows could fake it and have the byte-oriented interfaces use
UTF-8 encodings always and reject non-UTF-8 sequences as illegal (there are
already many illegal byte sequences anyway).

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/363482a0/attachment.htm>

From martin at v.loewis.de  Tue Apr 28 08:50:02 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 08:50:02 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>
	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>
	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>
	<49F6799F.5030208@v.loewis.de>
	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>
Message-ID: <49F6A71A.3020809@v.loewis.de>

James Y Knight wrote:
> Hopefully it can be assumed that your locale encoding really is a
> non-overlapping superset of ASCII, as is required by POSIX...

Can you please point to the part of the POSIX spec that says that
such overlapping is forbidden?

> I'm a bit scared at the prospect that U+DCAF could turn into "/", that
> just screams security vulnerability to me.  So I'd like to propose that
> only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be
> encoded/decoded via the error handler.

It would be actually U+DC2f that would turn into /.
I'm happy to exclude that range from the mapping if POSIX really
requires an encoding not to be overlapping with ASCII.

Regards,
Martin

From v+python at g.nevcal.com  Tue Apr 28 08:52:48 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 27 Apr 2009 23:52:48 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090428021117.GA25536@cskk.homeip.net>
References: <20090428021117.GA25536@cskk.homeip.net>
Message-ID: <49F6A7C0.6090105@g.nevcal.com>

On approximately 4/27/2009 7:11 PM, came the following characters from 
the keyboard of Cameron Simpson:
> On 27Apr2009 18:15, Glenn Linderman <v+python at g.nevcal.com> wrote:
>   
>>>>>> The problem with this, and other preceding schemes that have been
>>>>>> discussed here, is that there is no means of ascertaining whether a
>>>>>> particular file name str was obtained from a str API, or was funny-
>>>>>> decoded from a bytes API... and thus, there is no means of reliably
>>>>>> ascertaining whether a particular filename str should be passed to a
>>>>>> str API, or funny-encoded back to bytes.
>>>>>>         
>>>>>>             
>>>>> Why is it necessary that you are able to make this distinction?
>>>>>       
>>>>>           
>>>> It is necessary that programs (not me) can make the distinction, so 
>>>> that  it knows whether or not to do the funny-encoding or not.
>>>>     
>>>>         
>>> I would say this isn't so. It's important that programs know if they're
>>> dealing with strings-for-filenames, but not that they be able to figure
>>> that out "a priori" if handed a bare string (especially since they
>>> can't:-)
>>>       
>> So you agree they can't... that there are data puns.   (OK, you may not  
>> have thought that through)
>>     
>
> I agree you can't examine a string and know if it came from the os.* munging
> or from someone else's munging.
>
> I totally disagree that this is a problem.
>
> There may be puns. So what? Use the right strings for the right purpose
> and all will be well.
>
> I think what is missing here, and missing from Martin's PEP, is some
> utility functions for the os.* namespace.
>
> PROPOSAL: add to the PEP the following functions:
>
>   os.fsdecode(bytes) -> funny-encoded Unicode
>     This is what os.listdir() does to produce the strings it hands out.
>   os.fsencode(funny-string) -> bytes
>     This is what open(filename,..) does to turn the filename into bytes
>     for the POSIX open.
>   os.pathencode(your-string) -> funny-encoded-Unicode
>     This is what you must do to a de novo string to turn it into a
>     string suitable for use by open.
>     Importantly, for most strings not hand crafted to have weird
>     sequences in them, it is a no-op. But it will recode your puns
>     for survival.
>
> and for me, I would like to see:
>
>   os.setfilesystemencoding(coding)
>
> Currently os.getfilesystemencoding() returns you the encoding based on
> the current locale, and (I trust) the os.* stuff encodes on that basis.
> setfilesystemencoding() would override that, unless coding==None in what
> case it reverts to the former "use the user's current locale" behaviour.
> (We have locale "C" for what one might otherwise expect None to mean:-)
>
> The idea here is to let to program control the codec used for filenames
> for special purposes, without working indirectly through the locale.
>
>   
>>>> If a name is  funny-decoded when the name is accessed by a directory 
>>>> listing, it needs  to be funny-encoded in order to open the file.
>>>>         
>>> Hmm. I had thought that legitimate unicode strings already get transcoded
>>> to bytes via the mapping specified by sys.getfilesystemencoding()
>>> (the user's locale). That already happens I believe, and Martin's
>>> scheme doesn't change this. He's just funny-encoding non-decodable byte
>>> sequences, not the decoded stuff that surrounds them.
>>>       
>> So assume a non-decodable sequence in a name.  That puts us into  
>> Martin's funny-decode scheme.  His funny-decode scheme produces a bare  
>> string, indistinguishable from a bare string that would be produced by a  
>> str API that happens to contain that same sequence.  Data puns.
>>     
>
> See my proposal above. Does it address your concerns? A program still
> must know the providence of the string, and _if_ you're working with
> non-decodable sequences in a names then you should transmute then into
> the funny encoding using the os.pathencode() function described above.
>
> In this way the punning issue can be avoided.
>
> _Lacking_ such a function, your punning concern is valid.
>   

Seems like one would also desire os.pathdecode to do the reverse.  And 
also versions that take or produce bytes from funny-encoded strings.

Then, if programs were re-coded to perform these transformations on what 
you call de novo strings, then the scheme would work.

But I think a large part of the incentive for the PEP is to try to 
invent a scheme that intentionally allows for the puns, so that programs 
do not need to be recoded in this manner, and yet still work.  I don't 
think such a scheme exists.

If there is going to be a required transformation from de novo strings 
to funny-encoded strings, then why not make one that people can actually 
see and compare and decode from the displayable form, by using 
displayable characters instead of lone surrogates?

>> So when open is handed the string, should it open the file with the name  
>> that matches the string, or the file with the name that funny-decodes to  
>> the same string?  It can't know, unless it knows that the string is a  
>> funny-decoded string or not.
>>     
>
> True. open() should always expect a funny-encoded name.
>
>   
>>> So it is already the case that strings get decoded to bytes by
>>> calls like open(). Martin isn't changing that.
>>>       
>> I thought the process of converting strings to bytes is called encoding.  
>> You seem to be calling it decoding?
>>     
>
> My head must be standing in the wrong place. Yes, I probably mean
> encoding here. I'm trying to accompany these terms with little pictures
> like "string->bytes" to avoid confusion.
>
>   
>>> I suppose if your program carefully constructs a unicode string riddled
>>> with half-surrogates etc and imagines something specific should happen
>>> to them on the way to being POSIX bytes then you might have a problem...
>>>       
>> Right.  Or someone else's program does that.  I only want to use Unicode  
>> file names.  But if those other file names exist, I want to be able to  
>> access them, and not accidentally get a different file.
>>     
>
> Point taken. And I think addressed by the utility function proposed
> above.
>
> [...snip normal versus odd chars for the funny-encoding ...]
>   
>>> Also, by avoiding reuse of legitimate characters in the encoding we can
>>> avoid your issue with losing track of where a string came from;
>>> legitimate characters are currently untouched by Martin's scheme, except
>>> for the normal "bytes<->string via the user's locale" translation that
>>> must already happen, and there you're aided by byets and strings being
>>> different types.
>>>       
>> There are abnormal characters, but there are no illegal characters.   
>>     
>
> I though half-surrogates were illegal in well formed Unicode. I confess
> to being weak in this area. By "legitimate" above I meant things like
> half-surrogates which, like quarks, should not occur alone?
>   

"Illegal" just means violating the accepted rules.  In this case, the 
accepted rules are those enforced by the file system (at the bytes or 
str API levels), and by Python (for the str manipulations).  None of 
those rules outlaw lone surrogates.  Hence, while all of the systems 
under discussion can handle all Unicode characters in one way or 
another, none of them require that all Unicode rules are followed.  Yes, 
you are correct that lone surrogates are illegal in Unicode.  No, none 
of the accepted rules for these systems require Unicode.

>> NTFS permits any 16-bit "character" code, including abnormal ones,  
>> including half-surrogates, and including full surrogate sequences that  
>> decode to PUA characters.  POSIX permits all byte sequences, including  
>> things that look like UTF-8, things that don't look like UTF-8, things  
>> that look like half-surrogates, and things that look like full surrogate  
>> sequences that decode to PUA characters.
>>     
>
> Sure. I'm not really talking about what filesystem will accept at
> the native layer, I was talking in the python funny-encoded space.
>
> [..."escaping is necessary"... I agree...]
>   
>>>> I'm certainly not experienced enough in Python development processes 
>>>> or  internals to attempt such, as yet.  But somewhere in 25 years of  
>>>> programming, I picked up the knowledge that if you want to have a 
>>>> 1-to-1  reversible mapping, you have to avoid data puns, mappings of 
>>>> two  different data values into a single data value.  Your PEP, as 
>>>> first  written, didn't seem to do that... since there are two 
>>>> interfaces from  which to obtain data values, one performing a 
>>>> mapping from bytes to  "funny invalid" Unicode, and the other 
>>>> performing no mapping, but  accepting any sort of Unicode, possibly 
>>>> including "funny invalid"  Unicode, the possibility of data puns 
>>>> seems to exist.  I may be  misunderstanding something about the use 
>>>> cases that prevent these two  sources of "funny invalid" Unicode from 
>>>> ever coexisting, but if so,  perhaps you could point it out, or 
>>>> clarify the PEP.
>>>>         
>>> Please elucidate the "second source" of strings. I'm presuming you mean
>>> strings egenrated from scratch rather than obtained by something like
>>> listdir().
>>>   
>>>       
>> POSIX has byte APIs for strings, that's one source, that is most under  
>> discussion.  Windows has both bytes and 16-bit APIs for strings... the  
>> 16-bit APIs are generally mapped directly to UTF-16, but are not checked  
>> for UTF-16 validity, so all of Martin's funny-decoded files could be  
>> used for Windows file names on the 16-bit APIs.
>>     
>
> These are existing file objects, I'll take them as source 1. They get
> encoded for release by os.listdir() et al.
>
>   
>> And yes, strings can be  
>> generated from scratch.
>>     
>
> I take this to be source 2.
>   

One variation of source 2 is reading output from other programs, such as 
ls (POSIX) or dir (Windows).

> I think I agree with all the discussion that followed, and think the
> real problem is lack of utlities functions to funny-encode source 2
> strings for use. hence the proposal above.

I think we understand each other now.  I think your proposal could work, 
Cameron, although when recoding applications to use your proposal, I'd 
find it easier to use the "file name object" that others have proposed.  
I think that because either your proposal or the object proposals 
require recoding the application, that they will not be accepted.  I 
think that because the PEP 383 allows data puns, that it should not be 
accepted in its present form.

I think your if your proposal is accepted, that it then becomes possible 
to use an encoding that uses visible characters, which makes it easier 
for people to understand and verify.  An encoding such as the one I 
suggested, but perhaps using a more obscure character, if there is one, 
but yet doesn't violate true Unicode.  I think it should transform all 
data, from str and bytes interfaces, and produce only str values 
containing conforming Unicode, escaping all the non-conforming sequences 
in some manner.  This would make the strings truly readable, as long as 
fonts for all the characters are available.  And I had already suggested 
the utility functions you are suggesting, actually, in my first tirade 
against PEP 383 (search for "The encode and decode functions should be 
available for coders to use, that code to external
interfaces, either OS or 3rd party packages, that do not use this 
encoding scheme").  I really don't care if you or who gets the credit 
for the idea, others may have suggested it before me, but I do care that 
the solution should provide functionality that works without 
ambiguity/data puns.

The solution that was proposed in the lead up to releasing Python 3.0 
was to offer both bytes and str interfaces (so we have those), and then 
for those that want to have a single portable implementation that can 
access all data, an object that encapsulates the differences, and the 
variant system APIs.  (file system is one, command line is another, 
environment is another, I'm not sure if there are more.)  I haven't 
heard if any progress on such an encapsulating object has been made; the 
people that proposed such have been rather quiet about this PEP.  I 
would expect that an object implementation would provide display 
strings, and APIs to submit de novo str and bytes values to an object, 
which would run the appropriate encoding on them.

Programs that want to use str interfaces on POSIX will see a subset of 
files on systems that contain files whose bytes filenames are not 
decodable.  If a sysadmin wants to standardize on UTF-8 names 
universally, they can use something like convmv to clean up existing 
file names that don't conform.  Programs that use str interfaces on 
POSIX system will work fine, but with a subset of the files.  When that 
is unacceptable, they can either be recoded to use the bytes interfaces, 
or the hopefully forthcoming object encapsulation.  The issue then will 
be what technique will be used to transform bytes into display names, 
but since the display names would never be fed back to the objects 
directly (but the object would have an interface to accept de novo str 
and de novo bytes) then it is just a display issue, and one that uses 
visible characters would seem more useful in my mind, than one that uses 
half-surrogates or PUAs.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at v.loewis.de  Tue Apr 28 08:53:10 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 08:53:10 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <1240897141.5830.12.camel@lifeless-64>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>
	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>
	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>
	<49F6799F.5030208@v.loewis.de>	<49F6933B.7020705@g.nevcal.com>
	<1240897141.5830.12.camel@lifeless-64>
Message-ID: <49F6A7D6.5030809@v.loewis.de>

> Does the PEP take into consideration the normalising behaviour of Mac
> OSX ? We've had some ongoing challenges in bzr related to this with bzr.

No, that's completely out of scope, AFAICT. I don't even know what the
issues are, so I'm not able to propose a solution, at the moment.

Regards,
Martin

From martin at v.loewis.de  Tue Apr 28 08:59:19 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 08:59:19 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
Message-ID: <49F6A947.1050106@v.loewis.de>

> PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode
> strings in a reversible way.

That isn't really true; it is not, inherently, about UTF-8.
Instead, it tries to represent non-filesystem-encoding byte sequence
in Unicode strings in a reversible way.

> Quietly escaping a bad UTF-8 encoding with private Unicode characters is
> unlikely to be the right thing

And indeed, the PEP stopped using PUA characters.

> Therefore, when Python encounters path names on a file system
> that are not consistent with the (assumed) encoding for that file
> system, Python should raise an error. 

This is what happens currently, and users are quite unhappy about it.

> If you really don't care what the string looks like and you just want an
> encoding that round-trips without loss, you can probably just set your
> encoding to one of the 8 bit encodings, like ISO 8859-15.   Decoding
> arbitrary byte sequences to unicode strings as ISO 8859-15 is no less
> correct than decoding them as the proposed "utf-8b".  In fact, the most
> likely source of non-UTF-8 sequences is ISO 8859 encodings.

Yes, users can do that (to a degree), but they are still unhappy about
it. The approach actually fails for command line arguments

> As for what the byte-oriented interfaces should do, they are simply
> platform dependent.  On UNIX, they should do the obvious thing.  On
> Windows, they can either hook up to the low-level byte-oriented system
> calls that the systems supply, or Windows could fake it and have the
> byte-oriented interfaces use UTF-8 encodings always and reject non-UTF-8
> sequences as illegal (there are already many illegal byte sequences
> anyway).

As is, these interfaces are incomplete - they don't support command
line arguments, or environment variables. If you want to complete them,
you should write a PEP.

Regards,
Martin

From tmbdev at gmail.com  Tue Apr 28 09:30:01 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Tue, 28 Apr 2009 09:30:01 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F6A947.1050106@v.loewis.de>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> 
	<49F6A947.1050106@v.loewis.de>
Message-ID: <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>

> > Therefore, when Python encounters path names on a file system
> > that are not consistent with the (assumed) encoding for that file
> > system, Python should raise an error.
>
> This is what happens currently, and users are quite unhappy about it.

We need to keep "users" and "programmers" distinct here.

Programmers may find it inconvenient that they have to spend time figuring
out and deal with platform-dependent file system encoding issues and
errors.  But internationalization and unicode are hard, that's just a fact
of life.

End users, however, are going to be quite unhappy if they get a string of
gibberish for a file name because you decided to interpret some non-Unicode
string as UTF-8-with-extra-bytes.

Or some Python program might copy files from an ISO8859-15 encoded file
system to a UTF-8 encoded file system, and instead of getting an error when
the encodings are set incorrectly, Python would quietly create ISO8859-15
encoded file names, making the target file system inconsistent.

There is a lot of potential for major problems for end users with your
proposals.  In both cases, what should happen is that the end user gets an
error, submits a bug, and the programmer figures out how to deal with the
encoding issues correctly.

> Yes, users can do that (to a degree), but they are still unhappy about
> it. The approach actually fails for command line arguments

As it should: if I give an ISO8859-15 encoded command line argument to a
Python program that expects a UTF-8 encoding, the Python program should tell
me that there is something wrong when it notices that.  Quietly continuing
is the wrong thing to do.

If we follow your approach, that ISO8859-15 string will get turned into an
escaped unicode string inside Python.  If I understand your proposal
correctly, if it's a output file name and gets passed to Python's open
function, Python will then decode that string and end up with an ISO8859-15
byte sequence, which it will write to disk literally, even if the encoding
for the system is UTF-8.   That's the wrong thing to do.

As is, these interfaces are incomplete - they don't support command
> line arguments, or environment variables. If you want to complete them,
> you should write a PEP.

There's no point in scratching when there's no itch.

Tom

PS:

> Quietly escaping a bad UTF-8 encoding with private Unicode characters is
> > unlikely to be the right thing
>
> And indeed, the PEP stopped using PUA characters.

Let me rephrase this: "quietly escaping a bad UTF-8 encoding is unlikely to
be the right thing"; it doesn't matter how you do it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/cf12222f/attachment.htm>

From phd at phd.pp.ru  Tue Apr 28 09:58:06 2009
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 28 Apr 2009 11:58:06 +0400
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
Message-ID: <20090428075806.GB23828@phd.pp.ru>

On Tue, Apr 28, 2009 at 09:30:01AM +0200, Thomas Breuel wrote:
> Programmers may find it inconvenient that they have to spend time figuring
> out and deal with platform-dependent file system encoding issues and
> errors.  But internationalization and unicode are hard, that's just a fact
> of life.

   Until it's hard there will be no internationalization. A fact of life,
damn it. Programmers are lazy, and have many problems to solve.

> end user gets an
> error, submits a bug, and the programmer figures out how to deal with the
> encoding issues correctly.

   And the programmer answers "The program is expected a correct
environment, good filenames, etc." and closes the issue with the resolution
"User error, will not fix".

   I am not arguing for or against the PEP in question. Python certainly
has to have a way to make portable i18n less hard or else the number of
portable internationalized program will be about zero. What the way should
be - I don't know.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From tmbdev at gmail.com  Tue Apr 28 10:37:45 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Tue, 28 Apr 2009 10:37:45 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <20090428075806.GB23828@phd.pp.ru>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> 
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> 
	<20090428075806.GB23828@phd.pp.ru>
Message-ID: <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>

>
>
>    Until it's hard there will be no internationalization. A fact of life,
> damn it. Programmers are lazy, and have many problems to solve.

PEP 383 doesn't make it any easier; it just turns one set of problems into
another.  Actually, it makes it worse, since any problems that show up now
show up far from the source of the problem, and since it can lead to
security problems and/or data loss.

>    And the programmer answers "The program is expected a correct
> environment, good filenames, etc." and closes the issue with the resolution
> "User error, will not fix".

The problem may well be with the program using the wrong encodings or
incorrectly ignoring encoding information.  Furthermore, even if it is user
error, the program needs to validate its inputs and put up a meaningful
error message, not mangle the disk.  To detect such program bugs, it's
important that when Python detects an incorrect encoding that it doesn't
quietly continue with an incorrect string.

Furthermore, if you don't provide clear error messages, it often takes a
significant amount of time for each issue to determine that it is user
error.

>   I am not arguing for or against the PEP in question. Python certainly
> has to have a way to make portable i18n less hard or else the number of
> portable internationalized program will be about zero. What the way should
> be - I don't know.

Returning an error for an incorrect encoding doesn't make
internationalization harder, it makes it easier because it makes debugging
easier.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/ecbffe57/attachment-0001.htm>

From phd at phd.pp.ru  Tue Apr 28 11:00:11 2009
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 28 Apr 2009 13:00:11 +0400
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
Message-ID: <20090428090011.GA27583@phd.pp.ru>

On Tue, Apr 28, 2009 at 10:37:45AM +0200, Thomas Breuel wrote:
> Returning an error for an incorrect encoding doesn't make
> internationalization harder, it makes it easier because it makes debugging
> easier.

   What is a "correct encoding"?

   I have an FTP server to which clients with different local encodings
are connecting. FTP protocol doesn't have a notion of encoding so filenames
on the filesystem are in koi8-r, cp1251 and utf-8 encodings - all in one
directory! What should os.listdir() return for that directory? What is a
correct encoding for that directory?!

   If any program starts to raise errors Python becomes completely unusable
for me! But is there anything I can debug here?

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From p.f.moore at gmail.com  Tue Apr 28 11:20:44 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 28 Apr 2009 10:20:44 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F658A5.7080807@g.nevcal.com>
References: <20090427211447.GA4291@cskk.homeip.net>
	<49F658A5.7080807@g.nevcal.com>
Message-ID: <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>

2009/4/28 Glenn Linderman <v+python at g.nevcal.com>:
> So assume a non-decodable sequence in a name. ?That puts us into Martin's
> funny-decode scheme. ?His funny-decode scheme produces a bare string,
> indistinguishable from a bare string that would be produced by a str API
> that happens to contain that same sequence. ?Data puns.
>
> So when open is handed the string, should it open the file with the name
> that matches the string, or the file with the name that funny-decodes to the
> same string? ?It can't know, unless it knows that the string is a
> funny-decoded string or not.

Sorry for picking on Glenn's comment - it's only one of many in this
thread. But it seems to me that there is an assumption that problems
will arise when code gets a potentially funny-decoded string and
doesn't know where it came from.

Is that a real concern? How many programs really don't know where
their data came from? Maybe a general-purpose library routine *might*
just need to document explicitly how it handles funny-encoded data (I
can't actually imagine anything that would, but I'll concede it may be
possible) but that's just a matter of documenting your assumptions -
no better or worse than many other cases.

This all sounds similar to the idea of "tainted" data in security - if
you lose track of untrusted data from the environment, you expose
yourself to potential security issues. So the same techniques should
be relevant here (including ignoring it if your application isn't such
that it's s concern!)

I've yet to hear anyone claim that they would have an actual problem
with a specific piece of code they have written. (NB, if such a claim
has been made, feel free to point me to it - I admit I've been
skimming this thread at times).

Paul.

From tmbdev at gmail.com  Tue Apr 28 11:32:26 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Tue, 28 Apr 2009 11:32:26 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <20090428090011.GA27583@phd.pp.ru>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> 
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> 
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> 
	<20090428090011.GA27583@phd.pp.ru>
Message-ID: <7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com>

On Tue, Apr 28, 2009 at 11:00, Oleg Broytmann <phd at phd.pp.ru> wrote:

> On Tue, Apr 28, 2009 at 10:37:45AM +0200, Thomas Breuel wrote:
> > Returning an error for an incorrect encoding doesn't make
> > internationalization harder, it makes it easier because it makes
> debugging
> > easier.
>
>    What is a "correct encoding"?
>
>   I have an FTP server to which clients with different local encodings
> are connecting. FTP protocol doesn't have a notion of encoding so filenames
> on the filesystem are in koi8-r, cp1251 and utf-8 encodings - all in one
> directory! What should os.listdir() return for that directory? What is a
> correct encoding for that directory?!

I don't know what it should do (ftplib needs to worry about that). I do know
what it shouldn't do, however: it sould not return a utf-8b string which,
when used to create a file, will create a file reproducing the byte sequence
of the remote machine; that's wrong.

  If any program starts to raise errors Python becomes completely unusable
> for me! But is there anything I can debug here?

If we follow PEP 383, you will get lots of errors anyway because those
strings, when encoded in utf-8b, will result in an error when you try to
write them on a Windows file system or any other system that doesn't allow
the byte sequences that the utf-8b encodes.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/246a37be/attachment.htm>

From phd at phd.pp.ru  Tue Apr 28 11:52:23 2009
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 28 Apr 2009 13:52:23 +0400
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
	<20090428090011.GA27583@phd.pp.ru>
	<7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com>
Message-ID: <20090428095223.GB27583@phd.pp.ru>

On Tue, Apr 28, 2009 at 11:32:26AM +0200, Thomas Breuel wrote:
> On Tue, Apr 28, 2009 at 11:00, Oleg Broytmann <phd at phd.pp.ru> wrote:
> >   I have an FTP server to which clients with different local encodings
> > are connecting. FTP protocol doesn't have a notion of encoding so filenames
> > on the filesystem are in koi8-r, cp1251 and utf-8 encodings - all in one
> > directory! What should os.listdir() return for that directory? What is a
> > correct encoding for that directory?!
> 
> I don't know what it should do (ftplib needs to worry about that).

   There is no ftplib there. FTP server is ProFTPd, ftp clients of all
sort, one, e.g., an ftp client built-in into an automatic web-camera.
   I use python programs to process files after they have been uploaded.
The programs access FTP directory as a part of local filesystem.

> I do know
> what it shouldn't do, however: it sould not return a utf-8b string which,
> when used to create a file, will create a file reproducing the byte sequence
> of the remote machine; that's wrong.

   That certainly wrong. But at least the approach allows python programs
to list all files in a directory - currently AFAIU os.listdir() silently
skips undecodeable filenames. And after a program gets all files it can
process it further - it can cleanup filenames (base64-encode them, e.g.),
but at least it can do something, where currently it cannot.

PS. It seems I started to argue for the PEP. Well, well...

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From solipsis at pitrou.net  Tue Apr 28 13:49:47 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 28 Apr 2009 11:49:47 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?=
	=?utf-8?q?haracter=09Interfaces?=
References: <20090427211447.GA4291@cskk.homeip.net>
	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
Message-ID: <loom.20090428T114723-520@post.gmane.org>

Paul Moore <p.f.moore <at> gmail.com> writes:
> 
> I've yet to hear anyone claim that they would have an actual problem
> with a specific piece of code they have written.

Yep, that's the problem. Lots of theoretical problems noone has ever encountered
brought up against a PEP which resolves some actual problems people encounter on
a regular basis.

For the record, I'm +1 on the PEP being accepted and implemented as soon as
possible (preferably before 3.1).

Regards

Antoine.

From jianchun.zhou at gmail.com  Tue Apr 28 13:55:40 2009
From: jianchun.zhou at gmail.com (Jianchun Zhou)
Date: Tue, 28 Apr 2009 19:55:40 +0800
Subject: [Python-Dev] Can not run under python 2.6
Message-ID: <2b767f890904280455j2cbdf444i187841113df1df2b@mail.gmail.com>

Hi, there:

I am new to python, and now I got a trouble:

I have an application named canola, it is written under python 2.5, and can
run normally under python 2.5

But when it comes under python 2.6, problem up, it says:

Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py", line
151, in _load_plugins
    classes = plg.load()
  File "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py", line
94, in load
    mod = self._ldr.load()
  File "/usr/lib/python2.6/site-packages/terra/core/module_loader.py", line
42, in load
    mod = __import__(modpath, fromlist=[mod_name])
ImportError: Import by filename is not supported.

Any body any idea what should I do?

-- 
Best Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/f8a32069/attachment.htm>

From dickinsm at gmail.com  Tue Apr 28 13:56:59 2009
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 28 Apr 2009 12:56:59 +0100
Subject: [Python-Dev] One more proposed formatting change for 3.1
Message-ID: <5c6f2a5d0904280456k1fa5ade0gad1aad54364002d1@mail.gmail.com>

Here's one more proposed change, this time for formatting
of floats using format() and the empty presentation type.
To avoid repeating myself, here's the text from the issue
I just opened:

http://bugs.python.org/issue5864

"""
In all versions of Python from 2.6 up, I get the following behaviour:

>>> format(123.456, '.4')
'123.5'
>>> format(1234.56, '.4')
'1235.0'
>>> format(12345.6, '.4')
'1.235e+04'

The first and third results are as I expect, but the second is somewhat
misleading: it gives 5 significant digits when only 4 were requested,
and moreover the last digit is incorrect.

I propose that Python 2.7 and Python 3.1 be changed so that the output
for the second line above is '1.235e+03'.
"""

This issue seems fairly clear cut to me, and I doubt that there's been
enough uptake of 'format' yet for this to risk significant breakage.  So
unless there are objections I'll plan to make this change before this
weekend's beta.

Mark

From p.f.moore at gmail.com  Tue Apr 28 13:57:10 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 28 Apr 2009 12:57:10 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <loom.20090428T114723-520@post.gmane.org>
References: <20090427211447.GA4291@cskk.homeip.net>
	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
	<loom.20090428T114723-520@post.gmane.org>
Message-ID: <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com>

2009/4/28 Antoine Pitrou <solipsis at pitrou.net>:
> Paul Moore <p.f.moore <at> gmail.com> writes:
>>
>> I've yet to hear anyone claim that they would have an actual problem
>> with a specific piece of code they have written.
>
> Yep, that's the problem. Lots of theoretical problems noone has ever encountered
> brought up against a PEP which resolves some actual problems people encounter on
> a regular basis.
>
> For the record, I'm +1 on the PEP being accepted and implemented as soon as
> possible (preferably before 3.1).

In case it's not clear, I am also +1 on the PEP as it stands.

Paul.

From fuzzyman at voidspace.org.uk  Tue Apr 28 14:03:42 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Tue, 28 Apr 2009 13:03:42 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com>
References: <20090427211447.GA4291@cskk.homeip.net>	<49F658A5.7080807@g.nevcal.com>	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>	<loom.20090428T114723-520@post.gmane.org>
	<79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com>
Message-ID: <49F6F09E.2020506@voidspace.org.uk>

Paul Moore wrote:
> 2009/4/28 Antoine Pitrou <solipsis at pitrou.net>:
>   
>> Paul Moore <p.f.moore <at> gmail.com> writes:
>>     
>>> I've yet to hear anyone claim that they would have an actual problem
>>> with a specific piece of code they have written.
>>>       
>> Yep, that's the problem. Lots of theoretical problems noone has ever encountered
>> brought up against a PEP which resolves some actual problems people encounter on
>> a regular basis.
>>
>> For the record, I'm +1 on the PEP being accepted and implemented as soon as
>> possible (preferably before 3.1).
>>     
>
> In case it's not clear, I am also +1 on the PEP as it stands.
>   

Me 2

Michael
> Paul.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

-- 
http://www.ironpythoninaction.com/

From fuzzyman at voidspace.org.uk  Tue Apr 28 14:06:46 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Tue, 28 Apr 2009 13:06:46 +0100
Subject: [Python-Dev] Can not run under python 2.6
In-Reply-To: <2b767f890904280455j2cbdf444i187841113df1df2b@mail.gmail.com>
References: <2b767f890904280455j2cbdf444i187841113df1df2b@mail.gmail.com>
Message-ID: <49F6F156.6040901@voidspace.org.uk>

Jianchun Zhou wrote:
> Hi, there:
>
> I am new to python, and now I got a trouble:
>
> I have an application named canola, it is written under python 2.5, 
> and can run normally under python 2.5
>
> But when it comes under python 2.6, problem up, it says:
>
> Traceback (most recent call last):
>   File 
> "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py", line 
> 151, in _load_plugins
>     classes = plg.load()
>   File 
> "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py", line 
> 94, in load
>     mod = self._ldr.load()
>   File "/usr/lib/python2.6/site-packages/terra/core/module_loader.py", 
> line 42, in load
>     mod = __import__(modpath, fromlist=[mod_name])
> ImportError: Import by filename is not supported.
>
> Any body any idea what should I do?

The Python-Dev mailing list is for the development of Python and not 
with Python. You will get a much better response asking on the 
comp.lang.python (python-list) or python-tutor newsgroups / mailing 
lists. comp.lang.python has both google groups and gmane gateways and so 
is easy to post to.

For the particular problem you mention it is an intentional change and 
so the code in canola will need to be modified in order to run under 
Python 2.6.

All the best,

Michael Foord

>
> -- 
> Best Regards
> ------------------------------------------------------------------------
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

-- 
http://www.ironpythoninaction.com/

From jianchun.zhou at gmail.com  Tue Apr 28 14:20:06 2009
From: jianchun.zhou at gmail.com (Jianchun Zhou)
Date: Tue, 28 Apr 2009 20:20:06 +0800
Subject: [Python-Dev] Can not run under python 2.6
In-Reply-To: <49F6F156.6040901@voidspace.org.uk>
References: <2b767f890904280455j2cbdf444i187841113df1df2b@mail.gmail.com>
	<49F6F156.6040901@voidspace.org.uk>
Message-ID: <2b767f890904280520j2dbe469di5f580c835b240a83@mail.gmail.com>

OK, Thanks a lot.

On Tue, Apr 28, 2009 at 8:06 PM, Michael Foord <fuzzyman at voidspace.org.uk>wrote:

> Jianchun Zhou wrote:
>
>> Hi, there:
>>
>> I am new to python, and now I got a trouble:
>>
>> I have an application named canola, it is written under python 2.5, and
>> can run normally under python 2.5
>>
>> But when it comes under python 2.6, problem up, it says:
>>
>> Traceback (most recent call last):
>>  File "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py",
>> line 151, in _load_plugins
>>    classes = plg.load()
>>  File "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py",
>> line 94, in load
>>    mod = self._ldr.load()
>>  File "/usr/lib/python2.6/site-packages/terra/core/module_loader.py", line
>> 42, in load
>>    mod = __import__(modpath, fromlist=[mod_name])
>> ImportError: Import by filename is not supported.
>>
>> Any body any idea what should I do?
>>
>
> The Python-Dev mailing list is for the development of Python and not with
> Python. You will get a much better response asking on the comp.lang.python
> (python-list) or python-tutor newsgroups / mailing lists. comp.lang.python
> has both google groups and gmane gateways and so is easy to post to.
>
> For the particular problem you mention it is an intentional change and so
> the code in canola will need to be modified in order to run under Python
> 2.6.
>
> All the best,
>
> Michael Foord
>
>
>> --
>> Best Regards
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>>
>>
>
>
> --
> http://www.ironpythoninaction.com/
>
>

-- 
Best Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/8db6a9d0/attachment-0001.htm>

From l.mastrodomenico at gmail.com  Tue Apr 28 14:29:19 2009
From: l.mastrodomenico at gmail.com (Lino Mastrodomenico)
Date: Tue, 28 Apr 2009 14:29:19 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
	<20090428090011.GA27583@phd.pp.ru>
	<7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com>
Message-ID: <cc93256f0904280529n7692c9c7v8076a76fe246597a@mail.gmail.com>

2009/4/28 Thomas Breuel <tmbdev at gmail.com>:
> If we follow PEP 383, you will get lots of errors anyway because those
> strings, when encoded in utf-8b, will result in an error when you try to
> write them on a Windows file system or any other system that doesn't allow
> the byte sequences that the utf-8b encodes.

I'm not sure if when you say "write them on a Windows FS" you mean
from within Windows itself or a filesystem mounted on another OS, so
I'll cover both cases.

Let's suppose that I use Python 2.x or something else to create a file
with name b'\xff'. My (Linux) system has a sane configuration and the
filesystem encoding is UTF-8, so it's an invalid name but the kernel
will blindly accept it anyway.

With this PEP, Python 3.1 listdir() will convert b'\xff' to the string '\udcff'.

Now if this string somehow ends up in a Python 3.1 program running on
Windows and it tries to create a file with this name, it will work (no
exception will be raised). The Windows GUI will display the standard
"invalid character" symbol (an empty box) when listing this file, but
this seems reasonable since the original file was displayed as "?" by
the Linux console and with another invalid character symbol by the
GNOME file manager.

OTOH if I write the same file on a Windows filesystem mounted on
another OS, there will be in place an automatic translation (probably
done by the OS kernel) from the user-visible filesystem encoding (see
e.g. the "iocharset" or "utf8" mount options for vfat on Linux) to
UTF-16. Which means that the write will fail with something like:

IOError: [Errno 22] invalid filename: b'/media/windows_disk/\xff'

(The "problem" is that a vfat filesystem mounted with the "utf8"
option on Linux will only accept byte sequences that are valid UTF-8,
or at least reasonably similar: e.g. b'\xed\xb3\xbf' is accepted.)

Again this seems reasonable since it already happens in Python 2 and
with pretty much any other software, including GNU cp.

I don't see how Martin can do better than this.

Well, ok, I guess he could break into my house and rename the original
file to something sane...

-- 
Lino Mastrodomenico

From ronaldoussoren at mac.com  Tue Apr 28 14:30:43 2009
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Tue, 28 Apr 2009 14:30:43 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F6F09E.2020506@voidspace.org.uk>
References: <20090427211447.GA4291@cskk.homeip.net>
	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
	<loom.20090428T114723-520@post.gmane.org>
	<79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com>
	<49F6F09E.2020506@voidspace.org.uk>
Message-ID: <1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com>

For what it's worth, the OSX API's seem to behave as follows:

* If you create a file with an non-UTF8 name on a HFS+ filesystem the  
system automaticly encodes the name.

That is,  open(chr(255), 'w') will silently create a file named '%FF'  
instead of the name you'd expect on a unix system.

* If you mount an NFS filesystem from a linux host and that directory  
contains a file named chr(255)

- unix-level tools will see a file with the expected name (just like  
on linux)
- Cocoa's NSFileManager returns u"?" as the filename, that is when the  
filename cannot be decoded using UTF-8 the name returned by the high- 
level API is mangled. This is regardless of the setting of LANG.
- I haven't found a way yet to access files whose names are not valid  
UTF-8 using the high-level Cocoa API's.

The latter two are interesting because Cocoa has a unicode filesystem  
API on top of a POSIX C-API, just like Python 3.x. I guess the choosen  
behaviour works out on OSX (where users are unlikely to run into this  
issue), but could be more problematic on other POSIX systems.

Ronald

On 28 Apr, 2009, at 14:03, Michael Foord wrote:

> Paul Moore wrote:
>> 2009/4/28 Antoine Pitrou <solipsis at pitrou.net>:
>>
>>> Paul Moore <p.f.moore <at> gmail.com> writes:
>>>
>>>> I've yet to hear anyone claim that they would have an actual  
>>>> problem
>>>> with a specific piece of code they have written.
>>>>
>>> Yep, that's the problem. Lots of theoretical problems noone has  
>>> ever encountered
>>> brought up against a PEP which resolves some actual problems  
>>> people encounter on
>>> a regular basis.
>>>
>>> For the record, I'm +1 on the PEP being accepted and implemented  
>>> as soon as
>>> possible (preferably before 3.1).
>>>
>>
>> In case it's not clear, I am also +1 on the PEP as it stands.
>>
>
> Me 2
>
> Michael
>> Paul.
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>>
>
>
> -- 
> http://www.ironpythoninaction.com/
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2224 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/302315a3/attachment.bin>

From tmbdev at gmail.com  Tue Apr 28 14:37:33 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Tue, 28 Apr 2009 14:37:33 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <loom.20090428T114723-520@post.gmane.org>
References: <20090427211447.GA4291@cskk.homeip.net>
	<49F658A5.7080807@g.nevcal.com> 
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> 
	<loom.20090428T114723-520@post.gmane.org>
Message-ID: <7e51d15d0904280537n22168cfl16c58f727be1755e@mail.gmail.com>

>
> Yep, that's the problem. Lots of theoretical problems noone has ever
> encountered
> brought up against a PEP which resolves some actual problems people
> encounter on
> a regular basis.

How can you bring up practical problems against something that hasn't been
implemented?

The fact that no other language or library does this is perhaps an
indication that it isn't the right thing to do.

But the biggest problem with the proposal is that it isn't needed: if you
want to be able to turn arbitrary byte sequences into unicode strings and
back, just set your encoding to iso8859-15.  That already works and it
doesn't require any changes.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/30b4e72d/attachment.htm>

From hrvoje.niksic at avl.com  Tue Apr 28 14:41:19 2009
From: hrvoje.niksic at avl.com (Hrvoje Niksic)
Date: Tue, 28 Apr 2009 14:41:19 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <26924021.1861174.1240921767958.JavaMail.xicrypt@atgrzls001>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>	<49F6A947.1050106@v.loewis.de>	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>	<20090428075806.GB23828@phd.pp.ru>	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>	<20090428090011.GA27583@phd.pp.ru>	<7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com>
	<26924021.1861174.1240921767958.JavaMail.xicrypt@atgrzls001>
Message-ID: <49F6F96F.5050507@avl.com>

Lino Mastrodomenico wrote:
> Let's suppose that I use Python 2.x or something else to create a file
> with name b'\xff'. My (Linux) system has a sane configuration and the
> filesystem encoding is UTF-8, so it's an invalid name but the kernel
> will blindly accept it anyway.
> 
> With this PEP, Python 3.1 listdir() will convert b'\xff' to the string '\udcff'.

One question that really bothers me about this proposal is the following:

Assume a UTF-8 locale.  A file named b'\xff', being an invalid UTF-8 
sequence, will be converted to the half-surrogate '\udcff'.  However, a 
file named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also be 
converted to '\udcff'.  Those are quite different POSIX pathnames; how 
will Python know which one it was when I later pass '\udcff' to open()?

A poster hinted at this question, but I haven't seen it answered, yet.

[1]
I'm assuming that it's valid UTF8 because it passes through Python 2.5's 
'\xed\xb3\xbf'.decode('utf-8').  I don't claim to be a UTF-8 expert.

From hrvoje.niksic at avl.com  Tue Apr 28 14:46:11 2009
From: hrvoje.niksic at avl.com (Hrvoje Niksic)
Date: Tue, 28 Apr 2009 14:46:11 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <15546941.1861678.1240922288709.JavaMail.xicrypt@atgrzls001>
References: <20090427211447.GA4291@cskk.homeip.net>	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
	<loom.20090428T114723-520@post.gmane.org>
	<15546941.1861678.1240922288709.JavaMail.xicrypt@atgrzls001>
Message-ID: <49F6FA93.7080302@avl.com>

Thomas Breuel wrote:
> But the biggest problem with the proposal is that it isn't needed: if 
> you want to be able to turn arbitrary byte sequences into unicode 
> strings and back, just set your encoding to iso8859-15.  That already 
> works and it doesn't require any changes.

Are you proposing to unconditionally encode file names as iso8859-15, or 
to do so only when undecodeable bytes are encountered?

If you unconditionally set encoding to iso8859-15, then you are 
effectively reverting to treating file names as bytes, regardless of the 
locale.  You're also angering a lot of European users who expect 
iso8859-2, etc.

If you switch to iso8859-15 only in the presence of undecodable UTF-8, 
then you have the same round-trip problem as the PEP: both b'\xff' and 
b'\xc3\xbf' will be converted to u'\u00ff' without a way to 
unambiguously recover the original file name.

From rdmurray at bitdance.com  Tue Apr 28 14:47:40 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Tue, 28 Apr 2009 08:47:40 -0400 (EDT)
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> 
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0904280826500.1740@kimball.webabinitio.net>

On Tue, 28 Apr 2009 at 09:30, Thomas Breuel wrote:
>>> Therefore, when Python encounters path names on a file system
>>> that are not consistent with the (assumed) encoding for that file
>>> system, Python should raise an error.
>>
>> This is what happens currently, and users are quite unhappy about it.
>
> We need to keep "users" and "programmers" distinct here.
>
> Programmers may find it inconvenient that they have to spend time figuring
> out and deal with platform-dependent file system encoding issues and
> errors.  But internationalization and unicode are hard, that's just a fact
> of life.

And most programmers won't do it, because most programmers write for
an English speaking audience and have no clue about unicode issues.
That is probably slowly changing, but it is still true, I think.

> End users, however, are going to be quite unhappy if they get a string of
> gibberish for a file name because you decided to interpret some non-Unicode
> string as UTF-8-with-extra-bytes.

No, end users expect the gibberish, because they get it all the time
(at least on Unix) when dealing with international filenames.  They
expect to be able to manipulate such files _despite_ the gibberish.
(I speak here as an end user who does this!!)

> Or some Python program might copy files from an ISO8859-15 encoded file
> system to a UTF-8 encoded file system, and instead of getting an error when
> the encodings are set incorrectly, Python would quietly create ISO8859-15
> encoded file names, making the target file system inconsistent.

As will almost all unix programs, and the unix OS itself.  On Unix,
you can't make the file system inconsistent by doing this, because
filenames are just byte strings with no NULLs.

How _does_ Windows handle this?  Would a Windows program complain, or
would it happily record the gibberish?  I suspect the latter, but
I don't use Windows so I don't know.

> There is a lot of potential for major problems for end users with your
> proposals.  In both cases, what should happen is that the end user gets an
> error, submits a bug, and the programmer figures out how to deal with the
> encoding issues correctly.

What would actually happen is that the user would abandon the program
that didn't work for one (not written in Python) that did.  If the
programmer was lucky they'd get a bug report, which they wouldn't
be able to do anything about since Python wouldn't be providing the
tools to let them fix it (ie: there are currently no bytes interfaces
for environ or the command line in python3).

>> Yes, users can do that (to a degree), but they are still unhappy about
>> it. The approach actually fails for command line arguments
>
> As it should: if I give an ISO8859-15 encoded command line argument to a
> Python program that expects a UTF-8 encoding, the Python program should tell
> me that there is something wrong when it notices that.  Quietly continuing
> is the wrong thing to do.

Imagine you are on a unix system, and you have gotten from somewhere a
file whose name is encoded in something other than UTF-8 (I have a
number of those on my system).  Now imagine that I want to run a python
program against that file, passing the name in on the command line.
I type the program name, the first few (non-mangled) characters, and hit
tab for completion, and my shell automagically puts the escaped bytes
onto the command line.  Or perhaps I cut and paste from an 'ls' listing
into a quoted string on the command line.

Python is now getting the mangled filename passed in on the command
line, and if the python program can't manipulate that file like any
other file on my disk I am going to be mightily pissed.

This is the _reality_ of current unix systems, like it or not.  The same
apparently applies to Windows, though in that case the mangled names may
be fewer and you tend to pick them from a GUI interface rather than do
cut-and-paste or tab completion.

> If we follow your approach, that ISO8859-15 string will get turned into an
> escaped unicode string inside Python.  If I understand your proposal
> correctly, if it's a output file name and gets passed to Python's open
> function, Python will then decode that string and end up with an ISO8859-15
> byte sequence, which it will write to disk literally, even if the encoding
> for the system is UTF-8.   That's the wrong thing to do.

Right.  Like I said, that's what most (almost all) Unix/Linux programs
_do_.

Now, in some future world where everyone (including Windows) acts like
we are hearing OS/X does and rejects the garbled encoding _at the OS
level_, then we'd be able to trust the file system encoding (FSDO trust)
and there would be no need for this PEP or any similar solution.

--David

From l.mastrodomenico at gmail.com  Tue Apr 28 15:01:32 2009
From: l.mastrodomenico at gmail.com (Lino Mastrodomenico)
Date: Tue, 28 Apr 2009 15:01:32 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F6933B.7020705@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>
	<49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de>
	<49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de>
	<49F6933B.7020705@g.nevcal.com>
Message-ID: <cc93256f0904280601i231c78a2l9a7aa1ecd56ad4c0@mail.gmail.com>

2009/4/28 Glenn Linderman <v+python at g.nevcal.com>:
> The switch from PUA to half-surrogates does not resolve the issues with the
> encoding not being a 1-to-1 mapping, though. ?The very fact that you ?think
> you can get away with use of lone surrogates means that other people might,
> accidentally or intentionally, also use lone surrogates for some other
> purpose. ?Even in file names.

It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is
not a valid Unicode character (not a character at all, really) and the
only way you can put this in a POSIX filename is if you use a very
lenient  UTF-8 encoder that gives you b'\xed\xb3\xbf'.

Since this byte sequence doesn't represent a valid character when
decoded with UTF-8, it should simply be considered an invalid UTF-8
sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
'\udcff').

Martin: maybe the PEP should say this explicitly?

Note that the round-trip works without ambiguities between '\udcff' in
the filename:

b'\xed\xb3\xbf' -> '\udced\udcb3\udcbf' -> b'\xed\xb3\xbf'

and b'\xff' in the filename, decoded by Python to '\udcff':

b'\xff' -> '\udcff' -> b'\xff'

-- 
Lino Mastrodomenico

From solipsis at pitrou.net  Tue Apr 28 15:03:46 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 28 Apr 2009 13:03:46 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?=
	=?utf-8?q?haracter=09Interfaces?=
References: <20090427211447.GA4291@cskk.homeip.net>
	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
	<loom.20090428T114723-520@post.gmane.org>
	<7e51d15d0904280537n22168cfl16c58f727be1755e@mail.gmail.com>
Message-ID: <loom.20090428T125933-601@post.gmane.org>

Thomas Breuel <tmbdev <at> gmail.com> writes:
> 
> How can you bring up practical problems against something that hasn't been
implemented?

The PEP is simple enough that you can simulate its effect by manually computing
the resulting unicode string for a hypothetical broken filename. Several people
have already done so in this thread.

> The fact that no other language or library does this is perhaps an indication
that it isn't the right thing to do.

According to some messages, it seems Java and Mono actually use this kind of
workaround. Though I haven't checked (I don't use those languages).

> But the biggest problem with the proposal is that it isn't needed: if you want
to be able to turn arbitrary byte sequences into unicode strings and back, just
set your encoding to iso8859-15.? That already works

That doesn't work at all. With your proposal, any non-ASCII filename will be
unreadable; not only the broken ones.

Antoine.

From hrvoje.niksic at avl.com  Tue Apr 28 15:06:17 2009
From: hrvoje.niksic at avl.com (Hrvoje Niksic)
Date: Tue, 28 Apr 2009 15:06:17 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>
	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>
	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>
	<49F6799F.5030208@v.loewis.de>	<49F6933B.7020705@g.nevcal.com>
	<30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001>
Message-ID: <49F6FF49.6010205@avl.com>

Lino Mastrodomenico wrote:
> Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid character when
> decoded with UTF-8, it should simply be considered an invalid UTF-8
> sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
> '\udcff').

"Should be considered" or "will be considered"?  Python 3.0's UTF-8 
decoder happily accepts it and returns u'\udcff':

 >>> b'\xed\xb3\xbf'.decode('utf-8')
'\udcff'

If the PEP depends on this being changed, it should be mentioned in the PEP.

From solipsis at pitrou.net  Tue Apr 28 15:13:37 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 28 Apr 2009 13:13:37 +0000 (UTC)
Subject: [Python-Dev] lone surrogates in utf-8
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>
	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>
	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>
	<49F6799F.5030208@v.loewis.de>	<49F6933B.7020705@g.nevcal.com>
	<30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001>
	<49F6FF49.6010205@avl.com>
Message-ID: <loom.20090428T131112-377@post.gmane.org>

Hrvoje Niksic <hrvoje.niksic <at> avl.com> writes:
> 
> "Should be considered" or "will be considered"?  Python 3.0's UTF-8 
> decoder happily accepts it and returns u'\udcff':
> 
>  >>> b'\xed\xb3\xbf'.decode('utf-8')
> '\udcff'

Yes, there is already a bug entry for it:
http://bugs.python.org/issue3672

I think we could happily fix it for 3.1 (perhaps leaving 2.7 unchanged for
compatibility reasons - I don't know if some people may rely on the current
behaviour).

From l.mastrodomenico at gmail.com  Tue Apr 28 15:14:19 2009
From: l.mastrodomenico at gmail.com (Lino Mastrodomenico)
Date: Tue, 28 Apr 2009 15:14:19 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F6FF49.6010205@avl.com>
References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com>
	<49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com>
	<49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com>
	<49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com>
	<30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001>
	<49F6FF49.6010205@avl.com>
Message-ID: <cc93256f0904280614xd6847d4y8aad4f24d731fce6@mail.gmail.com>

2009/4/28 Hrvoje Niksic <hrvoje.niksic at avl.com>:
> Lino Mastrodomenico wrote:
>>
>> Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid
>> character when
>> decoded with UTF-8, it should simply be considered an invalid UTF-8
>> sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
>> '\udcff').
>
> "Should be considered" or "will be considered"? ?Python 3.0's UTF-8 decoder
> happily accepts it and returns u'\udcff':
>
>>>> b'\xed\xb3\xbf'.decode('utf-8')
> '\udcff'

Only for the new utf-8b encoding (if Martin agrees), while the
existing utf-8 is fine as is (or at least waaay outside the scope of
this PEP).

-- 
Lino Mastrodomenico

From p.f.moore at gmail.com  Tue Apr 28 15:19:55 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 28 Apr 2009 14:19:55 +0100
Subject: [Python-Dev] One more proposed formatting change for 3.1
In-Reply-To: <5c6f2a5d0904280456k1fa5ade0gad1aad54364002d1@mail.gmail.com>
References: <5c6f2a5d0904280456k1fa5ade0gad1aad54364002d1@mail.gmail.com>
Message-ID: <79990c6b0904280619n21002694j3ba63026cf954f53@mail.gmail.com>

2009/4/28 Mark Dickinson <dickinsm at gmail.com>:
> Here's one more proposed change, this time for formatting
> of floats using format() and the empty presentation type.
> To avoid repeating myself, here's the text from the issue
> I just opened:
>
> http://bugs.python.org/issue5864
>
> """
> In all versions of Python from 2.6 up, I get the following behaviour:
>
>>>> format(123.456, '.4')
> '123.5'
>>>> format(1234.56, '.4')
> '1235.0'
>>>> format(12345.6, '.4')
> '1.235e+04'
>
> The first and third results are as I expect, but the second is somewhat
> misleading: it gives 5 significant digits when only 4 were requested,
> and moreover the last digit is incorrect.
>
> I propose that Python 2.7 and Python 3.1 be changed so that the output
> for the second line above is '1.235e+03'.
> """
>
> This issue seems fairly clear cut to me, and I doubt that there's been
> enough uptake of 'format' yet for this to risk significant breakage. ?So
> unless there are objections I'll plan to make this change before this
> weekend's beta.

+1

From duncan.booth at suttoncourtenay.org.uk  Tue Apr 28 15:22:45 2009
From: duncan.booth at suttoncourtenay.org.uk (Duncan Booth)
Date: Tue, 28 Apr 2009 13:22:45 +0000 (UTC)
Subject: [Python-Dev] PEP 383 (again)
References: <26924021.1861174.1240921767958.JavaMail.xicrypt@atgrzls001>
	<49F6F96F.5050507@avl.com>
Message-ID: <Xns9BFB924C0E8EEduncanrcpcouk@127.0.0.1>

Hrvoje Niksic <hrvoje.niksic at avl.com> wrote:

> Assume a UTF-8 locale.  A file named b'\xff', being an invalid UTF-8 
> sequence, will be converted to the half-surrogate '\udcff'.  However,
> a file named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also be 
> converted to '\udcff'.  Those are quite different POSIX pathnames; how
> will Python know which one it was when I later pass '\udcff' to
> open()? 
> 
> 
> [1]
> I'm assuming that it's valid UTF8 because it passes through Python
> 2.5's '\xed\xb3\xbf'.decode('utf-8').  I don't claim to be a UTF-8
> expert.

I'm not a UTF-8 expert either, but I got bitten by this yesterday. I was 
uploading a file to a Google Search Appliance and it was rejected as 
invalid UTF-8 despite having been encoded into UTF-8 by Python.

The cause was a byte sequence which decoded to a half surrogate similar to 
your example above. Python will happily decode and encode such sequences, 
but as I found to my cost other systems reject them.

Reading wikipedia implies that Python is wrong to accept these sequences 
and I think (though I'm not a lawyer) that RFC 3629 also implies this:

"The definition of UTF-8 prohibits encoding character numbers between 
U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding form 
(as surrogate pairs) and do not directly represent characters."

 and

"Implementations of the decoding algorithm above MUST protect against 
decoding invalid sequences."

From murman at gmail.com  Tue Apr 28 16:00:50 2009
From: murman at gmail.com (Michael Urman)
Date: Tue, 28 Apr 2009 09:00:50 -0500
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <49EEBE2E.3090601@v.loewis.de>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
	<p04330106c61b9f2aad0a@192.168.123.162>
	<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>

On Mon, Apr 27, 2009 at 23:43, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Nobody said we were at the stage of *saving* the [attachment]!

But speaking of saving files, I think that's the biggest hole in this
that has been nagging at the back of my mind. This PEP intends to
allow easy access to filenames and other environment strings which are
not restricted to known encodings. What happens if the detected
encoding changes? There may be difficulties de/serializing these
names, such as for an MRU list.

Since the serialization of the Unicode string is likely to use UTF-8,
and the string for  such a file will include half surrogates, the
application may raise an exception when encoding the names for a
configuration file. These encoding exceptions will be as rare as the
unusual names (which the careful I18N aware developer has probably
eradicated from his system), and thus will appear late.

Or say de/serialization succeeds. Since the resulting Unicode string
differs depending on the encoding (which is a good thing; it is
supposed to make most cases mostly readable), when the filesystem
encoding changes (say from legacy to UTF-8), the "name" changes, and
deserialized references to it become stale.

This can probably be handled through careful use of the same
encoding/decoding scheme, if relevant, but that sounds like we've just
moved the problem from fs/environment access to serialization. Is that
good enough? For other uses the API knew whether it was
environmentally aware, but serialization probably will not. Should
this PEP make recommendations about how to save filenames in
configuration files?

-- 
Michael Urman

From stephen at xemacs.org  Tue Apr 28 16:09:33 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 28 Apr 2009 23:09:33 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
References: <20090427211447.GA4291@cskk.homeip.net>
	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
Message-ID: <87ocugkgyq.fsf@uwakimon.sk.tsukuba.ac.jp>

Paul Moore writes:

 > But it seems to me that there is an assumption that problems will
 > arise when code gets a potentially funny-decoded string and doesn't
 > know where it came from.
 > 
 > Is that a real concern?

Yes, it's a real concern.  I don't think it's possible to show a small
piece of code one could point at and say "without a better API I bet
you can't write this correctly," though.  Rather, my experience with
Emacs and various mail packages is that without type information it is
impossible to keep track of the myriad bits and pieces of text that
are recombining like pig flu, and eventually one breaks out and causes
an error.  It's usually easy to fix, but so are the next hundred
similar regressions, and in the meantime a hundred users have suffered
more or less damage or at least annoyance.

There's no question that dealing with escapes of funny-decoded strings
to uprepared code paths is mission creep compared to Martin's stated
purpose for PEP 383, but it is also a real problem.

From stephen at xemacs.org  Tue Apr 28 16:24:55 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 28 Apr 2009 23:24:55 +0900
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
Message-ID: <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>

Thomas Breuel writes:

 > PEP 383 doesn't make it any easier; it just turns one set of
 > problems into another.

That's false.  There is an interesting class of problems of the form
"get a list of names from the OS and allow the user to select from it,
and retrieve corresponding content."  People are *very* often able to
decode complete gibberish, as long as it's the only gibberish in a
list.  Ditto partial gibberish.  In that case, PEP 383 allows the
content retrieval operation to complete.

There are probably other problems that this PEP solves.

 > Actually, it makes it worse,

Again, it gives you different problems, which may be better and may be
worse according to the user's requirements.  Currently, you often get
an exception, and running the program again is no help.  The user must
clean up the list to make progress.  This may or may not be within the
user's capacity (eg, read-only media).

 > since any problems that show up now show up far from the source of
 > the problem, and since it can lead to security problems and/or data
 > loss.

Yes.  This is a point I have been at pains to argue elsewhere in this
thread.  However, it is "mission creep": Martin didn't volunteer to
write a PEP for it, he volunteered to write a PEP to solve the
"roundtrip the value of os.listdir()" problem.  And he succeeded, up
to some minor details.

 > The problem may well be with the program using the wrong encodings or
 > incorrectly ignoring encoding information.  Furthermore, even if it is user
 > error, the program needs to validate its inputs and put up a meaningful
 > error message, not mangle the disk.  To detect such program bugs, it's
 > important that when Python detects an incorrect encoding that it doesn't
 > quietly continue with an incorrect string.

I agree.  Guido, however, responded that "Practicality beats purity"
to a similar point in the PEP 263 discussion.

Be aware that you're fighting an uphill battle here.

From martin at v.loewis.de  Tue Apr 28 18:46:19 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 18:46:19 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
Message-ID: <49F732DB.8050101@v.loewis.de>

> If we follow your approach, that ISO8859-15 string will get turned into
> an escaped unicode string inside Python.  If I understand your proposal
> correctly, if it's a output file name and gets passed to Python's open
> function, Python will then decode that string and end up with an
> ISO8859-15 byte sequence, which it will write to disk literally, even if
> the encoding for the system is UTF-8.   That's the wrong thing to do.

I don't think anything can, or should be, done about that. If you had
byte-oriented interfaces (as you do in 2.x), exactly the same thing will
happen: the name of the file will be the very same byte sequence as the
one passed on the command line. Most Unix users here agree that this is
the right thing to happen.

Regards,
Martin

From martin at v.loewis.de  Tue Apr 28 18:49:23 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 18:49:23 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <cc93256f0904280601i231c78a2l9a7aa1ecd56ad4c0@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>	
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>	
	<49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de>	
	<49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de>	
	<49F6933B.7020705@g.nevcal.com>
	<cc93256f0904280601i231c78a2l9a7aa1ecd56ad4c0@mail.gmail.com>
Message-ID: <49F73393.90901@v.loewis.de>

> It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is
> not a valid Unicode character (not a character at all, really) and the
> only way you can put this in a POSIX filename is if you use a very
> lenient  UTF-8 encoder that gives you b'\xed\xb3\xbf'.
> 
> Since this byte sequence doesn't represent a valid character when
> decoded with UTF-8, it should simply be considered an invalid UTF-8
> sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
> '\udcff').
> 
> Martin: maybe the PEP should say this explicitly?

Sure, will do.

Regards,
Martin

From martin at v.loewis.de  Tue Apr 28 19:00:37 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 28 Apr 2009 19:00:37 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>	
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>	
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>	
	<loom.20090427T111118-510@post.gmane.org>	
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>	
	<loom.20090427T154536-926@post.gmane.org>	
	<p04330106c61b9f2aad0a@192.168.123.162>	
	<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>
	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
Message-ID: <49F73635.6010105@v.loewis.de>

> Since the serialization of the Unicode string is likely to use UTF-8,
> and the string for  such a file will include half surrogates, the
> application may raise an exception when encoding the names for a
> configuration file. These encoding exceptions will be as rare as the
> unusual names (which the careful I18N aware developer has probably
> eradicated from his system), and thus will appear late.

There are trade-offs to any solution; if there was a solution without
trade-offs, it would be implemented already.

The Python UTF-8 codec will happily encode half-surrogates; people argue
that it is a bug that it does so, however, it would help in this
specific case.

An alternative that doesn't suffer from the risk of not being able to
store decoded strings would have been the use of PUA characters, but
people rejected it because of the potential ambiguities. So they clearly
dislike one risk more than the other. UTF-8b is primarily meant as
an in-memory representation.

> Or say de/serialization succeeds. Since the resulting Unicode string
> differs depending on the encoding (which is a good thing; it is
> supposed to make most cases mostly readable), when the filesystem
> encoding changes (say from legacy to UTF-8), the "name" changes, and
> deserialized references to it become stale.

That problem has nothing to do with the PEP. If the encoding changes,
LRU entries may get stale even if there were no encoding errors at
all. Suppose the old encoding was Latin-1, and the new encoding is
KOI8-R, then all file names are decodable before and afterwards, yet
the string representation changes. Applications that want to protect
themselves against that happening need to store byte representations
of the file names, not character representations. Depending on the
configuration file format, that may or may not be possible.

I find the case pretty artificial, though: if the locale encoding
changes, all file names will look incorrect to the user, so he'll
quickly switch back, or rename all the files. As an application
supporting a LRU list, I would remove/hide all entries that don't
correlate to existing files - after all, the user may have as well
deleted the file in the LRU list.

Regards,
Martin

From martin at v.loewis.de  Tue Apr 28 19:08:37 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 19:08:37 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F6FF49.6010205@avl.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<49F6933B.7020705@g.nevcal.com>	<30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001>
	<49F6FF49.6010205@avl.com>
Message-ID: <49F73815.1010806@v.loewis.de>

> If the PEP depends on this being changed, it should be mentioned in the
> PEP.

The PEP says that the utf-8b codec decodes invalid bytes into low
surrogates. I have now clarified that a strict definition of UTF-8
is assumed for utf-8b.

Regards,
Martin

From foom at fuhm.net  Tue Apr 28 19:53:42 2009
From: foom at fuhm.net (James Y Knight)
Date: Tue, 28 Apr 2009 13:53:42 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F6A71A.3020809@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>
	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>
	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>
	<49F6799F.5030208@v.loewis.de>
	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>
	<49F6A71A.3020809@v.loewis.de>
Message-ID: <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>

On Apr 28, 2009, at 2:50 AM, Martin v. L?wis wrote:

> James Y Knight wrote:
>> Hopefully it can be assumed that your locale encoding really is a
>> non-overlapping superset of ASCII, as is required by POSIX...
>
> Can you please point to the part of the POSIX spec that says that
> such overlapping is forbidden?

I can't find it...I would've thought it would be on this page:
http://opengroup.org/onlinepubs/007908775/xbd/charset.html
but it's not (at least, not obviously). That does say (effectively)  
that all encodings must be supersets of ASCII and use the same  
codepoints, though.

However, ISO-2022 being inappropriate for LC_CTYPE usage is the entire  
reason why EUC-JP was created, so I'm pretty sure that it is in fact  
inappropriate, and I cannot find any evidence of it ever being used on  
any system.

 From http://en.wikipedia.org/wiki/EUC-JP:
"To get the EUC form of an ISO-2022 character, the most significant  
bit of each 7-bit byte of the original ISO 2022 codes is set (by  
adding 128 to each of these original 7-bit codes); this allows  
software to easily distinguish whether a particular byte in a  
character string belongs to the ISO-646 code or the ISO-2022 (EUC)  
code."

Also:
http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html

>> I'm a bit scared at the prospect that U+DCAF could turn into "/",  
>> that
>> just screams security vulnerability to me.  So I'd like to propose  
>> that
>> only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be
>> encoded/decoded via the error handler.
>
> It would be actually U+DC2f that would turn into /.

Yes, I meant to say DC2F, sorry for the confusion.

> I'm happy to exclude that range from the mapping if POSIX really
> requires an encoding not to be overlapping with ASCII.

I think it has to be excluded from mapping in order to not introduce  
security issues.

However...

There's also SHIFT-JIS to worry about...which apparently some people  
actually want to use as their default encoding, despite it being  
broken to do so. RedHat apparently refuses to provide it as a locale  
charset (due to its brokenness), and it's also not available by  
default on my Debian system. People do unfortunately seem to actually  
use it in real life.

https://bugzilla.redhat.com/show_bug.cgi?id=136290

So, I'd like to propose this:
The "python-escape" error handler when given a non-decodable byte from  
0x80 to 0xFF will produce values of U+DC80 to U+DCFF. When given a non- 
decodable byte from 0x00 to 0x7F, it will be converted to U+0000-U 
+007F. On the encoding side, values from U+DC80 to U+DCFF are encoded  
into 0x80 to 0xFF, and all other characters are treated in whatever  
way the encoding would normally treat them.

This proposal obviously works for all non-overlapping ASCII supersets,  
where 0x00 to 0x7F always decode to U+00 to U+7F. But it also works  
for Shift-JIS and other similar ASCII-supersets with overlaps in  
trailing bytes of a multibyte sequence. So, a sequence like  
"\x81\xFD".decode("shift-jis", "python-escape") will turn into  
u"\uDC81\u00fd". Which will then properly encode back into "\x81\xFD".

The character sets this *doesn't* work for are: ebcdic code pages  
(obviously completely unsuitable for a locale encoding on unix),  
iso2022-* (covered above), and shift-jisx0213 (because it has replaced  
\ with yen, and - with overline).

If it's desirable to work with shift_jisx0213, a modification of the  
proposal can be made: Change the second sentence to: "When given a non- 
decodable byte from 0x00 to 0x7F, that byte must be the second or  
later byte in a multibyte sequence. In such a case, the error handler  
will produce the encoding of that byte if it was standing alone (thus  
in most encodings, \x00-\x7f turn into U+00-U+7F)."

It sounds from https://bugzilla.novell.com/show_bug.cgi?id=162501 like  
some people do actually use shift_jisx0213, unfortunately.

James

From tmbdev at gmail.com  Tue Apr 28 20:38:44 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Tue, 28 Apr 2009 20:38:44 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> 
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> 
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> 
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>

>
> However, it is "mission creep": Martin didn't volunteer to
> write a PEP for it, he volunteered to write a PEP to solve the
> "roundtrip the value of os.listdir()" problem.  And he succeeded, up
> to some minor details.

Yes, it solves that problem.  But that doesn't come without cost.

Most importantly, now Python writes illegal UTF-8 strings even if the user
chose a UTF-8 encoding.   That means that illegal UTF-8 encodings can
propagate anywhere, without warning.

Furthermore, I don't believe that PEP 383 works consistently on Windows, and
it causes programs to behave differently in unintuitive ways on Windows and
Linux.

I'll suggest an alternative in a separate message.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/dee9d7a3/attachment.htm>

From martin at v.loewis.de  Tue Apr 28 20:45:57 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 20:45:57 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<49F6A947.1050106@v.loewis.de>	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
	<20090428075806.GB23828@phd.pp.ru>	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>
Message-ID: <49F74EE5.6060305@v.loewis.de>

> Furthermore, I don't believe that PEP 383 works consistently on Windows,

What makes you say that? PEP 383 will have no effect on Windows,
compared to the status quo, whatsoever.

Regards,
Martin

From v+python at g.nevcal.com  Tue Apr 28 20:48:37 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Tue, 28 Apr 2009 11:48:37 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F73635.6010105@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de>
Message-ID: <49F74F85.9010800@g.nevcal.com>

On approximately 4/28/2009 10:00 AM, came the following characters from 
the keyboard of Martin v. L?wis:

> An alternative that doesn't suffer from the risk of not being able to
> store decoded strings would have been the use of PUA characters, but
> people rejected it because of the potential ambiguities. So they clearly
> dislike one risk more than the other. UTF-8b is primarily meant as
> an in-memory representation.

The UTF-8b representation suffers from the same potential ambiguities as 
the PUA characters... perhaps slightly less likely in practice, due to 
the use of Unicode-illegal characters, but exactly the same theoretical 
likelihood in the space of Python-acceptable character codes.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From google at mrabarnett.plus.com  Tue Apr 28 20:55:09 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Tue, 28 Apr 2009 19:55:09 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>	<49F6A71A.3020809@v.loewis.de>
	<873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>
Message-ID: <49F7510D.7070603@mrabarnett.plus.com>

James Y Knight wrote:
> 
> On Apr 28, 2009, at 2:50 AM, Martin v. L?wis wrote:
> 
>> James Y Knight wrote:
>>> Hopefully it can be assumed that your locale encoding really is a
>>> non-overlapping superset of ASCII, as is required by POSIX...
>>
>> Can you please point to the part of the POSIX spec that says that
>> such overlapping is forbidden?
> 
> I can't find it...I would've thought it would be on this page:
> http://opengroup.org/onlinepubs/007908775/xbd/charset.html
> but it's not (at least, not obviously). That does say (effectively) that 
> all encodings must be supersets of ASCII and use the same codepoints, 
> though.
> 
> However, ISO-2022 being inappropriate for LC_CTYPE usage is the entire 
> reason why EUC-JP was created, so I'm pretty sure that it is in fact 
> inappropriate, and I cannot find any evidence of it ever being used on 
> any system.
> 
>  From http://en.wikipedia.org/wiki/EUC-JP:
> "To get the EUC form of an ISO-2022 character, the most significant bit 
> of each 7-bit byte of the original ISO 2022 codes is set (by adding 128 
> to each of these original 7-bit codes); this allows software to easily 
> distinguish whether a particular byte in a character string belongs to 
> the ISO-646 code or the ISO-2022 (EUC) code."
> 
> Also:
> http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html
> 
> 
>>> I'm a bit scared at the prospect that U+DCAF could turn into "/", that
>>> just screams security vulnerability to me.  So I'd like to propose that
>>> only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be
>>> encoded/decoded via the error handler.
>>
>> It would be actually U+DC2f that would turn into /.
> 
> Yes, I meant to say DC2F, sorry for the confusion.
> 
>> I'm happy to exclude that range from the mapping if POSIX really
>> requires an encoding not to be overlapping with ASCII.
> 
> I think it has to be excluded from mapping in order to not introduce 
> security issues.
> 
> However...
> 
> There's also SHIFT-JIS to worry about...which apparently some people 
> actually want to use as their default encoding, despite it being broken 
> to do so. RedHat apparently refuses to provide it as a locale charset 
> (due to its brokenness), and it's also not available by default on my 
> Debian system. People do unfortunately seem to actually use it in real 
> life.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=136290
> 
> So, I'd like to propose this:
> The "python-escape" error handler when given a non-decodable byte from 
> 0x80 to 0xFF will produce values of U+DC80 to U+DCFF. When given a 
> non-decodable byte from 0x00 to 0x7F, it will be converted to 
> U+0000-U+007F. On the encoding side, values from U+DC80 to U+DCFF are 
> encoded into 0x80 to 0xFF, and all other characters are treated in 
> whatever way the encoding would normally treat them.
> 
> This proposal obviously works for all non-overlapping ASCII supersets, 
> where 0x00 to 0x7F always decode to U+00 to U+7F. But it also works for 
> Shift-JIS and other similar ASCII-supersets with overlaps in trailing 
> bytes of a multibyte sequence. So, a sequence like 
> "\x81\xFD".decode("shift-jis", "python-escape") will turn into 
> u"\uDC81\u00fd". Which will then properly encode back into "\x81\xFD".
> 
> The character sets this *doesn't* work for are: ebcdic code pages 
> (obviously completely unsuitable for a locale encoding on unix), 
> iso2022-* (covered above), and shift-jisx0213 (because it has replaced \ 
> with yen, and - with overline).
> 
> If it's desirable to work with shift_jisx0213, a modification of the 
> proposal can be made: Change the second sentence to: "When given a 
> non-decodable byte from 0x00 to 0x7F, that byte must be the second or 
> later byte in a multibyte sequence. In such a case, the error handler 
> will produce the encoding of that byte if it was standing alone (thus in 
> most encodings, \x00-\x7f turn into U+00-U+7F)."
> 
> It sounds from https://bugzilla.novell.com/show_bug.cgi?id=162501 like 
> some people do actually use shift_jisx0213, unfortunately.
> 
I've been thinking of "python-escape" only in terms of UTF-8, the only
encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are
decodable.

But if you're talking about using it with other encodings, eg
shift-jisx0213, then I'd suggest the following:

1. Bytes 0x00 to 0xFF which can't normally be decoded are decoded to
half surrogates U+DC00 to U+DCFF.

2. Bytes which would have decoded to half surrogates U+DC00 to U+DCFF
are treated as though they are undecodable bytes.

3. Half surrogates U+DC00 to U+DCFF which can be produced by decoding
are encoded to bytes 0x00 to 0xFF.

4. Codepoints, including half surrogates U+DC00 to U+DCFF, which can't
be produced by decoding raise an exception.

I think I've covered all the possibilities. :-)

From tmbdev at gmail.com  Tue Apr 28 21:01:58 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Tue, 28 Apr 2009 21:01:58 +0200
Subject: [Python-Dev] a suggestion ... Re:  PEP 383 (again)
Message-ID: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>

I think we should break up this problem into several parts:

(1) Should the default UTF-8 decoder fail if it gets an illegal byte
sequence.

It's probably OK for the default decoder to be lenient in some way (see
below).

(2) Should the default UTF-8 encoder for file system operations be allowed
to generate illegal byte sequences?

I think that's a definite no; if I set the encoding for a device to UTF-8, I
never want Python to try to write illegal UTF-8 strings to my device.

(3) What kind of representation should the UTF-8 decoder return for illegal
inputs?

There are actually several choices: (a) it could guess what the actual
encoding is and use that, (b) it could return a valid unicode string that
indicates the illegal characters but does not re-encode to the original byte
sequence, or (c) it could return some kind of non-standard representation
that encodes back into the original byte sequence.

PEP 383 violated (2), and I think that's a bad thing.

I think the best solution would be to use (3a) and fall back to (3b) if that
doesn't work.  If people try to write those strings, they will always get
written as correctly encoded UTF-8 strings.

If people really want the option of (3c), then I think encoders related to
the file system should by default reject those strings as illegal because
the potential problems from writing them are just too serious.  Printing
routines and UI routines could display them without error (but some clear
indication), of course.

There is yet another option, which is arguably the "right" one: make the
results of os.listdir() subclasses of string that keep track of where they
came from.  If you write back to the same device, it just writes the same
byte sequence.  But if you write to other devices and the byte sequence is
illegal according to its encoding, you get an error.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/1e542f10/attachment.htm>

From zooko at zooko.com  Tue Apr 28 20:51:43 2009
From: zooko at zooko.com (Zooko O'Whielacronx)
Date: Tue, 28 Apr 2009 12:51:43 -0600
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F6FA93.7080302@avl.com>
References: <20090427211447.GA4291@cskk.homeip.net>	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
	<loom.20090428T114723-520@post.gmane.org>
	<15546941.1861678.1240922288709.JavaMail.xicrypt@atgrzls001>
	<49F6FA93.7080302@avl.com>
Message-ID: <F1A4F1B6-94BC-4588-A328-0737288BC58B@zooko.com>

On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote:

> Are you proposing to unconditionally encode file names as  
> iso8859-15, or to do so only when undecodeable bytes are encountered?

For what it is worth, what we have previously planned to do for the  
Tahoe project is the second of these -- decode using some 1-byte  
encoding such as iso-8859-1, iso-8859-15, or windows-1252 only in the  
case that attempting to decode the bytes using the local alleged  
encoding failed.

> If you switch to iso8859-15 only in the presence of undecodable  
> UTF-8, then you have the same round-trip problem as the PEP: both  
> b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a  
> way to unambiguously recover the original file name.

Why do you say that?  It seems to work as I expected here:

 >>> '\xff'.decode('iso-8859-15')
u'\xff'
 >>> '\xc3\xbf'.decode('iso-8859-15')
u'\xc3\xbf'
 >>>
 >>>
 >>>
 >>> '\xff'.decode('cp1252')
u'\xff'
 >>> '\xc3\xbf'.decode('cp1252')
u'\xc3\xbf'

Regards,

Zooko

From google at mrabarnett.plus.com  Tue Apr 28 21:04:35 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Tue, 28 Apr 2009 20:04:35 +0100
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F74EE5.6060305@v.loewis.de>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>	<49F6A947.1050106@v.loewis.de>	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>	<20090428075806.GB23828@phd.pp.ru>	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>
	<49F74EE5.6060305@v.loewis.de>
Message-ID: <49F75343.9020602@mrabarnett.plus.com>

Martin v. L?wis wrote:
>> Furthermore, I don't believe that PEP 383 works consistently on Windows,
> 
> What makes you say that? PEP 383 will have no effect on Windows,
> compared to the status quo, whatsoever.
> 
You could argue that if Windows is actually returning UTF-16 with half
surrogates that they should be altered to conform to what UTF-8 would
have returned.

From v+python at g.nevcal.com  Tue Apr 28 21:07:54 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Tue, 28 Apr 2009 12:07:54 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>	<49F6A71A.3020809@v.loewis.de>
	<873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>
Message-ID: <49F7540A.9010500@g.nevcal.com>

On approximately 4/28/2009 10:53 AM, came the following characters from 
the keyboard of James Y Knight:
> 
> On Apr 28, 2009, at 2:50 AM, Martin v. L?wis wrote:
> 
>> James Y Knight wrote:
>>> Hopefully it can be assumed that your locale encoding really is a
>>> non-overlapping superset of ASCII, as is required by POSIX...
>>
>> Can you please point to the part of the POSIX spec that says that
>> such overlapping is forbidden?
> 
> I can't find it...I would've thought it would be on this page:
> http://opengroup.org/onlinepubs/007908775/xbd/charset.html
> but it's not (at least, not obviously). That does say (effectively) that 
> all encodings must be supersets of ASCII and use the same codepoints, 
> though.
> 
> However, ISO-2022 being inappropriate for LC_CTYPE usage is the entire 
> reason why EUC-JP was created, so I'm pretty sure that it is in fact 
> inappropriate, and I cannot find any evidence of it ever being used on 
> any system.

It would seem from the definition of ISO-2022 that what it calls "escape 
sequences" is in your POSIX spec called "locking-shift encoding". 
Therefore, the second bullet item under the "Character Encoding" heading 
prohibits use of ISO-2022, for whatever uses that document defines 
(which, since you referenced it, I assume means locales, and possibly 
file system encodings, but I'm not familiar with the structure of all 
the POSIX standards documents).

A locking-shift encoding (where the state of the character is determined 
by a shift code that may affect more than the single character following 
it) cannot be defined with the current character set description file 
format. Use of a locking-shift encoding with any of the standard 
utilities in the XCU specification or with any of the functions in the 
XSH specification that do not specifically mention the effects of 
state-dependent encoding is implementation-dependent.

>  From http://en.wikipedia.org/wiki/EUC-JP:
> "To get the EUC form of an ISO-2022 character, the most significant bit 
> of each 7-bit byte of the original ISO 2022 codes is set (by adding 128 
> to each of these original 7-bit codes); this allows software to easily 
> distinguish whether a particular byte in a character string belongs to 
> the ISO-646 code or the ISO-2022 (EUC) code."
> 
> Also:
> http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html
> 
> 
>>> I'm a bit scared at the prospect that U+DCAF could turn into "/", that
>>> just screams security vulnerability to me.  So I'd like to propose that
>>> only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be
>>> encoded/decoded via the error handler.
>>
>> It would be actually U+DC2f that would turn into /.
> 
> Yes, I meant to say DC2F, sorry for the confusion.
> 
>> I'm happy to exclude that range from the mapping if POSIX really
>> requires an encoding not to be overlapping with ASCII.
> 
> I think it has to be excluded from mapping in order to not introduce 
> security issues.
> 
> However...
> 
> There's also SHIFT-JIS to worry about...which apparently some people 
> actually want to use as their default encoding, despite it being broken 
> to do so. RedHat apparently refuses to provide it as a locale charset 
> (due to its brokenness), and it's also not available by default on my 
> Debian system. People do unfortunately seem to actually use it in real 
> life.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=136290
> 
> So, I'd like to propose this:
> The "python-escape" error handler when given a non-decodable byte from 
> 0x80 to 0xFF will produce values of U+DC80 to U+DCFF. When given a 
> non-decodable byte from 0x00 to 0x7F, it will be converted to 
> U+0000-U+007F. On the encoding side, values from U+DC80 to U+DCFF are 
> encoded into 0x80 to 0xFF, and all other characters are treated in 
> whatever way the encoding would normally treat them.
> 
> This proposal obviously works for all non-overlapping ASCII supersets, 
> where 0x00 to 0x7F always decode to U+00 to U+7F. But it also works for 
> Shift-JIS and other similar ASCII-supersets with overlaps in trailing 
> bytes of a multibyte sequence. So, a sequence like 
> "\x81\xFD".decode("shift-jis", "python-escape") will turn into 
> u"\uDC81\u00fd". Which will then properly encode back into "\x81\xFD".
> 
> The character sets this *doesn't* work for are: ebcdic code pages 
> (obviously completely unsuitable for a locale encoding on unix), 

Why is that obvious?  The only thing I saw that could exclude EBCDIC 
would be the requirement that the codes be positive in a char, but on a 
system where the C compiler treats char as unsigned, EBCDIC would qualify.

Of course, the use of EBCDIC would also restrict the other possible code 
pages to those derived from EBCDIC (rather than the bulk of code pages 
that are derived from ASCII), due to:

If the encoded values associated with each member of the portable 
character set are not invariant across all locales supported by the 
implementation, the results achieved by an application accessing those 
locales are unspecified.

> iso2022-* (covered above), and shift-jisx0213 (because it has replaced \ 
> with yen, and - with overline).
> 
> If it's desirable to work with shift_jisx0213, a modification of the 
> proposal can be made: Change the second sentence to: "When given a 
> non-decodable byte from 0x00 to 0x7F, that byte must be the second or 
> later byte in a multibyte sequence. In such a case, the error handler 
> will produce the encoding of that byte if it was standing alone (thus in 
> most encodings, \x00-\x7f turn into U+00-U+7F)."
> 
> It sounds from https://bugzilla.novell.com/show_bug.cgi?id=162501 like 
> some people do actually use shift_jisx0213, unfortunately.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From tmbdev at gmail.com  Tue Apr 28 21:24:40 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Tue, 28 Apr 2009 21:24:40 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F74EE5.6060305@v.loewis.de>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> 
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> 
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> 
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> 
	<49F74EE5.6060305@v.loewis.de>
Message-ID: <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>

On Tue, Apr 28, 2009 at 20:45, "Martin v. L?wis" <martin at v.loewis.de> wrote:

> > Furthermore, I don't believe that PEP 383 works consistently on Windows,
>
> What makes you say that? PEP 383 will have no effect on Windows,
> compared to the status quo, whatsoever.
>

That's what you believe, but it's not clear to me that that follows from
your proposal.

Your proposal says that utf-8b would be used for file systems, but then you
also say that it might be used for command line arguments and environment
variables.  So, which specific APIs will it be used with on Windows and on
POSIX systems?   Or will utf-8b simply not be available on Windows at all?
What happens if I create a Python version of tar, utf-8b strings slip in
there, and I try to use them on Windows?

You also assume that all Windows file system functions strictly conform to
UTF-16 in practice (not just on paper).  Have you verified that?  It
certainly isn't true across all versions of Windows (since NT originally
used UCS-2).   What's the situation on Windows CE?

Another question on Linux: what happens when I decode a file system path
with utf-8b and then pass the resulting unicode string to Gnome?  To Qt?  To
windows.forms?  To Java?  To a unicode regular expression library?  To
wprintf?  AFAIK, the behavior of most libraries is undefined for the kinds
of unicode strings you construct, and it may be undefined in a bad way
(crash, buffer overflow, whatever).

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/e6c2200c/attachment.htm>

From zooko at zooko.com  Tue Apr 28 21:50:55 2009
From: zooko at zooko.com (Zooko O'Whielacronx)
Date: Tue, 28 Apr 2009 13:50:55 -0600
Subject: [Python-Dev] a suggestion ... Re:  PEP 383 (again)
In-Reply-To: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
Message-ID: <944CCCB4-36E5-40D5-8F69-67C45F7FD640@zooko.com>

On Apr 28, 2009, at 13:01 PM, Thomas Breuel wrote:

> (2) Should the default UTF-8 encoder for file system operations be  
> allowed to generate illegal byte sequences?
>
> I think that's a definite no; if I set the encoding for a device to  
> UTF-8, I never want Python to try to write illegal UTF-8 strings to  
> my device.
...
> If people really want the option of (3c), then I think encoders  
> related to the file system should by default reject those strings  
> as illegal because the potential problems from writing them are  
> just too serious.  Printing routines and UI routines could display  
> them without error (but some clear indication), of course.

For what it is worth, sometimes we have to write bytes to a POSIX  
filesystem even though those bytes are not the encoding of any string  
in the filesystem's "alleged encoding".  The reason is that it is  
common for there to be filenames which are not the encodings of  
anything in the filesystem's alleged encoding, and the user expects  
my tool (Tahoe-LAFS [1]) to copy that name to a distributed storage  
grid and then copy it back unchanged.  Even though, I re-iterate,  
that name is *not* a valid encoding of anything in the current encoding.

This doesn't argue that this behavior has to be the *default*  
behavior, but it is sometimes necessary.

It's too bad that POSIX is so far behind Mac OS X in this respect.   
(Also so far behind Windows, but I use Mac as the example to show how  
it is possible to build a better system on top of POSIX.)  Hopefully  
David Wheeler's proposals to tighten the requirements in Linux  
filesystems will catch on: [2].

Regards,

Zooko

[1] http://allmydata.org
[2] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

From martin at v.loewis.de  Tue Apr 28 22:04:12 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 22:04:12 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>
	<49F74EE5.6060305@v.loewis.de>
	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>
Message-ID: <49F7613C.9000901@v.loewis.de>

> Your proposal says that utf-8b would be used for file systems, but then
> you also say that it might be used for command line arguments and
> environment variables.  So, which specific APIs will it be used with on
> Windows and on POSIX systems?

On Windows, the Wide APIs are already used throughout the code base,
e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the
specific API for a specific functionality, please read the source code.

> Or will utf-8b simply not be available
> on Windows at all?

It will be available, but it won't be used automatically for
anything.

> What happens if I create a Python version of tar,
> utf-8b strings slip in there, and I try to use them on Windows?

No need to create it - the tarfile module is already there. By
"in there", do you mean on the file system, or in the tarfile?

> You also assume that all Windows file system functions strictly conform
> to UTF-16 in practice (not just on paper).  Have you verified that?

No, I don't assume that. I assume that all functions are strictly
available in a Wide character version, and have verified that they are.

> What's the situation on Windows CE?

I can't see how this question is relevant to the PEP. The PEP says this:

# On Windows, Python uses the wide character APIs to access
# character-oriented APIs, allowing direct conversion of the
# environmental data to Python str objects.

This is what it already does, and this is what it will continue to do.

> Another question on Linux: what happens when I decode a file system path
> with utf-8b and then pass the resulting unicode string to Gnome?  To
> Qt?

You probably get moji-bake, or an error, I didn't try.

> To windows.forms?  To Java?

How do you do that, on Linux?

> To a unicode regular expression library?

You mean, SRE? SRE will match the code points as individual characters,
class Cs. You should have been able to find out that for yourself.

> To wprintf?

Depends on the wprintf implementation.

> AFAIK, the behavior of most libraries is
> undefined for the kinds of unicode strings you construct, and it may be
> undefined in a bad way (crash, buffer overflow, whatever).

Indeed so. This is intentional. If you can crash Python that way,
nothing gets worse by this PEP - you can then *already* crash Python
in that way.

Regards,
Martin

From martin at v.loewis.de  Tue Apr 28 22:05:22 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 22:05:22 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F75343.9020602@mrabarnett.plus.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>	<49F6A947.1050106@v.loewis.de>	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>	<20090428075806.GB23828@phd.pp.ru>	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>	<49F74EE5.6060305@v.loewis.de>
	<49F75343.9020602@mrabarnett.plus.com>
Message-ID: <49F76182.9010000@v.loewis.de>

MRAB wrote:
> Martin v. L?wis wrote:
>>> Furthermore, I don't believe that PEP 383 works consistently on Windows,
>>
>> What makes you say that? PEP 383 will have no effect on Windows,
>> compared to the status quo, whatsoever.
>>
> You could argue that if Windows is actually returning UTF-16 with half
> surrogates that they should be altered to conform to what UTF-8 would
> have returned.

Perhaps - but this is not what the PEP specifies (and intentionally so).

Regards,
Martin

From v+python at g.nevcal.com  Tue Apr 28 22:16:34 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Tue, 28 Apr 2009 13:16:34 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F7510D.7070603@mrabarnett.plus.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>	<49F6A71A.3020809@v.loewis.de>	<873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>
	<49F7510D.7070603@mrabarnett.plus.com>
Message-ID: <49F76422.4010806@g.nevcal.com>

On approximately 4/28/2009 11:55 AM, came the following characters from 
the keyboard of MRAB:
> I've been thinking of "python-escape" only in terms of UTF-8, the only
> encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are
> decodable.

UTF-8 is only mentioned in the sense of having special handling for 
re-encoding; all the other locales/encodings are implicit.  But I also 
went down that path to some extent.

> But if you're talking about using it with other encodings, eg
> shift-jisx0213, then I'd suggest the following:
> 
> 1. Bytes 0x00 to 0xFF which can't normally be decoded are decoded to
> half surrogates U+DC00 to U+DCFF.

This makes 256 different escape codes.

> 2. Bytes which would have decoded to half surrogates U+DC00 to U+DCFF
> are treated as though they are undecodable bytes.

This provides escaping for the 256 different escape codes, which is 
lacking from the PEP.

> 3. Half surrogates U+DC00 to U+DCFF which can be produced by decoding
> are encoded to bytes 0x00 to 0xFF.

This reverses the escaping.

> 4. Codepoints, including half surrogates U+DC00 to U+DCFF, which can't
> be produced by decoding raise an exception.

This is confusing.  Did you mean "excluding" instead of "including"?

> I think I've covered all the possibilities. :-)

You might have.  Seems like there could be a simpler scheme, though...

1. Define an escape codepoint.  It could be U+003F or U+DC00 or U+F817 
or pretty much any defined Unicode codepoint outside the range U+0100 to 
U+01FF (see rule 3 for why).  Only one escape codepoint is needed, this 
is easier for humans to comprehend.

2. When the escape codepoint is decoded from the byte stream for a bytes 
interface or found in a str on the str interface, double it.

3. When an undecodable byte 0xPQ is found, decode to the escape 
codepoint, followed by codepoint U+01PQ, where P and Q are hex digits.

4. When encoding, a sequence of two escape codepoints would be encoded 
as one escape codepoint, and a sequence of the escape codepoint followed 
by codepoint U+01PQ would be encoded as byte 0xPQ.  Escape codepoints 
not followed by the escape codepoint, or by a codepoint in the range 
U+0100 to U+01FF would raise an exception.

5. Provide functions that will perform the same decoding and encoding as 
would be done by the system calls, for both bytes and str interfaces.

This differs from my previous proposal in three ways:

A. Doesn't put a marker at the beginning of the string (which I said 
wasn't necessary even then).

B. Allows for a choice of escape codepoint, the previous proposal 
suggested a specific one.  But the final solution will only have a 
single one, not a user choice, but an implementation choice.

C. Uses the range U+0100 to U+01FF for the escape codes, rather than 
U+0000 to U+00FF.  This avoids introducing the NULL character and escape 
characters into the decoded str representation, yet still uses 
characters for which glyphs are commonly available, are non-combining, 
and are easily distinguishable one from another.

Rationale:

The use of codepoints with visible glyphs makes the escaped string 
friendlier to display systems, and to people.  I still recommend using 
U+003F as the escape codepoint, but certainly one with a typcially 
visible glyph available.  This avoids what I consider to be an annoyance 
with the PEP, that the codepoints used are not ones that are easily 
displayed, so endecodable names could easily result in long strings of 
indistinguishable substitution characters.

It, like MRAB's proposal, also avoids data puns, which is a major 
problem with the PEP.  I consider this proposal to be easier to 
understand than MRAB's proposal, or the PEP, because of the single 
escape codepoint and the use of visible characters.

This proposal, like my initial one, also decodes and encodes (just the 
escape codes) values on the str interfaces.  This is necessary to avoid 
data puns on systems that provide both types of interfaces.

This proposal could be used for programs that use str values, and easily 
migrates to a solution that provides an object that provides an 
abstraction for system interfaces that have two forms.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at v.loewis.de  Tue Apr 28 22:25:07 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 22:25:07 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F74F85.9010800@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
Message-ID: <49F76623.8060903@v.loewis.de>

> The UTF-8b representation suffers from the same potential ambiguities as
> the PUA characters... 

Not at all the same ambiguities. Here, again, the two choices:

A. use PUA characters to represent undecodable bytes, in particular for
   UTF-8 (the PEP actually never proposed this to happen).
   This introduces an ambiguity: two different files in the same
   directory may decode to the same string name, if one has the PUA
   character, and the other has a non-decodable byte that gets decoded
   to the same PUA character.

B. use UTF-8b, representing the byte will ill-formed surrogate codes.
   The same ambiguity does *NOT* exist. If a file on disk already
   contains an invalid surrogate code in its file name, then the UTF-8b
   decoder will recognize this as invalid, and decode it byte-for-byte,
   into three surrogate codes. Hence, the file names that are different
   on disk are also different in memory. No ambiguity.

Regards,
Martin

From v+python at g.nevcal.com  Tue Apr 28 22:34:21 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Tue, 28 Apr 2009 13:34:21 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <cc93256f0904280601i231c78a2l9a7aa1ecd56ad4c0@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>	
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>	
	<49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de>	
	<49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de>	
	<49F6933B.7020705@g.nevcal.com>
	<cc93256f0904280601i231c78a2l9a7aa1ecd56ad4c0@mail.gmail.com>
Message-ID: <49F7684D.4040904@g.nevcal.com>

On approximately 4/28/2009 6:01 AM, came the following characters from 
the keyboard of Lino Mastrodomenico:
> 2009/4/28 Glenn Linderman <v+python at g.nevcal.com>:
>> The switch from PUA to half-surrogates does not resolve the issues with the
>> encoding not being a 1-to-1 mapping, though.  The very fact that you  think
>> you can get away with use of lone surrogates means that other people might,
>> accidentally or intentionally, also use lone surrogates for some other
>> purpose.  Even in file names.
> 
> It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is
> not a valid Unicode character (not a character at all, really) and the
> only way you can put this in a POSIX filename is if you use a very
> lenient  UTF-8 encoder that gives you b'\xed\xb3\xbf'.

Wrong.

An 8859-1 locale allows any byte sequence to placed into a POSIX filename.

And while U+DCFF is illegal alone in Unicode, it is not illegal in 
Python str values.  And from my testing, Python 3's current UTF-8 
encoder will happily provide exactly the bytes value you mention when 
given U+DCFF.

> Since this byte sequence doesn't represent a valid character when
> decoded with UTF-8, it should simply be considered an invalid UTF-8
> sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
> '\udcff').
> 
> Martin: maybe the PEP should say this explicitly?
> 
> Note that the round-trip works without ambiguities between '\udcff' in
> the filename:
> 
> b'\xed\xb3\xbf' -> '\udced\udcb3\udcbf' -> b'\xed\xb3\xbf'
> 
> and b'\xff' in the filename, decoded by Python to '\udcff':
> 
> b'\xff' -> '\udcff' -> b'\xff'

Others have made this suggestion, and it is helpful to the PEP, but not 
sufficient.  As implemented as an error handler, I'm not sure that the 
b'\xed\xb3\xbf' sequence would trigger the error handler, if the UTF-8 
decoder is happy with it.  Which, in my testing, it is.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at g.nevcal.com  Tue Apr 28 22:37:07 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Tue, 28 Apr 2009 13:37:07 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F76623.8060903@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de>
Message-ID: <49F768F3.8080304@g.nevcal.com>

On approximately 4/28/2009 1:25 PM, came the following characters from 
the keyboard of Martin v. L?wis:
>> The UTF-8b representation suffers from the same potential ambiguities as
>> the PUA characters... 
> 
> Not at all the same ambiguities. Here, again, the two choices:
> 
> A. use PUA characters to represent undecodable bytes, in particular for
>    UTF-8 (the PEP actually never proposed this to happen).
>    This introduces an ambiguity: two different files in the same
>    directory may decode to the same string name, if one has the PUA
>    character, and the other has a non-decodable byte that gets decoded
>    to the same PUA character.
> 
> B. use UTF-8b, representing the byte will ill-formed surrogate codes.
>    The same ambiguity does *NOT* exist. If a file on disk already
>    contains an invalid surrogate code in its file name, then the UTF-8b
>    decoder will recognize this as invalid, and decode it byte-for-byte,
>    into three surrogate codes. Hence, the file names that are different
>    on disk are also different in memory. No ambiguity.

C. File on disk with the invalid surrogate code, accessed via the str 
interface, no decoding happens, matches in memory the file on disk with 
the byte that translates to the same surrogate, accessed via the bytes 
interface.  Ambiguity.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at v.loewis.de  Tue Apr 28 23:01:14 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 23:01:14 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F7684D.4040904@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>	
	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	
	<49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de>	
	<49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de>	
	<49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de>	
	<49F6933B.7020705@g.nevcal.com>
	<cc93256f0904280601i231c78a2l9a7aa1ecd56ad4c0@mail.gmail.com>
	<49F7684D.4040904@g.nevcal.com>
Message-ID: <49F76E9A.6090701@v.loewis.de>

> Others have made this suggestion, and it is helpful to the PEP, but not
> sufficient.  As implemented as an error handler, I'm not sure that the
> b'\xed\xb3\xbf' sequence would trigger the error handler, if the UTF-8
> decoder is happy with it.  Which, in my testing, it is.

Rest assured that the utf-8b codec will work the way it is specified.

Regards,
Martin

From google at mrabarnett.plus.com  Tue Apr 28 23:01:44 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Tue, 28 Apr 2009 22:01:44 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F76422.4010806@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>	<49F6A71A.3020809@v.loewis.de>	<873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>
	<49F7510D.7070603@mrabarnett.plus.com>
	<49F76422.4010806@g.nevcal.com>
Message-ID: <49F76EB8.4030900@mrabarnett.plus.com>

Glenn Linderman wrote:
> On approximately 4/28/2009 11:55 AM, came the following characters from 
> the keyboard of MRAB:
>> I've been thinking of "python-escape" only in terms of UTF-8, the only
>> encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are
>> decodable.
> 
> 
> UTF-8 is only mentioned in the sense of having special handling for 
> re-encoding; all the other locales/encodings are implicit.  But I also 
> went down that path to some extent.
> 
> 
>> But if you're talking about using it with other encodings, eg
>> shift-jisx0213, then I'd suggest the following:
>>
>> 1. Bytes 0x00 to 0xFF which can't normally be decoded are decoded to
>> half surrogates U+DC00 to U+DCFF.
> 
> 
> This makes 256 different escape codes.
> 
> 
Speaking personally, I won't call them 'escape codes'. I'd use the term
'escape code' to mean a character that changes the interpretation of the
next character(s).

>> 2. Bytes which would have decoded to half surrogates U+DC00 to U+DCFF
>> are treated as though they are undecodable bytes.
> 
> 
> This provides escaping for the 256 different escape codes, which is 
> lacking from the PEP.
> 
> 
>> 3. Half surrogates U+DC00 to U+DCFF which can be produced by decoding
>> are encoded to bytes 0x00 to 0xFF.
> 
> 
> This reverses the escaping.
> 
> 
>> 4. Codepoints, including half surrogates U+DC00 to U+DCFF, which can't
>> be produced by decoding raise an exception.
> 
> 
> This is confusing.  Did you mean "excluding" instead of "including"?
> 
Perhaps I should've said "Any codepoint which can't be produced by
decoding should raise an exception".

For example, decoding with UTF-8b will never produce U+DC00, therefore
attempting to encode U+DC00 should raise an exception and not produce
0x00.

> 
>> I think I've covered all the possibilities. :-)
> 
> 
> You might have.  Seems like there could be a simpler scheme, though...
> 
> 1. Define an escape codepoint.  It could be U+003F or U+DC00 or U+F817 
> or pretty much any defined Unicode codepoint outside the range U+0100 to 
> U+01FF (see rule 3 for why).  Only one escape codepoint is needed, this 
> is easier for humans to comprehend.
> 
> 2. When the escape codepoint is decoded from the byte stream for a bytes 
> interface or found in a str on the str interface, double it.
> 
> 3. When an undecodable byte 0xPQ is found, decode to the escape 
> codepoint, followed by codepoint U+01PQ, where P and Q are hex digits.
> 
> 4. When encoding, a sequence of two escape codepoints would be encoded 
> as one escape codepoint, and a sequence of the escape codepoint followed 
> by codepoint U+01PQ would be encoded as byte 0xPQ.  Escape codepoints 
> not followed by the escape codepoint, or by a codepoint in the range 
> U+0100 to U+01FF would raise an exception.
> 
> 5. Provide functions that will perform the same decoding and encoding as 
> would be done by the system calls, for both bytes and str interfaces.
> 
> 
> This differs from my previous proposal in three ways:
> 
> A. Doesn't put a marker at the beginning of the string (which I said 
> wasn't necessary even then).
> 
> B. Allows for a choice of escape codepoint, the previous proposal 
> suggested a specific one.  But the final solution will only have a 
> single one, not a user choice, but an implementation choice.
> 
> C. Uses the range U+0100 to U+01FF for the escape codes, rather than 
> U+0000 to U+00FF.  This avoids introducing the NULL character and escape 
> characters into the decoded str representation, yet still uses 
> characters for which glyphs are commonly available, are non-combining, 
> and are easily distinguishable one from another.
> 
> Rationale:
> 
> The use of codepoints with visible glyphs makes the escaped string 
> friendlier to display systems, and to people.  I still recommend using 
> U+003F as the escape codepoint, but certainly one with a typcially 
> visible glyph available.  This avoids what I consider to be an annoyance 
> with the PEP, that the codepoints used are not ones that are easily 
> displayed, so endecodable names could easily result in long strings of 
> indistinguishable substitution characters.
> 
Perhaps the escape character should be U+005C. ;-)

> It, like MRAB's proposal, also avoids data puns, which is a major 
> problem with the PEP.  I consider this proposal to be easier to 
> understand than MRAB's proposal, or the PEP, because of the single 
> escape codepoint and the use of visible characters.
> 
> This proposal, like my initial one, also decodes and encodes (just the 
> escape codes) values on the str interfaces.  This is necessary to avoid 
> data puns on systems that provide both types of interfaces.
> 
> This proposal could be used for programs that use str values, and easily 
> migrates to a solution that provides an object that provides an 
> abstraction for system interfaces that have two forms.
> 

From martin at v.loewis.de  Tue Apr 28 23:02:59 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Apr 2009 23:02:59 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F768F3.8080304@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
Message-ID: <49F76F03.8040702@v.loewis.de>

Glenn Linderman wrote:
> On approximately 4/28/2009 1:25 PM, came the following characters from
> the keyboard of Martin v. L?wis:
>>> The UTF-8b representation suffers from the same potential ambiguities as
>>> the PUA characters... 
>>
>> Not at all the same ambiguities. Here, again, the two choices:
>>
>> A. use PUA characters to represent undecodable bytes, in particular for
>>    UTF-8 (the PEP actually never proposed this to happen).
>>    This introduces an ambiguity: two different files in the same
>>    directory may decode to the same string name, if one has the PUA
>>    character, and the other has a non-decodable byte that gets decoded
>>    to the same PUA character.
>>
>> B. use UTF-8b, representing the byte will ill-formed surrogate codes.
>>    The same ambiguity does *NOT* exist. If a file on disk already
>>    contains an invalid surrogate code in its file name, then the UTF-8b
>>    decoder will recognize this as invalid, and decode it byte-for-byte,
>>    into three surrogate codes. Hence, the file names that are different
>>    on disk are also different in memory. No ambiguity.
> 
> C. File on disk with the invalid surrogate code, accessed via the str
> interface, no decoding happens, matches in memory the file on disk with
> the byte that translates to the same surrogate, accessed via the bytes
> interface.  Ambiguity.

Is that an alternative to A and B?

Regards,
Martin

From tmbdev at gmail.com  Wed Apr 29 00:30:42 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Wed, 29 Apr 2009 00:30:42 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F7613C.9000901@v.loewis.de>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> 
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> 
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> 
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> 
	<49F74EE5.6060305@v.loewis.de>
	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> 
	<49F7613C.9000901@v.loewis.de>
Message-ID: <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>

>
> On Windows, the Wide APIs are already used throughout the code base,
>  e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the
> specific API for a specific functionality, please read the source code.
> [...]
>
No, I don't assume that. I assume that all functions are strictly
> available in a Wide character version, and have verified that they are.

The wide APIs use UTF-16.  UTF-16 suffers from the same problem as UTF-8:
not all sequences of words are valid UTF-16 sequences.  In particular,
sequences containing isolated surrogate pairs are not well-formed according
to the Unicode standard.  Therefore, the existence of a wide character API
function does not guarantee that the wide character strings it returns can
be converted into valid unicode strings.  And, in fact, Windows Vista
happily creates files with malformed UTF-16 encodings, and os.listdir()
happily returns them.

> If you can crash Python that way,
> nothing gets worse by this PEP - you can then *already* crash Python
> in that way.

Yes, but AFAIK, Python does not currently have functions that, as part of
correct usage and normal operation, are intended to generate malformed
unicode strings.

Under your proposal, passing the output from a correctly implemented file
system or other OS function to a correctly written library using unicode
strings may crash Python.  In order to avoid that, every library that's
built into Python would have to be checked and updated to deal with both the
Unicode standard and your extension to it.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090429/726af08b/attachment.htm>

From solipsis at pitrou.net  Wed Apr 29 00:46:17 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 28 Apr 2009 22:46:17 +0000 (UTC)
Subject: [Python-Dev] PEP 383 (again)
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>
	<49F74EE5.6060305@v.loewis.de>
	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>
	<49F7613C.9000901@v.loewis.de>
	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>
Message-ID: <loom.20090428T224019-608@post.gmane.org>

Thomas Breuel <tmbdev <at> gmail.com> writes:
> 
> And, in fact, Windows Vista happily creates files with malformed UTF-16
encodings, and os.listdir() happily returns them. 

The PEP won't change that, so what's the problem exactly?

> Under your proposal, passing the output from a correctly implemented file
system or other OS function to a correctly written library using unicode strings
may crash Python.

That's a very dishonest formulation. It cannot crash Python; it can only crash
hypothetical third-party programs or libraries with deficient error checking and
unreasonable assumptions about input data.

(and, of course, you haven't even proven those programs or libraries exist)

Antoine.

From v+python at g.nevcal.com  Wed Apr 29 00:52:22 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Tue, 28 Apr 2009 15:52:22 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F76F03.8040702@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<49F76F03.8040702@v.loewis.de>
Message-ID: <49F788A6.3040702@g.nevcal.com>

On approximately 4/28/2009 2:02 PM, came the following characters from 
the keyboard of Martin v. L?wis:
> Glenn Linderman wrote:
>> On approximately 4/28/2009 1:25 PM, came the following characters from
>> the keyboard of Martin v. L?wis:
>>>> The UTF-8b representation suffers from the same potential ambiguities as
>>>> the PUA characters... 
>>> Not at all the same ambiguities. Here, again, the two choices:
>>>
>>> A. use PUA characters to represent undecodable bytes, in particular for
>>>    UTF-8 (the PEP actually never proposed this to happen).
>>>    This introduces an ambiguity: two different files in the same
>>>    directory may decode to the same string name, if one has the PUA
>>>    character, and the other has a non-decodable byte that gets decoded
>>>    to the same PUA character.
>>>
>>> B. use UTF-8b, representing the byte will ill-formed surrogate codes.
>>>    The same ambiguity does *NOT* exist. If a file on disk already
>>>    contains an invalid surrogate code in its file name, then the UTF-8b
>>>    decoder will recognize this as invalid, and decode it byte-for-byte,
>>>    into three surrogate codes. Hence, the file names that are different
>>>    on disk are also different in memory. No ambiguity.
>> C. File on disk with the invalid surrogate code, accessed via the str
>> interface, no decoding happens, matches in memory the file on disk with
>> the byte that translates to the same surrogate, accessed via the bytes
>> interface.  Ambiguity.
> 
> Is that an alternative to A and B?

I guess it is an adjunct to case B, the current PEP.

It is what happens when using the PEP on a system that provides both 
bytes and str interfaces, and both get used.

On a Windows system, perhaps the ambiguous case would be the use of the 
str API and bytes APIs producing different memory names for the same 
file that contains a (Unicode-illegal) half surrogate.  The 
half-surrogate would seem to get decoded to 3 half surrogates if 
accessed via the bytes interface, but only one via the str interface. 
The version with 3 half surrogates could match another name that 
actually contains 3 half surrogates, that is accessed via the str interface.

I can't actually tell by reading the PEP whether it affects Windows 
bytes interfaces or is only implemented on POSIX, so that POSIX has a 
str interface.

If it is only implemented on POSIX, then the current scheme (now 
escaping the hundreds of escape codes) could work, within a single 
platform... but it would still suffer from displaying garbage (sequences 
of replacement characters) in file listings displayed or printed.  There 
is no way, once the string is adjusted to contain replacement characters 
for display, to distinguish one file name from another, if they are 
identical except for a same-length sequence of different undecodable bytes.

The concept of a function that allows the same decoding and encoding 
process for 3rd party interfaces is still missing from the PEP; 
implementation of the PEP would require that all interfaces to 3rd party 
software that accept file names would have to be transcoded by the 
interface layer.  Or else such software would have to use the bytes 
interfaces directly, and if they do, there is no need for the PEP.

So I see the PEP as a partial solution to a limited problem, that on the 
one hand potentially produces indistinguishable sequences of replacement 
characters in filenames, rather than the mojibake (which is at least 
distinguishable), and on the other hand, doesn't help software that also 
uses 3rd party libraries to avoid the use of bytes APIs for accessing 
file names.  There are other encodings that produce more distinguishable 
mojibake, and would work in the same situations as the PEP.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From cs at zip.com.au  Wed Apr 29 01:06:55 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Wed, 29 Apr 2009 09:06:55 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F6A7C0.6090105@g.nevcal.com>
Message-ID: <20090428230655.GA23830@cskk.homeip.net>

I think I may be able to resolve Glenn's issues with the scheme lower
down (through careful use of definitions and hand waving).

On 27Apr2009 23:52, Glenn Linderman <v+python at g.nevcal.com> wrote:
> On approximately 4/27/2009 7:11 PM, came the following characters from  
> the keyboard of Cameron Simpson:
[...]
>> There may be puns. So what? Use the right strings for the right purpose
>> and all will be well.
>>
>> I think what is missing here, and missing from Martin's PEP, is some
>> utility functions for the os.* namespace.
>>
>> PROPOSAL: add to the PEP the following functions:
>>
>>   os.fsdecode(bytes) -> funny-encoded Unicode
>>     This is what os.listdir() does to produce the strings it hands out.
>>   os.fsencode(funny-string) -> bytes
>>     This is what open(filename,..) does to turn the filename into bytes
>>     for the POSIX open.
>>   os.pathencode(your-string) -> funny-encoded-Unicode
>>     This is what you must do to a de novo string to turn it into a
>>     string suitable for use by open.
>>     Importantly, for most strings not hand crafted to have weird
>>     sequences in them, it is a no-op. But it will recode your puns
>>     for survival.
[...]
>>> So assume a non-decodable sequence in a name.  That puts us into   
>>> Martin's funny-decode scheme.  His funny-decode scheme produces a 
>>> bare  string, indistinguishable from a bare string that would be 
>>> produced by a  str API that happens to contain that same sequence.  
>>> Data puns.
>>>     
>>
>> See my proposal above. Does it address your concerns? A program still
>> must know the providence of the string, and _if_ you're working with
>> non-decodable sequences in a names then you should transmute then into
>> the funny encoding using the os.pathencode() function described above.
>>
>> In this way the punning issue can be avoided.
>> _Lacking_ such a function, your punning concern is valid.
>
> Seems like one would also desire os.pathdecode to do the reverse.

Yes.

> And  
> also versions that take or produce bytes from funny-encoded strings.

Isn't that the first two functions above?

> Then, if programs were re-coded to perform these transformations on what  
> you call de novo strings, then the scheme would work.
> But I think a large part of the incentive for the PEP is to try to  
> invent a scheme that intentionally allows for the puns, so that programs  
> do not need to be recoded in this manner, and yet still work.  I don't  
> think such a scheme exists.

I agree no such scheme exists. I don't think it can, just using strings.

But _unless_ you have made a de novo handcrafted string with
ill-formed sequences in it, you don't need to bother because you
won't _have_ puns. If Martin's using half surrogates to encode
"undecodable" bytes, then no normal string should conflict because a
normal string will contain _only_ Unicode scalar values. Half surrogate
code points are not such.

The advantage here is that unless you've deliberately constructed an
ill-formed unicode string, you _do_not_ need to recode into
funny-encoding, because you are already compatible. Somewhat like one
doesn't need to recode ASCII into UTF-8, because ASCII is unchanged.

> If there is going to be a required transformation from de novo strings  
> to funny-encoded strings, then why not make one that people can actually  
> see and compare and decode from the displayable form, by using  
> displayable characters instead of lone surrogates?

Because that would _not_ be a no-op for well formed Unicode strings.

That reason is sufficient for me.

I consider the fact that well-formed Unicode -> funny-encoded is a no-op
to be an enormous feature of Martin's scheme.

Unless I'm missing something, there _are_no_puns_ between funny-encoded
strings and well formed unicode strings.

>>>> I suppose if your program carefully constructs a unicode string riddled
>>>> with half-surrogates etc and imagines something specific should happen
>>>> to them on the way to being POSIX bytes then you might have a problem...
>>>>       
>>> Right.  Or someone else's program does that.

I've just spent a cosy 20 minutes with my copy of Unicode 5.0 and a
coffee, reading section 3.9 (Unicode Encoding Forms).

I now do not believe your scenario makes sense.

Someone can construct a Python3 string containing code points that
includes surrogates. Granted.

However such a string is not meaningful because it is not well-formed
(D85).  It's ill-formed (D84). It is not sane to expect it to
translate into a POSIX byte sequence, be it UTF-8 or anything else,
unless it is accompanied by some kind of explicit mapping provided by
the programmer.  Absent that mapping, it's nonsense in much the same
way that a non-decodable UTF-8 byte sequence is nonsense.

For example, Martin's funny-encoding is such an explicit mapping.

>>>I only want to use 
>>> Unicode  file names.  But if those other file names exist, I want to 
>>> be able to  access them, and not accidentally get a different file.

But those other names _don't_ exist.

>>>> Also, by avoiding reuse of legitimate characters in the encoding we can
>>>> avoid your issue with losing track of where a string came from;
>>>> legitimate characters are currently untouched by Martin's scheme, except
>>>> for the normal "bytes<->string via the user's locale" translation that
>>>> must already happen, and there you're aided by byets and strings being
>>>> different types.
>>>>       
>>> There are abnormal characters, but there are no illegal characters.   
>>
>> I though half-surrogates were illegal in well formed Unicode. I confess
>> to being weak in this area. By "legitimate" above I meant things like
>> half-surrogates which, like quarks, should not occur alone?
>
> "Illegal" just means violating the accepted rules.

I think that either we've lost track of what each other is saying,
or you're wrong here. And my poor terminology hasn't been helping.

What we've got:

  (1) Byte sequence files names in the POSIX file system.
      It doesn't matter whether the underlying storage is a real POSIX
      filesystem or mostly POSIX one like MacOSX HFS or a remotely
      attached non-POSIX filesystem like a Windows one, because we're
      talking through the POSIX API, and it is handing us byte
      sequences, which will expect may contain anything except a NUL.

  (2) Under Martin's scheme, os.listdir() et al hand us (and accept)
      funny-encoded Python3 strings, which are strings of Unicode code
      units (D77).
      Particularly, if there were bytes in the POSIX byte string that
      did not decode into Unicode scalar values (D76) then each such
      byte is encoded as a surrogate (D71,72,73,74).

      it is important to note here that because surrogates are _not_
      Unicode scalar values, the is no punning between the two sets
      of values.

  (3) Other Python3 strings that have not been through Martin's mangler
      in either direction. Ordinary strings.

Your concern is that, handed a string, a programmer could misuse (3) as
(2) or vice versa because of punning.

In a well-formed unicode string there are no surrogates; surrogates only
occur in UTF-16 _encodings_ of Unicode strings (D75).

Therefore, it _is_ possible to inspect a string, if one cared, to see if
it is funny-encoded or "raw". One may get two different answers:

  - If there are surrogate code units then it must be funny-encoded
    and will therefore work perfectly if handed to a os.* interface.

  - If there are no surrogate code units the it may be funny encoded or it
    may not have been through Martin's funny-encoder, you can't tell.
    However, this doesn't matter because the encoder is a no-op for such
    strings.
    Therefore it will work perfectly if handed to an os.* interface.

The only gap in this is a specially crated string containing surrogate
code points that did not come via Martin's encoder. But such a string
cannot come from a user interface, which will accept only characters
and there only include unicode scalar values.

Such a string can only be explicitly constructed (eg with a \uD802
code point). And if something constructs such a string, it must have in
mind an explicit interpretation of those code points, which means it is
the _constructor_ on whom the burden of translation lies.

Does this make sesne to you, or have you a counter example in mind?

> In this case, the  
> accepted rules are those enforced by the file system (at the bytes or  
> str API levels), and by Python (for the str manipulations).  None of  
> those rules outlaw lone surrogates.  Hence, while all of the systems  
> under discussion can handle all Unicode characters in one way or  
> another, none of them require that all Unicode rules are followed.  Yes,  
> you are correct that lone surrogates are illegal in Unicode.  No, none  
> of the accepted rules for these systems require Unicode.

However, Martin's scheme explicitly translates these ill-formed
sequences into Python3 strings and back, losslessly. You can have
surrogates in the filesystem storage/API on Windows. You can have
non-UTF-8-decodable sequences in the POSIX filesystem layer too.
They're all taken in and handled.

In Python3 space, one might have a bytes object with a raw POSIX
byte filename in it. Presumably one can also have a byte string with a
raw (UTF-16) WIndows filename in it. They're not strings, so no
confusion.

But there's no _string_ for these things without a matching
string<->bytestring mapping associated with it.

If you have a Python3 string which is well-formed Unicode, then you can
hand it to the os.* interfaces and the Right Thing will happen (on
Windows just because it stored Unicode and on POSIX provided you agree
that your locale/getfilesystemencoding() is the right thing).

If you have a string that isn't well-formed, then the meaning of any
code points which are not Unicode scalar values is not well defined
without some auxiliary stuff in the app.

>>> NTFS permits any 16-bit "character" code, including abnormal ones,   
>>> including half-surrogates, and including full surrogate sequences 
>>> that  decode to PUA characters.  POSIX permits all byte sequences, 
>>> including  things that look like UTF-8, things that don't look like 
>>> UTF-8, things  that look like half-surrogates, and things that look 
>>> like full surrogate  sequences that decode to PUA characters.

See above. I think this is addressed.

[...]
>> These are existing file objects, I'll take them as source 1. They get
>> encoded for release by os.listdir() et al.
>>   
>>> And yes, strings can be  generated from scratch.
>>
>> I take this to be source 2.
>
> One variation of source 2 is reading output from other programs, such as  
> ls (POSIX) or dir (Windows).

Sure. But that is reading byte sequences, and one must again know the
encoding. If that is known and the input decoded happily into Unicode
scalar values, then there is no issue. If the input didn't decode, then
one must make some decision about what the non-decodable bits mean.

>> I think I agree with all the discussion that followed, and think the
>> real problem is lack of utlities functions to funny-encode source 2
>> strings for use. hence the proposal above.
>
> I think we understand each other now.  I think your proposal could work,  
> Cameron, although when recoding applications to use your proposal, I'd  
> find it easier to use the "file name object" that others have proposed.   
> I think that because either your proposal or the object proposals  
> require recoding the application, that they will not be accepted.  I  
> think that because the PEP 383 allows data puns, that it should not be  
> accepted in its present form.

I'm of the option now that the puns can only occur when the source 2
string has surrogates, and either those surrogates are chosen to match
the funny-encoding, in which case the pun is not a pun, or the
surrogates are chosen according to a different scheme in which case
source 2 is obliged to provide a mapping.

A source 2 string of only Unicode scalar values doesn't need remapping.

> I think your if your proposal is accepted, that it then becomes possible  
> to use an encoding that uses visible characters, which makes it easier  
> for people to understand and verify.  An encoding such as the one I  
> suggested, but perhaps using a more obscure character, if there is one,  
> but yet doesn't violate true Unicode.

I think any scheme that uses any Unicode scalar value as an escape
character _inherently_ introduces puns, and puns that are easier to
encounter.

I think the real strength of Martin's scheme is exactly that bytes strings
that needed the funny-encoding _do_ produce ill-formed Unicode strings,
because such strings _cannot_ conflict with well-formed strings.

I think your desire for a human readable encoding is valid, but it should
be a further purely "presentation" step, somewhat like quoted-printable
encoding in MIME, and not the scheme used by Martin.

> I think it should transform all  
> data, from str and bytes interfaces, and produce only str values  
> containing conforming Unicode, escaping all the non-conforming sequences  
> in some manner.  This would make the strings truly readable, as long as  
> fonts for all the characters are available.

But I think it would just move the punning. A human readable string with
readable escapes in it may be funny-encoded. _Or_ it may be "raw", with
funny-encoded yet to happen; after all only might weirdly be dealing
with a filename which contained post-funny-encode visible sequences in
it.

SO you're right back to _guessing_ what you're looking at.

WIth the surrogate scheme you only have to guess if there are surrogates,
but then you _know_ that you're dealing with a special encoding scheme;
it is certain - the guess is about which scheme.

If you're working in a domain with no ill-formed strings you never need
to worry at all.

With a visible/printable-encoding such as you advocate the guess is about
whether the scheme have even been used, which is why I think it is worse.

> And I had already suggested  
> the utility functions you are suggesting, actually, in my first tirade  
> against PEP 383 (search for "The encode and decode functions should be  
> available for coders to use, that code to external
> interfaces, either OS or 3rd party packages, that do not use this  
> encoding scheme").

I must have missed that sentence. But it sounds like we want the same
facilities at least.

> The solution that was proposed in the lead up to releasing Python 3.0  
> was to offer both bytes and str interfaces (so we have those), and then  
> for those that want to have a single portable implementation that can  
> access all data, an object that encapsulates the differences, and the  
> variant system APIs.  (file system is one, command line is another,  
> environment is another, I'm not sure if there are more.)  I haven't  
> heard if any progress on such an encapsulating object has been made; the  
> people that proposed such have been rather quiet about this PEP.  I  
> would expect that an object implementation would provide display  
> strings, and APIs to submit de novo str and bytes values to an object,  
> which would run the appropriate encoding on them.

I think covering these other cases is quite messy, if only because
there's not even agreement amonst existing command line apps about all
that stuff.

Regarding "APIs to submit de novo str and bytes values to an object,  
which would run the appropriate encoding on them" I think such a
facility for de novo strings must require the caller to provide a
handler/mapper for the not-well-formed parts of such strings if they
occur.

> Programs that want to use str interfaces on POSIX will see a subset of  
> files on systems that contain files whose bytes filenames are not  
> decodable.

Not under Martin's scheme, because all bytes filenames _are_ decoded.

> If a sysadmin wants to standardize on UTF-8 names  
> universally, they can use something like convmv to clean up existing  
> file names that don't conform.  Programs that use str interfaces on  
> POSIX system will work fine, but with a subset of the files.  When that  
> is unacceptable, they can either be recoded to use the bytes interfaces,  
> or the hopefully forthcoming object encapsulation.  The issue then will  
> be what technique will be used to transform bytes into display names,  
> but since the display names would never be fed back to the objects  
> directly (but the object would have an interface to accept de novo str  
> and de novo bytes) then it is just a display issue, and one that uses  
> visible characters would seem more useful in my mind, than one that uses  
> half-surrogates or PUAs.

I agree it might be handy to have a display function, but isn't repr()
exactly that, now I think of it?

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

"waste cycles drawing trendy 3D junk"   - Mac Eudora v3 config option

From v+python at g.nevcal.com  Wed Apr 29 01:02:13 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Tue, 28 Apr 2009 16:02:13 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F76EB8.4030900@mrabarnett.plus.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>	<49F6A71A.3020809@v.loewis.de>	<873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>	<49F7510D.7070603@mrabarnett.plus.com>	<49F76422.4010806@g.nevcal.com>
	<49F76EB8.4030900@mrabarnett.plus.com>
Message-ID: <49F78AF5.3080406@g.nevcal.com>

On approximately 4/28/2009 2:01 PM, came the following characters from 
the keyboard of MRAB:
> Glenn Linderman wrote:
>> On approximately 4/28/2009 11:55 AM, came the following characters 
>> from the keyboard of MRAB:
>>> I've been thinking of "python-escape" only in terms of UTF-8, the only
>>> encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are
>>> decodable.
>>
>>
>> UTF-8 is only mentioned in the sense of having special handling for 
>> re-encoding; all the other locales/encodings are implicit.  But I also 
>> went down that path to some extent.
>>
>>
>>> But if you're talking about using it with other encodings, eg
>>> shift-jisx0213, then I'd suggest the following:
>>>
>>> 1. Bytes 0x00 to 0xFF which can't normally be decoded are decoded to
>>> half surrogates U+DC00 to U+DCFF.
>>
>>
>> This makes 256 different escape codes.
>>
>>
> Speaking personally, I won't call them 'escape codes'. I'd use the term
> 'escape code' to mean a character that changes the interpretation of the
> next character(s).

OK, I won't be offended if you don't call them 'escape codes'. :)  But 
what else to call them?

My use of that term is a bit backwards, perhaps... what happens is that 
because these 256 half surrogates are used to decode otherwise 
undecodable bytes, they themselves must be "escaped" or translated into 
something different, when they appear in the byte sequence.  The process 
  described reserves a set of codepoints for use, and requires that that 
same set of codepoints be translated using a similar mechanism to avoid 
their untranslated appearance in the resulting str.  Escape codes have 
the same sort of characteristic... by replacing their normal use for 
some other use, they must themselves have a replacement.

Anyway, I think we are communicating successfully.

>>> 2. Bytes which would have decoded to half surrogates U+DC00 to U+DCFF
>>> are treated as though they are undecodable bytes.
>>
>>
>> This provides escaping for the 256 different escape codes, which is 
>> lacking from the PEP.
>>
>>
>>> 3. Half surrogates U+DC00 to U+DCFF which can be produced by decoding
>>> are encoded to bytes 0x00 to 0xFF.
>>
>>
>> This reverses the escaping.
>>
>>
>>> 4. Codepoints, including half surrogates U+DC00 to U+DCFF, which can't
>>> be produced by decoding raise an exception.
>>
>>
>> This is confusing.  Did you mean "excluding" instead of "including"?
>>
> Perhaps I should've said "Any codepoint which can't be produced by
> decoding should raise an exception".

Yes, your rephrasing is clearer, regarding your intention.

> For example, decoding with UTF-8b will never produce U+DC00, therefore
> attempting to encode U+DC00 should raise an exception and not produce
> 0x00.

Decoding with UTF-8b might never produce U+DC00, but then again, it 
won't handle the random byte string, either.

>>> I think I've covered all the possibilities. :-)
>>
>>
>> You might have.  Seems like there could be a simpler scheme, though...
>>
>> 1. Define an escape codepoint.  It could be U+003F or U+DC00 or U+F817 
>> or pretty much any defined Unicode codepoint outside the range U+0100 
>> to U+01FF (see rule 3 for why).  Only one escape codepoint is needed, 
>> this is easier for humans to comprehend.
>>
>> 2. When the escape codepoint is decoded from the byte stream for a 
>> bytes interface or found in a str on the str interface, double it.
>>
>> 3. When an undecodable byte 0xPQ is found, decode to the escape 
>> codepoint, followed by codepoint U+01PQ, where P and Q are hex digits.
>>
>> 4. When encoding, a sequence of two escape codepoints would be encoded 
>> as one escape codepoint, and a sequence of the escape codepoint 
>> followed by codepoint U+01PQ would be encoded as byte 0xPQ.  Escape 
>> codepoints not followed by the escape codepoint, or by a codepoint in 
>> the range U+0100 to U+01FF would raise an exception.
>>
>> 5. Provide functions that will perform the same decoding and encoding 
>> as would be done by the system calls, for both bytes and str interfaces.
>>
>>
>> This differs from my previous proposal in three ways:
>>
>> A. Doesn't put a marker at the beginning of the string (which I said 
>> wasn't necessary even then).
>>
>> B. Allows for a choice of escape codepoint, the previous proposal 
>> suggested a specific one.  But the final solution will only have a 
>> single one, not a user choice, but an implementation choice.
>>
>> C. Uses the range U+0100 to U+01FF for the escape codes, rather than 
>> U+0000 to U+00FF.  This avoids introducing the NULL character and 
>> escape characters into the decoded str representation, yet still uses 
>> characters for which glyphs are commonly available, are non-combining, 
>> and are easily distinguishable one from another.
>>
>> Rationale:
>>
>> The use of codepoints with visible glyphs makes the escaped string 
>> friendlier to display systems, and to people.  I still recommend using 
>> U+003F as the escape codepoint, but certainly one with a typcially 
>> visible glyph available.  This avoids what I consider to be an 
>> annoyance with the PEP, that the codepoints used are not ones that are 
>> easily displayed, so endecodable names could easily result in long 
>> strings of indistinguishable substitution characters.
>>
> Perhaps the escape character should be U+005C. ;-)

Windows users everywhere would love you for that one :)

>> It, like MRAB's proposal, also avoids data puns, which is a major 
>> problem with the PEP.  I consider this proposal to be easier to 
>> understand than MRAB's proposal, or the PEP, because of the single 
>> escape codepoint and the use of visible characters.
>>
>> This proposal, like my initial one, also decodes and encodes (just the 
>> escape codes) values on the str interfaces.  This is necessary to 
>> avoid data puns on systems that provide both types of interfaces.
>>
>> This proposal could be used for programs that use str values, and 
>> easily migrates to a solution that provides an object that provides an 
>> abstraction for system interfaces that have two forms.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From a.badger at gmail.com  Wed Apr 29 04:09:42 2009
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Tue, 28 Apr 2009 19:09:42 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <F1A4F1B6-94BC-4588-A328-0737288BC58B@zooko.com>
References: <20090427211447.GA4291@cskk.homeip.net>	<49F658A5.7080807@g.nevcal.com>	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>	<loom.20090428T114723-520@post.gmane.org>	<15546941.1861678.1240922288709.JavaMail.xicrypt@atgrzls001>	<49F6FA93.7080302@avl.com>
	<F1A4F1B6-94BC-4588-A328-0737288BC58B@zooko.com>
Message-ID: <49F7B6E6.20808@gmail.com>

Zooko O'Whielacronx wrote:
> On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote:
>> If you switch to iso8859-15 only in the presence of undecodable UTF-8,
>> then you have the same round-trip problem as the PEP: both b'\xff' and
>> b'\xc3\xbf' will be converted to u'\u00ff' without a way to
>> unambiguously recover the original file name.
> 
> Why do you say that?  It seems to work as I expected here:
> 
>>>> '\xff'.decode('iso-8859-15')
> u'\xff'
>>>> '\xc3\xbf'.decode('iso-8859-15')
> u'\xc3\xbf'
>>>>
>>>>
>>>>
>>>> '\xff'.decode('cp1252')
> u'\xff'
>>>> '\xc3\xbf'.decode('cp1252')
> u'\xc3\xbf'
> 

You're not showing that this is a fallback path.  What won't work is
first trying a local encoding (in the following example, utf-8) and then
if that doesn't work, trying a one-byte encoding like iso8859-15:

try:
    file1 = '\xff'.decode('utf-8')
except UnicodeDecodeError:
    file1 = '\xff'.decode('iso-8859-15')
print repr(file1)

try:
    file2 = '\xc3\xbf'.decode('utf-8')
except UnicodeDecodeError:
    file2 = '\xc3\xbf'.decode('iso-8859-15')
print repr(file2)

That prints:
  u'\xff'
  u'\xff'

The two encodings can map different bytes to the same unicode code point
 so you can't do this type of thing without recording what encoding was
used in the translation.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/972668d1/attachment.pgp>

From cs at zip.com.au  Wed Apr 29 04:33:53 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Wed, 29 Apr 2009 12:33:53 +1000
Subject: [Python-Dev] Python-Dev PEP 383: Non-decodable Bytes in
	System	Character?Interfaces
In-Reply-To: <loom.20090428T114723-520@post.gmane.org>
Message-ID: <20090429023353.GA11210@cskk.homeip.net>

On 28Apr2009 11:49, Antoine Pitrou <solipsis at pitrou.net> wrote:
| Paul Moore <p.f.moore <at> gmail.com> writes:
| > 
| > I've yet to hear anyone claim that they would have an actual problem
| > with a specific piece of code they have written.
| 
| Yep, that's the problem. Lots of theoretical problems noone has ever encountered
| brought up against a PEP which resolves some actual problems people encounter on
| a regular basis.
| 
| For the record, I'm +1 on the PEP being accepted and implemented as soon as
| possible (preferably before 3.1).

I am also +1 on this.

I would like utility functions to perform:
  os-bytes->funny-encoded
  funny-encoded->os-bytes
or explicit example code snippets for same in the PEP text.
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

This person is currently undergoing electric shock therapy at Agnews
Developmental Center in San Jose, California. All his opinions are static,
please ignore him.  Thank you,  Nurse Ratched
- the sig quote of Bob "Another beer, please" Christ <bhatch at netcom.com>

From rdmurray at bitdance.com  Wed Apr 29 04:40:04 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Tue, 28 Apr 2009 22:40:04 -0400 (EDT)
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F768F3.8080304@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> 
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> 
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
	<p04330106c61b9f2aad0a@192.168.123.162>
	<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>
	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
Message-ID: <Pine.LNX.4.64.0904282238280.1740@kimball.webabinitio.net>

On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
> C. File on disk with the invalid surrogate code, accessed via the str 
> interface, no decoding happens, matches in memory the file on disk with the 
> byte that translates to the same surrogate, accessed via the bytes interface. 
> Ambiguity.

Unless I'm missing something, one of these is type str, and the other is 
type bytes, so no ambiguity.

--David

From cs at zip.com.au  Wed Apr 29 04:40:26 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Wed, 29 Apr 2009 12:40:26 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <7e51d15d0904280537n22168cfl16c58f727be1755e@mail.gmail.com>
Message-ID: <20090429024026.GA15177@cskk.homeip.net>

On 28Apr2009 14:37, Thomas Breuel <tmbdev at gmail.com> wrote:
| But the biggest problem with the proposal is that it isn't needed: if you
| want to be able to turn arbitrary byte sequences into unicode strings and
| back, just set your encoding to iso8859-15.  That already works and it
| doesn't require any changes.

No it doesn't. It does transcode without throwing exceptions. On POSIX.
(On Windows? I doubt it - windows isn't using an 8-bit scheme. I
believe.) But it utter destorys any hope of working in any other locale
nicely. The PEP lets you work losslessly in other locales.

It _may_ require some app care for particular very weird strings
that don't come from the filesystem, but as far as I can see only in
circumstances where such care would be needed anyway i.e. you've got to
do special stuff for weirdness in the first place. Weird == "ill-formed
unicode string" here.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

I just kept it wide-open thinking it would correct itself.
Then I ran out of talent.       - C. Fittipaldi

From a.badger at gmail.com  Wed Apr 29 04:39:20 2009
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Tue, 28 Apr 2009 19:39:20 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F73635.6010105@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de>
Message-ID: <49F7BDD8.3010202@gmail.com>

Martin v. L?wis wrote:
>> Since the serialization of the Unicode string is likely to use UTF-8,
>> and the string for  such a file will include half surrogates, the
>> application may raise an exception when encoding the names for a
>> configuration file. These encoding exceptions will be as rare as the
>> unusual names (which the careful I18N aware developer has probably
>> eradicated from his system), and thus will appear late.
> 
> There are trade-offs to any solution; if there was a solution without
> trade-offs, it would be implemented already.
> 
> The Python UTF-8 codec will happily encode half-surrogates; people argue
> that it is a bug that it does so, however, it would help in this
> specific case.

Can we use this encoding scheme for writing into files as well?  We've
turned the filename with undecodable bytes into a string with half
surrogates.  Putting that string into a file has to turn them into bytes
at some level.  Can we use the python-escape error handler to achieve
that somehow?

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/18c8fa55/attachment.pgp>

From larry at hastings.org  Wed Apr 29 05:03:24 2009
From: larry at hastings.org (Larry Hastings)
Date: Tue, 28 Apr 2009 20:03:24 -0700
Subject: [Python-Dev] Proposed: a new function-based C API for declaring
	Python types
Message-ID: <49F7C37C.5090305@hastings.org>

EXECUTIVE SUMMARY

I've written a patch against py3k trunk creating a new function-based
API for creating  extension types in C.  This allows PyTypeObject to
become a (mostly) private structure.

THE PROBLEM

Here's how you create an extension type using the current API.

  * First, find some code that already has a working type declaration.
    Copy and paste their fifty-line PyTypeObject declaration, then
    hack it up until it looks like what you need.

  * Next--hey!  There *is* no next, you're done.  You can immediately
    create an object using your type and pass it into the Python
    interpreter and it would work fine.  You are encouraged to call
    PyType_Ready(), but this isn't required and it's often skipped.

This approach causes two problems.

  1) The Python interpreter *must support* and *cannot change*
     the PyTypeObject structure, forever.  Any meaningful change to
     the structure will break every extension.   This has many
     consequences:
       a) Fields that are no longer used must be left in place,
          forever, as ignored placeholders if need be.  Py3k cleaned
      up a lot of these, but it's already picked up a new one
      ("tp_compare" is now "tp_reserved").
       b) Internal implementation details of the type system must
          be public.
       c) The interpreter can't even use a different structure
          internally, because extensions are free to pass in objects
      using PyTypeObjects the interpreter has never seen before.

  2) As a programming interface this lacks a certain gentility.  It
     clearly *works*, but it requires programmers to copy and paste
     with a large structure mostly containing NULLs, which they must
     pick carefully through to change just a few fields.

THE SOLUTION

My patch creates a new function-based extension type definition API.
You create a type by calling PyType_New(), then call various accessor
functions on the type (PyType_SetString and the like), and when your
type has been completely populated you must call PyType_Activate()
to enable it for use.

With this API available, extension authors no longer need to directly
see the innards of the PyTypeObject structure.  Well, most of the
fields anyway.  There are a few shortcut macros in CPython that need
to continue working for performance reasons, so the "tp_flags" and
"tp_dealloc" fields need to remain publically visible.

One feature worth mentioning is that the API is type-safe.  Many such
APIs would have had one generic "PyType_SetPointer", taking an
identifier for the field and a void * for its value, but this would
have lost type safety.  Another approach would have been to have one
accessor per field ("PyType_SetAddFunction"), but this would have
exploded the number of functions in the API.  My API splits the
difference: each distinct *type* has its own set of accessors
("PyType_GetSSizeT") which takes an identifier specifying which
field you wish to get or set.

SIDE-EFFECTS OF THE API

The major change resulting from this API: all PyTypeObjects must now
be *pointers* rather than static instances.  For example, the external
declaration of PyType_Type itself changes from this:
    PyAPI_DATA(PyTypeObject) PyType_Type;
to this:
    PyAPI_DATA(PyTypeObject *) PyType_Type;

This gives rise to the first headache caused by the API: type casts
on type objects.  It took me a day and a half to realize that this,
from Modules/_weakref.c:
        PyModule_AddObject(m, "ref",
                           (PyObject *) &_PyWeakref_RefType);
really needed to be this:
        PyModule_AddObject(m, "ref",
                           (PyObject *) _PyWeakref_RefType);

Hopefully I've already found most of these in CPython itself, but
this sort of code surely lurks in extensions yet to be touched.

(Pro-tip: if you're working with this patch, and you see a crash,
and gdb shows you something like this at the top of the stack:
    #0  0x081056d8 in visit_decref (op=0x8247aa0, data=0x0)
                   at Modules/gcmodule.c:323
    323             if (PyObject_IS_GC(op)) {
your problem is an errant &, likely on a type object you're passing
in to the interpreter.  Think--what did you touch recently?  Or debug
it by salting your code with calls to collect(NUM_GENERATIONS-1).)

Another irksome side-effect of the API: because of "tp_flags" and
"tp_dealloc", I now have two declarations of PyTypeObject.  There's
the externally-visible one in Include/object.h, which lets external
parties see "tp_dealloc" and "tp_flags".  Then there's the internal
one in Objects/typeprivate.h which is the real structure.  Since
declaring a type twice is a no-no, the external one is gated on
    #ifndef PY_TYPEPRIVATE
If you're a normal Python extension programmer, you'd include Python.h
as normal:
    #include "Python.h"
Python implementation files that need to see the real PyTypeObject
structure now look like this:
    #define PY_TYPEPRIVATE
    #include "Python.h"
    #include "../Objects/typeprivate.h"

Also, since the structure of PyTypeObject hasn't yet changed, there
are a bunch of fields in PyTypeObject that are externally visible that
I don't want to be visible.  To ensure no one was using them, I renamed
them to "mysterious_object_0" and "mysterious_object_1" and the like.
Before this patch gets accepted, I want to reorder the fields in
PyTypeObject (which we can! because it's private!) so that these public
fields are at the top of the both the external and internal structures.

THE UPGRADE PATH

Python internally declares a great many types, and I haven't attempted
to convert them all.  Instead there's an conversion header file that
does most of the work for you.  Here's how one would apply it to an
existing type.

1. Where your file currently has this:
    #include "Python.h"
   change it to this:
    #define PY_TYPEPRIVATE
    #include "Python.h"
    #include "pytypeconvert.h"

2. Whenever you declare a type, change it from this:
    static PyTypeObject YourExtension_Type = {
   to this:
    static PyTypeObject *YourExtension_Type;
    static PyTypeObject _YourExtension_Type = {

   Use NULL for your metaclass.  For example, change this:
    PyObject_HEAD_INIT(&PyType_Type),
   to this:
    PyObject_HEAD_INIT(NULL),

   Also use NULL for your baseclass.  For example, change this:
    &PyDict_Type, /* tp_base */
   to this:
    NULL, /* tp_base */
   setting it to NULL instead.

3. In your module's init function, add this:
    CONVERT_TYPE(YourExtension_Type,
        metaclass, baseclass, "description of type");
   "metaclass" and "baseclass" should be the metaclass and baseclass
   for your type, the ones you just set to NULL in step 3.  If you
   had NULL before the baseclass, use NULL here too.

4. If you have any static object declarations, set their ob_type to
   NULL in the static declaration, then set it explicitly in your
   init function.  If your object uses a locally-defined type,
   be sure to do this *after* the CONVERT_TYPE line for that type.
   (See _Py_EllipsisObject for an example.)

5. Anywhere you're using existing Python type declarations
   you must remove the & from the front.

The conversion header file *also* redefines PyTypeObject.  But this
time it redefines it to the existing definition, and that definition
will stay the same forever.  That's the whole point: if you have an
existing Python 3.0 extension, it won't have to change if we change
the internal definition of PyTypeObject.

(Why bother with this conversion process, with few py3k extensions
in the wild?  This patch was started quite a while ago, when it
seemed plausible the API would get backported to 2.x.  Now I'm not
so sure that will happen.)

THE CURRENT PATCH

I've uploaded a patch to the tracker:
    http://bugs.python.org/issue5872
It applies cleanly to py3k/trunk (r72081).  But the code is awfully
grubby.

 * I haven't dealt with any types I can't build, and I can't build
   a lot of the extensions.  I'm using Linux, and I don't have the
   dev headers for many libraries on my laptop, and I haven't touched
   Windows or Mac stuff.

 * I created some new build warnings which should obviously be fixed.

 * With the patch installed, py3k trunk builds and installs.  It does
   *not* pass the regression test suite.  (It crashes.)  I don't think
   this'll be too bad, it's just taken me this long to get it as far
   as I have.

 * There are some internal scaffolds and hacks that should be purged
   by the final patch.

 * There's no documentation.  If you'd like to see how you'd use the
   new API, currently the best way to learn is to read
   Include/pytypeconvert.h.

 * I don't like the PY_TYPEPRIVATE hack.  I only used it 'cause it
   sucks less than the other approaches I've thought of.  I welcome
   your suggestions.

   The second-best approach I've come up with: make PyTypeObject
   genuinely private, and declare a different structure containing just
   the head of PyTypeObject.   Let's call it PyTypeObjectHead.  Then,
   for the convenience macros that use "dealloc" and "flags", cast the
   object to PyTypeObjectHead before dereferencing.  This abandons type
   safety, and given my longing for type safety while developing this
   patch I'd prefer to not make loss of type safety an official API.

THE FEEDBACK I SEEK

My understanding is that the feature-freeze for Python 3.1 is in a
little over a week.  Given the current stability level and untestedness
of the patch, and the lateness of the hour... is there any chance this
would be accepted into Python 3.1?  If so, I'll need to act fast.  If
not, I might as well take it relax, huh.

My thanks to Neal Norwitz for suggesting this project, and Brett Cannon
for some recent encouragement.  (And another person who I discussed it
with so long ago I forgot who it was... maybe Fredik Lundh?)

/larry/

From cs at zip.com.au  Wed Apr 29 05:27:40 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Wed, 29 Apr 2009 13:27:40 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F768F3.8080304@g.nevcal.com>
Message-ID: <20090429032740.GA31335@cskk.homeip.net>

On 28Apr2009 13:37, Glenn Linderman <v+python at g.nevcal.com> wrote:
> On approximately 4/28/2009 1:25 PM, came the following characters from  
> the keyboard of Martin v. L?wis:
>>> The UTF-8b representation suffers from the same potential ambiguities as
>>> the PUA characters... 
>>
>> Not at all the same ambiguities. Here, again, the two choices:
>>
>> A. use PUA characters to represent undecodable bytes, in particular for
>>    UTF-8 (the PEP actually never proposed this to happen).
>>    This introduces an ambiguity: two different files in the same
>>    directory may decode to the same string name, if one has the PUA
>>    character, and the other has a non-decodable byte that gets decoded
>>    to the same PUA character.
>>
>> B. use UTF-8b, representing the byte will ill-formed surrogate codes.
>>    The same ambiguity does *NOT* exist. If a file on disk already
>>    contains an invalid surrogate code in its file name, then the UTF-8b
>>    decoder will recognize this as invalid, and decode it byte-for-byte,
>>    into three surrogate codes. Hence, the file names that are different
>>    on disk are also different in memory. No ambiguity.
>
> C. File on disk with the invalid surrogate code, accessed via the str  
> interface, no decoding happens, matches in memory the file on disk with  
> the byte that translates to the same surrogate, accessed via the bytes  
> interface.  Ambiguity.

Is this a Windows example, or (now I think on it) an equivalent POSIX example
of using the PEP where the locale encoding is UTF-16?

In either case, I would say one could make an argument for being stricter
in reading in OS-native sequences. Grant that NTFS doesn't prevent
half-surrogates in filenames, and likewise that POSIX won't because to
the OS they're just bytes. On decoding, require well-formed data. When
you hit ill-formed data, treat the nasty half surrogate as a PAIR of
bytes to be escaped in the resulting decode.

Ambiguity avoided.

I'm more concerned with your (yours? someone else's?) mention of shift
characters. I'm unfamiliar with these encodings: to translate such a
thing into a Latin example, is it the case that there are schemes with
valid encodings that look like:

  [SHIFT] a b c

which would produce "ABC" in unicode, which is ambiguous with:

  A B C

which would also produce "ABC"?

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Helicopters are considerably more expensive [than fixed wing aircraft],
which is only right because they don't actually fly, but just beat
the air into submission.        - Paul Tomblin

From v+python at g.nevcal.com  Wed Apr 29 05:29:16 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Tue, 28 Apr 2009 20:29:16 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <Pine.LNX.4.64.0904282238280.1740@kimball.webabinitio.net>
References: <49EEBE2E.3090601@v.loewis.de>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
	<p04330106c61b9f2aad0a@192.168.123.162>
	<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>
	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<Pine.LNX.4.64.0904282238280.1740@kimball.webabinitio.net>
Message-ID: <49F7C98C.60406@g.nevcal.com>

On approximately 4/28/2009 7:40 PM, came the following characters from 
the keyboard of R. David Murray:
> On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
>> C. File on disk with the invalid surrogate code, accessed via the str 
>> interface, no decoding happens, matches in memory the file on disk 
>> with the byte that translates to the same surrogate, accessed via the 
>> bytes interface. Ambiguity.
> 
> Unless I'm missing something, one of these is type str, and the other is 
> type bytes, so no ambiguity.

You are missing that the bytes value would get decoded to a str; thus 
both are str; so ambiguity is possible.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at g.nevcal.com  Wed Apr 29 05:32:15 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Tue, 28 Apr 2009 20:32:15 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090428230655.GA23830@cskk.homeip.net>
References: <20090428230655.GA23830@cskk.homeip.net>
Message-ID: <49F7CA3F.6000206@g.nevcal.com>

On approximately 4/28/2009 4:06 PM, came the following characters from 
the keyboard of Cameron Simpson:
> I think I may be able to resolve Glenn's issues with the scheme lower
> down (through careful use of definitions and hand waving).
>   

Close.  You at least resolved what you thought my issue was.  And, you 
did make me more comfortable with the idea that I, in programs I write, 
would not be adversely affected by the PEP if implemented.  While I can 
see that the PEP no doubt solves the os.listdir / open problem on POSIX 
systems for Python 3 + PEP programs that don't use 3rd party libraries, 
it does require programs that do use 3rd party libraries to be recoded 
with your functions -- which so far the PEP hasn't embraced.  Or, to use 
the bytes APIs directly to get file names for 3rd party libraries -- but 
the directly ported, filenames-as-strings type of applications that 
could call 3rd party filenames-as-bytes libraries in 2.x must be tweaked 
to do something different than they did before.

> On 27Apr2009 23:52, Glenn Linderman <v+python at g.nevcal.com> wrote:
>   
>> On approximately 4/27/2009 7:11 PM, came the following characters from  
>> the keyboard of Cameron Simpson:
>>     
> [...]
>   
>>> There may be puns. So what? Use the right strings for the right purpose
>>> and all will be well.
>>>
>>> I think what is missing here, and missing from Martin's PEP, is some
>>> utility functions for the os.* namespace.
>>>
>>> PROPOSAL: add to the PEP the following functions:
>>>
>>>   os.fsdecode(bytes) -> funny-encoded Unicode
>>>     This is what os.listdir() does to produce the strings it hands out.
>>>   os.fsencode(funny-string) -> bytes
>>>     This is what open(filename,..) does to turn the filename into bytes
>>>     for the POSIX open.
>>>   os.pathencode(your-string) -> funny-encoded-Unicode
>>>     This is what you must do to a de novo string to turn it into a
>>>     string suitable for use by open.
>>>     Importantly, for most strings not hand crafted to have weird
>>>     sequences in them, it is a no-op. But it will recode your puns
>>>     for survival.
>>>       
> [...]
>   
>>>> So assume a non-decodable sequence in a name.  That puts us into   
>>>> Martin's funny-decode scheme.  His funny-decode scheme produces a 
>>>> bare  string, indistinguishable from a bare string that would be 
>>>> produced by a  str API that happens to contain that same sequence.  
>>>> Data puns.
>>>>     
>>>>         
>>> See my proposal above. Does it address your concerns? A program still
>>> must know the providence of the string, and _if_ you're working with
>>> non-decodable sequences in a names then you should transmute then into
>>> the funny encoding using the os.pathencode() function described above.
>>>
>>> In this way the punning issue can be avoided.
>>> _Lacking_ such a function, your punning concern is valid.
>>>       
>> Seems like one would also desire os.pathdecode to do the reverse.
>>     
>
> Yes.
>
>   
>> And  
>> also versions that take or produce bytes from funny-encoded strings.
>>     
>
> Isn't that the first two functions above?
>   

Yes, sorry.

>> Then, if programs were re-coded to perform these transformations on what  
>> you call de novo strings, then the scheme would work.
>> But I think a large part of the incentive for the PEP is to try to  
>> invent a scheme that intentionally allows for the puns, so that programs  
>> do not need to be recoded in this manner, and yet still work.  I don't  
>> think such a scheme exists.
>>     
>
> I agree no such scheme exists. I don't think it can, just using strings.
>
> But _unless_ you have made a de novo handcrafted string with
> ill-formed sequences in it, you don't need to bother because you
> won't _have_ puns. If Martin's using half surrogates to encode
> "undecodable" bytes, then no normal string should conflict because a
> normal string will contain _only_ Unicode scalar values. Half surrogate
> code points are not such.
>
> The advantage here is that unless you've deliberately constructed an
> ill-formed unicode string, you _do_not_ need to recode into
> funny-encoding, because you are already compatible. Somewhat like one
> doesn't need to recode ASCII into UTF-8, because ASCII is unchanged.
>   

Right.  And I don't intend to generate ill-formed Unicode strings, in my 
programs.  But I might well read their names from other sources.

It is nice, and thank you for emphasizing (although I already did 
realize it, back there in the far reaches of the brain) that all the 
data puns are between ill-formed Unicode strings, and undecodable bytes 
strings.  That is a nice property of the PEP's encoding/decoding 
method.  I'm not sure it outweighs the disadvantage of taking unreadable 
gibberish, and producing indecipherable gibberish (codepoints with no 
glyphs), though, when there are ways to produce decipherable gibberish 
instead... or at least mostly-decipherable gibberish.  Another idea 
forms.... described below.

>> If there is going to be a required transformation from de novo strings  
>> to funny-encoded strings, then why not make one that people can actually  
>> see and compare and decode from the displayable form, by using  
>> displayable characters instead of lone surrogates?
>>     
>
> Because that would _not_ be a no-op for well formed Unicode strings.
>
> That reason is sufficient for me.
>
> I consider the fact that well-formed Unicode -> funny-encoded is a no-op
> to be an enormous feature of Martin's scheme.
>
> Unless I'm missing something, there _are_no_puns_ between funny-encoded
> strings and well formed unicode strings.
>   

I think you are correct regarding where the puns are.  I agree that not 
perturbing well-formed Unicode is a benefit.

>>>>> I suppose if your program carefully constructs a unicode string riddled
>>>>> with half-surrogates etc and imagines something specific should happen
>>>>> to them on the way to being POSIX bytes then you might have a problem...
>>>>>       
>>>>>           
>>>> Right.  Or someone else's program does that.
>>>>         
>
> I've just spent a cosy 20 minutes with my copy of Unicode 5.0 and a
> coffee, reading section 3.9 (Unicode Encoding Forms).
>
> I now do not believe your scenario makes sense.
>
> Someone can construct a Python3 string containing code points that
> includes surrogates. Granted.
>
> However such a string is not meaningful because it is not well-formed
> (D85).  It's ill-formed (D84). It is not sane to expect it to
> translate into a POSIX byte sequence, be it UTF-8 or anything else,
> unless it is accompanied by some kind of explicit mapping provided by
> the programmer.  Absent that mapping, it's nonsense in much the same
> way that a non-decodable UTF-8 byte sequence is nonsense.
>
> For example, Martin's funny-encoding is such an explicit mapping.
>   

Such a string can be meaningful if it is used as a file name... it is 
the name of the file.  I will agree that it would not be a word in any 
language, because it is composed of things that are not characters / 
codepoints, if that is what you meant.

>>>> I only want to use 
>>>> Unicode  file names.  But if those other file names exist, I want to 
>>>> be able to  access them, and not accidentally get a different file.
>>>>         
>
> But those other names _don't_ exist.
>   

They do if someone constructs them.

>>>>> Also, by avoiding reuse of legitimate characters in the encoding we can
>>>>> avoid your issue with losing track of where a string came from;
>>>>> legitimate characters are currently untouched by Martin's scheme, except
>>>>> for the normal "bytes<->string via the user's locale" translation that
>>>>> must already happen, and there you're aided by byets and strings being
>>>>> different types.
>>>>>       
>>>>>           
>>>> There are abnormal characters, but there are no illegal characters.   
>>>>         
>>> I though half-surrogates were illegal in well formed Unicode. I confess
>>> to being weak in this area. By "legitimate" above I meant things like
>>> half-surrogates which, like quarks, should not occur alone?
>>>       
>> "Illegal" just means violating the accepted rules.
>>     
>
> I think that either we've lost track of what each other is saying,
> or you're wrong here. And my poor terminology hasn't been helping.
>
> What we've got:
>
>   (1) Byte sequence files names in the POSIX file system.
>       It doesn't matter whether the underlying storage is a real POSIX
>       filesystem or mostly POSIX one like MacOSX HFS or a remotely
>       attached non-POSIX filesystem like a Windows one, because we're
>       talking through the POSIX API, and it is handing us byte
>       sequences, which will expect may contain anything except a NUL.
>
>   (2) Under Martin's scheme, os.listdir() et al hand us (and accept)
>       funny-encoded Python3 strings, which are strings of Unicode code
>       units (D77).
>       Particularly, if there were bytes in the POSIX byte string that
>       did not decode into Unicode scalar values (D76) then each such
>       byte is encoded as a surrogate (D71,72,73,74).
>
>       it is important to note here that because surrogates are _not_
>       Unicode scalar values, the is no punning between the two sets
>       of values.
>
>   (3) Other Python3 strings that have not been through Martin's mangler
>       in either direction. Ordinary strings.
>
> Your concern is that, handed a string, a programmer could misuse (3) as
> (2) or vice versa because of punning.
>
> In a well-formed unicode string there are no surrogates; surrogates only
> occur in UTF-16 _encodings_ of Unicode strings (D75).
>
> Therefore, it _is_ possible to inspect a string, if one cared, to see if
> it is funny-encoded or "raw". One may get two different answers:
>
>   - If there are surrogate code units then it must be funny-encoded
>     and will therefore work perfectly if handed to a os.* interface.
>
>   - If there are no surrogate code units the it may be funny encoded or it
>     may not have been through Martin's funny-encoder, you can't tell.
>     However, this doesn't matter because the encoder is a no-op for such
>     strings.
>     Therefore it will work perfectly if handed to an os.* interface.
>
> The only gap in this is a specially crated string containing surrogate
> code points that did not come via Martin's encoder. But such a string
> cannot come from a user interface, which will accept only characters
> and there only include unicode scalar values.
>
> Such a string can only be explicitly constructed (eg with a \uD802
> code point). And if something constructs such a string, it must have in
> mind an explicit interpretation of those code points, which means it is
> the _constructor_ on whom the burden of translation lies.
>
> Does this make sesne to you, or have you a counter example in mind?
>   

Lots of configuration systems permit schemes like C's \x to be used to 
create strings.  Whether you perceive that to be a user interface or 
not, or believe that such things should be part of a user interface or 
not, they exist.  Whether they validate that such strings are properly 
constructed Unicode text or should or should not do such validation, is 
open for discussion, but I'd be surprised if there are not some such 
schemes that don't do such checking, and consider it a feature.  Why 
make the file name longer than necessary, when you can just use all 
these nice illegal codepoints to keep it shorter instead?  Instead of 5 
characters for a filename sequence counter, someone might stuff it in 1 
character, in binary, and think they were clever.  I've seen such 
techniques, although not specifically in Python, since I'm fairly new to 
reading Python code.

So I consider it not beyond the realm of possibility to encounter lone 
surrogate code units in strings that haven't been through Martin's 
funny-encoder.  Hence, I disbelieve that the gap you mention can be ignored.

>> In this case, the  
>> accepted rules are those enforced by the file system (at the bytes or  
>> str API levels), and by Python (for the str manipulations).  None of  
>> those rules outlaw lone surrogates.  Hence, while all of the systems  
>> under discussion can handle all Unicode characters in one way or  
>> another, none of them require that all Unicode rules are followed.  Yes,  
>> you are correct that lone surrogates are illegal in Unicode.  No, none  
>> of the accepted rules for these systems require Unicode.
>>     
>
> However, Martin's scheme explicitly translates these ill-formed
> sequences into Python3 strings and back, losslessly. You can have
> surrogates in the filesystem storage/API on Windows. You can have
> non-UTF-8-decodable sequences in the POSIX filesystem layer too.
> They're all taken in and handled.
>   

It is still not clear whether the PEP (1) would be implemented on 
Windows (2) if it is, if it prevents lone surrogates from being obtained 
from the str APIs, by transcoding them into 3 lone surrogates, and if 
doesn't transcode from the str APIs, but does funny-decode from the 
bytes APIs, then it would seem there is still the possibility of data 
puns on Windows.

> In Python3 space, one might have a bytes object with a raw POSIX
> byte filename in it. Presumably one can also have a byte string with a
> raw (UTF-16) WIndows filename in it. They're not strings, so no
> confusion.
>
> But there's no _string_ for these things without a matching
> string<->bytestring mapping associated with it.
>
> If you have a Python3 string which is well-formed Unicode, then you can
> hand it to the os.* interfaces and the Right Thing will happen (on
> Windows just because it stored Unicode and on POSIX provided you agree
> that your locale/getfilesystemencoding() is the right thing).
>
> If you have a string that isn't well-formed, then the meaning of any
> code points which are not Unicode scalar values is not well defined
> without some auxiliary stuff in the app.
>
>   
>>>> NTFS permits any 16-bit "character" code, including abnormal ones,   
>>>> including half-surrogates, and including full surrogate sequences 
>>>> that  decode to PUA characters.  POSIX permits all byte sequences, 
>>>> including  things that look like UTF-8, things that don't look like 
>>>> UTF-8, things  that look like half-surrogates, and things that look 
>>>> like full surrogate  sequences that decode to PUA characters.
>>>>         
>
> See above. I think this is addressed.
>   

Without transcoding on the str APIs, which I haven't seen mentioned, I 
don't think so.

> [...]
>   
>>> These are existing file objects, I'll take them as source 1. They get
>>> encoded for release by os.listdir() et al.
>>>   
>>>       
>>>> And yes, strings can be  generated from scratch.
>>>>         
>>> I take this to be source 2.
>>>       
>> One variation of source 2 is reading output from other programs, such as  
>> ls (POSIX) or dir (Windows).
>>     
>
> Sure. But that is reading byte sequences, and one must again know the
> encoding. If that is known and the input decoded happily into Unicode
> scalar values, then there is no issue. If the input didn't decode, then
> one must make some decision about what the non-decodable bits mean.
>   

Sure.  So the PEP needs your functions, or the equivalent.  Last I 
checked, they weren't there.

>>> I think I agree with all the discussion that followed, and think the
>>> real problem is lack of utlities functions to funny-encode source 2
>>> strings for use. hence the proposal above.
>>>       
>> I think we understand each other now.  I think your proposal could work,  
>> Cameron, although when recoding applications to use your proposal, I'd  
>> find it easier to use the "file name object" that others have proposed.   
>> I think that because either your proposal or the object proposals  
>> require recoding the application, that they will not be accepted.  I  
>> think that because the PEP 383 allows data puns, that it should not be  
>> accepted in its present form.
>>     
>
> I'm of the option now that the puns can only occur when the source 2
> string has surrogates, and either those surrogates are chosen to match
> the funny-encoding, in which case the pun is not a pun, or the
> surrogates are chosen according to a different scheme in which case
> source 2 is obliged to provide a mapping.
>
> A source 2 string of only Unicode scalar values doesn't need remapping.
>   

A correct translation of source 2 strings would be obliged to call one 
of your functions, that doesn't exist in the PEP, because it appears the 
PEP wants to assume that such strings don't exist, unless it creates 
them.  So this takes porting effort for programs generating and 
consuming such strings, to avoid being mangled by the PEP.  That isn't 
necessary today, only post-PEP.

>> I think your if your proposal is accepted, that it then becomes possible  
>> to use an encoding that uses visible characters, which makes it easier  
>> for people to understand and verify.  An encoding such as the one I  
>> suggested, but perhaps using a more obscure character, if there is one,  
>> but yet doesn't violate true Unicode.
>>     
>
> I think any scheme that uses any Unicode scalar value as an escape
> character _inherently_ introduces puns, and puns that are easier to
> encounter.
>
> I think the real strength of Martin's scheme is exactly that bytes strings
> that needed the funny-encoding _do_ produce ill-formed Unicode strings,
> because such strings _cannot_ conflict with well-formed strings.
>
> I think your desire for a human readable encoding is valid, but it should
> be a further purely "presentation" step, somewhat like quoted-printable
> encoding in MIME, and not the scheme used by Martin.
>   

Another step?  Even more porting effort?  For a PEP that is trying to 
avoid porting effort?

But maybe there is a compromise that mostly meets both goals: use U+DC10 
as a (high-flying) escape character.  It is not printable, so the 
substitution glyph will likely get displayed by display functions.  Then 
transcode illegal bytes to the range U+0100 to U+01FF, and transcode 
existing U+DC10 to U+DC10 U+DC10. 

1) This is an easy to understand scheme, and illegal byte values would 
become displayable, but would each be preceded by the substitution glyph 
for the U+DC10. 

2) There would be no need to transcode other lone surrogates... on the 
other hand, any illegal code values could be treated as illegal bytes 
and transcoded, making the strings more nearly legal, and more uniformly 
displayable.

3) The property that all potential data puns are among ill-formed 
Unicode strings is still retained.

4) Because the result string is nearly legal Unicode (except for the 
escape characters U+DC10), it becomes uniformly comparable and different 
strings can be visibly different.

5) It is still necessary to transcode names from str interfaces, to 
escape any U+DC10 characters, at least, which is also required by this 
PEP to avoid data puns on systems that have both str and bytes interfaces.

>> I think it should transform all  
>> data, from str and bytes interfaces, and produce only str values  
>> containing conforming Unicode, escaping all the non-conforming sequences  
>> in some manner.  This would make the strings truly readable, as long as  
>> fonts for all the characters are available.
>>     
>
> But I think it would just move the punning. A human readable string with
> readable escapes in it may be funny-encoded. _Or_ it may be "raw", with
> funny-encoded yet to happen; after all only might weirdly be dealing
> with a filename which contained post-funny-encode visible sequences in
> it.
>
> SO you're right back to _guessing_ what you're looking at.
>
> WIth the surrogate scheme you only have to guess if there are surrogates,
> but then you _know_ that you're dealing with a special encoding scheme;
> it is certain - the guess is about which scheme.
>   

I think you mean you don't have to guess if there are lone surrogates... 
you can look and see.

> If you're working in a domain with no ill-formed strings you never need
> to worry at all.
>
> With a visible/printable-encoding such as you advocate the guess is about
> whether the scheme have even been used, which is why I think it is worse.
>   

So the above scheme, using a U+DC10 escape character, meets your 
desirable truisms about lone surrogates being the trigger for knowing 
that you are dealing with bizarro names, but being uncertain about which 
kind, and also makes the results lots more readable.

I still think there is a need to provide the encoding and decoding 
functions, for both bytes and de novo strings.

>> And I had already suggested  
>> the utility functions you are suggesting, actually, in my first tirade  
>> against PEP 383 (search for "The encode and decode functions should be  
>> available for coders to use, that code to external
>> interfaces, either OS or 3rd party packages, that do not use this  
>> encoding scheme").
>>     
>
> I must have missed that sentence. But it sounds like we want the same
> facilities at least.
>
>   
>> The solution that was proposed in the lead up to releasing Python 3.0  
>> was to offer both bytes and str interfaces (so we have those), and then  
>> for those that want to have a single portable implementation that can  
>> access all data, an object that encapsulates the differences, and the  
>> variant system APIs.  (file system is one, command line is another,  
>> environment is another, I'm not sure if there are more.)  I haven't  
>> heard if any progress on such an encapsulating object has been made; the  
>> people that proposed such have been rather quiet about this PEP.  I  
>> would expect that an object implementation would provide display  
>> strings, and APIs to submit de novo str and bytes values to an object,  
>> which would run the appropriate encoding on them.
>>     
>
> I think covering these other cases is quite messy, if only because
> there's not even agreement amonst existing command line apps about all
> that stuff.
>
> Regarding "APIs to submit de novo str and bytes values to an object,  
> which would run the appropriate encoding on them" I think such a
> facility for de novo strings must require the caller to provide a
> handler/mapper for the not-well-formed parts of such strings if they
> occur.
>   

The caller shouldn't have to supply anything.  The same encoding that is 
applied to str system interfaces that supply strings should be applied 
to de novo strings.  It is just a matter of transcoding a de novo string 
into the "right form" that it can then be encoded by the system encoder 
to produce the original string again, if it goes to a str interface, or 
to an equivalent bytes string, if it goes to a bytes interface.

>> Programs that want to use str interfaces on POSIX will see a subset of  
>> files on systems that contain files whose bytes filenames are not  
>> decodable.
>>     
>
> Not under Martin's scheme, because all bytes filenames _are_ decoded.
>   

I think I was speaking of the status quo, here, not with the PEP.

>> If a sysadmin wants to standardize on UTF-8 names  
>> universally, they can use something like convmv to clean up existing  
>> file names that don't conform.  Programs that use str interfaces on  
>> POSIX system will work fine, but with a subset of the files.  When that  
>> is unacceptable, they can either be recoded to use the bytes interfaces,  
>> or the hopefully forthcoming object encapsulation.  The issue then will  
>> be what technique will be used to transform bytes into display names,  
>> but since the display names would never be fed back to the objects  
>> directly (but the object would have an interface to accept de novo str  
>> and de novo bytes) then it is just a display issue, and one that uses  
>> visible characters would seem more useful in my mind, than one that uses  
>> half-surrogates or PUAs.
>>     
>
> I agree it might be handy to have a display function, but isn't repr()
> exactly that, now I think of it?

repr is a display function that produces rather ugly results in most 
non-ASCII cases.  But then again, one could use repr as the 
funny-encoding scheme, too...  I don't think we want to use repr for 
either case, actually.  Of course, with Py 3, if the file names were 
objects, and could have reprlib customizations...  :) :)

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From tmbdev at gmail.com  Wed Apr 29 07:12:46 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Wed, 29 Apr 2009 07:12:46 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <loom.20090428T224019-608@post.gmane.org>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> 
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> 
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> 
	<49F74EE5.6060305@v.loewis.de>
	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> 
	<49F7613C.9000901@v.loewis.de>
	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> 
	<loom.20090428T224019-608@post.gmane.org>
Message-ID: <7e51d15d0904282212j681084f3i72be4eb316428499@mail.gmail.com>

>
> It cannot crash Python; it can only crash
> hypothetical third-party programs or libraries with deficient error
> checking and
> unreasonable assumptions about input data.

The error checking isn't necessarily deficient.  For example, a safe and
legitimate thing to do is for third party libraries to throw a C++
exception, raise a Python exception, or delete the half surrogate.  Any of
those would break one of the use cases people have been talking about,
namely being able to present the output from os.listdir() to the user, say
in a file selector, and then access that file.

(and, of course, you haven't even proven those programs or libraries exist)
>

PEP 383 is a proposal that suggests changing Python such that malformed
unicode strings become a required part of Python and such that Pyhon writes
illegal UTF-8 encodings to UTF-8 encoded file systems.  Those are big
changes, and it's legitimate to ask that PEP 383 address the implications of
that choice before it's made.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090429/b9f99464/attachment-0001.htm>

From martin at v.loewis.de  Wed Apr 29 07:45:08 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 07:45:08 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<49F6A947.1050106@v.loewis.de>
	<7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com>
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>
	<49F74EE5.6060305@v.loewis.de>
	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>
	<49F7613C.9000901@v.loewis.de>
	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>
Message-ID: <49F7E964.9050700@v.loewis.de>

> The wide APIs use UTF-16.  UTF-16 suffers from the same problem as
> UTF-8: not all sequences of words are valid UTF-16 sequences.  In
> particular, sequences containing isolated surrogate pairs are not
> well-formed according to the Unicode standard.  Therefore, the existence
> of a wide character API function does not guarantee that the wide
> character strings it returns can be converted into valid unicode
> strings.  And, in fact, Windows Vista happily creates files with
> malformed UTF-16 encodings, and os.listdir() happily returns them.

Whatever. What does that have to do with PEP 383? Your claim was
that PEP 383 may have unfortunate effects on Windows, and I'm telling
you that it won't, because the behavior of Python on Windows won't
change at all. So whatever the problem - it's there already, and the
PEP is not going to change it.

I personally don't see a problem here - *of course* os.listdir will
report invalid utf-16 encodings, if that's what is stored on disk.
It doesn't matter whether the file names are valid wrt. some
specification. What matters is that you can access all the files.

Regards,
Martin

From martin at v.loewis.de  Wed Apr 29 07:52:23 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 07:52:23 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F788A6.3040702@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com>
Message-ID: <49F7EB17.4010309@v.loewis.de>

>>> C. File on disk with the invalid surrogate code, accessed via the str
>>> interface, no decoding happens, matches in memory the file on disk with
>>> the byte that translates to the same surrogate, accessed via the bytes
>>> interface.  Ambiguity.
>>
>> Is that an alternative to A and B?
> 
> I guess it is an adjunct to case B, the current PEP.
> 
> It is what happens when using the PEP on a system that provides both
> bytes and str interfaces, and both get used.

Your formulation is a bit too stenographic to me, but please trust me
that there is *no* ambiguity in the case you construct.

By "accessed via the str interface", I assume you do something like

  fn = "some string"
  open(fn)

You are wrong in assuming "no decoding happens", and that "matches
in memory the file on disk" (whatever that means - how do I match
a file on disk in memory??????). What happens instead is that fn
gets *encoded* with the file system encoding, and the python-escape
handler. This will *not* produce an ambiguity.

If you think there is an ambiguity in that you can use both the
byte interface and the string interface to access the same file:
this would be a ridiculous interpretation. *Of course* you can
access /etc/passwd both as "/etc/passwd" and b"/etc/passwd",
there is nothing ambiguous about that.

Regards,
Martin

From martin at v.loewis.de  Wed Apr 29 08:04:52 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 08:04:52 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F7BDD8.3010202@gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F7BDD8.3010202@gmail.com>
Message-ID: <49F7EE04.6090701@v.loewis.de>

>> The Python UTF-8 codec will happily encode half-surrogates; people argue
>> that it is a bug that it does so, however, it would help in this
>> specific case.
> 
> Can we use this encoding scheme for writing into files as well?  We've
> turned the filename with undecodable bytes into a string with half
> surrogates.  Putting that string into a file has to turn them into bytes
> at some level.  Can we use the python-escape error handler to achieve
> that somehow?

Sure: if you are aware that what you write to the stream is actually
a file name, you should encode it with the file system encoding, and
the python-escape handler. However, it's questionable that the same
approach is right for the rest of the data that goes into the file.

If you use a different encoding on the stream, yet still use the
python-escape handler, you may end up with completely non-sensical
bytes. In practice, it probably won't be that bad - python-escape
has likely escaped all non-ASCII bytes, so that on re-encoding with
a different encoding, only the ASCII characters get encoded, which
likely will work fine.

Regards,
Martin

From martin at v.loewis.de  Wed Apr 29 08:07:10 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 29 Apr 2009 08:07:10 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090429032740.GA31335@cskk.homeip.net>
References: <20090429032740.GA31335@cskk.homeip.net>
Message-ID: <49F7EE8E.1030404@v.loewis.de>

> I'm more concerned with your (yours? someone else's?) mention of shift
> characters. I'm unfamiliar with these encodings: to translate such a
> thing into a Latin example, is it the case that there are schemes with
> valid encodings that look like:
> 
>   [SHIFT] a b c
> 
> which would produce "ABC" in unicode, which is ambiguous with:
> 
>   A B C
> 
> which would also produce "ABC"?

No: the "shift" in "shift-jis" is not really about the shift key.
See http://en.wikipedia.org/wiki/Shift-JIS

Regards,
Martin

From martin at v.loewis.de  Wed Apr 29 08:27:18 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 08:27:18 +0200
Subject: [Python-Dev] Python-Dev PEP 383: Non-decodable Bytes in	System
 Character?Interfaces
In-Reply-To: <20090429023353.GA11210@cskk.homeip.net>
References: <20090429023353.GA11210@cskk.homeip.net>
Message-ID: <49F7F346.2010003@v.loewis.de>

> I would like utility functions to perform:
>   os-bytes->funny-encoded
>   funny-encoded->os-bytes
> or explicit example code snippets for same in the PEP text.

Done!

Martin

From tmbdev at gmail.com  Wed Apr 29 08:53:36 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Wed, 29 Apr 2009 08:53:36 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F7E964.9050700@v.loewis.de>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> 
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> 
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> 
	<49F74EE5.6060305@v.loewis.de>
	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> 
	<49F7613C.9000901@v.loewis.de>
	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> 
	<49F7E964.9050700@v.loewis.de>
Message-ID: <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com>

On Wed, Apr 29, 2009 at 07:45, "Martin v. L?wis" <martin at v.loewis.de> wrote:

> Your claim was
> that PEP 383 may have unfortunate effects on Windows,

No, I simply think that PEP 383 is not sufficiently specified to be able to
tell.

> and I'm telling
> you that it won't, because the behavior of Python on Windows won't
> change at all.

A justification for your proposal is that there are differences between
Python on UNIX and Windows that you would like to reduce.  But depending on
where you introduce utf-8b coding on UNIX, you may also have to introduce it
on Windows in order to keep the platforms consistent.

So whatever the problem - it's there already, and the
> PEP is not going to change it.

OK, so you are saying that under PEP 383, utf-8b wouldn't be used anywhere
on Windows by default.  That's not clear from your proposal.

It's also not clear from your proposal where utf-8b will get used on UNIX
systems.  Some of the places that have been suggested are: open, os.listdir,
sys.argv, os.getenv. There are other potential ones, like print, write, and
os.system.  And what about text file and string conversions: will utf-8b
become the default, or optional, or unavailable?

Each of those choices potentially has significant implications.  I'm just
asking what those choices are so that one can then talk about the
implications and see whether this proposal is a good one or whether other
alternatives are better.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090429/a21281d8/attachment-0001.htm>

From v+python at g.nevcal.com  Wed Apr 29 08:54:21 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Tue, 28 Apr 2009 23:54:21 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F7EB17.4010309@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com>
	<49F7EB17.4010309@v.loewis.de>
Message-ID: <49F7F99D.8070606@g.nevcal.com>

On approximately 4/28/2009 10:52 PM, came the following characters from 
the keyboard of Martin v. L?wis:
>>>> C. File on disk with the invalid surrogate code, accessed via the str
>>>> interface, no decoding happens, matches in memory the file on disk with
>>>> the byte that translates to the same surrogate, accessed via the bytes
>>>> interface.  Ambiguity.
>>> Is that an alternative to A and B?
>> I guess it is an adjunct to case B, the current PEP.
>>
>> It is what happens when using the PEP on a system that provides both
>> bytes and str interfaces, and both get used.
> 
> Your formulation is a bit too stenographic to me, but please trust me
> that there is *no* ambiguity in the case you construct.

No Martin, the point of reviewing the PEP is to _not_ trust you, even 
though you are generally very knowledgeable and very trustworthy.  It is 
much easier to find problems before something is released, or even 
coded, than it is afterwards.

> By "accessed via the str interface", I assume you do something like
> 
>   fn = "some string"
>   open(fn)
> 
> You are wrong in assuming "no decoding happens", and that "matches
> in memory the file on disk" (whatever that means - how do I match
> a file on disk in memory??????). What happens instead is that fn
> gets *encoded* with the file system encoding, and the python-escape
> handler. This will *not* produce an ambiguity.

You assumed, and maybe I wasn't clear in my statement.

By "accessed via the str interface" I mean that (on Windows) the wide 
string interface would be used to obtain a file name.  Now, suppose that 
the file name returned contains "abc" followed by the half-surrogate 
U+DC10 -- four 16-bit codes.

Then, ask for the same filename via the bytes interface, using UTF-8 
encoding.  The PEP says that the above name would get translated to 
"abc" followed by 3 half-surrogates, corresponding to the 3 UTF-8 bytes 
used to represent the half-surrogate that is actually in the file name, 
specifically U+DCED U+DCB0 U+DC90.  This means that one name on disk can 
be seen as two different names in memory.

Now posit another file which, when accessed via the str interface, has 
the name "abc" followed by U+DCED U+DCB0 U+DC90.

Looks ambiguous to me.  Now if you have a scheme for handling this case, 
fine, but I don't understand it from what is written in the PEP.

> If you think there is an ambiguity in that you can use both the
> byte interface and the string interface to access the same file:
> this would be a ridiculous interpretation. *Of course* you can
> access /etc/passwd both as "/etc/passwd" and b"/etc/passwd",
> there is nothing ambiguous about that.

Yes, this would be a ridiculous interpretation of "ambiguous".

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at v.loewis.de  Wed Apr 29 09:17:23 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 09:17:23 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>
	<49F74EE5.6060305@v.loewis.de>
	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>
	<49F7613C.9000901@v.loewis.de>
	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>
	<49F7E964.9050700@v.loewis.de>
	<7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com>
Message-ID: <49F7FF03.2090909@v.loewis.de>

> OK, so you are saying that under PEP 383, utf-8b wouldn't be used
> anywhere on Windows by default.  That's not clear from your proposal.

You didn't read it carefully enough. The first three paragraphs of
the "Specification" section make that clear.

Regards,
Martin

From martin at v.loewis.de  Wed Apr 29 09:29:05 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 09:29:05 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F7F99D.8070606@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com>
	<49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com>
Message-ID: <49F801C1.2070109@v.loewis.de>

>>>>> C. File on disk with the invalid surrogate code, accessed via the str
>>>>> interface, no decoding happens, matches in memory the file on disk
>>>>> with
>>>>> the byte that translates to the same surrogate, accessed via the bytes
>>>>> interface.  Ambiguity.
>>>> Is that an alternative to A and B?
>>> I guess it is an adjunct to case B, the current PEP.
>>>
>>> It is what happens when using the PEP on a system that provides both
>>> bytes and str interfaces, and both get used.
>>
>> Your formulation is a bit too stenographic to me, but please trust me
>> that there is *no* ambiguity in the case you construct.
> 
> 
> No Martin, the point of reviewing the PEP is to _not_ trust you, even
> though you are generally very knowledgeable and very trustworthy.  It is
> much easier to find problems before something is released, or even
> coded, than it is afterwards.

Sure. However, that requires you to provide meaningful, reproducible
counter-examples, rather than a stenographic formulation that might
hint some problem you apparently see (which I believe is just not
there).

> You assumed, and maybe I wasn't clear in my statement.
> 
> By "accessed via the str interface" I mean that (on Windows) the wide
> string interface would be used to obtain a file name.

What does that mean? What specific interface are you referring to to
obtain file names? Most of the time, file names are obtained by the
user entering them on the keyboard. GUI applications are completely
out of the scope of the PEP.

> Now, suppose that
> the file name returned contains "abc" followed by the half-surrogate
> U+DC10 -- four 16-bit codes.

Ok, so perhaps you might be talking about os.listdir here. Communication
would be much easier if I would not need to guess what you may mean.

Also, why is U+DC10 four 16-bit codes?

> Then, ask for the same filename via the bytes interface, using UTF-8
> encoding.

How do you do that on Windows? You cannot just pick an encoding, such
as UTF-8, and pass that to the byte interface, and expect it to work.
If you use the byte interface, you need to encode in the file system
encoding, of course.

Also, what do you mean by "ask for"?????? WHAT INTERFACE ARE YOU
USING???? Please use specific python code.

> The PEP says that the above name would get translated to
> "abc" followed by 3 half-surrogates, corresponding to the 3 UTF-8 bytes
> used to represent the half-surrogate that is actually in the file name,
> specifically U+DCED U+DCB0 U+DC90.  This means that one name on disk can
> be seen as two different names in memory.

You are relying on false assumptions here, namely that the UTF-8
encoding would play any role.

What would happen instead is that the "mbcs" encoding would be used. The
"mbcs" encoding, by design from Microsoft, will never report an error,
so the error handler will not be invoked at all.

> Now posit another file which, when accessed via the str interface, has
> the name "abc" followed by U+DCED U+DCB0 U+DC90.
> 
> Looks ambiguous to me.  Now if you have a scheme for handling this case,
> fine, but I don't understand it from what is written in the PEP.

You were just making false assumptions in your reasoning, assumptions
that are way beyond the scope of the PEP.

Regards,
Martin

From baptiste13z at free.fr  Wed Apr 29 09:38:38 2009
From: baptiste13z at free.fr (Baptiste Carvello)
Date: Wed, 29 Apr 2009 09:38:38 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F76422.4010806@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>	<49F6A71A.3020809@v.loewis.de>	<873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>	<49F7510D.7070603@mrabarnett.plus.com>
	<49F76422.4010806@g.nevcal.com>
Message-ID: <gt906m$1ag$1@ger.gmane.org>

Glenn Linderman a ?crit :
> 
> 3. When an undecodable byte 0xPQ is found, decode to the escape 
> codepoint, followed by codepoint U+01PQ, where P and Q are hex digits.
> 

The problem with this strategy is: paths are often sliced, so your 2 codepoints 
could get separated. The good thing with the PEP's strategy is that 1 character 
stays 1 character.

Baptiste

From v+python at g.nevcal.com  Wed Apr 29 09:49:37 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 29 Apr 2009 00:49:37 -0700
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F7FF03.2090909@v.loewis.de>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>	<20090428075806.GB23828@phd.pp.ru>	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>	<49F74EE5.6060305@v.loewis.de>	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>	<49F7613C.9000901@v.loewis.de>	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>	<49F7E964.9050700@v.loewis.de>	<7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com>
	<49F7FF03.2090909@v.loewis.de>
Message-ID: <49F80691.80403@g.nevcal.com>

On approximately 4/29/2009 12:17 AM, came the following characters from 
the keyboard of Martin v. L?wis:
>> OK, so you are saying that under PEP 383, utf-8b wouldn't be used
>> anywhere on Windows by default.  That's not clear from your proposal.
> 
> You didn't read it carefully enough. The first three paragraphs of
> the "Specification" section make that clear.

Sorry, rereading those paragraphs even with this declaration in mind, 
does not make that clear.  It is not enough to have a solution that 
works; it is necessary to communicate that solution clearly enough that 
people understand it.  By the huge amount of feedback you have received, 
it is clear that either the solution doesn't work, or that it wasn't 
communicated clearly.

The following comments are an attempt to help you make the PEP clear, 
based on your above declaration that UTF-8b wouldn't be used on Windows. 
  I may still be unclear about what you mean, but if you can accept 
these enhancements to the PEP, then maybe we are approaching a common 
understanding; if not, you should be aware that the PEP still needs 
clarification.

In the first paragraph, you should make it clear that Python 3.0 does 
not use the Windows bytes interfaces, if it doesn't.  "Python uses 
*only* the wide character APIs..." would suffice.  As stated, it seems 
like Python *does* use the wide character APIs, but leaves open the 
possibility that it might use byte APIs also.  A short description of 
what happens on Windows when Python code uses bytes APIs would also be 
helpful.

In the second paragraph, it speaks of "currently" but then speaks of 
using the half-surrogates.  I don't believe that happens "currently". 
You did change tense, but that paragraph is quite confusing, currently, 
because of the tense change.  You should describe there, the action that 
is currently taken by Python for non-decodable byes, and then in the 
next paragraph talk about what the PEP changes.

The 4th paragraph is now confusing too... would it not be the decode 
error handler that returns the byte strings, in addition to the Unicode 
strings?

The 5th paragraph has apparently confused some people into thinking this 
PEP only applies to locale's using UTF-8 encodings; you should have an 
"else clause" to clear that up, pointing out that the reverse encoding 
of half-surrogates by other encodings already produces errors, that 
UTF-8 is a special case, not the only case.

The code added to the discussion has mismatched (), making me wonder if 
it is complete.  There is a reasonable possibility that only the final ) 
is missing.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From cs at zip.com.au  Wed Apr 29 10:17:44 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Wed, 29 Apr 2009 18:17:44 +1000
Subject: [Python-Dev] Python-Dev PEP 383: Non-decodable Bytes in
	System	Character?Interfaces
In-Reply-To: <49F7F346.2010003@v.loewis.de>
Message-ID: <20090429081744.GA18296@cskk.homeip.net>

On 29Apr2009 08:27, Martin v. L?wis <martin at v.loewis.de> wrote:
| > I would like utility functions to perform:
| >   os-bytes->funny-encoded
| >   funny-encoded->os-bytes
| > or explicit example code snippets for same in the PEP text.
| 
| Done!

Thanks!
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

From hrvoje.niksic at avl.com  Wed Apr 29 10:29:32 2009
From: hrvoje.niksic at avl.com (Hrvoje Niksic)
Date: Wed, 29 Apr 2009 10:29:32 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <22573349.1882424.1240944925201.JavaMail.xicrypt@atgrzls001>
References: <20090427211447.GA4291@cskk.homeip.net>	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
	<loom.20090428T114723-520@post.gmane.org>
	<15546941.1861678.1240922288709.JavaMail.xicrypt@atgrzls001>
	<49F6FA93.7080302@avl.com>
	<22573349.1882424.1240944925201.JavaMail.xicrypt@atgrzls001>
Message-ID: <49F80FEC.9000900@avl.com>

Zooko O'Whielacronx wrote:
>> If you switch to iso8859-15 only in the presence of undecodable  
>> UTF-8, then you have the same round-trip problem as the PEP: both  
>> b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a  
>> way to unambiguously recover the original file name.
> 
> Why do you say that?  It seems to work as I expected here:
> 
>  >>> '\xff'.decode('iso-8859-15')
> u'\xff'
>  >>> '\xc3\xbf'.decode('iso-8859-15')
> u'\xc3\xbf'

Here is what I mean by "switch to iso8859-15" only in the presence of 
undecodable UTF-8:

def file_name_to_unicode(fn, encoding):
     try:
         return fn.decode(encoding)
     except UnicodeDecodeError:
         return fn.decode('iso-8859-15')

Now, assume a UTF-8 locale and try to use it on the provided example 
file names.

 >>> file_name_to_unicode(b'\xff', 'utf-8')
'?'
 >>> file_name_to_unicode(b'\xc3\xbf', 'utf-8')
'?'

That is the ambiguity I was referring to -- to different byte sequences 
result in the same unicode string.

From baptiste13z at free.fr  Wed Apr 29 10:43:49 2009
From: baptiste13z at free.fr (Baptiste Carvello)
Date: Wed, 29 Apr 2009 10:43:49 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <cc93256f0904280614xd6847d4y8aad4f24d731fce6@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>
	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>
	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>
	<49F6933B.7020705@g.nevcal.com>	<30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001>	<49F6FF49.6010205@avl.com>
	<cc93256f0904280614xd6847d4y8aad4f24d731fce6@mail.gmail.com>
Message-ID: <gt940t$fja$1@ger.gmane.org>

Lino Mastrodomenico a ?crit :
> 
> Only for the new utf-8b encoding (if Martin agrees), while the
> existing utf-8 is fine as is (or at least waaay outside the scope of
> this PEP).
> 

This is questionable. This would have the consequence that \udcxx in a python 
string would sometimes mean a surrogate, and sometimes mean raw bytes, depending 
on the history of the string.

By contrast, if the new utf-8b codec would *supercede* the old one, \udcxx would 
always mean raw bytes (at least on UCS-4 builds, where surrogates are unused). 
Thus ambiguity could be avoided.

Baptiste

From baptiste13z at free.fr  Wed Apr 29 11:09:09 2009
From: baptiste13z at free.fr (Baptiste Carvello)
Date: Wed, 29 Apr 2009 11:09:09 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F6A7C0.6090105@g.nevcal.com>
References: <20090428021117.GA25536@cskk.homeip.net>
	<49F6A7C0.6090105@g.nevcal.com>
Message-ID: <gt95gd$kbt$1@ger.gmane.org>

Glenn Linderman a ?crit :

> 
> If there is going to be a required transformation from de novo strings 
> to funny-encoded strings, then why not make one that people can actually 
> see and compare and decode from the displayable form, by using 
> displayable characters instead of lone surrogates?
> 

The problem with your "escape character" scheme is that the meaning is lost with 
slicing of the strings, which is a very common operation.

>>
>> I though half-surrogates were illegal in well formed Unicode. I confess
>> to being weak in this area. By "legitimate" above I meant things like
>> half-surrogates which, like quarks, should not occur alone?
>>   
> 
> "Illegal" just means violating the accepted rules.  In this case, the 
> accepted rules are those enforced by the file system (at the bytes or 
> str API levels), and by Python (for the str manipulations).  None of 
> those rules outlaw lone surrogates.  [...]
> 

Python could as well *specify* that lone surrogates are illegal, as their 
meaning is undefined by Unicode. If this rule is respected language-wise, there 
is no ambiguity. It might be unrealistic on windows, though.

This rule could even be specified only for strings that represent filesystem 
paths. Sure, they are the same type as other strings, but the programmer usually 
knows if a given string is intended to be a path or not.

Baptiste

From tmbdev at gmail.com  Wed Apr 29 11:19:01 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Wed, 29 Apr 2009 11:19:01 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F801C1.2070109@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de> <49F73635.6010105@v.loewis.de> 
	<49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> 
	<49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> 
	<49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> 
	<49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de>
Message-ID: <7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com>

> Sure. However, that requires you to provide meaningful, reproducible
> counter-examples, rather than a stenographic formulation that might
> hint some problem you apparently see (which I believe is just not
> there).

Well, here's another one: PEP 383 would disallow UTF-8 encodings of half
surrogates.  But such encodings are currently supported by Python, and they
are used as part of CESU-8 coding.  That's, in fact, a common way of
converting UTF-16 to UTF-8.  How are you going to deal with existing code
that relies on being able to code half surrogates as UTF-8?

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090429/1ffd6914/attachment.htm>

From solipsis at pitrou.net  Wed Apr 29 11:25:17 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 29 Apr 2009 09:25:17 +0000 (UTC)
Subject: [Python-Dev] PEP 383 (again)
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>
	<20090428075806.GB23828@phd.pp.ru>
	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>
	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>
	<49F74EE5.6060305@v.loewis.de>
	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>
	<49F7613C.9000901@v.loewis.de>
	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>
	<loom.20090428T224019-608@post.gmane.org>
	<7e51d15d0904282212j681084f3i72be4eb316428499@mail.gmail.com>
Message-ID: <loom.20090429T091542-128@post.gmane.org>

Thomas Breuel <tmbdev <at> gmail.com> writes:
> 
> The error checking isn't necessarily deficient.? For example, a safe and
legitimate thing to do is for third party libraries to throw a C++ exception,
raise a Python exception, or delete the half surrogate.

Do you have any concrete examples of this behaviour? When e.g. Nautilus shows
some illegal UTF-8 filenames in an UTF-8 locale, it replaces the offending bytes
with placeholders rather than crash in your face.

> PEP 383 is a proposal that suggests changing Python such that malformed
unicode strings become a required part of Python and such that Pyhon writes
illegal UTF-8 encodings to UTF-8 encoded file systems.

That's again a misleading statement.
It only writes an "illegal encoding" if it received one from the filesystem in
the first place. A clean filesystem will only receive clean filenames.

>? Those are big changes, and it's legitimate to ask that PEP 383 address the
implications of that choice before it's made.

No, it's legitimate to ask that /you/ back up your arguments with concrete
facts. It's difficult to demonstrate the non-existence of a problem. On the
other hand, you can easily demonstrate that it exists, if it really does.

By the way, most of those libraries under Unix would take a char * as input, so
they wouldn't deal with an "illegal unicode string", they would deal with the
original byte string.

Regards

Antoine.

From v+python at g.nevcal.com  Wed Apr 29 11:38:32 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 29 Apr 2009 02:38:32 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <gt906m$1ag$1@ger.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>	<49F6A71A.3020809@v.loewis.de>	<873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>	<49F7510D.7070603@mrabarnett.plus.com>	<49F76422.4010806@g.nevcal.com>
	<gt906m$1ag$1@ger.gmane.org>
Message-ID: <49F82018.5060407@g.nevcal.com>

On approximately 4/29/2009 12:38 AM, came the following characters from 
the keyboard of Baptiste Carvello:
> Glenn Linderman a ?crit :
>>
>> 3. When an undecodable byte 0xPQ is found, decode to the escape 
>> codepoint, followed by codepoint U+01PQ, where P and Q are hex digits.
>>
> 
> The problem with this strategy is: paths are often sliced, so your 2 
> codepoints could get separated. The good thing with the PEP's strategy 
> is that 1 character stays 1 character.
> 
> Baptiste

Except for half-surrogates that are in the file names already, which get 
converted to 3 characters.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at g.nevcal.com  Wed Apr 29 11:56:05 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 29 Apr 2009 02:56:05 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F801C1.2070109@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com>
	<49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com>
	<49F801C1.2070109@v.loewis.de>
Message-ID: <49F82435.3060205@g.nevcal.com>

On approximately 4/29/2009 12:29 AM, came the following characters from 
the keyboard of Martin v. L?wis:
>>>>>> C. File on disk with the invalid surrogate code, accessed via the str
>>>>>> interface, no decoding happens, matches in memory the file on disk
>>>>>> with
>>>>>> the byte that translates to the same surrogate, accessed via the bytes
>>>>>> interface.  Ambiguity.
>>>>> Is that an alternative to A and B?
>>>> I guess it is an adjunct to case B, the current PEP.
>>>>
>>>> It is what happens when using the PEP on a system that provides both
>>>> bytes and str interfaces, and both get used.
>>> Your formulation is a bit too stenographic to me, but please trust me
>>> that there is *no* ambiguity in the case you construct.
>>
>> No Martin, the point of reviewing the PEP is to _not_ trust you, even
>> though you are generally very knowledgeable and very trustworthy.  It is
>> much easier to find problems before something is released, or even
>> coded, than it is afterwards.
> 
> Sure. However, that requires you to provide meaningful, reproducible
> counter-examples, rather than a stenographic formulation that might
> hint some problem you apparently see (which I believe is just not
> there).
> 
>> You assumed, and maybe I wasn't clear in my statement.
>>
>> By "accessed via the str interface" I mean that (on Windows) the wide
>> string interface would be used to obtain a file name.
> 
> What does that mean? What specific interface are you referring to to
> obtain file names? Most of the time, file names are obtained by the
> user entering them on the keyboard. GUI applications are completely
> out of the scope of the PEP.
> 
>> Now, suppose that
>> the file name returned contains "abc" followed by the half-surrogate
>> U+DC10 -- four 16-bit codes.
> 
> Ok, so perhaps you might be talking about os.listdir here. Communication
> would be much easier if I would not need to guess what you may mean.

os.listdir("")

> 
> Also, why is U+DC10 four 16-bit codes?

It isn't.

First 16-bit code is U+0061
Second 16-bit code is U+0062
Third 16-bit code is U+0063
Fourth 16-bit code is U+DC10

>> Then, ask for the same filename via the bytes interface, using UTF-8
>> encoding.
> 
> How do you do that on Windows? You cannot just pick an encoding, such
> as UTF-8, and pass that to the byte interface, and expect it to work.
> If you use the byte interface, you need to encode in the file system
> encoding, of course.
> 
> Also, what do you mean by "ask for"?????? WHAT INTERFACE ARE YOU
> USING???? Please use specific python code.

os.listdir(b"")

I find that on my Windows system, with all ASCII path file names, that I 
get quite different results when I pass os.listdir an empty str vs an 
empty bytes.

Rather than keep you guessing, I get the root directory contents from 
the empty str, and the current directory contents from an empty bytes. 
That is rather unexpected.

So I guess I'd better suggest that a specific, equivalent directory name 
be passed in either bytes or str form.

>> The PEP says that the above name would get translated to
>> "abc" followed by 3 half-surrogates, corresponding to the 3 UTF-8 bytes
>> used to represent the half-surrogate that is actually in the file name,
>> specifically U+DCED U+DCB0 U+DC90.  This means that one name on disk can
>> be seen as two different names in memory.
> 
> You are relying on false assumptions here, namely that the UTF-8
> encoding would play any role.
> 
> What would happen instead is that the "mbcs" encoding would be used. The
> "mbcs" encoding, by design from Microsoft, will never report an error,
> so the error handler will not be invoked at all.

So what you are saying here is that Python doesn't use the "A" forms of 
the Windows APIs for filenames, but only the "W" forms, and uses lossy 
decoding (from MS) to the current code page (which can never be UTF-8 on 
Windows).

You are further saying that Python doesn't give the programmer control 
over the codec that is used to convert from W results to bytes, so that 
on Windows, it is impossible to obtain a bytes result containing UTF-8 
from os.listdir, even though sys.setfilesystemencoding exists, and 
sys.getfilesystemencoding is affected by it, and the latter is 
documented as returning "mbcs", and as returning the codec that should 
be used by the application to convert str to bytes for filenames. 
(Python 3.0.1).

While I can hear a "that is outside the scope of the PEP" coming, this 
documentation is confusing, to say the least.

>> Now posit another file which, when accessed via the str interface, has
>> the name "abc" followed by U+DCED U+DCB0 U+DC90.
>>
>> Looks ambiguous to me.  Now if you have a scheme for handling this case,
>> fine, but I don't understand it from what is written in the PEP.
> 
> You were just making false assumptions in your reasoning, assumptions
> that are way beyond the scope of the PEP.

Absolutely correct.  I was making what seemed to be reasonable 
assumptions about Python internals on Windows, and several of them are 
false, including misleading documentation for listdir (which doesn't 
specify that bytes and str parameters affect whether or not the current 
directory is honored), and sys.getfilesystemencoding (which reflects the 
result of sys.setfilesystemencoding, rather than returning, on Windows, 
the "mbcs" used by Python to create bytes forms of filenames from W 
forms of filenames even after sys.setfilesystemencoding is called. 
Things are a little clearer in the documentation for 
sys.setfilesystemencoding, which does say the encoding isn't used by 
Windows -- so why is it permitted to change it, if it has no effect?).

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From rdmurray at bitdance.com  Wed Apr 29 13:07:11 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 29 Apr 2009 07:07:11 -0400 (EDT)
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F7C98C.60406@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> 
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> 
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
	<p04330106c61b9f2aad0a@192.168.123.162>
	<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>
	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<Pine.LNX.4.64.0904282238280.1740@kimball.webabinitio.net>
	<49F7C98C.60406@g.nevcal.com>
Message-ID: <Pine.LNX.4.64.0904290650470.1740@kimball.webabinitio.net>

On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote:
> On approximately 4/28/2009 7:40 PM, came the following characters from the 
> keyboard of R. David Murray:
>>  On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
>> >  C. File on disk with the invalid surrogate code, accessed via the str 
>> >  interface, no decoding happens, matches in memory the file on disk with 
>> >  the byte that translates to the same surrogate, accessed via the bytes 
>> >  interface. Ambiguity.
>>
>>  Unless I'm missing something, one of these is type str, and the other is
>>  type bytes, so no ambiguity.
>
>
> You are missing that the bytes value would get decoded to a str; thus both 
> are str; so ambiguity is possible.

Only if you as the programmer decode it.  Now, I don't understand the
subtleties of Unicode enough to know if Martin has already successfully
addressed this concern in another fashion, but personally I think that
if you as a programmer are comparing funnydecoded-str strings gotten
via a string interface with normal-decoded strings gotten via a bytes
interface, that we could claim that your program has a bug.

--David

From cs at zip.com.au  Wed Apr 29 13:36:53 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Wed, 29 Apr 2009 21:36:53 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F82435.3060205@g.nevcal.com>
Message-ID: <20090429113653.GA22908@cskk.homeip.net>

On 29Apr2009 02:56, Glenn Linderman <v+python at g.nevcal.com> wrote:
> os.listdir(b"")
>
> I find that on my Windows system, with all ASCII path file names, that I  
> get quite different results when I pass os.listdir an empty str vs an  
> empty bytes.
>
> Rather than keep you guessing, I get the root directory contents from  
> the empty str, and the current directory contents from an empty bytes.  
> That is rather unexpected.
>
> So I guess I'd better suggest that a specific, equivalent directory name  
> be passed in either bytes or str form.

I think you may have uncovered an implementation bug rather than an
encoding issue (because I'd expect "" and b"" to be equivalent).

In ancient times, "" was a valid UNIX name for the working directory.
POSIX disallows that, and requires people to use ".".

Maybe you're seeing an artifact; did python move from UNIX to Windows or the
other way around in its porting history? I'd guess the former.

Do you get differing results from listdir(".") and listdir(b".") ?
How's python2 behave for ""? (Since there's no b"" in python2.)

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

'Supposing a tree fell down, Pooh, when we were underneath it?'
'Supposing it didn't,' said Pooh after careful thought.

From v+python at g.nevcal.com  Wed Apr 29 13:47:00 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 29 Apr 2009 04:47:00 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <Pine.LNX.4.64.0904290650470.1740@kimball.webabinitio.net>
References: <49EEBE2E.3090601@v.loewis.de>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
	<p04330106c61b9f2aad0a@192.168.123.162>
	<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>
	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<Pine.LNX.4.64.0904282238280.1740@kimball.webabinitio.net>
	<49F7C98C.60406@g.nevcal.com>
	<Pine.LNX.4.64.0904290650470.1740@kimball.webabinitio.net>
Message-ID: <49F83E34.4020005@g.nevcal.com>

On approximately 4/29/2009 4:07 AM, came the following characters from 
the keyboard of R. David Murray:
> On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote:
>> On approximately 4/28/2009 7:40 PM, came the following characters from 
>> the keyboard of R. David Murray:
>>>  On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote:
>>> >  C. File on disk with the invalid surrogate code, accessed via the 
>>> str >  interface, no decoding happens, matches in memory the file on 
>>> disk with >  the byte that translates to the same surrogate, accessed 
>>> via the bytes >  interface. Ambiguity.
>>>
>>>  Unless I'm missing something, one of these is type str, and the 
>>> other is
>>>  type bytes, so no ambiguity.
>>
>>
>> You are missing that the bytes value would get decoded to a str; thus 
>> both are str; so ambiguity is possible.
> 
> Only if you as the programmer decode it.  Now, I don't understand the
> subtleties of Unicode enough to know if Martin has already successfully
> addressed this concern in another fashion, but personally I think that
> if you as a programmer are comparing funnydecoded-str strings gotten
> via a string interface with normal-decoded strings gotten via a bytes
> interface, that we could claim that your program has a bug.

Hopefully Martin will clarify the PEP as I suggested in another branch 
of this thread.  He has eventually convinced me that this ambiguity is 
not possible, via email discussion, but the PEP is certainly less than 
sufficiently explanatory to make that obvious.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at g.nevcal.com  Wed Apr 29 14:06:57 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 29 Apr 2009 05:06:57 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090429113653.GA22908@cskk.homeip.net>
References: <20090429113653.GA22908@cskk.homeip.net>
Message-ID: <49F842E1.1060008@g.nevcal.com>

On approximately 4/29/2009 4:36 AM, came the following characters from 
the keyboard of Cameron Simpson:
> On 29Apr2009 02:56, Glenn Linderman <v+python at g.nevcal.com> wrote:
>   
>> os.listdir(b"")
>>
>> I find that on my Windows system, with all ASCII path file names, that I  
>> get quite different results when I pass os.listdir an empty str vs an  
>> empty bytes.
>>
>> Rather than keep you guessing, I get the root directory contents from  
>> the empty str, and the current directory contents from an empty bytes.  
>> That is rather unexpected.
>>
>> So I guess I'd better suggest that a specific, equivalent directory name  
>> be passed in either bytes or str form.
>>     
>
> I think you may have uncovered an implementation bug rather than an
> encoding issue (because I'd expect "" and b"" to be equivalent).
>   

Me too.

> In ancient times, "" was a valid UNIX name for the working directory.
> POSIX disallows that, and requires people to use ".".
>
> Maybe you're seeing an artifact; did python move from UNIX to Windows or the
> other way around in its porting history? I'd guess the former.
>
> Do you get differing results from listdir(".") and listdir(b".") ?
>   

No.  Both are the same as b""

> How's python2 behave for ""? (Since there's no b"" in python2.)

Python2 os.listdir("") produces the same thing as Python3 os.listdir(b"")
Python2 os.listdir(u"") produces the same thing as Python3 os.listdir("")

Another phenomenon of note:

I created a directory named ?bc.  (Windows XP, Python 3.0.1, Python 
2.6.1, SetConsoleOutputCP(65001))
Python3 os.listdir(b".") prints it as b"\xe1bc"
Python2 os.listdir(".") prints it as b"\xe1bc"
Python2 os.listdir(u".") prints it as u"\xe1bc"
Python3 os.listdir(".") prints it as "bc"

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From stephen at xemacs.org  Wed Apr 29 14:18:42 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 29 Apr 2009 21:18:42 +0900
Subject: [Python-Dev]  a suggestion ... Re:  PEP 383 (again)
In-Reply-To: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
Message-ID: <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>

Thomas Breuel writes:

 > PEP 383 violated (2), and I think that's a bad thing.

The whole purpose of PEP 383 is to send the exact same bytes that were
read from the OS back to the OS => violating (2) (for whatever the
apparent system file-encoding is, not limited to UTF-8), and that has
overwhelmingly popular support.

Note that this won't happen automatically, either, AIUI.  The PEP's
proposed implementation is as an error handler, and this would need to
be specified explicitly.  It's not intended to be the default.

 > I think the best solution would be to use (3a) and fall back to (3b) if that
 > doesn't work.  If people try to write those strings, they will always get
 > written as correctly encoded UTF-8 strings.

The intended audience aren't trying to write anything in particular,
though.  They just want to repeat verbatim what the OS told them.

 > There is yet another option, which is arguably the "right" one: make the
 > results of os.listdir() subclasses of string that keep track of where they
 > came from.

Sure.  This has been mentioned by several people.  Martin has no
intention of doing it in PEP 383, though, so it will need a new PEP.
It has gotten strong pushback from several people, as well.

From stephen at xemacs.org  Wed Apr 29 15:14:18 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 29 Apr 2009 22:14:18 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <gt940t$fja$1@ger.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com>
	<49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com>
	<49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com>
	<49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com>
	<30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001>
	<49F6FF49.6010205@avl.com>
	<cc93256f0904280614xd6847d4y8aad4f24d731fce6@mail.gmail.com>
	<gt940t$fja$1@ger.gmane.org>
Message-ID: <87d4avk3f9.fsf@uwakimon.sk.tsukuba.ac.jp>

Baptiste Carvello writes:

 > By contrast, if the new utf-8b codec would *supercede* the old one,
 > \udcxx would always mean raw bytes (at least on UCS-4 builds, where
 > surrogates are unused). Thus ambiguity could be avoided.

Unfortunately, that's false.  It could have come from a literal string
(similar to the text above ;-), a C extension, or a string slice (on
16-bit builds), and there may be other ways to do it.  The only way to
avoid ambiguity is to change the definition of a Python string to be
*valid* Unicode (possibly with Python extensions such as PEP 383 for
internal use only).  But Guido has rejected that in the past;
validation is the application's problem, not Python's.

Nor is a UCS-4 build exempt.  IIRC Guido specifically envisioned
Python strings being used to build up code point sequences to be
directly output, which means that a UCS-4 string might none-the-less
contain surrogates being added to a string intended to be sent as
UTF-16 output simply by truncating the 32-bit code units to 16 bits.

From stephen at xemacs.org  Wed Apr 29 15:39:26 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 29 Apr 2009 22:39:26 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F73635.6010105@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>
	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>
	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>
	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T111118-510@post.gmane.org>
	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>
	<loom.20090427T154536-926@post.gmane.org>
	<p04330106c61b9f2aad0a@192.168.123.162>
	<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>
	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de>
Message-ID: <87bpqfk29d.fsf@uwakimon.sk.tsukuba.ac.jp>

"Martin v. L?wis" writes:

 > I find the case pretty artificial, though: if the locale encoding
 > changes, all file names will look incorrect to the user, so he'll
 > quickly switch back, or rename all the files.

It's not necessarily the case that the locale encoding changes, but
rather the name of the file.  I have a couple of directories where I
have Japanese in both EUC-JP and UTF-8, for example.  (The
applications where I never bothered to do a conversion from EUC to
UTF-8 are things like stripping MIME attachments from messages and
saving them to files when I changed my default.)

So I have a little Emacs Lisp function that tries EUC or UTF8
depending on date and falls back to the other on a decode error.

Another possible situation would be a user program in the user's
locale communicating with a daemon running in some other locale (quite
likely POSIX).

So while out of scope of the PEP, I don't think it's at all
artificial.

From skip at pobox.com  Wed Apr 29 16:30:53 2009
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 29 Apr 2009 09:30:53 -0500 (CDT)
Subject: [Python-Dev] string to float containing whitespace
Message-ID: <20090429143053.3AE1D10276C9@montanaro.dyndns.org>

Someone please tell me I'm not going mad.  I could have sworn that once upon
a time attempting to convert numeric strings to ints or floats if they
contained whitespace raised an exception.  As far back as 1.5.2 it appears
that float(), string.atof() and string.atoi() allow whitespace.  Maybe I'm
thinking of trailing non-numeric, non-whitespace characters.

Skip

From amauryfa at gmail.com  Wed Apr 29 17:13:42 2009
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Wed, 29 Apr 2009 17:13:42 +0200
Subject: [Python-Dev] string to float containing whitespace
In-Reply-To: <20090429143053.3AE1D10276C9@montanaro.dyndns.org>
References: <20090429143053.3AE1D10276C9@montanaro.dyndns.org>
Message-ID: <e27efe130904290813l58274056s76c22e539362d349@mail.gmail.com>

Hi,

2009/4/29  <skip at pobox.com>:
> Someone please tell me I'm not going mad. ?I could have sworn that once upon
> a time attempting to convert numeric strings to ints or floats if they
> contained whitespace raised an exception. ?As far back as 1.5.2 it appears
> that float(), string.atof() and string.atoi() allow whitespace. ?Maybe I'm
> thinking of trailing non-numeric, non-whitespace characters.

You are maybe referring to the Decimal constructor:
   decimal.Decimal(" 123")
fails with 2.5, but works with 2.6. (issue 1780)

-- 
Amaury Forgeot d'Arc

From skip at pobox.com  Wed Apr 29 17:26:17 2009
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 29 Apr 2009 10:26:17 -0500
Subject: [Python-Dev] string to float containing whitespace
In-Reply-To: <e27efe130904290813l58274056s76c22e539362d349@mail.gmail.com>
References: <20090429143053.3AE1D10276C9@montanaro.dyndns.org>
	<e27efe130904290813l58274056s76c22e539362d349@mail.gmail.com>
Message-ID: <18936.29081.376818.250362@montanaro.dyndns.org>

    Amaury> You are maybe referring to the Decimal constructor:
    Amaury>    decimal.Decimal(" 123")
    Amaury> fails with 2.5, but works with 2.6. (issue 1780)

Highly unlikely, since my recollection is from way back in the early days.
Also, I have yet to actually use the decimal module. :-/

Skip

From theandromedan at gmail.com  Wed Apr 29 17:42:11 2009
From: theandromedan at gmail.com (Paul Franz)
Date: Wed, 29 Apr 2009 11:42:11 -0400
Subject: [Python-Dev] Installing Python 2.5.4 from Source under Windows
Message-ID: <49F87553.7070501@gmail.com>

I have looked and looked and looked. But I can not find any directions 
on how to install the version of Python build using Microsoft's 
compiler. It builds. I get the dlls and the exe's. But there is no 
documentation that says how to install what has been built. I have read 
every readme and stop by the IRC channel and there seems to be nothing.

Any ideas where I can look?

Paul Franz

From aahz at pythoncraft.com  Wed Apr 29 18:03:00 2009
From: aahz at pythoncraft.com (Aahz)
Date: Wed, 29 Apr 2009 09:03:00 -0700
Subject: [Python-Dev] Installing Python 2.5.4 from Source under Windows
In-Reply-To: <49F87553.7070501@gmail.com>
References: <49F87553.7070501@gmail.com>
Message-ID: <20090429160300.GA10295@panix.com>

On Wed, Apr 29, 2009, Paul Franz wrote:
>
> I have looked and looked and looked. But I can not find any directions  
> on how to install the version of Python build using Microsoft's  
> compiler. It builds. I get the dlls and the exe's. But there is no  
> documentation that says how to install what has been built. I have read  
> every readme and stop by the IRC channel and there seems to be nothing.
>
> Any ideas where I can look?

Please use comp.lang.python -- python-dev is for discussion of core
development.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From theandromedan at gmail.com  Wed Apr 29 19:08:11 2009
From: theandromedan at gmail.com (Paul Franz)
Date: Wed, 29 Apr 2009 13:08:11 -0400
Subject: [Python-Dev] Installing Python 2.5.4 from Source under Windows
In-Reply-To: <20090429160300.GA10295@panix.com>
References: <49F87553.7070501@gmail.com> <20090429160300.GA10295@panix.com>
Message-ID: <49F8897B.5080805@gmail.com>

Ok. I will ask on the python-list.

Paul Franz

Aahz wrote:
> On Wed, Apr 29, 2009, Paul Franz wrote:
>   
>> I have looked and looked and looked. But I can not find any directions  
>> on how to install the version of Python build using Microsoft's  
>> compiler. It builds. I get the dlls and the exe's. But there is no  
>> documentation that says how to install what has been built. I have read  
>> every readme and stop by the IRC channel and there seems to be nothing.
>>
>> Any ideas where I can look?
>>     
>
> Please use comp.lang.python -- python-dev is for discussion of core
> development.
>   

From larry at hastings.org  Wed Apr 29 22:01:38 2009
From: larry at hastings.org (Larry Hastings)
Date: Wed, 29 Apr 2009 13:01:38 -0700
Subject: [Python-Dev] Proposed: add support for UNC paths to all functions
	in ntpath
Message-ID: <49F8B222.7070204@hastings.org>

I've written a patch for Python 3.1 that changes os.path so it handles 
UNC paths on Windows:

   http://bugs.python.org/issue5799

In a Windows path string, a UNC path functions *exactly* like a drive
letter.  This patch means that the Python path split/join functions
treats them as if they were.

For instance:
    >>> splitdrive("A:\\FOO\\BAR.TXT")
    ("A:", "\\FOO\\BAR.TXT")

With this patch applied:
    >>> splitdrive("\\\\HOSTNAME\\SHARE\\FOO\\BAR.TXT")
    ("\\\\HOSTNAME\\SHARE", "\\FOO\\BAR.TXT")

This methodology only breaks down in one place: there is no "default
directory" for a UNC share point.  E.g. you can say
    >>> os.chdir("c:")
or
    >>> os.chdir("c:foo\\bar")
but you can't say
    >>> os.chdir("\\\\hostname\\share")
But this is irrelevant to the patch.

Here's what my patch changes:
* Modify join, split, splitdrive, and ismount to add explicit support
  for UNC paths.  (The other functions pick up support from these four.)
* Simplify isabs and normpath, now that they don't need to be delicate
  about UNC paths.
* Modify existing unit tests and add new ones.
* Document the changes to the API.
* Deprecate splitunc, with a warning and a documentation remark.

This patch adds one subtle change I hadn't expected.  If you call
split() with a drive letter followed by a trailing slash, it returns the
trailing slash as part of the "head" returned.  E.g.
    >>> os.path.split("\\")
    ("\\", "")
    >>> os.path.split("A:\\")
    ("A:\\", "")
This is mentioned in the documentation, as follows:
    Trailing slashes are stripped from head unless it is the root
    (one or more slashes only).

For some reason, when os.path.split was called with a UNC path with only
a trailing slash, it stripped the trailing slash:
    >>> os.path.split("\\\\hostname\\share\\")
    ("\\\\hostname\\share", "")
My patch changes this behavior; you would now see:
    >>> os.path.split("\\\\hostname\\share\\")
    ("\\\\hostname\\share\\", "")
I think it's an improvement--this is more consistent.  Note that this
does *not* break the documented requirement that
os.path.join(os.path.split(path)) == path; that continues to work fine.

In the interests of full disclosure: I submitted a patch providing this
exact behavior just over ten years ago.  GvR accepted it into Python
1.5.2b2 (marked "*EXPERIMENTAL*") and removed it from 1.5.2c1.

You can read GvR's commentary upon removing it; see comments in
Misc/HISTORY <http://svn.python.org/view/python/trunk/Misc/HISTORY> dated "Tue Apr  6 19:38:18 1999".  If memory serves
correctly, the problems cited were only on Cygwin.  At the time Cygwin
used "ntpath", and it supported "//a/foo" as an alias for "A:\\FOO". 
You can see how this would cause Cygwin problems.

In the intervening decade, two highly relevant things have happened:
* Python no longer uses ntpath for os.path on Cygwin.  Instead it uses
  posixpath.
* Cygwin removed the "//a/foo" drive letter hack.  In fact, I believe it
  now support UNC paths.
Therefore this patch will have no effect on Cygwin users.

What do you think?

/larry/

From martin at v.loewis.de  Wed Apr 29 22:06:33 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 22:06:33 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F80691.80403@g.nevcal.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>	<20090428075806.GB23828@phd.pp.ru>	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>	<49F74EE5.6060305@v.loewis.de>	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>	<49F7613C.9000901@v.loewis.de>	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>	<49F7E964.9050700@v.loewis.de>	<7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com>
	<49F7FF03.2090909@v.loewis.de> <49F80691.80403@g.nevcal.com>
Message-ID: <49F8B349.30901@v.loewis.de>

> In the first paragraph, you should make it clear that Python 3.0 does
> not use the Windows bytes interfaces, if it doesn't.  "Python uses
> *only* the wide character APIs..." would suffice.

That's not quite exact. It uses both ANSI and Wide APIs - depending
on whether you pass bytes as input or strings. Please see the Python
source code to find out how this works, and what that means.

> As stated, it seems
> like Python *does* use the wide character APIs, but leaves open the
> possibility that it might use byte APIs also.  A short description of
> what happens on Windows when Python code uses bytes APIs would also be
> helpful.

I'm at a loss how to make the text more clear than it already is. I'm
really not good at writing long essays, with a lot of
explanatory-but-non-normative text. I also think that explanations do
not belong in the section titled specification, nor does a full
description of the status quo belongs into the PEP at all. The reader
should consult the current Python source code if in doubt what the
status quo is.

> In the second paragraph, it speaks of "currently" but then speaks of
> using the half-surrogates.  I don't believe that happens "currently".
> You did change tense, but that paragraph is quite confusing, currently,
> because of the tense change.  You should describe there, the action that
> is currently taken by Python for non-decodable byes, and then in the
> next paragraph talk about what the PEP changes.

Thanks, fixed.

> The 4th paragraph is now confusing too... would it not be the decode
> error handler that returns the byte strings, in addition to the Unicode
> strings?

No, why do you think so? That's intended as stated.

> The 5th paragraph has apparently confused some people into thinking this
> PEP only applies to locale's using UTF-8 encodings; you should have an
> "else clause" to clear that up, pointing out that the reverse encoding
> of half-surrogates by other encodings already produces errors, that
> UTF-8 is a special case, not the only case.

I have fixed that by extending the third paragraph.

> The code added to the discussion has mismatched (), making me wonder if
> it is complete.  There is a reasonable possibility that only the final )
> is missing.

Indeed; this is now also fixed.

Regards,
Martin

From martin at v.loewis.de  Wed Apr 29 22:15:12 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 22:15:12 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de> <49F73635.6010105@v.loewis.de>
	<49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de>
	<49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de>
	<49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de>
	<49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de>
	<7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com>
Message-ID: <49F8B550.9070808@v.loewis.de>

>     Sure. However, that requires you to provide meaningful, reproducible
>     counter-examples, rather than a stenographic formulation that might
>     hint some problem you apparently see (which I believe is just not
>     there).
> 
> 
> Well, here's another one: PEP 383 would disallow UTF-8 encodings of half
> surrogates.  But such encodings are currently supported by Python, and
> they are used as part of CESU-8 coding.  That's, in fact, a common way
> of converting UTF-16 to UTF-8.  How are you going to deal with existing
> code that relies on being able to code half surrogates as UTF-8?

Can you please elaborate? What code specifically are you talking about?

Regards,
Martin

From martin at v.loewis.de  Wed Apr 29 22:28:54 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 22:28:54 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F82435.3060205@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com>
	<49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com>
	<49F801C1.2070109@v.loewis.de> <49F82435.3060205@g.nevcal.com>
Message-ID: <49F8B886.5020700@v.loewis.de>

>>>>>>> C. File on disk with the invalid surrogate code, accessed via the
>>>>>>> str interface, no decoding happens, matches in memory the file on disk
>>>>>>> with the byte that translates to the same surrogate, accessed via the
>>>>>>> bytes interface.  Ambiguity.
>> What does that mean? What specific interface are you referring to to
>> obtain file names? 
> 
> os.listdir("")
> 
> os.listdir(b"")
> 
> So I guess I'd better suggest that a specific, equivalent directory name
> be passed in either bytes or str form.

[Leaving the issue of the empty string apparently having different
meanings aside ...]

Ok. Now I understand the example. So you do

os.listdir("c:/tmp")
os.listdir(b"c:/tmp")

and you have a file in c:/tmp that is named "abc\uDC10".

> So what you are saying here is that Python doesn't use the "A" forms of
> the Windows APIs for filenames, but only the "W" forms, and uses lossy
> decoding (from MS) to the current code page (which can never be UTF-8 on
> Windows).

Actually, it does use the A form, in the second listdir example. This,
in turn (inside Windows), uses the lossy CP_ACP encoding. You get back
a byte string; the listdirs should give

["abc\uDC10"]
[b"abc?"]

(not quite sure about the second - I only guess that CP_ACP will replace
the half surrogate with a question mark).

So where is the ambiguity here?

> You are further saying that Python doesn't give the programmer control
> over the codec that is used to convert from W results to bytes, so that
> on Windows, it is impossible to obtain a bytes result containing UTF-8
> from os.listdir, even though sys.setfilesystemencoding exists, and
> sys.getfilesystemencoding is affected by it, and the latter is
> documented as returning "mbcs", and as returning the codec that should
> be used by the application to convert str to bytes for filenames.
> (Python 3.0.1).

Not exactly. You *can* do setfilesystemencoding on Windows, but it has
no effect, as the Python file system encoding is never used on Windows.
For a string, it passes it to the W API as is; for bytes, it passes it
to the A API as-is. Python never invokes any codec here.

> While I can hear a "that is outside the scope of the PEP" coming, this
> documentation is confusing, to say the least.

Only because you are apparently unaware of the status quo. If you would
study the current Python source code, it would be all very clear.

> Things are a little clearer in the documentation for
> sys.setfilesystemencoding, which does say the encoding isn't used by
> Windows -- so why is it permitted to change it, if it has no effect?).

As in many cases: because nobody contributed code to make it behave
otherwise. It's not that the file system encoding is "mbcs" - the
file system encoding is simply unused on Windows (but that wasn't
always the case, in particular not when Windows 9x still had to
be supported).

Regards,
Martin

From martin at v.loewis.de  Wed Apr 29 22:35:17 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 22:35:17 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <87bpqfk29d.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <49EEBE2E.3090601@v.loewis.de>	<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>	<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>	<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>	<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090427T111118-510@post.gmane.org>	<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>	<loom.20090427T154536-926@post.gmane.org>	<p04330106c61b9f2aad0a@192.168.123.162>	<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>	<49F73635.6010105@v.loewis.de>
	<87bpqfk29d.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <49F8BA05.2070906@v.loewis.de>

> So while out of scope of the PEP, I don't think it's at all
> artificial.

Sure - but I see this as the same case as "the file got renamed".
If you have a LRU list in your app, and a file gets renamed, then
the LRU list breaks (unless you also store the inode number in the
LRU list, and lookup the file by inode number - or object UUID
on NTFS, possibly using distributed link tracking).

Regards,
Martin

From tjreedy at udel.edu  Wed Apr 29 22:59:57 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 29 Apr 2009 16:59:57 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F842E1.1060008@g.nevcal.com>
References: <20090429113653.GA22908@cskk.homeip.net>
	<49F842E1.1060008@g.nevcal.com>
Message-ID: <gtaf4d$ppb$1@ger.gmane.org>

Glenn Linderman wrote:
> On approximately 4/29/2009 4:36 AM, came the following characters from 
> the keyboard of Cameron Simpson:
>> On 29Apr2009 02:56, Glenn Linderman <v+python at g.nevcal.com> wrote:
>>  
>>> os.listdir(b"")
>>>
>>> I find that on my Windows system, with all ASCII path file names, 
>>> that I  get quite different results when I pass os.listdir an empty 
>>> str vs an  empty bytes.
>>>
>>> Rather than keep you guessing, I get the root directory contents 
>>> from  the empty str, and the current directory contents from an empty 
>>> bytes.  That is rather unexpected.
>>>
>>> So I guess I'd better suggest that a specific, equivalent directory 
>>> name  be passed in either bytes or str form.
>>>     
>>
>> I think you may have uncovered an implementation bug rather than an
>> encoding issue (because I'd expect "" and b"" to be equivalent).
>>   
> 
> Me too.

Sounds like an issue for the tracker.

From tjreedy at udel.edu  Wed Apr 29 23:03:30 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 29 Apr 2009 17:03:30 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com>
References: <49EEBE2E.3090601@v.loewis.de> <49F73635.6010105@v.loewis.de>
	<49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de>
	<49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de>
	<49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de>
	<49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de>
	<7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com>
Message-ID: <gtafb2$ppb$2@ger.gmane.org>

Thomas Breuel wrote:
> 
>     Sure. However, that requires you to provide meaningful, reproducible
>     counter-examples, rather than a stenographic formulation that might
>     hint some problem you apparently see (which I believe is just not
>     there).
> 
> 
> Well, here's another one: PEP 383 would disallow UTF-8 encodings of half 
> surrogates. 

By my reading, the current Unicode 5.1 definition of 'UTF-8' disallows that.

> But such encodings are currently supported by Python, and 
> they are used as part of CESU-8 coding.  That's, in fact, a common way 
> of converting UTF-16 to UTF-8.  How are you going to deal with existing 
> code that relies on being able to code half surrogates as UTF-8?

From v+python at g.nevcal.com  Wed Apr 29 23:09:26 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 29 Apr 2009 14:09:26 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F8B886.5020700@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com>
	<49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com>
	<49F801C1.2070109@v.loewis.de> <49F82435.3060205@g.nevcal.com>
	<49F8B886.5020700@v.loewis.de>
Message-ID: <49F8C206.5070801@g.nevcal.com>

On approximately 4/29/2009 1:28 PM, came the following characters from 
the keyboard of Martin v. L?wis:
>>>>>>>> C. File on disk with the invalid surrogate code, accessed via the
>>>>>>>> str interface, no decoding happens, matches in memory the file on disk
>>>>>>>> with the byte that translates to the same surrogate, accessed via the
>>>>>>>> bytes interface.  Ambiguity.
>>> What does that mean? What specific interface are you referring to to
>>> obtain file names? 
>> os.listdir("")
>>
>> os.listdir(b"")
>>
>> So I guess I'd better suggest that a specific, equivalent directory name
>> be passed in either bytes or str form.
> 
> [Leaving the issue of the empty string apparently having different
> meanings aside ...]
> 
> Ok. Now I understand the example. So you do
> 
> os.listdir("c:/tmp")
> os.listdir(b"c:/tmp")
> 
> and you have a file in c:/tmp that is named "abc\uDC10".
> 
>> So what you are saying here is that Python doesn't use the "A" forms of
>> the Windows APIs for filenames, but only the "W" forms, and uses lossy
>> decoding (from MS) to the current code page (which can never be UTF-8 on
>> Windows).
> 
> Actually, it does use the A form, in the second listdir example. This,
> in turn (inside Windows), uses the lossy CP_ACP encoding. You get back
> a byte string; the listdirs should give
> 
> ["abc\uDC10"]
> [b"abc?"]
> 
> (not quite sure about the second - I only guess that CP_ACP will replace
> the half surrogate with a question mark).
> 
> So where is the ambiguity here?

None.  But not everyone can read all the Python source code to try to 
understand it; they expect the documentation to help them avoid that. 
Because the documentation is lacking in this area, it makes your 
concisely stated PEP rather hard to understand.

Thanks for clarifying the Windows behavior, here.  A little more 
clarification in the PEP could have avoided lots of discussion.  It 
would seem that a PEP, proposed to modify a poorly documented (and 
therefore likely poorly understood) area, should be educational about 
the status quo, as well as presenting the suggested change.  Or is it 
the Python philosophy that the PEPs should be as incomprehensible as 
possible, to generate large discussions?

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at v.loewis.de  Wed Apr 29 23:17:32 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 23:17:32 +0200
Subject: [Python-Dev] string to float containing whitespace
In-Reply-To: <20090429143053.3AE1D10276C9@montanaro.dyndns.org>
References: <20090429143053.3AE1D10276C9@montanaro.dyndns.org>
Message-ID: <49F8C3EC.8020001@v.loewis.de>

skip at pobox.com wrote:
> Someone please tell me I'm not going mad.  I could have sworn that once upon
> a time attempting to convert numeric strings to ints or floats if they
> contained whitespace raised an exception.  As far back as 1.5.2 it appears
> that float(), string.atof() and string.atoi() allow whitespace.  Maybe I'm
> thinking of trailing non-numeric, non-whitespace characters.

Maybe you remember truly *embedded* whitespace:

py> float("1. 3")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for float(): 1. 3

Regards,
Martin

From martin at v.loewis.de  Wed Apr 29 23:19:25 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Apr 2009 23:19:25 +0200
Subject: [Python-Dev] a suggestion ... Re:  PEP 383 (again)
In-Reply-To: <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <49F8C45D.6060302@v.loewis.de>

> The whole purpose of PEP 383 is to send the exact same bytes that were
> read from the OS back to the OS => violating (2) (for whatever the
> apparent system file-encoding is, not limited to UTF-8), and that has
> overwhelmingly popular support.
> 
> Note that this won't happen automatically, either, AIUI.  The PEP's
> proposed implementation is as an error handler, and this would need to
> be specified explicitly.  It's not intended to be the default.

Actually, no: the error handler will be automatically used in all places
that convert file names to bytes. I have clarified the PEP to make that
explicit. IOW, it replaces the current "strict" setting in these cases.

Regards,
Martin

From cs at zip.com.au  Wed Apr 29 23:49:31 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Thu, 30 Apr 2009 07:49:31 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <gtafb2$ppb$2@ger.gmane.org>
Message-ID: <20090429214931.GA3303@cskk.homeip.net>

On 29Apr2009 17:03, Terry Reedy <tjreedy at udel.edu> wrote:
> Thomas Breuel wrote:
>>     Sure. However, that requires you to provide meaningful, reproducible
>>     counter-examples, rather than a stenographic formulation that might
>>     hint some problem you apparently see (which I believe is just not
>>     there).
>>
>> Well, here's another one: PEP 383 would disallow UTF-8 encodings of 
>> half surrogates. 
>
> By my reading, the current Unicode 5.1 definition of 'UTF-8' disallows that.

5.0 also disallows it. No surprise I guess.
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Out on the road, feeling the breeze, passing the cars.  - Bob Seger

From v+python at g.nevcal.com  Thu Apr 30 00:17:42 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 29 Apr 2009 15:17:42 -0700
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F8B349.30901@v.loewis.de>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>	<20090428075806.GB23828@phd.pp.ru>	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>	<49F74EE5.6060305@v.loewis.de>	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>	<49F7613C.9000901@v.loewis.de>	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>	<49F7E964.9050700@v.loewis.de>	<7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com>
	<49F7FF03.2090909@v.loewis.de> <49F80691.80403@g.nevcal.com>
	<49F8B349.30901@v.loewis.de>
Message-ID: <49F8D206.2000104@g.nevcal.com>

On approximately 4/29/2009 1:06 PM, came the following characters from 
the keyboard of Martin v. L?wis:

 > Thanks, fixed.

Thanks for your fixes.  They are helpful.

> I'm at a loss how to make the text more clear than it already is. I'm
> really not good at writing long essays, with a lot of
> explanatory-but-non-normative text. I also think that explanations do
> not belong in the section titled specification, nor does a full
> description of the status quo belongs into the PEP at all. The reader
> should consult the current Python source code if in doubt what the
> status quo is.

The status quo is what justifies the existence of the PEP.  If the 
status quo were perfect, there would be no need for the PEP.

The status quo should be described in the Rationale.  Some of it is. 
The rest of it should be.  If there is a need for this PEP for POSIX, 
but not Windows, the reason why should be given (Para 2 in Rationale 
seems to try to describe that, but doesn't go far enough), and also the 
reason that cross-platform code can install this PEP's error handler on 
both platforms, yet it won't affect bytes interfaces on Windows.  These 
are two omissions that have both caused large amounts of discussion.

Attempting to understand the Python source code is a good thing, but 
there is a lot to understand, and few will achieve a full understanding.

>> The 4th paragraph is now confusing too... would it not be the decode
>> error handler that returns the byte strings, in addition to the Unicode
>> strings?
> 
> No, why do you think so? That's intended as stated.

Here, a use case, or several, in the PEP could help clarify why it would 
be the encode error handler that would return both the bytes string and 
the Unicode string.  And why the decode error handler would not need to.

Seems that if the decode handler preserved the bytes from the OS, and 
made them available as well as the decoded Unicode, that could be 
interesting to the application that is wanting to manipulate the file.

Seems that if the encode handler is given the Unicode, so not clear why 
it should also return it.  I guess if there is an error during the 
encode process (can there be?) then the bytes and Unicode for comparison 
could be useful for error reporting.

But I shouldn't have to guess.  The PEP should explain how these things 
are useful.  The discussion section could be extended with use cases for 
both the encode and decode cases.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From cs at zip.com.au  Thu Apr 30 00:45:32 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Thu, 30 Apr 2009 08:45:32 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <87d4avk3f9.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <20090429224532.GA11604@cskk.homeip.net>

On 29Apr2009 22:14, Stephen J. Turnbull <stephen at xemacs.org> wrote:
| Baptiste Carvello writes:
|  > By contrast, if the new utf-8b codec would *supercede* the old one,
|  > \udcxx would always mean raw bytes (at least on UCS-4 builds, where
|  > surrogates are unused). Thus ambiguity could be avoided.
| 
| Unfortunately, that's false.  It could have come from a literal string
| (similar to the text above ;-), a C extension, or a string slice (on
| 16-bit builds), and there may be other ways to do it.  The only way to
| avoid ambiguity is to change the definition of a Python string to be
| *valid* Unicode (possibly with Python extensions such as PEP 383 for
| internal use only).  But Guido has rejected that in the past;
| validation is the application's problem, not Python's.
| 
| Nor is a UCS-4 build exempt.  IIRC Guido specifically envisioned
| Python strings being used to build up code point sequences to be
| directly output, which means that a UCS-4 string might none-the-less
| contain surrogates being added to a string intended to be sent as
| UTF-16 output simply by truncating the 32-bit code units to 16 bits.

Wouldn't you then be bypassing the implicit encoding anyway, at least to
some extent, and thus not trip over the PEP?
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Clemson is the Harvard of cardboard packaging.
- overhead by WIRED at the Intelligent Printing conference Oct2006

From fuzzyman at voidspace.org.uk  Thu Apr 30 00:50:08 2009
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Wed, 29 Apr 2009 23:50:08 +0100
Subject: [Python-Dev] Proposed: add support for UNC paths to all
 functions in ntpath
In-Reply-To: <49F8B222.7070204@hastings.org>
References: <49F8B222.7070204@hastings.org>
Message-ID: <49F8D9A0.7000104@voidspace.org.uk>

Larry Hastings wrote:
>
> I've written a patch for Python 3.1 that changes os.path so it handles 
> UNC paths on Windows:
>
>   http://bugs.python.org/issue5799

+1 for the feature. I have to deal with Windows networks from time to 
time and this would be useful.

Michael

>
> In a Windows path string, a UNC path functions *exactly* like a drive
> letter.  This patch means that the Python path split/join functions
> treats them as if they were.
>
> For instance:
>    >>> splitdrive("A:\\FOO\\BAR.TXT")
>    ("A:", "\\FOO\\BAR.TXT")
>
> With this patch applied:
>    >>> splitdrive("\\\\HOSTNAME\\SHARE\\FOO\\BAR.TXT")
>    ("\\\\HOSTNAME\\SHARE", "\\FOO\\BAR.TXT")
>
> This methodology only breaks down in one place: there is no "default
> directory" for a UNC share point.  E.g. you can say
>    >>> os.chdir("c:")
> or
>    >>> os.chdir("c:foo\\bar")
> but you can't say
>    >>> os.chdir("\\\\hostname\\share")
> But this is irrelevant to the patch.
>
> Here's what my patch changes:
> * Modify join, split, splitdrive, and ismount to add explicit support
>  for UNC paths.  (The other functions pick up support from these four.)
> * Simplify isabs and normpath, now that they don't need to be delicate
>  about UNC paths.
> * Modify existing unit tests and add new ones.
> * Document the changes to the API.
> * Deprecate splitunc, with a warning and a documentation remark.
>
> This patch adds one subtle change I hadn't expected.  If you call
> split() with a drive letter followed by a trailing slash, it returns the
> trailing slash as part of the "head" returned.  E.g.
>    >>> os.path.split("\\")
>    ("\\", "")
>    >>> os.path.split("A:\\")
>    ("A:\\", "")
> This is mentioned in the documentation, as follows:
>    Trailing slashes are stripped from head unless it is the root
>    (one or more slashes only).
>
> For some reason, when os.path.split was called with a UNC path with only
> a trailing slash, it stripped the trailing slash:
>    >>> os.path.split("\\\\hostname\\share\\")
>    ("\\\\hostname\\share", "")
> My patch changes this behavior; you would now see:
>    >>> os.path.split("\\\\hostname\\share\\")
>    ("\\\\hostname\\share\\", "")
> I think it's an improvement--this is more consistent.  Note that this
> does *not* break the documented requirement that
> os.path.join(os.path.split(path)) == path; that continues to work fine.
>
>
> In the interests of full disclosure: I submitted a patch providing this
> exact behavior just over ten years ago.  GvR accepted it into Python
> 1.5.2b2 (marked "*EXPERIMENTAL*") and removed it from 1.5.2c1.
>
> You can read GvR's commentary upon removing it; see comments in
> Misc/HISTORY <http://svn.python.org/view/python/trunk/Misc/HISTORY> 
> dated "Tue Apr  6 19:38:18 1999".  If memory serves
> correctly, the problems cited were only on Cygwin.  At the time Cygwin
> used "ntpath", and it supported "//a/foo" as an alias for "A:\\FOO". 
> You can see how this would cause Cygwin problems.
>
> In the intervening decade, two highly relevant things have happened:
> * Python no longer uses ntpath for os.path on Cygwin.  Instead it uses
>  posixpath.
> * Cygwin removed the "//a/foo" drive letter hack.  In fact, I believe it
>  now support UNC paths.
> Therefore this patch will have no effect on Cygwin users.
>
>
> What do you think?
>
>
> /larry/
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk 
>

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

From barry at barrys-emacs.org  Thu Apr 30 00:41:16 2009
From: barry at barrys-emacs.org (Barry Scott)
Date: Wed, 29 Apr 2009 23:41:16 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49EEBE2E.3090601@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
Message-ID: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>

On 22 Apr 2009, at 07:50, Martin v. L?wis wrote:

>
> If the locale's encoding is UTF-8, the file system encoding is set to
> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
>

Forgive me if this has been covered. I've been reading this thread for  
a long time
and still have a 100 odd replies to go...

How do get a printable unicode version of these path strings if they  
contain
none unicode data?

I'm guessing that an app has to understand that filenames come in two  
forms
unicode and bytes if its not utf-8 data. Why not simply return string  
if its valid
utf-8 otherwise return bytes? Then in the app you check for the type  
for the object,
string or byte and deal with reporting errors appropriately.

Barry

From eric at trueblade.com  Thu Apr 30 00:59:25 2009
From: eric at trueblade.com (Eric Smith)
Date: Wed, 29 Apr 2009 18:59:25 -0400
Subject: [Python-Dev] Proposed: add support for UNC paths to all
 functions in ntpath
In-Reply-To: <49F8D9A0.7000104@voidspace.org.uk>
References: <49F8B222.7070204@hastings.org> <49F8D9A0.7000104@voidspace.org.uk>
Message-ID: <49F8DBCD.6050504@trueblade.com>

Michael Foord wrote:
> Larry Hastings wrote:
>>
>> I've written a patch for Python 3.1 that changes os.path so it handles 
>> UNC paths on Windows:
>>
>>   http://bugs.python.org/issue5799
> 
> +1 for the feature. I have to deal with Windows networks from time to 
> time and this would be useful.

+1 from me, too. I haven't looked at the implementation, but for sure 
the feature would be welcome.

>> In the interests of full disclosure: I submitted a patch providing this
>> exact behavior just over ten years ago.  GvR accepted it into Python
>> 1.5.2b2 (marked "*EXPERIMENTAL*") and removed it from 1.5.2c1.

>> In the intervening decade, two highly relevant things have happened:
>> * Python no longer uses ntpath for os.path on Cygwin.  Instead it uses
>>  posixpath.
>> * Cygwin removed the "//a/foo" drive letter hack.  In fact, I believe it
>>  now support UNC paths.
>> Therefore this patch will have no effect on Cygwin users.

Yes, cygwin supports UNC paths with //host/share, and they use 
/cygdrive/a, etc., to refer to physical drives. It's been that way for 
as long as I recall, at least 7 years.

From cs at zip.com.au  Thu Apr 30 01:28:52 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Thu, 30 Apr 2009 09:28:52 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>
Message-ID: <20090429232852.GA26172@cskk.homeip.net>

On 29Apr2009 23:41, Barry Scott <barry at barrys-emacs.org> wrote:
> On 22 Apr 2009, at 07:50, Martin v. L?wis wrote:
>> If the locale's encoding is UTF-8, the file system encoding is set to
>> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
>> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
>
> Forgive me if this has been covered. I've been reading this thread for a 
> long time and still have a 100 odd replies to go...
>
> How do get a printable unicode version of these path strings if they  
> contain none unicode data?

Personally, I'd use repr(). One might ask, what would you expect to see
if you were printing such a string?

> I'm guessing that an app has to understand that filenames come in two  
> forms unicode and bytes if its not utf-8 data. Why not simply return string if 
> its valid utf-8 otherwise return bytes? Then in the app you check for the type for 
> the object, string or byte and deal with reporting errors appropriately.

Because it complicates the app enormously, for every app.

It would be _nice_ to just call os.listdir() et al with strings, get
strings, and not worry.

With strings becoming unicode in Python3, on POSIX you have an issue of
deciding how to get its filenames-are-bytes into a string and the
reverse. One could naively map the byte values to the same Unicode code
points, but that results in strings that do not contain the same
characters as the user/app expects for byte values above 127.

Since POSIX does not really have a filesystem level character encoding,
just a user environment setting that says how the current user encodes
characters into bytes (UTF-8 is increasingly common and useful, but
it is not universal), it is more useful to decode filenames on the
assumption that they represent characters in the user's (current) encoding
convention; that way when things are displayed they are meaningful,
and they interoperate well with strings made by the user/app. If all
the filenames were actually encoded that way when made, that works. But
different users may adopt different conventions, and indeed a user may
have used ACII or and ISO8859-* coding in the past and be transitioning
to something else now, so they will have a bunch of files in different
encodings.

The PEP uses the user's current encoding with a handler for byte
sequences that don't decode to valid Unicode scaler values in
a fashion that is reversible. That is, you get "strings" out of
listdir() and those strings will go back in (eg to open()) perfectly
robustly.

Previous approaches would either silently hide non-decodable names in
listdir() results or throw exceptions when the decode failed or mangle
things no reversably. I believe Python3 went with the first option
there.

The PEP at least lets programs naively access all files that exist,
and create a filename from any well-formed unicode string provided that
the filesystem encoding permits the name to be encoded.

The lengthy discussion mostly revolves around:

  - Glenn points out that strings that came _not_ from listdir, and that are
    _not_ well-formed unicode (== "have bare surrogates in them") but that
    were intended for use as filenames will conflict with the PEP's scheme -
    programs must know that these strings came from outside and must be
    translated into the PEP's funny-encoding before use in the os.*
    functions. Previous to the PEP they would get used directly and
    encode differently after the PEP, thus producing different POSIX
    filenames. Breakage.

  - Glenn would like the encoding to use Unicode scalar values only,
    using a rare-in-filenames character.
    That would avoid the issue with "outside' strings that contain
    surrogates. To my mind it just moves the punning from rare illegal
    strings to merely uncommon but legal characters.

  - Some parties think it would be better to not return strings from
    os.listdir but a subclass of string (or at least a duck-type of
    string) that knows where it came from and is also handily
    recognisable as not-really-a-string for purposes of deciding
    whether is it PEP-funny-encoded by direct inspection.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

The peever can look at the best day in his life and sneer at it.
        - Jim Hill, JennyGfest '95

From aahz at pythoncraft.com  Thu Apr 30 04:50:50 2009
From: aahz at pythoncraft.com (Aahz)
Date: Wed, 29 Apr 2009 19:50:50 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <20090429232852.GA26172@cskk.homeip.net>
References: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>
	<20090429232852.GA26172@cskk.homeip.net>
Message-ID: <20090430025050.GB1544@panix.com>

On Thu, Apr 30, 2009, Cameron Simpson wrote:
>
> The lengthy discussion mostly revolves around:
> 
>   - Glenn points out that strings that came _not_ from listdir, and that are
>     _not_ well-formed unicode (== "have bare surrogates in them") but that
>     were intended for use as filenames will conflict with the PEP's scheme -
>     programs must know that these strings came from outside and must be
>     translated into the PEP's funny-encoding before use in the os.*
>     functions. Previous to the PEP they would get used directly and
>     encode differently after the PEP, thus producing different POSIX
>     filenames. Breakage.
> 
>   - Glenn would like the encoding to use Unicode scalar values only,
>     using a rare-in-filenames character.
>     That would avoid the issue with "outside' strings that contain
>     surrogates. To my mind it just moves the punning from rare illegal
>     strings to merely uncommon but legal characters.
> 
>   - Some parties think it would be better to not return strings from
>     os.listdir but a subclass of string (or at least a duck-type of
>     string) that knows where it came from and is also handily
>     recognisable as not-really-a-string for purposes of deciding
>     whether is it PEP-funny-encoded by direct inspection.

Assuming people agree that this is an accurate summary, it should be
incorporated into the PEP.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From aahz at pythoncraft.com  Thu Apr 30 05:16:29 2009
From: aahz at pythoncraft.com (Aahz)
Date: Wed, 29 Apr 2009 20:16:29 -0700
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F8B349.30901@v.loewis.de>
References: <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>
	<49F74EE5.6060305@v.loewis.de>
	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>
	<49F7613C.9000901@v.loewis.de>
	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>
	<49F7E964.9050700@v.loewis.de>
	<7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com>
	<49F7FF03.2090909@v.loewis.de> <49F80691.80403@g.nevcal.com>
	<49F8B349.30901@v.loewis.de>
Message-ID: <20090430031629.GB25125@panix.com>

On Wed, Apr 29, 2009, "Martin v. L?wis" wrote:
>
> I'm at a loss how to make the text more clear than it already is. I'm
> really not good at writing long essays, with a lot of
> explanatory-but-non-normative text. I also think that explanations do
> not belong in the section titled specification, nor does a full
> description of the status quo belongs into the PEP at all. The reader
> should consult the current Python source code if in doubt what the
> status quo is.

Perhaps not a full description of the status quo, but the PEP definitely
needs a good summary -- remember that PEPs are not just for the time that
they are written, but also for the future.  While telling people to "read
the source, Luke" makes some sense at a specific point in time, I don't
think that requiring a trawl through code history is fair.

And, yes, PEP-writing is painful.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From tmbdev at gmail.com  Thu Apr 30 05:16:20 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 05:16:20 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> 
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>

>
> The whole purpose of PEP 383 is to send the exact same bytes that were
> read from the OS back to the OS => violating (2) (for whatever the
> apparent system file-encoding is, not limited to UTF-8),

It's fine to read a file name from a file system and write the same file
back as the same raw byte sequence.  That I don't have a problem with; it's
not quite right, but it's harmless.

The problem with this PEP is that the malformed unicode it produces can end
up in so many other places: as file names on another file system, in string
processing libraries, in text files, in databases, in user interfaces,
etc.   Some of those destinations will use the utf-8b decoder, so they will
get byte sequences that never could occur before and that are illegal under
unicode.

Nobody knows what will happen.  And, yes, Martin is proposing that this is
the default behavior.

There are several other issues that are unresolved: utf-8b makes some
current practices illegal; for example, it might break CESU-8 encodings.
Also, what are Jython and IronPython supposed to do on UNIX?  Can they
implement these semantics at all?

> and that has overwhelmingly popular support.

I think people don't fully understand the tradeoffs.  I certainly don't.
Although there is a slight benefit, there are unknown and potentially large
costs. We'd be changing Python's entire unicode string behavior for the sake
of one use cases.  Since our uses of Python actually involve a lot of
unicode, I am wary of having malformed unicode crop up legally in Python
code.

And that's why I think this proposal should be shelved for a while until
people have had more time to try to understand the issues and also come up
with alternative proposals.  Once this is adopted and implemented in
C-Python, Python is stuck with it forever.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/20825f26/attachment.htm>

From curt at hagenlocher.org  Thu Apr 30 05:40:07 2009
From: curt at hagenlocher.org (Curt Hagenlocher)
Date: Wed, 29 Apr 2009 20:40:07 -0700
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
Message-ID: <d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>

On Wed, Apr 29, 2009 at 8:16 PM, Thomas Breuel <tmbdev at gmail.com> wrote:
>
> Also, what are Jython and IronPython supposed to do on UNIX?? Can they
> implement these semantics at all?

IronPython will inherit whatever behavior Mono has implemented. The
Microsoft CLR defines the native string type as UTF-16 and all of the
managed APIs for things like file names and environmental variables
operate on UTF-16 strings -- there simply are no byte string APIs.

I assume that Mono does the same but I don't have any Mono experience.

--
Curt Hagenlocher
curt at hagenlocher.org

From steve at pearwood.info  Thu Apr 30 05:45:52 2009
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 30 Apr 2009 13:45:52 +1000
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
Message-ID: <200904301345.52776.steve@pearwood.info>

On Thu, 30 Apr 2009 01:16:20 pm Thomas Breuel wrote:
> And that's why I think this proposal should be shelved for a while
> until people have had more time to try to understand the issues and
> also come up with alternative proposals. ?Once this is adopted and
> implemented in C-Python, Python is stuck with it forever.

+1 on this. I'm going to quote the Zen here:

Now is better than never.
Although never is often better than *right* now.

I don't understand the proposal and issues. I see a lot of people 
claiming that they do, and then spending all their time either 
talking past each other, or disagreeing. If everyone who claims they 
understand the issues actually does, why is it so hard to reach a 
consensus?

I'd like to see some real examples of how things can break in the 
current system, and I'd like any potential solution to be made 
available as a third-party package before it goes into the standard 
library (if possible). Currently, we're reduced to trying to predict 
the consequences of implementing the PEP, instead of being able to 
try it out and see.

Even something like a test suite would be useful: here are a bunch of 
malformed file names, and this is what happens when you try to work 
with them. Please, let's see some code we can run, not more words.

-- 
Steven D'Aprano 

From tjreedy at udel.edu  Thu Apr 30 05:46:44 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 29 Apr 2009 23:46:44 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F8C206.5070801@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>	<49F73635.6010105@v.loewis.de>
	<49F74F85.9010800@g.nevcal.com>	<49F76623.8060903@v.loewis.de>
	<49F768F3.8080304@g.nevcal.com>	<49F76F03.8040702@v.loewis.de>
	<49F788A6.3040702@g.nevcal.com>	<49F7EB17.4010309@v.loewis.de>
	<49F7F99D.8070606@g.nevcal.com>	<49F801C1.2070109@v.loewis.de>
	<49F82435.3060205@g.nevcal.com>	<49F8B886.5020700@v.loewis.de>
	<49F8C206.5070801@g.nevcal.com>
Message-ID: <gtb6v5$gu2$1@ger.gmane.org>

Glenn Linderman wrote:
> On approximately 4/29/2009 1:28 PM, came the following characters from 

>> So where is the ambiguity here?
> 
> None.  But not everyone can read all the Python source code to try to 
> understand it; they expect the documentation to help them avoid that. 
> Because the documentation is lacking in this area, it makes your 
> concisely stated PEP rather hard to understand.

If you think a section of the doc is grossly inadequate, and there is no 
existing issue on the tracker, feel free to add one.

> Thanks for clarifying the Windows behavior, here.  A little more 
> clarification in the PEP could have avoided lots of discussion.  It 
> would seem that a PEP, proposed to modify a poorly documented (and 
> therefore likely poorly understood) area, should be educational about 
> the status quo, as well as presenting the suggested change.

Where the PEP proposes to change, it should start with the status quo. 
But Martin's somewhat reasonable position is that since he is not 
proposing to change behavior on Windows, it is not his responsibility to 
document what he is not proposing to change more adequately.  This 
means, of course, that any observed change on Windows would then be a 
bug, or at least a break of the promise.  On the other hand, I can see 
that this is enough related to what he is proposing to change that 
better doc would help.

tjr

From martin at v.loewis.de  Thu Apr 30 06:48:24 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 06:48:24 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <49F8D206.2000104@g.nevcal.com>
References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com>	<20090428075806.GB23828@phd.pp.ru>	<7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com>	<87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>	<49F74EE5.6060305@v.loewis.de>	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>	<49F7613C.9000901@v.loewis.de>	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>	<49F7E964.9050700@v.loewis.de>	<7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com>
	<49F7FF03.2090909@v.loewis.de> <49F80691.80403@g.nevcal.com>
	<49F8B349.30901@v.loewis.de> <49F8D206.2000104@g.nevcal.com>
Message-ID: <49F92D98.3020403@v.loewis.de>

> But I shouldn't have to guess.  The PEP should explain how these things
> are useful.  The discussion section could be extended with use cases for
> both the encode and decode cases.

See PEP 293.

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 06:52:18 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 06:52:18 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>
References: <49EEBE2E.3090601@v.loewis.de>
	<83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>
Message-ID: <49F92E82.9040702@v.loewis.de>

> How do get a printable unicode version of these path strings if they
> contain none unicode data?

Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.

> I'm guessing that an app has to understand that filenames come in two forms
> unicode and bytes if its not utf-8 data. Why not simply return string if
> its valid utf-8 otherwise return bytes?

That would have been an alternative solution, and the one that 2.x uses
for listdir. People didn't like it.

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 07:17:38 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 30 Apr 2009 07:17:38 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <200904301345.52776.steve@pearwood.info>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<200904301345.52776.steve@pearwood.info>
Message-ID: <49F93472.5010509@v.loewis.de>

> I don't understand the proposal and issues. I see a lot of people 
> claiming that they do, and then spending all their time either 
> talking past each other, or disagreeing. If everyone who claims they 
> understand the issues actually does, why is it so hard to reach a 
> consensus?

Because the problem is difficult, and any solution has trade-offs.
People disagree on which trade-offs are worse than others.

> I'd like to see some real examples of how things can break in the 
> current system

Suppose I create a new directory, and run the following script
in 3.x:

py> open("x","w").close()
py> open(b"\xff","w").close()
py> os.listdir(".")
['x']

If I quit Python, I can now do

martin at mira:~/work/3k/t$ ls
?  x
martin at mira:~/work/3k/t$ ls -b
\377  x

As you can see, there are two files in the current directory, but
only one of them is reported by os.listdir. The same happens to
command line arguments and environment variables: Python might swallow
some of them.

> and I'd like any potential solution to be made 
> available as a third-party package before it goes into the standard 
> library (if possible).

Unfortunately, at least for my solution, this isn't possible. I need
to change the implementation of the existing file IO APIs.

> Currently, we're reduced to trying to predict 
> the consequences of implementing the PEP, instead of being able to 
> try it out and see.

In a sense, this is one of the primary points of the PEP process:
to discuss a specification before the effort to produce an
implementation is started.

> Even something like a test suite would be useful: here are a bunch of 
> malformed file names, and this is what happens when you try to work 
> with them. Please, let's see some code we can run, not more words.

Just try my example above, on a Linux system, in a UTF-8 locale.

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 07:24:34 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 07:24:34 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
Message-ID: <49F93612.4080100@v.loewis.de>

Curt Hagenlocher wrote:
> On Wed, Apr 29, 2009 at 8:16 PM, Thomas Breuel <tmbdev at gmail.com> wrote:
>> Also, what are Jython and IronPython supposed to do on UNIX?  Can they
>> implement these semantics at all?
> 
> IronPython will inherit whatever behavior Mono has implemented. The
> Microsoft CLR defines the native string type as UTF-16 and all of the
> managed APIs for things like file names and environmental variables
> operate on UTF-16 strings -- there simply are no byte string APIs.
> 
> I assume that Mono does the same but I don't have any Mono experience.

Marcin Kowalczyk once did a review, at

http://mail.python.org/pipermail/python-3000/2007-September/010450.html

It may have changed since then; at the time, Mono would omit
non-decodable files in directory listings, and would refuse to start
if a non-decodable command line argument is passed. The environment
variable MONO_EXTERNAL_ENCODINGS can be set to specify what
encodings should be tried in what order.

However, I don't think it is relevant for the PEP: as Curt says, these
details will be inherited from the VM; the mechanism proposed is really
specific to CPython. To implement it on the other VMs, those would have
to either implement it natively, or provide byte-oriented APIs to allow
Jython/IronPython to implement it on top of it (the latter being not
realistic or useful).

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 07:29:53 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 07:29:53 +0200
Subject: [Python-Dev] PEP 383 (again)
In-Reply-To: <20090430031629.GB25125@panix.com>
References: <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com>	<49F74EE5.6060305@v.loewis.de>	<7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com>	<49F7613C.9000901@v.loewis.de>	<7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com>	<49F7E964.9050700@v.loewis.de>	<7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com>	<49F7FF03.2090909@v.loewis.de>
	<49F80691.80403@g.nevcal.com>	<49F8B349.30901@v.loewis.de>
	<20090430031629.GB25125@panix.com>
Message-ID: <49F93751.7060701@v.loewis.de>

> Perhaps not a full description of the status quo, but the PEP definitely
> needs a good summary

I completely agree, and believe that the PEP *does* have a good
summary - it has both an abstract, and a rationale, and both say
exactly what I want them to say. If people want them to say different
things, they have to tell me what specifically they want it to say
(perhaps even with specific formulations). If they can't communicate
their requests to me, I can't comply.

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 07:42:12 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 07:42:12 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F8C206.5070801@g.nevcal.com>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>
	<49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com>
	<49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com>
	<49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com>
	<49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com>
	<49F801C1.2070109@v.loewis.de> <49F82435.3060205@g.nevcal.com>
	<49F8B886.5020700@v.loewis.de> <49F8C206.5070801@g.nevcal.com>
Message-ID: <49F93A34.4050904@v.loewis.de>

> Thanks for clarifying the Windows behavior, here.  A little more
> clarification in the PEP could have avoided lots of discussion.  It
> would seem that a PEP, proposed to modify a poorly documented (and
> therefore likely poorly understood) area, should be educational about
> the status quo, as well as presenting the suggested change.  Or is it
> the Python philosophy that the PEPs should be as incomprehensible as
> possible, to generate large discussions?

Certainly not. See PEP 277 for a description of a specification of
how file names are handled on Windows.

Large discussions could be reduced if readers would try to
constructively comment on the PEP, rather than making counter-proposals,
or making statements about the PEP without making their implied
assumptions explicit.

Regards,
Martin

From asmodai at in-nomine.org  Thu Apr 30 08:13:09 2009
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Thu, 30 Apr 2009 08:13:09 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F93472.5010509@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<200904301345.52776.steve@pearwood.info>
	<49F93472.5010509@v.loewis.de>
Message-ID: <20090430061309.GH9749@nexus.in-nomine.org>

-On [20090430 07:18], "Martin v. L?wis" (martin at v.loewis.de) wrote:
>Suppose I create a new directory, and run the following script
>in 3.x:
>
>py> open("x","w").close()
>py> open(b"\xff","w").close()
>py> os.listdir(".")
>['x']

That is actually a regression in 3.x:

Python 2.6.1 (r261:67515, Mar  8 2009, 11:36:21) 
>>> import os
>>> open("x","w").close()
>>> open(b"\xff","w").close()
>>> os.listdir(".")
['x', '\xff']

[Apologies if that was completely clear through the entire discussion, but
I've lost track at a given point.]

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Heart is the engine of your body, but Mind is the engine of Life...

From tmbdev at gmail.com  Thu Apr 30 08:28:28 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 08:28:28 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <gtafb2$ppb$2@ger.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de> <49F76623.8060903@v.loewis.de> 
	<49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> 
	<49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> 
	<49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> 
	<7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com> 
	<gtafb2$ppb$2@ger.gmane.org>
Message-ID: <7e51d15d0904292328g3f97b19ele58e76a8b82c9d80@mail.gmail.com>

On Wed, Apr 29, 2009 at 23:03, Terry Reedy <tjreedy at udel.edu> wrote:

> Thomas Breuel wrote:
>
>>
>>    Sure. However, that requires you to provide meaningful, reproducible
>>    counter-examples, rather than a stenographic formulation that might
>>    hint some problem you apparently see (which I believe is just not
>>    there).
>>
>>
>> Well, here's another one: PEP 383 would disallow UTF-8 encodings of half
>> surrogates.
>>
>
> By my reading, the current Unicode 5.1 definition of 'UTF-8' disallows
> that.

If we use conformance to Unicode 5.1 as the basis for our discussion, then
PEP 383 is off the table anyway.  I'm all for strict Unicode compliance.
But apparently, the Python community doesn't care.

CESU-8 is described in Unicode Technical Report #26, so it at least has some
official recognition.  More importantly, it's also widely used.  So, my
question: what are the implications of PEP 383 for CESU-8 encodings on
Python?

My meta-point is: there are probably many more such issues hidden away and
it is a really bad idea to rush something like PEP 383 out.  Unicode is hard
anyway, and tinkering with its semantics requires a lot of thought.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/754ce06b/attachment-0001.htm>

From tmbdev at gmail.com  Thu Apr 30 08:32:51 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 08:32:51 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> 
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> 
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
Message-ID: <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>

On Thu, Apr 30, 2009 at 05:40, Curt Hagenlocher <curt at hagenlocher.org>wrote:

>  IronPython will inherit whatever behavior Mono has implemented. The
> Microsoft CLR defines the native string type as UTF-16 and all of the
> managed APIs for things like file names and environmental variables
> operate on UTF-16 strings -- there simply are no byte string APIs.

Yes.  Now think about the implications.  This means that adopting PEP 383
will make IronPython and Jython running on UNIX intrinsically incompatible
with CPython running on UNIX, and there's no way to fix that.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/da8fd21b/attachment.htm>

From martin at v.loewis.de  Thu Apr 30 08:41:28 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 30 Apr 2009 08:41:28 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <20090430061309.GH9749@nexus.in-nomine.org>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>	<200904301345.52776.steve@pearwood.info>	<49F93472.5010509@v.loewis.de>
	<20090430061309.GH9749@nexus.in-nomine.org>
Message-ID: <49F94818.7060701@v.loewis.de>

Jeroen Ruigrok van der Werven wrote:
> -On [20090430 07:18], "Martin v. L?wis" (martin at v.loewis.de) wrote:
>> Suppose I create a new directory, and run the following script
>> in 3.x:
>>
>> py> open("x","w").close()
>> py> open(b"\xff","w").close()
>> py> os.listdir(".")
>> ['x']
> 
> That is actually a regression in 3.x:

Correct - and precisely the issue that this PEP wants to address.

For comparison, do os.listdir(u"."), though:

py> os.listdir(u".")
[u'x', '\xff']

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 08:42:21 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 08:42:21 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
Message-ID: <49F9484D.9010507@v.loewis.de>

Thomas Breuel wrote:
> On Thu, Apr 30, 2009 at 05:40, Curt Hagenlocher <curt at hagenlocher.org
> <mailto:curt at hagenlocher.org>> wrote:
> 
>     IronPython will inherit whatever behavior Mono has implemented. The
>     Microsoft CLR defines the native string type as UTF-16 and all of the
>     managed APIs for things like file names and environmental variables
>     operate on UTF-16 strings -- there simply are no byte string APIs.
> 
> 
> Yes.  Now think about the implications.  This means that adopting PEP
> 383 will make IronPython and Jython running on UNIX intrinsically
> incompatible with CPython running on UNIX, and there's no way to fix that. 

*Not* adapting the PEP will also make CPython and IronPython
incompatible, and there's no way to fix that.

Regards,
Martin

From tmbdev at gmail.com  Thu Apr 30 09:21:54 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 09:21:54 +0200
Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again)
Message-ID: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com>

Given the stated rationale of PEP 383, I was wondering what Windows actually
does.  So, I created some ISO8859-15 and ISO8859-8 encoded file names on a
device, plugged them into my Windows Vista machine, and fired up Python 3.0.

First, os.listdir("f:") returns a list of strings for those file names...
but those unicode strings are illegal.

You can't even print them without getting an error from Python.  In fact,
you also can't print strings containing the proposed half-surrogate
encodings either: in both cases, the output encoder rejects them with a
UnicodeEncodeError.   (If not even Python, with its generally lenient
attitude, can print those things, some other libraries probably will fail,
too.)

What about round tripping? So, if you take a malformed file name from an
external device (say, because it was actually encoded iso8859-15 or East
Asian) and write it to an NTFS directory, it seems to write malformed UTF-16
file names.  In essence, Windows doesn't really use unicode, it just
implements 16bit raw character strings, just like UNIX historically
implements raw 8bit character strings.

Then I tried the same thing on my Ubuntu 9.04 machine.    It turns out that,
unlike Windows, Linux is seems to be moving to consistent use of valid
UTF-8.  If you plug in an external device and nothing else is known about
it, it gets mounted with the utf8 option and the kernel actually seems to
enforce UTF-8 encoding.   I think this calls into question the rationale
behind PEP 383, and we should first look into what the roadmap for
UNIX/Linux and UTF-8 actually is.  UNIX may have consistent unicode support
(via UTF-8) before Windows.

As I was saying, I think PEP 383 needs a lot more thought and research...

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/72448a65/attachment.htm>

From tmbdev at gmail.com  Thu Apr 30 09:26:10 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 09:26:10 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F9484D.9010507@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> 
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> 
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com> 
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> 
	<49F9484D.9010507@v.loewis.de>
Message-ID: <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>

> > Yes.  Now think about the implications.  This means that adopting PEP
> > 383 will make IronPython and Jython running on UNIX intrinsically
> > incompatible with CPython running on UNIX, and there's no way to fix
> that.
>
> *Not* adapting the PEP will also make CPython and IronPython
> incompatible, and there's no way to fix that.
>

CPython and IronPython are incompatible.  And they will stay incompatible if
the PEP is adopted.

They would become compatible if CPython adopted Mono and/or Java semantics.

Since both have had to deal with this, have you looked at what they actually
do before proposing PEP 383?  What did you find?  Why did you choose an
incompatible approach for PEP 383?

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/7120f22c/attachment.htm>

From v+python at g.nevcal.com  Thu Apr 30 09:29:36 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Thu, 30 Apr 2009 00:29:36 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <gtb6v5$gu2$1@ger.gmane.org>
References: <49EEBE2E.3090601@v.loewis.de>		<79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com>		<87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp>		<79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com>		<87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T111118-510@post.gmane.org>		<87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp>		<loom.20090427T154536-926@post.gmane.org>		<p04330106c61b9f2aad0a@192.168.123.162>		<87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410904280700i83563f0qf61b8eb017575bdb@mail.gmail.com>	<49F73635.6010105@v.loewis.de>	<49F74F85.9010800@g.nevcal.com>	<49F76623.8060903@v.loewis.de>	<49F768F3.8080304@g.nevcal.com>	<49F76F03.8040702@v.loewis.de>	<49F788A6.3040702@g.nevcal.com>	<49F7EB17.4010309@v.loewis.de>	<49F7F99D.8070606@g.nevcal.com>	<49F801C1.2070109@v.loewis.de>	<49F82435.3060205@g.nevcal.com>	<49F8B886.5020700@v.loewis.de>	<49F8C206.5070801@g.nevcal.com>
	<gtb6v5$gu2$1@ger.gmane.org>
Message-ID: <49F95360.8070808@g.nevcal.com>

On approximately 4/29/2009 8:46 PM, came the following characters from 
the keyboard of Terry Reedy:
> Glenn Linderman wrote:
>> On approximately 4/29/2009 1:28 PM, came the following characters from 
> 
>>> So where is the ambiguity here?
>>
>> None.  But not everyone can read all the Python source code to try to 
>> understand it; they expect the documentation to help them avoid that. 
>> Because the documentation is lacking in this area, it makes your 
>> concisely stated PEP rather hard to understand.
> 
> If you think a section of the doc is grossly inadequate, and there is no 
> existing issue on the tracker, feel free to add one.
> 
>> Thanks for clarifying the Windows behavior, here.  A little more 
>> clarification in the PEP could have avoided lots of discussion.  It 
>> would seem that a PEP, proposed to modify a poorly documented (and 
>> therefore likely poorly understood) area, should be educational about 
>> the status quo, as well as presenting the suggested change.
> 
> Where the PEP proposes to change, it should start with the status quo. 
> But Martin's somewhat reasonable position is that since he is not 
> proposing to change behavior on Windows, it is not his responsibility to 
> document what he is not proposing to change more adequately.  This 
> means, of course, that any observed change on Windows would then be a 
> bug, or at least a break of the promise.  On the other hand, I can see 
> that this is enough related to what he is proposing to change that 
> better doc would help.

Yes; the very fact that the PEP discusses Windows, speaks about 
cross-platform code, and doesn't explicitly state that no Windows 
functionality will change, is confusing.

An example of how to initialize things within a sample cross-platform 
application might help, especially if that initialization only happens 
if the platform is POSIX, or is commented to the effect that it has no 
effect on Windows, but makes POSIX happy.  Or maybe it is all buried 
within the initialization of Python itself, and is not exposed to the 
application at all.  I still haven't figured that out, but was not (and 
am still not) as concerned about that as ensuring that the overall 
algorithms are functional and useful and user-friendly.  Showing it 
might have been helpful in making it clear that no Windows functionality 
would change, however.

A statement that additional features are being added to allow 
cross-platform programs deal with non-decodable bytes obtained from 
POSIX APIs using the same code that already works on Windows, would have 
made things much clearer.  The present Abstract does, in fact, talk only 
about POSIX, but later statements about Windows muddy the water.

Rationale paragraph 3, explicitly talks about cross-platform programs 
needing to work one way on Windows and another way on POSIX to deal with 
all the cases.  It calls that a proposal, which I guess it is for 
command line and environment, but it is already implemented in both 
bytes and str forms for file names... so that further muddies the water.

It is, of course, easier to point out deficiencies in a document than to 
write a better document; however, it is incumbent upon the PEP author to 
write a PEP that is good enough to get approved, and that means making 
it understandable enough that people are in favor... or to respond to 
the plethora of comments until people are in favor.  I'm not sure which 
one is more time-consuming.

I've reached the point, based on PEP and comment responses, where I now 
believe that the PEP is a solution to the problem it is trying to solve, 
and doesn't create ambiguities in the naming.  I don't believe it is the 
best solution.

The basic problem is the overuse of fake characters... normalizing them 
for display results is large data loss -- many characters would be 
translated to the same replacement characters.

Solutions exist that would allow the use of fewer different fake 
characters in the strings, while still having a fake character as the 
escape character, to preserve the invariant that all the strings 
manipulated by python-escape from the PEP were, and become, strings 
containing fake characters (from a strict Unicode perspective), which is 
a nice invariant*.  There even exist solutions that would use only one 
fake character (repeatedly if necessary), and all other characters 
generated would be displayable characters.  This would ease the burden 
on the program in displaying the strings, and also on the user that 
might view the resulting mojibake in trying to differentiate one such 
string from another.  Those are outlined in various emails in this 
thread, although some include my misconception that strings obtained via 
  Unicode-enabled OS APIs would also need to be encoded and altered.  If 
there is any interest in using a more readable encoding, I'd be glad to 
rework them to remove those misconceptions.

* It would be nice to point out that invariant in the PEP, also.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at g.nevcal.com  Thu Apr 30 09:38:29 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Thu, 30 Apr 2009 00:38:29 -0700
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F93472.5010509@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>	<200904301345.52776.steve@pearwood.info>
	<49F93472.5010509@v.loewis.de>
Message-ID: <49F95575.5010408@g.nevcal.com>

On approximately 4/29/2009 10:17 PM, came the following characters from 
the keyboard of Martin v. L?wis:
>> I don't understand the proposal and issues. I see a lot of people 
>> claiming that they do, and then spending all their time either 
>> talking past each other, or disagreeing. If everyone who claims they 
>> understand the issues actually does, why is it so hard to reach a 
>> consensus?
> 
> Because the problem is difficult, and any solution has trade-offs.
> People disagree on which trade-offs are worse than others.
> 
>> I'd like to see some real examples of how things can break in the 
>> current system
> 
> Suppose I create a new directory, and run the following script
> in 3.x:
> 
> py> open("x","w").close()
> py> open(b"\xff","w").close()
> py> os.listdir(".")
> ['x']

but...

py> os.listdir(b".")
['x', '\xff']

> If I quit Python, I can now do
> 
> martin at mira:~/work/3k/t$ ls
> ?  x
> martin at mira:~/work/3k/t$ ls -b
> \377  x
> 
> As you can see, there are two files in the current directory, but
> only one of them is reported by os.listdir. The same happens to
> command line arguments and environment variables: Python might swallow
> some of them.

There is presently no solution for command line and environment 
variables, I guess... which adds some amount of urgency to the 
implementation of _something_, even if not this PEP.

>> and I'd like any potential solution to be made 
>> available as a third-party package before it goes into the standard 
>> library (if possible).
> 
> Unfortunately, at least for my solution, this isn't possible. I need
> to change the implementation of the existing file IO APIs.

Other than initializing them to use UTF-8b instead of UTF-8, and to use 
the new python-escape handler?  I'm sure if I read the code for that, 
I'd be able to figure out the answer...  I don't find any documented way 
of adding an encoding/decoding handler to the file IO encoding 
technique, though which lends credence to your statement, but then that 
could also be an oversight on my part.

One could envision a staged implementation: the addition of the ability 
to add encoding/decoding handlers to the file IO encoding/decoding 
process, and the external selection of your new python-escape handler 
during application startup.  That way, the hooks would be in the file 
system to allow your solution to be used, but not require that it be 
used; competing solutions using similar technology could be implemented 
and evaluated.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at g.nevcal.com  Thu Apr 30 09:58:16 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Thu, 30 Apr 2009 00:58:16 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System	Character
 Interfaces
In-Reply-To: <20090430025050.GB1544@panix.com>
References: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>	<20090429232852.GA26172@cskk.homeip.net>
	<20090430025050.GB1544@panix.com>
Message-ID: <49F95A18.4040907@g.nevcal.com>

On approximately 4/29/2009 7:50 PM, came the following characters from 
the keyboard of Aahz:
> On Thu, Apr 30, 2009, Cameron Simpson wrote:
>> The lengthy discussion mostly revolves around:
>>
>>   - Glenn points out that strings that came _not_ from listdir, and that are
>>     _not_ well-formed unicode (== "have bare surrogates in them") but that
>>     were intended for use as filenames will conflict with the PEP's scheme -
>>     programs must know that these strings came from outside and must be
>>     translated into the PEP's funny-encoding before use in the os.*
>>     functions. Previous to the PEP they would get used directly and
>>     encode differently after the PEP, thus producing different POSIX
>>     filenames. Breakage.
>>
>>   - Glenn would like the encoding to use Unicode scalar values only,
>>     using a rare-in-filenames character.
>>     That would avoid the issue with "outside' strings that contain
>>     surrogates. To my mind it just moves the punning from rare illegal
>>     strings to merely uncommon but legal characters.
>>
>>   - Some parties think it would be better to not return strings from
>>     os.listdir but a subclass of string (or at least a duck-type of
>>     string) that knows where it came from and is also handily
>>     recognisable as not-really-a-string for purposes of deciding
>>     whether is it PEP-funny-encoded by direct inspection.
> 
> Assuming people agree that this is an accurate summary, it should be
> incorporated into the PEP.

I'll agree that once other misconceptions were explained away, that the 
remaining issues are those Cameron summarized.  Thanks for the summary!

Point two could be modified because I've changed my opinion; I like the 
invariant Cameron first (I think) explicitly stated about the PEP as it 
stands, and that I just reworded in another message, that the strings 
that are altered by the PEP in either direction are in the subset of 
strings that contain fake (from a strict Unicode viewpoint) characters. 
  I still think an encoding that uses mostly real characters that have 
assigned glyphs would be better than the encoding in the PEP; but would 
now suggest that an escape character be a fake character.

I'll note here that while the PEP encoding causes illegal bytes to be 
translated to one fake character, the 3-byte sequence that looks like 
the range of fake characters would also be translated to a sequence of 3 
fake characters.  This is 512 combinations that must be translated, and 
understood by the user (or at least by the programmer).  The "escape 
sequence" approach requires changing only 257 combinations, and each 
altered combination would result in exactly 2 characters.  Hence, this 
seems simpler to understand, and to manually encode and decode for 
debugging purposes.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at v.loewis.de  Thu Apr 30 10:21:39 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 10:21:39 +0200
Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383
	(again)
In-Reply-To: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com>
References: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com>
Message-ID: <49F95F93.5090204@v.loewis.de>

Thomas Breuel wrote:
> Given the stated rationale of PEP 383, I was wondering what Windows
> actually does.  So, I created some ISO8859-15 and ISO8859-8 encoded file
> names on a device, plugged them into my Windows Vista machine, and fired
> up Python 3.0.

How did you do that, and what were the specific names that you
had chosen? How does explorer display the file names?

> First, os.listdir("f:") returns a list of strings for those file
> names... but those unicode strings are illegal.

What was the exact result that you got?

> You can't even print them without getting an error from Python.

This is unrelated to the PEP. Try to run the same code in IDLE,
or use the ascii() function.

> What about round tripping? So, if you take a malformed file name from an
> external device (say, because it was actually encoded iso8859-15 or East
> Asian) and write it to an NTFS directory, it seems to write malformed
> UTF-16 file names.  In essence, Windows doesn't really use unicode, it
> just implements 16bit raw character strings, just like UNIX historically
> implements raw 8bit character strings.

I think you misinterpreted what you saw. To find out what way you
misinterpreted it, we would have to know what it is that you saw.

> I think this calls into
> question the rationale behind PEP 383, and we should first look into
> what the roadmap for UNIX/Linux and UTF-8 actually is.  UNIX may have
> consistent unicode support (via UTF-8) before Windows.

If so, PEP 383 won't hurt. If you never get decode errors for file
names, you can just ignore PEP 383. It's only for those of us who do
get decode errors.

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 10:25:57 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 10:25:57 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
Message-ID: <49F96095.4000208@v.loewis.de>

> CPython and IronPython are incompatible.  And they will stay
> incompatible if the PEP is adopted.
> 
> They would become compatible if CPython adopted Mono and/or Java
> semantics. 

Which one should it adopt? Mono semantics, or Java semantics?

> Since both have had to deal with this, have you looked at what they
> actually do before proposing PEP 383?  What did you find?  

See

http://mail.python.org/pipermail/python-3000/2007-September/010450.html

> Why did you choose an incompatible approach for PEP 383?

Because in Python, we want to be able to access all files on disk.
Neither Java nor Mono are capable of doing that.

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 10:48:27 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 10:48:27 +0200
Subject: [Python-Dev] PEP 383 and GUI libraries
Message-ID: <49F965DB.6050601@v.loewis.de>

I checked how GUI libraries deal with half surrogates.
In pygtk, a warning gets issued to the console

/tmp/helloworld.py:71: PangoWarning: Invalid UTF-8 string passed to
pango_layout_set_text()
  self.window.show()

and then the widget contains three crossed boxes.

wxpython (in its wxgtk version) behaves the same way.

PyQt displays a single square box.

Regards,
Martin

From v+python at g.nevcal.com  Thu Apr 30 10:55:12 2009
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Thu, 30 Apr 2009 01:55:12 -0700
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <49F965DB.6050601@v.loewis.de>
References: <49F965DB.6050601@v.loewis.de>
Message-ID: <49F96770.4080206@g.nevcal.com>

On approximately 4/30/2009 1:48 AM, came the following characters from 
the keyboard of Martin v. L?wis:
> I checked how GUI libraries deal with half surrogates.
> In pygtk, a warning gets issued to the console
> 
> /tmp/helloworld.py:71: PangoWarning: Invalid UTF-8 string passed to
> pango_layout_set_text()
>   self.window.show()
> 
> and then the widget contains three crossed boxes.
> 
> wxpython (in its wxgtk version) behaves the same way.
> 
> PyQt displays a single square box.

Interesting.

Did you use a name with other characters?  Were they displayed?  Both 
before and after the surrogates?

Did you use one or three half surrogates, to produce the three crossed 
boxes?

Did you use one or three half surrogates, to produce the single square box?

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at v.loewis.de  Thu Apr 30 11:08:08 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 11:08:08 +0200
Subject: [Python-Dev] PEP 382 update
Message-ID: <49F96A78.6060404@v.loewis.de>

Guido found out that I had misunderstood the existing
pkg mechanism: If a "zope" package is imported, and
it uses pkgutil.extend_path, then it won't glob for files
ending in .pkg, but instead searches the path for
files named zope.pkg.

IOW, this is unsuitable as a foundation of PEP 382. I have
now changed the PEP to call the files .pth, more in line
with how top-level .pth files work, and added a statement
that the import feature of .pth files is not provided for
package .pth files (use __init__.py instead).

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 11:12:32 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 11:12:32 +0200
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <49F96770.4080206@g.nevcal.com>
References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com>
Message-ID: <49F96B80.5090808@v.loewis.de>

> Did you use a name with other characters?  Were they displayed?  Both
> before and after the surrogates?

Yes, yes, and yes (IOW, I put the surrogate in the middle).

> Did you use one or three half surrogates, to produce the three crossed
> boxes?

Only one, and it produced three boxes - probably one for each UTF-8 byte
that pango considered invalid.

> Did you use one or three half surrogates, to produce the single square box?

Again, only one. Apparently, PyQt passes the Python Unicode string to Qt
in a character-by-character representation, rather than going through UTF-8.

Regards,
Martin

From tmbdev at gmail.com  Thu Apr 30 11:20:10 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 11:20:10 +0200
Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383
	(again)
In-Reply-To: <49F95F93.5090204@v.loewis.de>
References: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com> 
	<49F95F93.5090204@v.loewis.de>
Message-ID: <7e51d15d0904300220o1e8a78f9pc54bf3fe8148bd67@mail.gmail.com>

On Thu, Apr 30, 2009 at 10:21, "Martin v. L?wis" <martin at v.loewis.de> wrote:

> Thomas Breuel wrote:
> > Given the stated rationale of PEP 383, I was wondering what Windows
> > actually does.  So, I created some ISO8859-15 and ISO8859-8 encoded file
> > names on a device, plugged them into my Windows Vista machine, and fired
> > up Python 3.0.
>
> How did you do that, and what were the specific names that you
> had chosen?

There are several different ways I tried it.  The easiest was to mount a
vfat file system with various encodings on Linux and use the Python byte
interface to write file names, then plug that flash drive into Windows.

> I think you misinterpreted what you saw. To find out what way you
> misinterpreted it, we would have to know what it is that you saw.

I didn't interpret it much at all.  I'm just saying that the PEP 383
assumption that these problems can't occur on Windows isn't true.

I can plug in a flash drive with malformed strings, and somewhere between
the disk and Python, something maps those strings onto unicode in some way,
and it's done in a way that's different from PEP 383.  Mono and Java must
have their own solutions that are different from PEP 383.

My point remains that I think PEP 383 shouldn't be rushed through, and one
should look more carefully first at what the Windows kernel does in these
situations, and what Mono and Java do.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/b0c6e3e8/attachment.htm>

From martin at v.loewis.de  Thu Apr 30 11:24:43 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 11:24:43 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System	Character
 Interfaces
In-Reply-To: <20090430025050.GB1544@panix.com>
References: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>	<20090429232852.GA26172@cskk.homeip.net>
	<20090430025050.GB1544@panix.com>
Message-ID: <49F96E5B.5010107@v.loewis.de>

> Assuming people agree that this is an accurate summary, it should be
> incorporated into the PEP.

Done!

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 11:35:02 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 11:35:02 +0200
Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383
	(again)
In-Reply-To: <7e51d15d0904300220o1e8a78f9pc54bf3fe8148bd67@mail.gmail.com>
References: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com>
	<49F95F93.5090204@v.loewis.de>
	<7e51d15d0904300220o1e8a78f9pc54bf3fe8148bd67@mail.gmail.com>
Message-ID: <49F970C6.1000407@v.loewis.de>

> There are several different ways I tried it.  The easiest was to mount a
> vfat file system with various encodings on Linux and use the Python byte
> interface to write file names, then plug that flash drive into Windows.

So can you share precisely what you have done, to allow others to
reproduce it?

>     I think you misinterpreted what you saw. To find out what way you
>     misinterpreted it, we would have to know what it is that you saw.
> 
> 
> I didn't interpret it much at all.  I'm just saying that the PEP 383
> assumption that these problems can't occur on Windows isn't true.

What are "these problems", and where does PEP 383 say they can't occur
on Windows? What could Python do differently on Windows?

> I can plug in a flash drive with malformed strings, and somewhere
> between the disk and Python, something maps those strings onto unicode
> in some way, and it's done in a way that's different from PEP 383.

Of course it is. The Windows FAT driver has chosen some mapping for the
file names to Unicode, and most likely not the encoding that you meant
it to use.

There is now no way for a Win32 application to find out how the file
name is actually represented on disk, short of implementing the FAT
file system itself.

So what Python does is the best possible solution already - report the
file names as-is, with no interpretation.

> My point remains that I think PEP 383 shouldn't be rushed through, and
> one should look more carefully first at what the Windows kernel does in
> these situations, and what Mono and Java do.

These questions really have been studied on this list for the last eight
years, over and over again. It's not being rushed.

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 11:42:13 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 11:42:13 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>	<49F6A71A.3020809@v.loewis.de>
	<873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>
Message-ID: <49F97275.3010307@v.loewis.de>

> I think it has to be excluded from mapping in order to not introduce
> security issues.

I think you are right. I have now excluded ASCII bytes from being
mapped, effectively not supporting any encodings that are not ASCII
compatible. Does that sound ok?

Regards,
Martin

From tmbdev at gmail.com  Thu Apr 30 11:44:40 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 11:44:40 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F96095.4000208@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> 
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> 
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com> 
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> 
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> 
	<49F96095.4000208@v.loewis.de>
Message-ID: <7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com>

>
> > Since both have had to deal with this, have you looked at what they
> > actually do before proposing PEP 383?  What did you find?
>
> See
>
> http://mail.python.org/pipermail/python-3000/2007-September/010450.html
>

Thanks, that's very useful.

> > Why did you choose an incompatible approach for PEP 383?
>
> Because in Python, we want to be able to access all files on disk.
> Neither Java nor Mono are capable of doing that.

OK, so what's wrong with os.listdir() and similar functions returning a
unicode string for strings that correctly encode/decode, and with byte
strings for strings that are not valid unicode?

The file I/O functions already seem to deal with byte strings correctly, you
never get byte strings on platforms that are fully unicode, and they are
well supported.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/d3296ed7/attachment-0001.htm>

From martin at v.loewis.de  Thu Apr 30 12:32:47 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 12:32:47 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com>
Message-ID: <49F97E4F.7080100@v.loewis.de>

> OK, so what's wrong with os.listdir() and similar functions returning a
> unicode string for strings that correctly encode/decode, and with byte
> strings for strings that are not valid unicode? 

See http://bugs.python.org/issue3187
in particular msg71655

Regards,
Martin

From solipsis at pitrou.net  Thu Apr 30 12:48:36 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 30 Apr 2009 10:48:36 +0000 (UTC)
Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383
	(again)
References: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com>
Message-ID: <loom.20090430T104515-831@post.gmane.org>

Thomas Breuel <tmbdev <at> gmail.com> writes:
> 
> So, I created some ISO8859-15 and ISO8859-8 encoded file names on a device,
plugged them into my Windows Vista machine, and fired up Python 3.0.First,
os.listdir("f:") returns a list of strings for those file names... but those
unicode strings are illegal.

Sorry, when you report such experiments, is it too much to ask for a cut and
paste of your Python session?

You are being unhelpful with such unsubstantiated statements, and your mails are
taking a lot of valuable bandwidth.

Antoine.

From tmbdev at gmail.com  Thu Apr 30 12:56:03 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 12:56:03 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F97E4F.7080100@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> 
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> 
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com> 
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> 
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> 
	<49F96095.4000208@v.loewis.de>
	<7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com> 
	<49F97E4F.7080100@v.loewis.de>
Message-ID: <7e51d15d0904300356k208304ech7d934d10bb809c38@mail.gmail.com>

On Thu, Apr 30, 2009 at 12:32, "Martin v. L?wis" <martin at v.loewis.de> wrote:

> > OK, so what's wrong with os.listdir() and similar functions returning a
> > unicode string for strings that correctly encode/decode, and with byte
> > strings for strings that are not valid unicode?
>
> See http://bugs.python.org/issue3187
> in particular msg71655
>

Why didn't you point to that discussion from the PEP 383?  And why didn't
you point to Kowalczyk's message on encodings in Mono, Java, etc. from the
PEP?  You could have saved us all a lot of time.

Under the set of constraints that Guido imposes, plus the requirement that
round-trip works for illegal encodings, there is no other solution than PEP
383.  That doesn't make PEP 383 right--I still think it's a bad
decision--but it makes it pointless to discuss it any further.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/5ed6dac1/attachment.htm>

From p.f.moore at gmail.com  Thu Apr 30 13:02:08 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 30 Apr 2009 12:02:08 +0100
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F97E4F.7080100@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com>
	<49F97E4F.7080100@v.loewis.de>
Message-ID: <79990c6b0904300402p5f8adff3r10c70f6279944f56@mail.gmail.com>

2009/4/30 "Martin v. L?wis" <martin at v.loewis.de>:
>> OK, so what's wrong with os.listdir() and similar functions returning a
>> unicode string for strings that correctly encode/decode, and with byte
>> strings for strings that are not valid unicode?
>
> See http://bugs.python.org/issue3187
> in particular msg71655

Can I suggest that a pointer to this issue be added to the PEP? It
certainly seems like a lot of the discussion of options available is
captured there. And the fact that Guido's views are noted there is
also useful (as he hasn't been contributing to this thread).

2009/4/30 Thomas Breuel <tmbdev at gmail.com>:
>> > Since both have had to deal with this, have you looked at what they
>> > actually do before proposing PEP 383?  What did you find?
>>
>> See
>>
>> http://mail.python.org/pipermail/python-3000/2007-September/010450.html
>
> Thanks, that's very useful.

This reference could probably be usefully added to the PEP as well.

Paul.

From glyph at divmod.com  Thu Apr 30 13:26:34 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 30 Apr 2009 11:26:34 -0000
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F96095.4000208@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
Message-ID: <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>

On 08:25 am, martin at v.loewis.de wrote:
>>Why did you choose an incompatible approach for PEP 383?
>
>Because in Python, we want to be able to access all files on disk.
>Neither Java nor Mono are capable of doing that.

Java is not capable of doing that.  Mono, as I keep pointing out, is. 
It uses NULLs to escape invalid UNIX filenames.  Please see:

http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding

"The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, 
access, and open all files on your filesystem, regardless of encoding."

From tmbdev at gmail.com  Thu Apr 30 13:34:47 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 13:34:47 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> 
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> 
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com> 
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> 
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> 
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
Message-ID: <7e51d15d0904300434q7b667ed9lf989738c7fbdfc36@mail.gmail.com>

>
> Java is not capable of doing that.  Mono, as I keep pointing out, is. It
> uses NULLs to escape invalid UNIX filenames.  Please see:
>
> http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding
>
> "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list,
> access, and open all files on your filesystem, regardless of encoding."
>

OK, so why not adopt the Mono solution in CPython?  It seems to produce
valid unicode strings, removing at least one issue with PEP 383.  It also
means that IronPython and CPython actually would be compatible.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/1ea68280/attachment.htm>

From rdmurray at bitdance.com  Thu Apr 30 13:49:18 2009
From: rdmurray at bitdance.com (R. David Murray)
Date: Thu, 30 Apr 2009 07:49:18 -0400 (EDT)
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
Message-ID: <Pine.LNX.4.64.0904300737170.3802@kimball.webabinitio.net>

On Thu, 30 Apr 2009 at 11:26, glyph at divmod.com wrote:
> On 08:25 am, martin at v.loewis.de wrote:
>> > Why did you choose an incompatible approach for PEP 383?
>> 
>> Because in Python, we want to be able to access all files on disk.
>> Neither Java nor Mono are capable of doing that.
>
> Java is not capable of doing that.  Mono, as I keep pointing out, is. It uses 
> NULLs to escape invalid UNIX filenames.  Please see:
>
> http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding
>
> "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, 
> access, and open all files on your filesystem, regardless of encoding."

And then it goes on to say: "You won't be able to pass non-Unicode
filenames as command-line arguments."(*)  Not only that, but you can't
reliably use such files with System.IO (whatever that is, but it
sounds pretty basic).  This support is only available "within the
Mono.Unix and Mono.Unix.Native namespaces".  Now, I don't know what
that means (never having touched Mono), but it doesn't sound like
it simplifies cross-platform support, which is what PEP 383 is aiming for.

So it doesn't sound like Mono has solved the problem that Martin is
trying to solve, even if it is possible to put Unix specific code into
your Mono ap to deal with byte filenames on disk from within your GUI.

FWIW I'm +1 on seeing PEP 383 in 3.1, if Martin can manage the patch
in time.

--David

(*) I'd argue that in an important sense that makes Martin's statement
about Mono being unable to access all files on disk a true statement; but,
then, I freely admit that I have a bias against GUI programs in general :)

From martin at v.loewis.de  Thu Apr 30 14:25:28 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 14:25:28 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904300356k208304ech7d934d10bb809c38@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com>
	<49F97E4F.7080100@v.loewis.de>
	<7e51d15d0904300356k208304ech7d934d10bb809c38@mail.gmail.com>
Message-ID: <49F998B8.8080602@v.loewis.de>

> Why didn't you point to that discussion from the PEP 383?  And why
> didn't you point to Kowalczyk's message on encodings in Mono, Java, etc.
> from the PEP?  

Because I assumed that readers of the PEP would know (and I'm sure
many of them do - this has been *really* discussed over and over again).

> Under the set of constraints that Guido imposes, plus the requirement
> that round-trip works for illegal encodings, there is no other solution
> than PEP 383.

Well, there actually is an alternative: expose byte-oriented interfaces
in parallel with the string-oriented ones. In the rationale, the PEP
explains why I consider this the worse choice.

Regards,
Martin

From tmbdev at gmail.com  Thu Apr 30 14:59:55 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 14:59:55 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <Pine.LNX.4.64.0904300737170.3802@kimball.webabinitio.net>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> 
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> 
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com> 
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> 
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> 
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<Pine.LNX.4.64.0904300737170.3802@kimball.webabinitio.net>
Message-ID: <7e51d15d0904300559v40c4cc53x498184af9485bc17@mail.gmail.com>

>
> And then it goes on to say: "You won't be able to pass non-Unicode
> filenames as command-line arguments."(*)  Not only that, but you can't
> reliably use such files with System.IO (whatever that is, but it
> sounds pretty basic).  This support is only available "within the
> Mono.Unix and Mono.Unix.Native namespaces".  Now, I don't know what
> that means (never having touched Mono), but it doesn't sound like
> it simplifies cross-platform support, which is what PEP 383 is aiming for.

The problem there isn't how the characters are quoted, but that they are
quoted at all, and that the ECMA and Microsoft libraries don't understand
this quoting convention.  Since command line parsing is handled through
ECMA, you happen not to be able to get at those files (that's fixable, but
why bother).

The analogous problem exists with Martin's proposal on Python: if you pass a
unicode string from Python to some library through a unicode API and that
library attempts to open the file, it will fail because it doesn't use the
proposed Python utf-8b decoder.  There just is no way to fix that, no matter
which quoting convention you use.

In contrast to PEP 383, quoting with u0000 at least results in valid unicode
strings in Python.  And command line arguments (and environment variables
etc.) would work in Python because in Python, those should also use the new
encoding for invalid UTF-8 inputs.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/96c06a45/attachment.htm>

From martin at v.loewis.de  Thu Apr 30 15:04:59 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 30 Apr 2009 15:04:59 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>	<49F9484D.9010507@v.loewis.de>	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
Message-ID: <49F9A1FB.4090104@v.loewis.de>

>> Because in Python, we want to be able to access all files on disk.
>> Neither Java nor Mono are capable of doing that.
> 
> Java is not capable of doing that.  Mono, as I keep pointing out, is. It
> uses NULLs to escape invalid UNIX filenames.  Please see:
> 
> http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding
> 
> "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list,
> access, and open all files on your filesystem, regardless of encoding."

I think this is misleading. With Mono 2.0.1, I get

** (/tmp/a.exe:30553): WARNING **: FindNextFile: Bad encoding for
'/home/martin/work/3k/t/\xff'
Consider using MONO_EXTERNAL_ENCODINGS

when running the program

using System.IO;
class X{
  public static void Main(string[] args){
    DirectoryInfo di = new DirectoryInfo(".");
    foreach(FileInfo fi in di.GetFiles())
      System.Console.WriteLine("Next:"+fi.Name);
  }
}

On the other hand, when I write

using Mono.Unix;
class X{
  public static void Main(string[] args){
    UnixDirectoryInfo di = new UnixDirectoryInfo(".");
    foreach(UnixFileSystemInfo fi in di.GetFileSystemEntries())
      System.Console.WriteLine("Next:"+fi.Name);
  }
}

I get indeed all files listed (and can also find out the other
stat results). Of course, the resulting application will be
mono-specific (it links with Mono.Posix), and not work on Microsoft
.NET anymore. IOW, IronPython likely won't use this API.

Python, of course, already has the equivalent of that: os.listdir,
with a byte parameter, will give you access to all files. If
you wanted to closely emulate the Mono API, you could set
the file system encoding to the mono-lookalike codec.

Regards,
Martin

From martin at v.loewis.de  Thu Apr 30 15:06:57 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 15:06:57 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904300434q7b667ed9lf989738c7fbdfc36@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<7e51d15d0904300434q7b667ed9lf989738c7fbdfc36@mail.gmail.com>
Message-ID: <49F9A271.8050700@v.loewis.de>

> OK, so why not adopt the Mono solution in CPython?  It seems to produce
> valid unicode strings, removing at least one issue with PEP 383.  It
> also means that IronPython and CPython actually would be compatible.

See my other message. The Mono solution may not be what you expect it to be.

Regards,
Martin

From tmbdev at gmail.com  Thu Apr 30 15:10:47 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 15:10:47 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F9A1FB.4090104@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> 
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> 
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com> 
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> 
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> 
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<49F9A1FB.4090104@v.loewis.de>
Message-ID: <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>

>
>
> > "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list,
> > access, and open all files on your filesystem, regardless of encoding."
>
> I think this is misleading. With Mono 2.0.1, I get

This has nothing to do with how Mono quotes.  The reason for this is that
Mono quotes at all and that the Mono developers decided not to change
System.IO to understand UNIX quoting.

If Mono used PEP 383 quoting, this would fail the same way.

And analogous failures will exist with PEP 383 in Python, because there will
be more and more libraries with unicode interfaces that then use their own
internal decoder (which doesn't understand utf8b) to get a UNIX file name.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/f2bda166/attachment.htm>

From martin at v.loewis.de  Thu Apr 30 15:32:01 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 15:32:01 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<49F9A1FB.4090104@v.loewis.de>
	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>
Message-ID: <49F9A851.5010006@v.loewis.de>

> This has nothing to do with how Mono quotes.  The reason for this is
> that Mono quotes at all and that the Mono developers decided not to
> change System.IO to understand UNIX quoting. 
> 
> If Mono used PEP 383 quoting, this would fail the same way. 
> 
> And analogous failures will exist with PEP 383 in Python, because there
> will be more and more libraries with unicode interfaces that then use
> their own internal decoder (which doesn't understand utf8b) to get a
> UNIX file name.

What's an analogous failure? Or, rather, why would a failure analogous
to the one I got when using System.IO.DirectoryInfo ever exist in
Python?

Regards,
Martin

From aahz at pythoncraft.com  Thu Apr 30 15:42:36 2009
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 30 Apr 2009 06:42:36 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <49F95360.8070808@g.nevcal.com>
References: <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com>
	<49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com>
	<49F801C1.2070109@v.loewis.de> <49F82435.3060205@g.nevcal.com>
	<49F8B886.5020700@v.loewis.de> <49F8C206.5070801@g.nevcal.com>
	<gtb6v5$gu2$1@ger.gmane.org> <49F95360.8070808@g.nevcal.com>
Message-ID: <20090430134236.GA12664@panix.com>

[top-posting for once to preserve full quoting]

Glenn,

Could you please reduce your suggestions into sample text for the PEP?
We seem to be now at the stage where nobody is objecting to the PEP, so
the focus should be on making the PEP clearer.

If you still want to create an alternative PEP implementation, please
provide step-by-step walkthroughs, preferably in a new thread -- if you
did previously provide that, it's gotten lost in the flood of messages.

On Thu, Apr 30, 2009, Glenn Linderman wrote:
> On approximately 4/29/2009 8:46 PM, came the following characters from  
> the keyboard of Terry Reedy:
>> Glenn Linderman wrote:
>>> On approximately 4/29/2009 1:28 PM, came the following characters 
>>> from 
>>
>>>> So where is the ambiguity here?
>>>
>>> None.  But not everyone can read all the Python source code to try to 
>>> understand it; they expect the documentation to help them avoid that. 
>>> Because the documentation is lacking in this area, it makes your  
>>> concisely stated PEP rather hard to understand.
>>
>> If you think a section of the doc is grossly inadequate, and there is 
>> no existing issue on the tracker, feel free to add one.
>>
>>> Thanks for clarifying the Windows behavior, here.  A little more  
>>> clarification in the PEP could have avoided lots of discussion.  It  
>>> would seem that a PEP, proposed to modify a poorly documented (and  
>>> therefore likely poorly understood) area, should be educational about 
>>> the status quo, as well as presenting the suggested change.
>>
>> Where the PEP proposes to change, it should start with the status quo.  
>> But Martin's somewhat reasonable position is that since he is not  
>> proposing to change behavior on Windows, it is not his responsibility 
>> to document what he is not proposing to change more adequately.  This  
>> means, of course, that any observed change on Windows would then be a  
>> bug, or at least a break of the promise.  On the other hand, I can see  
>> that this is enough related to what he is proposing to change that  
>> better doc would help.
>
>
> Yes; the very fact that the PEP discusses Windows, speaks about  
> cross-platform code, and doesn't explicitly state that no Windows  
> functionality will change, is confusing.
>
> An example of how to initialize things within a sample cross-platform  
> application might help, especially if that initialization only happens  
> if the platform is POSIX, or is commented to the effect that it has no  
> effect on Windows, but makes POSIX happy.  Or maybe it is all buried  
> within the initialization of Python itself, and is not exposed to the  
> application at all.  I still haven't figured that out, but was not (and  
> am still not) as concerned about that as ensuring that the overall  
> algorithms are functional and useful and user-friendly.  Showing it  
> might have been helpful in making it clear that no Windows functionality  
> would change, however.
>
> A statement that additional features are being added to allow  
> cross-platform programs deal with non-decodable bytes obtained from  
> POSIX APIs using the same code that already works on Windows, would have  
> made things much clearer.  The present Abstract does, in fact, talk only  
> about POSIX, but later statements about Windows muddy the water.
>
> Rationale paragraph 3, explicitly talks about cross-platform programs  
> needing to work one way on Windows and another way on POSIX to deal with  
> all the cases.  It calls that a proposal, which I guess it is for  
> command line and environment, but it is already implemented in both  
> bytes and str forms for file names... so that further muddies the water.
>
> It is, of course, easier to point out deficiencies in a document than to  
> write a better document; however, it is incumbent upon the PEP author to  
> write a PEP that is good enough to get approved, and that means making  
> it understandable enough that people are in favor... or to respond to  
> the plethora of comments until people are in favor.  I'm not sure which  
> one is more time-consuming.
>
> I've reached the point, based on PEP and comment responses, where I now  
> believe that the PEP is a solution to the problem it is trying to solve,  
> and doesn't create ambiguities in the naming.  I don't believe it is the  
> best solution.
>
> The basic problem is the overuse of fake characters... normalizing them  
> for display results is large data loss -- many characters would be  
> translated to the same replacement characters.
>
> Solutions exist that would allow the use of fewer different fake  
> characters in the strings, while still having a fake character as the  
> escape character, to preserve the invariant that all the strings  
> manipulated by python-escape from the PEP were, and become, strings  
> containing fake characters (from a strict Unicode perspective), which is  
> a nice invariant*.  There even exist solutions that would use only one  
> fake character (repeatedly if necessary), and all other characters  
> generated would be displayable characters.  This would ease the burden  
> on the program in displaying the strings, and also on the user that  
> might view the resulting mojibake in trying to differentiate one such  
> string from another.  Those are outlined in various emails in this  
> thread, although some include my misconception that strings obtained via  
>  Unicode-enabled OS APIs would also need to be encoded and altered.  If  
> there is any interest in using a more readable encoding, I'd be glad to  
> rework them to remove those misconceptions.
>
> * It would be nice to point out that invariant in the PEP, also.
>
>
> -- 
> Glenn -- http://nevcal.com/
> ===========================
> A protocol is complete when there is nothing left to remove.
> -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/aahz%40pythoncraft.com

-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

From google at mrabarnett.plus.com  Thu Apr 30 16:01:01 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Thu, 30 Apr 2009 15:01:01 +0100
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F9A271.8050700@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>	<87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp>	<7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com>	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>	<49F9484D.9010507@v.loewis.de>	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>	<49F96095.4000208@v.loewis.de>	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>	<7e51d15d0904300434q7b667ed9lf989738c7fbdfc36@mail.gmail.com>
	<49F9A271.8050700@v.loewis.de>
Message-ID: <49F9AF1D.8060100@mrabarnett.plus.com>

Martin v. L?wis wrote:
>> OK, so why not adopt the Mono solution in CPython?  It seems to produce
>> valid unicode strings, removing at least one issue with PEP 383.  It
>> also means that IronPython and CPython actually would be compatible.
> 
> See my other message. The Mono solution may not be what you expect it to be.
> 
Have we considered discussing the problem with the developers and users
of the other languages to reach a common solution?

From apt.shansen at gmail.com  Thu Apr 30 16:05:29 2009
From: apt.shansen at gmail.com (Stephen Hansen)
Date: Thu, 30 Apr 2009 07:05:29 -0700
Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383
	(again)
In-Reply-To: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com>
References: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com>
Message-ID: <7a9c25c20904300705i2237ab62pb9f14b19c46252b@mail.gmail.com>

>
> You can't even print them without getting an error from Python.  In fact,
> you also can't print strings containing the proposed half-surrogate
> encodings either: in both cases, the output encoder rejects them with a
> UnicodeEncodeError.   (If not even Python, with its generally lenient
> attitude, can print those things, some other libraries probably will fail,
> too.)
>

I think you may be confusing two completely separate things; its a
long-known issue that the windows console is simply not a Unicode-aware
display device naturally. You have to manually set the codepage (by typing
'chcp 65001' -- that's utf8) *and* manually make sure you have a
unicode-enabled font chosen for it (which for console fonts is extremely
limited to none, and last I looked the default font didn't support unicode)
before you can even try to successfully print valid unicode. The default
codepage is 437 (for me at least; I think it depends on which language of
Windows you're using) which is ASCII-/ish/.

You have to do your test in an environment which actually supports
displaying unicode at all, or its meaningless.

Personally and for all the use cases I have to deal with at work, I would
/love/ to see this PEP succeed. Being able to query a list of files in a
directory and get them -all-, display them all to a user
(which necessitates it being converted to unicode one way or the other. I
don't care if certain characters don't display: as long as any arbitrary
file will always end up looking like a distinct series of readable and
unreadable glyphs so the user can select it clearly), and then perform
operations on any selected file regardless of whatever nonsense may be going
on underneath with confused users and encodings... in a cross-platform way,
would be a tremendous boon to future py3k porting efforts. I ramble.

If there's inconsistent encodings used by users on a posix system so that
they can only make sense of half of what the names really are... that's for
other programs to deal with. I just want to be able to access the files they
tell me they want.

For anyone who is doing something low-level, they can use the bytes API.

--Stephen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/9e41f61f/attachment.htm>

From guido at python.org  Thu Apr 30 16:39:31 2009
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Apr 2009 07:39:31 -0700
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <49F96B80.5090808@v.loewis.de>
References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> 
	<49F96B80.5090808@v.loewis.de>
Message-ID: <ca471dc20904300739l68f9d224xb7643f8cd004d220@mail.gmail.com>

FWIW, I'm in agreement with this PEP (i.e. its status is now
Accepted). Martin, you can update the PEP and start the
implementation.

On Thu, Apr 30, 2009 at 2:12 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Did you use a name with other characters? ?Were they displayed? ?Both
>> before and after the surrogates?
>
> Yes, yes, and yes (IOW, I put the surrogate in the middle).
>
>> Did you use one or three half surrogates, to produce the three crossed
>> boxes?
>
> Only one, and it produced three boxes - probably one for each UTF-8 byte
> that pango considered invalid.
>
>> Did you use one or three half surrogates, to produce the single square box?
>
> Again, only one. Apparently, PyQt passes the Python Unicode string to Qt
> in a character-by-character representation, rather than going through UTF-8.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tmbdev at gmail.com  Thu Apr 30 16:42:45 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 16:42:45 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F9A851.5010006@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> 
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com> 
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> 
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> 
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<49F9A1FB.4090104@v.loewis.de>
	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> 
	<49F9A851.5010006@v.loewis.de>
Message-ID: <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>

>
> What's an analogous failure? Or, rather, why would a failure analogous
> to the one I got when using System.IO.DirectoryInfo ever exist in
> Python?

Mono.Unix uses an encoder and a decoder that knows about special quoting
rules.  System.IO uses a different encoder and decoder because it's a
reimplementation of a Microsoft library and the Mono developers chose not to
implement Mono.Unix quoting rules in it.  There is nothing technical
preventing System.IO from using the Mono.Unix codec, it's just that the
developers didn't want to change the behavior of an ECMA and Microsoft
library.

The analogous phenomenon will exist in Python with PEP 383.  Let's say I
have a C library with wide character interfaces and I pass it a unicode
string from Python.(*)  That C library now turns that unicode string into
UTF-8 for writing to disk using its internal UTF-8 converter.   The result
is that the file can be opened using Python's "open", but it can't be opened
using the other library.  There simply is no way you can guarantee that all
libraries turn unicode strings into pathnames using utf-8b.   I'm not
arguing about whether that's good or bad anymore, since it's obvious that
the only proposal acceptable to Guido uses some form of non-standard
encoding / quoting.

I'm simply pointing out that the failure you observed with System.IO has
nothing to do with which quoting convention you choose, but results from the
fact that the developers of System.IO are not using the same encoder/decoder
as Mono.Unix (in that case, by choice).

So, I don't see any reason to prefer your half surrogate quoting to the Mono
U+0000-based quoting.  Both seem to achieve the same goal with respect to
round tripping file names, displaying them, etc., but Mono quoting actually
results in valid unicode strings.  It works because null is the one
character that's not legal in a UNIX path name.

So, why do you prefer half surrogate coding to U+0000 quoting?

Tom

(*) There's actually a second, sutble issue.  PEP 383 intends utf-8b only to
be used for file names.  But that means that I might have to bind the first
argument to TIFFOpen with utf-8b conversion, while I might have to bind
other arguments with utf-8 conversion.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/57f4ecdb/attachment-0001.htm>

From glyph at divmod.com  Thu Apr 30 17:19:36 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 30 Apr 2009 15:19:36 -0000
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<49F9A1FB.4090104@v.loewis.de>
	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>
	<49F9A851.5010006@v.loewis.de>
	<7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
Message-ID: <20090430151936.12555.425993626.divmod.xquotient.10039@weber.divmod.com>

On 02:42 pm, tmbdev at gmail.com wrote:
>So, why do you prefer half surrogate coding to U+0000 quoting?

I have also been eagerly waiting for an answer to this question.  I am 
afraid I have lost it somewhere in the storm of this thread :).

Martin, if you're going to stick with the half-surrogate trick, would 
you mind adding a section to the PEP on "alternate encoding strategies", 
explaining why the NULL method was not selected?

From p.f.moore at gmail.com  Thu Apr 30 17:04:30 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 30 Apr 2009 16:04:30 +0100
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<49F9A1FB.4090104@v.loewis.de>
	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>
	<49F9A851.5010006@v.loewis.de>
	<7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
Message-ID: <79990c6b0904300804u7f54b6e6g1f4efe25fc8f1533@mail.gmail.com>

2009/4/30 Thomas Breuel <tmbdev at gmail.com>:
> The analogous phenomenon will exist in Python with PEP 383.? Let's say I
> have a C library with wide character interfaces and I pass it a unicode
> string from Python.(*)
[...]
> (*) There's actually a second, sutble issue.? PEP 383 intends utf-8b only to
> be used for file names.? But that means that I might have to bind the first
> argument to TIFFOpen with utf-8b conversion, while I might have to bind
> other arguments with utf-8 conversion.

The footnote seems to imply that you have a concrete case rather than
a hypothetical one. The discussion would be much easier if you would
supply the concrete details. Then other participants in the discussion
could offer concrete suggestions on how your issue could be addressed.

Of course, there are 2 provisos here:

1. Maybe you don't care any more, having accepted that the PEP is
going to be implemented. That's fine, but there's also no point
continuing to argue your case in that event.
2. Maybe you aren't going to accept suggestions that don't conform to
your idea of how things should be done. In which case, your reasoning
is circular, and you're wasting people's time.

Sorry, that sounds grumpy. But I get a headache at the best of times
trying to understand Unicode issues, and theoretical, vague,
descriptions of problems just make my headache worse...

I suggest the discussion should be dropped now, as the PEP has been accepted.
Paul.

From martin at v.loewis.de  Thu Apr 30 17:35:19 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 17:35:19 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<49F9A1FB.4090104@v.loewis.de>
	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>
	<49F9A851.5010006@v.loewis.de>
	<7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
Message-ID: <49F9C537.9020801@v.loewis.de>

>     What's an analogous failure? Or, rather, why would a failure analogous
>     to the one I got when using System.IO.DirectoryInfo ever exist in
>     Python?
> 
> 
> Mono.Unix uses an encoder and a decoder that knows about special quoting
> rules.  System.IO uses a different encoder and decoder because it's a
> reimplementation of a Microsoft library and the Mono developers chose
> not to implement Mono.Unix quoting rules in it.  There is nothing
> technical preventing System.IO from using the Mono.Unix codec, it's just
> that the developers didn't want to change the behavior of an ECMA and
> Microsoft library.
> 
> The analogous phenomenon will exist in Python with PEP 383.  Let's say I
> have a C library with wide character interfaces and I pass it a unicode
> string from Python.(*)  That C library now turns that unicode string
> into UTF-8 for writing to disk using its internal UTF-8 converter.

What specific library do you have in mind? Would it always use UTF-8?
If so, it will fail in many other ways, as well - if the locale charset
is different from UTF-8.

I fail to see the analogy. In Python, the standard library works,
and the extension fails; in Mono, it's actually vice versa, and not
at all analogous.

> So, I don't see any reason to prefer your half surrogate quoting to the
> Mono U+0000-based quoting.  Both seem to achieve the same goal with
> respect to round tripping file names, displaying them, etc., but Mono
> quoting actually results in valid unicode strings.  It works because
> null is the one character that's not legal in a UNIX path name.
> 
> So, why do you prefer half surrogate coding to U+0000 quoting?

If I pass a string with an embedded U+0000 to gtk, gtk will truncate
the string, and stop rendering it at this character. This is worse than
what it does for invalid UTF-8 sequences. Chances are fairly high that
other C libraries will fail in the same way, in particular if they
expect char* (which is very common in C).

So I prefer the half surrogate because its failure mode is better th

> (*) There's actually a second, sutble issue.  PEP 383 intends utf-8b
> only to be used for file names.  But that means that I might have to
> bind the first argument to TIFFOpen with utf-8b conversion, while I
> might have to bind other arguments with utf-8 conversion.

I couldn't find a Python wrapper for libtiff. If a wrapper was written,
it would indeed have to use the file system encoding for the file name
parameters. However, it would have to do that even without PEP 383,
since the file name should be encoded in the locale's encoding, not
in UTF-8, anyway.

Regards,
Martin

From murman at gmail.com  Thu Apr 30 17:43:02 2009
From: murman at gmail.com (Michael Urman)
Date: Thu, 30 Apr 2009 10:43:02 -0500
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<49F9A1FB.4090104@v.loewis.de>
	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>
	<49F9A851.5010006@v.loewis.de>
	<7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
Message-ID: <dcbbbb410904300843p764d8f78lf5ec710611baa0c6@mail.gmail.com>

On Thu, Apr 30, 2009 at 09:42, Thomas Breuel <tmbdev at gmail.com> wrote:
> So, I don't see any reason to prefer your half surrogate quoting to the Mono
> U+0000-based quoting.? Both seem to achieve the same goal with respect to
> round tripping file names, displaying them, etc., but Mono quoting actually
> results in valid unicode strings.? It works because null is the one
> character that's not legal in a UNIX path name.

This seems to summarize only half of the problem. Mono's U+0000
quoting creates a string which is an invalid filename; PEP 383's
creates one which is an unsanctioned collection of code units. Neither
can be passed directly to the posix filesystem in question. I favor
PEP 383 because its Unicode strings can be usefully passed to most
APIs that would display it usefully. Mono's U+0000 probably truncates
most strings. And since such non-valid Unicode strings can occur on
the Windows filesystem, I don't find their use in PEP 383 to be a
flaw.

-- 
Michael Urman

From martin at v.loewis.de  Thu Apr 30 18:07:33 2009
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 30 Apr 2009 18:07:33 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <20090430151936.12555.425993626.divmod.xquotient.10039@weber.divmod.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>	<49F9484D.9010507@v.loewis.de>	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>	<49F96095.4000208@v.loewis.de>	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>	<49F9A1FB.4090104@v.loewis.de>	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>	<49F9A851.5010006@v.loewis.de>	<7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
	<20090430151936.12555.425993626.divmod.xquotient.10039@weber.divmod.com>
Message-ID: <49F9CCC5.2010504@v.loewis.de>

> Martin, if you're going to stick with the half-surrogate trick, would
> you mind adding a section to the PEP on "alternate encoding strategies",
> explaining why the NULL method was not selected?

In the PEP process, it isn't my job to criticize competing proposals.
Instead, proponents of competing proposals should write alternative
PEPs, which then get criticized on their own. As the PEP author, I would
have to collect the objections to the PEP in the PEP, which I did;
I'm not convinced that I would have to also collect all alternative
proposals that people come up with in the PEP (except when they are in
fact amendments that I accept).

I hope I had made it clear that I don't try to "shoot down" alternative
proposals, but have rather asked people making alternative proposals
to write their own PEPs. At some point (when the amount of alternative
proposals grew unreasonably), I stopped responding to each and every
alternative proposal that this should be proposed in a separate PEP.

Wrt. escaping with U+0000: I personally disliked it because I considered
it difficult to implement. In particular, on encoding: how do you
arrange the encoder not to encode the NUL character in the encoding, as
it would surely be a valid character? The surrogate approach works
much better here, as it will automatically invoke the error handler.

With further testing, I found that in practice, the proposal also
suffers from the problem that the character would be taken as a
terminating character by APIs - I found that to be a real problem
in gtk, and have added that to the PEP.

Regards,
Martin

From glyph at divmod.com  Thu Apr 30 18:26:25 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 30 Apr 2009 16:26:25 -0000
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F9C537.9020801@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<49F9A1FB.4090104@v.loewis.de>
	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>
	<49F9A851.5010006@v.loewis.de>
	<7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
	<49F9C537.9020801@v.loewis.de>
Message-ID: <20090430162625.12555.1123571271.divmod.xquotient.10122@weber.divmod.com>

On 03:35 pm, martin at v.loewis.de wrote:
>>So, why do you prefer half surrogate coding to U+0000 quoting?
>
>If I pass a string with an embedded U+0000 to gtk, gtk will truncate
>the string, and stop rendering it at this character. This is worse than
>what it does for invalid UTF-8 sequences. Chances are fairly high that
>other C libraries will fail in the same way, in particular if they
>expect char* (which is very common in C).

Hmm.  I believe the intended failure mode here, for PyGTK at least, is 
actually this:

    TypeError: GtkLabel.set_text() argument 1 must be string without null 
bytes, not unicode

APIs in PyGTK which accept NULLs and silently trucate are probably 
broken.  Although perhaps I've just made your point even more strongly; 
one because the behavior is inconsistent, and two because it sometimes 
raises an exception if a NULL is present, and apparently the goal here 
is to prevent exceptions from being raised anywhere in the process.

For this idiom to be of any use to GTK programs, 
gtk.FileChooser.get_filename() will probably need to be changed, since 
(in py2) it currently returns a str, not unicode.

The PEP should say something about how GUI libraries should handle file 
choosers, so that they'll be consistent and compatible with the standard 
library.  Perhaps only that file choosers need to take this PEP into 
account, and the rest is obvious.  Or maybe the right thing for GTK to 
do would be to continue to use bytes on POSIX and convert to text on 
Windows, since open(), listdir() et. al. will continue to accept bytes 
for filenames?
>So I prefer the half surrogate because its failure mode is better th

Heh heh heh.

From glyph at divmod.com  Thu Apr 30 18:35:25 2009
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 30 Apr 2009 16:35:25 -0000
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F9CCC5.2010504@v.loewis.de>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>
	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>
	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>
	<49F9484D.9010507@v.loewis.de>
	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<49F9A1FB.4090104@v.loewis.de>
	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>
	<49F9A851.5010006@v.loewis.de>
	<7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
	<20090430151936.12555.425993626.divmod.xquotient.10039@weber.divmod.com>
	<49F9CCC5.2010504@v.loewis.de>
Message-ID: <20090430163525.12555.1432542229.divmod.xquotient.10137@weber.divmod.com>

On 04:07 pm, martin at v.loewis.de wrote:
>>Martin, if you're going to stick with the half-surrogate trick, would
>>you mind adding a section to the PEP on "alternate encoding 
>>strategies",
>>explaining why the NULL method was not selected?
>
>In the PEP process, it isn't my job to criticize competing proposals.
>Instead, proponents of competing proposals should write alternative
>PEPs, which then get criticized on their own. As the PEP author, I 
>would
>have to collect the objections to the PEP in the PEP, which I did;
>I'm not convinced that I would have to also collect all alternative
>proposals that people come up with in the PEP (except when they are in
>fact amendments that I accept).

Fair enough.  I have probably misunderstood the process.  I dimly 
recalled reading some PEPs which addressed alternate approaches in this 
way and I thought it was part of the process.

Anyway, congratulations on getting the PEP accepted, good luck with the 
implementation.  Thanks for addressing my question.

From martin at v.loewis.de  Thu Apr 30 18:21:03 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 18:21:03 +0200
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <20090430162625.12555.1123571271.divmod.xquotient.10122@weber.divmod.com>
References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com>	<d2155e360904292040u7d915e4qc463371db569f22b@mail.gmail.com>	<7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com>	<49F9484D.9010507@v.loewis.de>	<7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>	<49F96095.4000208@v.loewis.de>	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>	<49F9A1FB.4090104@v.loewis.de>	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>	<49F9A851.5010006@v.loewis.de>	<7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>	<49F9C537.9020801@v.loewis.de>
	<20090430162625.12555.1123571271.divmod.xquotient.10122@weber.divmod.com>
Message-ID: <49F9CFEF.2050401@v.loewis.de>

>> If I pass a string with an embedded U+0000 to gtk, gtk will truncate
>> the string, and stop rendering it at this character. This is worse than
>> what it does for invalid UTF-8 sequences. Chances are fairly high that
>> other C libraries will fail in the same way, in particular if they
>> expect char* (which is very common in C).
> 
> Hmm.  I believe the intended failure mode here, for PyGTK at least, is
> actually this:
> 
>    TypeError: GtkLabel.set_text() argument 1 must be string without null
> bytes, not unicode

It may depend on the widget also, I tried it with wxMessageDialog
(I only had the wx example available, and am using wxgtk).

> APIs in PyGTK which accept NULLs and silently trucate are probably
> broken.  Although perhaps I've just made your point even more strongly;
> one because the behavior is inconsistent, and two because it sometimes
> raises an exception if a NULL is present, and apparently the goal here
> is to prevent exceptions from being raised anywhere in the process.

Indeed so.

> For this idiom to be of any use to GTK programs,
> gtk.FileChooser.get_filename() will probably need to be changed, since
> (in py2) it currently returns a str, not unicode.

Perhaps - the entire PEP is about Python 3 only. I don't know whether
PyGTK already works with 3.x.

> The PEP should say something about how GUI libraries should handle file
> choosers, so that they'll be consistent and compatible with the standard
> library.  Perhaps only that file choosers need to take this PEP into
> account, and the rest is obvious.  Or maybe the right thing for GTK to
> do would be to continue to use bytes on POSIX and convert to text on
> Windows, since open(), listdir() et. al. will continue to accept bytes
> for filenames?

In Python 3, the file chooser should definitely return strings, and it
would be good if they were PEP 383 compliant.

>> So I prefer the half surrogate because its failure mode is better th
> 
> Heh heh heh.

And it wasn't even intentional :-)

Martin

From stephen at xemacs.org  Thu Apr 30 18:39:52 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 01 May 2009 01:39:52 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <20090429224532.GA11604@cskk.homeip.net>
References: <87d4avk3f9.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20090429224532.GA11604@cskk.homeip.net>
Message-ID: <87y6tihz8n.fsf@uwakimon.sk.tsukuba.ac.jp>

Cameron Simpson writes:
 > On 29Apr2009 22:14, Stephen J. Turnbull <stephen at xemacs.org> wrote:
 > | Baptiste Carvello writes:
 > |  > By contrast, if the new utf-8b codec would *supercede* the old one,
 > |  > \udcxx would always mean raw bytes (at least on UCS-4 builds, where
 > |  > surrogates are unused). Thus ambiguity could be avoided.
 > | 
 > | Unfortunately, that's false.  [Because Python strings are
 > | intended to be used as containers for widechars which are to be
 > | interpreted as Unicode when that makes sense, but there's no
 > | restriction against nonsense code points, including in UCS-4
 > | Python.]

[...]

 > Wouldn't you then be bypassing the implicit encoding anyway, at least to
 > some extent, and thus not trip over the PEP?

Sure.  I'm not really arguing the PEP here; the point is that under
the current definition of Python strings, ambiguity is unavoidable.
The best we can ask for is fewer exceptions, and an attempt to reduce
ambiguity to a bare minimum in the code paths that we open up when we
make definition that allows a formerly erroneous computation to
succeed.

Martin is well aware of this, the PEP is clear enough about that (to
me, but I'm a mail and multilingual editor internals kinda guy<wink>).
I'd rather have more validation of strings, but *shrug* Martin's doing
the work.

OTOH, the Unicode fans need to understand that past policy of Python
is not to validate; Python is intended to provide all the tools needed
to write validating apps, but it isn't one itself.  Martin's PEP is
quite narrow in that sense.  All it is about is an invertible encoding
of broken encodings.  It does have the downside that it guarantees
that Python itself can produce non-conforming strings, but that's not
the end of the world, and an app can keep track of them or even refuse
them by setting the error handler, if it wants to.

From dripton at ripton.net  Thu Apr 30 18:44:35 2009
From: dripton at ripton.net (David Ripton)
Date: Thu, 30 Apr 2009 09:44:35 -0700
Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again)
In-Reply-To: <49F9CFEF.2050401@v.loewis.de>
References: <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com>
	<49F96095.4000208@v.loewis.de>
	<20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com>
	<49F9A1FB.4090104@v.loewis.de>
	<7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com>
	<49F9A851.5010006@v.loewis.de>
	<7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com>
	<49F9C537.9020801@v.loewis.de>
	<20090430162625.12555.1123571271.divmod.xquotient.10122@weber.divmod.com>
	<49F9CFEF.2050401@v.loewis.de>
Message-ID: <20090430164435.GA314@vidar.dreamhost.com>

On 2009.04.30 18:21:03 +0200, "Martin v. L?wis" wrote:
> Perhaps - the entire PEP is about Python 3 only. I don't know whether
> PyGTK already works with 3.x.

It does not.  There is a bug in the Gnome tracker for it, and I believe
some work has been done to start porting PyGObject, but it appears that
a full PyGTK on Python 3 is a ways off.

-- 
David Ripton    dripton at ripton.net

From google at mrabarnett.plus.com  Thu Apr 30 20:02:23 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Thu, 30 Apr 2009 19:02:23 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
Message-ID: <49F9E7AF.1050306@mrabarnett.plus.com>

One further question: should the encoder accept a string like
u'\xDCC2\xDC80'? That would encode to b'\xC2\x80', which, when decoded,
would give u'\x80'. Does the PEP only guarantee that strings decoded
from the filesystem are reversible, but not check what might be de novo
strings?

From jimjjewett at gmail.com  Thu Apr 30 21:03:47 2009
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 30 Apr 2009 15:03:47 -0400
Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable
Message-ID: <fb6fbf560904301203v7b110cc6j6adb79a3637e88d4@mail.gmail.com>

Jared Grubb wrote:

> Ok, so if I understand, the situation is:
> * python points to 2.x version
> * python3 points to 3.x version
> * need to be able to run certain 3k scripts from cmdline (since we're
>    talking about shebangs) using Python3k even though "python"
>    points to  2.x

> So, if I got the situation right, then do these same scripts
> understand that PYTHONPATH and PYTHONHOME and all the others
> are also  probably pointing to 2.x code?

Would it make sense to introduce PYTHON2PATH and PYTHON3PATH (or even
PYTHON27PATH and PYTHON 32PATH) et al?

Or is this an area where we just figure that whoever moved the file
locations around for distribution can hardcode things properly?

-jJ

From martin at v.loewis.de  Thu Apr 30 21:10:37 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 21:10:37 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <49F9E7AF.1050306@mrabarnett.plus.com>
References: <49F9E7AF.1050306@mrabarnett.plus.com>
Message-ID: <49F9F7AD.8090704@v.loewis.de>

MRAB wrote:
> One further question: should the encoder accept a string like
> u'\xDCC2\xDC80'? That would encode to b'\xC2\x80'

Indeed so.

> which, when decoded, would give u'\x80'.

Assuming the encoding is UTF-8, yes.

> Does the PEP only guarantee that strings decoded
> from the filesystem are reversible, but not check what might be de novo
> strings?

Exactly so.

Regards,
Martin

From mike.klaas at gmail.com  Thu Apr 30 21:10:51 2009
From: mike.klaas at gmail.com (Mike Klaas)
Date: Thu, 30 Apr 2009 12:10:51 -0700
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <ca471dc20904300739l68f9d224xb7643f8cd004d220@mail.gmail.com>
References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com>
	<49F96B80.5090808@v.loewis.de>
	<ca471dc20904300739l68f9d224xb7643f8cd004d220@mail.gmail.com>
Message-ID: <D38BC97C-217E-4F72-886B-500A34773E28@gmail.com>

On 30-Apr-09, at 7:39 AM, Guido van Rossum wrote:

> FWIW, I'm in agreement with this PEP (i.e. its status is now
> Accepted). Martin, you can update the PEP and start the
> implementation.

+1

Kudos to Martin for seeing this through with (imo) considerable  
patience and dignity.

-Mike

From larry at hastings.org  Thu Apr 30 21:32:32 2009
From: larry at hastings.org (Larry Hastings)
Date: Thu, 30 Apr 2009 12:32:32 -0700
Subject: [Python-Dev] Proposed: add support for UNC paths to all
 functions in ntpath
In-Reply-To: <49F8DBCD.6050504@trueblade.com>
References: <49F8B222.7070204@hastings.org> <49F8D9A0.7000104@voidspace.org.uk>
	<49F8DBCD.6050504@trueblade.com>
Message-ID: <49F9FCD0.80208@hastings.org>

Counting the votes for http://bugs.python.org/issue5799 :

    +1 from Mark Hammond (via private mail)
    +1 from Paul Moore (via the tracker)
    +1 from Tim Golden (in Python-ideas, though what he literally said
    was "I'm up for it")
    +1 from Michael Foord
    +1 from Eric Smith

There have been no other votes.

Is that enough consensus for it to go in?  If so, are there any core 
developers who could help me get it in before the 3.1 feature freeze?  
The patch should be in good shape; it has unit tests and updated 
documentation.

/larry/

From piet at cs.uu.nl  Thu Apr 30 21:33:05 2009
From: piet at cs.uu.nl (Piet van Oostrum)
Date: Thu, 30 Apr 2009 21:33:05 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com> (Ronald Oussoren's
	message of "Tue\, 28 Apr 2009 14\:30\:43 +0200")
References: <20090427211447.GA4291@cskk.homeip.net>
	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
	<loom.20090428T114723-520@post.gmane.org>
	<79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com>
	<49F6F09E.2020506@voidspace.org.uk>
	<1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com>
Message-ID: <m2ocueq6mm.fsf@cs.uu.nl>

>>>>> Ronald Oussoren <ronaldoussoren at mac.com> (RO) wrote:

>RO> For what it's worth, the OSX API's seem to behave as follows:
>RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the
>RO> system automaticly encodes the name.

>RO> That is,  open(chr(255), 'w') will silently create a file named '%FF'
>RO> instead of the name you'd expect on a unix system.

Not for me (I am using Python 2.6.2).

>>> f = open(chr(255), 'w')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
>>> 

I once got a tar file from a Linux system which contained a file with a
non-ASCII, ISO-8859-1 encoded filename. The tar file refused to be
unpacked on a HFS+ filesystem.
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org

From barry at barrys-emacs.org  Thu Apr 30 21:43:24 2009
From: barry at barrys-emacs.org (Barry Scott)
Date: Thu, 30 Apr 2009 20:43:24 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F92E82.9040702@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>
	<49F92E82.9040702@v.loewis.de>
Message-ID: <3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org>

On 30 Apr 2009, at 05:52, Martin v. L?wis wrote:

>> How do get a printable unicode version of these path strings if they
>> contain none unicode data?
>
> Define "printable". One way would be to use a regular expression,
> replacing all codes in a certain range with a question mark.

What I mean by printable is that the string must be valid unicode
that I can print to a UTF-8 console or place as text in a UTF-8
web page.

I think your PEP gives me a string that will not encode to
valid UTF-8 that the outside of python world likes. Did I get this
point wrong?

>
>
>> I'm guessing that an app has to understand that filenames come in  
>> two forms
>> unicode and bytes if its not utf-8 data. Why not simply return  
>> string if
>> its valid utf-8 otherwise return bytes?
>
> That would have been an alternative solution, and the one that 2.x  
> uses
> for listdir. People didn't like it.

In our application we are running fedora with the assumption that the
filenames are UTF-8. When Windows systems FTP files to our system
the files are in CP-1251(?) and not valid UTF-8.

What we have to do is detect these non UTF-8 filename and get the
users to rename them.

Having an algorithm that says if its a string no problem, if its
a byte deal with the exceptions seems simple.

How do I do this detection with the PEP proposal?
Do I end up using the byte interface and doing the utf-8 decode
myself?

Barry

From google at mrabarnett.plus.com  Thu Apr 30 21:54:42 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Thu, 30 Apr 2009 20:54:42 +0100
Subject: [Python-Dev] Proposed: add support for UNC paths to all
 functions in ntpath
In-Reply-To: <49F9FCD0.80208@hastings.org>
References: <49F8B222.7070204@hastings.org>
	<49F8D9A0.7000104@voidspace.org.uk>	<49F8DBCD.6050504@trueblade.com>
	<49F9FCD0.80208@hastings.org>
Message-ID: <49FA0202.2020203@mrabarnett.plus.com>

Larry Hastings wrote:
> 
> 
> Counting the votes for http://bugs.python.org/issue5799 :
> 
>    +1 from Mark Hammond (via private mail)
>    +1 from Paul Moore (via the tracker)
>    +1 from Tim Golden (in Python-ideas, though what he literally said
>    was "I'm up for it")
>    +1 from Michael Foord
>    +1 from Eric Smith
> 
> There have been no other votes.
> 
> Is that enough consensus for it to go in?  If so, are there any core 
> developers who could help me get it in before the 3.1 feature freeze?  
> The patch should be in good shape; it has unit tests and updated 
> documentation.
> 
+1 from me.

From nad at acm.org  Thu Apr 30 21:54:50 2009
From: nad at acm.org (Ned Deily)
Date: Thu, 30 Apr 2009 12:54:50 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
References: <20090427211447.GA4291@cskk.homeip.net>
	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
	<loom.20090428T114723-520@post.gmane.org>
	<79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com>
	<49F6F09E.2020506@voidspace.org.uk>
	<1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com>
	<m2ocueq6mm.fsf@cs.uu.nl>
Message-ID: <nad-D3B823.12544930042009@ger.gmane.org>

In article <m2ocueq6mm.fsf at cs.uu.nl>, Piet van Oostrum <piet at cs.uu.nl> 
wrote:
> >>>>> Ronald Oussoren <ronaldoussoren at mac.com> (RO) wrote:
> >RO> For what it's worth, the OSX API's seem to behave as follows:
> >RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the
> >RO> system automaticly encodes the name.
> 
> >RO> That is,  open(chr(255), 'w') will silently create a file named '%FF'
> >RO> instead of the name you'd expect on a unix system.
> 
> Not for me (I am using Python 2.6.2).
> 
> >>> f = open(chr(255), 'w')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
> >>> 

What version of OSX are you using?  On Tiger 10.4.11 I see the failure 
you see but on Leopard 10.5.6 the behavior Ronald reports.

-- 
 Ned Deily,
 nad at acm.org

From martin at v.loewis.de  Thu Apr 30 22:06:33 2009
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Apr 2009 22:06:33 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org>
References: <49EEBE2E.3090601@v.loewis.de>
	<83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>
	<49F92E82.9040702@v.loewis.de>
	<3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org>
Message-ID: <49FA04C9.70906@v.loewis.de>

>>> How do get a printable unicode version of these path strings if they
>>> contain none unicode data?
>>
>> Define "printable". One way would be to use a regular expression,
>> replacing all codes in a certain range with a question mark.
> 
> What I mean by printable is that the string must be valid unicode
> that I can print to a UTF-8 console or place as text in a UTF-8
> web page.
> 
> I think your PEP gives me a string that will not encode to
> valid UTF-8 that the outside of python world likes. Did I get this
> point wrong?

You are right. However, if your *only* requirement is that it should
be printable, then this is fairly underspecified. One way to get
a printable string would be this function

def printable_string(unprintable):
  return ""

This will always return a printable version of the input string...

> In our application we are running fedora with the assumption that the
> filenames are UTF-8. When Windows systems FTP files to our system
> the files are in CP-1251(?) and not valid UTF-8.

That would be a bug in your FTP server, no? If you want all file names
to be UTF-8, then your FTP server should arrange for that.

> Having an algorithm that says if its a string no problem, if its
> a byte deal with the exceptions seems simple.
> 
> How do I do this detection with the PEP proposal?
> Do I end up using the byte interface and doing the utf-8 decode
> myself?

No, you should encode using the "strict" error handler, with the
locale encoding. If the file name encodes successfully, it's correct,
otherwise, it's broken.

Regards,
Martin

From google at mrabarnett.plus.com  Thu Apr 30 22:07:41 2009
From: google at mrabarnett.plus.com (MRAB)
Date: Thu, 30 Apr 2009 21:07:41 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org>
References: <49EEBE2E.3090601@v.loewis.de>	<83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>	<49F92E82.9040702@v.loewis.de>
	<3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org>
Message-ID: <49FA050D.4090104@mrabarnett.plus.com>

Barry Scott wrote:
> 
> On 30 Apr 2009, at 05:52, Martin v. L?wis wrote:
> 
>>> How do get a printable unicode version of these path strings if they
>>> contain none unicode data?
>>
>> Define "printable". One way would be to use a regular expression,
>> replacing all codes in a certain range with a question mark.
> 
> What I mean by printable is that the string must be valid unicode
> that I can print to a UTF-8 console or place as text in a UTF-8
> web page.
> 
> I think your PEP gives me a string that will not encode to
> valid UTF-8 that the outside of python world likes. Did I get this
> point wrong?
> 
> 
>>
>>
>>> I'm guessing that an app has to understand that filenames come in two 
>>> forms
>>> unicode and bytes if its not utf-8 data. Why not simply return string if
>>> its valid utf-8 otherwise return bytes?
>>
>> That would have been an alternative solution, and the one that 2.x uses
>> for listdir. People didn't like it.
> 
> In our application we are running fedora with the assumption that the
> filenames are UTF-8. When Windows systems FTP files to our system
> the files are in CP-1251(?) and not valid UTF-8.
> 
> What we have to do is detect these non UTF-8 filename and get the
> users to rename them.
> 
> Having an algorithm that says if its a string no problem, if its
> a byte deal with the exceptions seems simple.
> 
> How do I do this detection with the PEP proposal?
> Do I end up using the byte interface and doing the utf-8 decode
> myself?
> 
What do you do currently?

The PEP just offers a way of reading all filenames as Unicode, if that's
what you want. So what if the strings can't be encoded to normal UTF-8!
The filenames aren't valid UTF-8 anyway! :-)

From foom at fuhm.net  Thu Apr 30 22:20:31 2009
From: foom at fuhm.net (James Y Knight)
Date: Thu, 30 Apr 2009 16:20:31 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49F97275.3010307@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>	<49F6A71A.3020809@v.loewis.de>
	<873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>
	<49F97275.3010307@v.loewis.de>
Message-ID: <36EBC80A-EBF2-4C4E-B948-48AA30E63911@fuhm.net>

On Apr 30, 2009, at 5:42 AM, Martin v. L?wis wrote:
> I think you are right. I have now excluded ASCII bytes from being
> mapped, effectively not supporting any encodings that are not ASCII
> compatible. Does that sound ok?

Yes. The practical upshot of this is that users who brokenly use  
"ja_JP.SJIS" as their locale (which, note, first requires editing some  
files in /var/lib/locales manually to enable its use..) may still have  
python not work with invalid-in-shift-jis filenames. Since that locale  
is widely recognized as a bad idea to use, and is not supported by any  
distros, it certainly doesn't bother me that it isn't 100% supported  
in python. It seems like the most common reason why people want to use  
SJIS is to make old pre-unicode apps work right in WINE -- in which  
case it doesn't actually affect unix python at all.

I'd personally be fine with python just declaring that the filesystem- 
encoding will *always* be utf-8b and ignore the locale...but I expect  
some other people might complain about that. Of course, application  
authors can decide to do that themselves by calling  
sys.setfilesystemencoding('utf-8b') at the start of their program.

James

From tmbdev at gmail.com  Thu Apr 30 22:55:48 2009
From: tmbdev at gmail.com (Thomas Breuel)
Date: Thu, 30 Apr 2009 22:55:48 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <m2ocueq6mm.fsf@cs.uu.nl>
References: <20090427211447.GA4291@cskk.homeip.net>
	<49F658A5.7080807@g.nevcal.com> 
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> 
	<loom.20090428T114723-520@post.gmane.org>
	<79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> 
	<49F6F09E.2020506@voidspace.org.uk>
	<1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com> 
	<m2ocueq6mm.fsf@cs.uu.nl>
Message-ID: <7e51d15d0904301355u2268bf0te06769792f697cc7@mail.gmail.com>

>
> Not for me (I am using Python 2.6.2).
>
> >>> f = open(chr(255), 'w')
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
> >>>

You can get the same error on Linux:

$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f=open(chr(255),'w')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
>>>

(Some file system drivers do not enforce valid utf8 yet, but I suspect they
will in the future.)

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/052fa8e6/attachment.htm>

From barry at barrys-emacs.org  Thu Apr 30 23:13:43 2009
From: barry at barrys-emacs.org (Barry Scott)
Date: Thu, 30 Apr 2009 22:13:43 +0100
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <49FA04C9.70906@v.loewis.de>
References: <49EEBE2E.3090601@v.loewis.de>
	<83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org>
	<49F92E82.9040702@v.loewis.de>
	<3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org>
	<49FA04C9.70906@v.loewis.de>
Message-ID: <3D703962-7B3A-4BBC-95DB-ACD90838F13B@barrys-emacs.org>

On 30 Apr 2009, at 21:06, Martin v. L?wis wrote:

>>>> How do get a printable unicode version of these path strings if  
>>>> they
>>>> contain none unicode data?
>>>
>>> Define "printable". One way would be to use a regular expression,
>>> replacing all codes in a certain range with a question mark.
>>
>> What I mean by printable is that the string must be valid unicode
>> that I can print to a UTF-8 console or place as text in a UTF-8
>> web page.
>>
>> I think your PEP gives me a string that will not encode to
>> valid UTF-8 that the outside of python world likes. Did I get this
>> point wrong?
>
> You are right. However, if your *only* requirement is that it should
> be printable, then this is fairly underspecified. One way to get
> a printable string would be this function
>
> def printable_string(unprintable):
>  return ""

Ha ha! Indeed this works, but I would have to try to turn enough of the
string into a reasonable hint at the name of the file so the user can
some chance of know what is being reported.

>
>
> This will always return a printable version of the input string...
>
>> In our application we are running fedora with the assumption that the
>> filenames are UTF-8. When Windows systems FTP files to our system
>> the files are in CP-1251(?) and not valid UTF-8.
>
> That would be a bug in your FTP server, no? If you want all file names
> to be UTF-8, then your FTP server should arrange for that.

Not a bug its the lack of a feature. We use ProFTPd that has just  
implemented
what is required. I forget the exact details - they are at work - when  
the ftp client
asks for the FEAT of the ftp server the server can say use UTF-8.  
Supporting
that in the server was apparently none-trivia.

>
>
>> Having an algorithm that says if its a string no problem, if its
>> a byte deal with the exceptions seems simple.
>>
>> How do I do this detection with the PEP proposal?
>> Do I end up using the byte interface and doing the utf-8 decode
>> myself?
>
> No, you should encode using the "strict" error handler, with the
> locale encoding. If the file name encodes successfully, it's correct,
> otherwise, it's broken.

O.k. I understand.

Barry

From benjamin at python.org  Thu Apr 30 23:25:16 2009
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 30 Apr 2009 16:25:16 -0500
Subject: [Python-Dev] 3.1 beta deferred
Message-ID: <1afaf6160904301425h4b420827w3c51eafd097e9c73@mail.gmail.com>

Hi everyone!
In the interest of letting Martin implement PEP 383 for 3.1, I am
deferring the release of the 3.1 beta until next Wednesday, May 6th.

Thank you,
Benjamin

From tjreedy at udel.edu  Thu Apr 30 23:39:10 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 30 Apr 2009 17:39:10 -0400
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
	Interfaces
In-Reply-To: <36EBC80A-EBF2-4C4E-B948-48AA30E63911@fuhm.net>
References: <49EEBE2E.3090601@v.loewis.de>	<fb73205e0904240059u42329631hdbf34a2d94cd873e@mail.gmail.com>	<49F184C6.8000905@g.nevcal.com>	<49F30083.5050506@v.loewis.de>	<49F559A4.8050400@g.nevcal.com>	<49F60A8A.8090603@v.loewis.de>	<49F63B19.7010306@g.nevcal.com>	<49F6799F.5030208@v.loewis.de>	<875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net>	<49F6A71A.3020809@v.loewis.de>	<873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net>	<49F97275.3010307@v.loewis.de>
	<36EBC80A-EBF2-4C4E-B948-48AA30E63911@fuhm.net>
Message-ID: <gtd5pu$asg$2@ger.gmane.org>

James Y Knight wrote:
> On Apr 30, 2009, at 5:42 AM, Martin v. L?wis wrote:
>> I think you are right. I have now excluded ASCII bytes from being
>> mapped, effectively not supporting any encodings that are not ASCII
>> compatible. Does that sound ok?
> 
> Yes. The practical upshot of this is that users who brokenly use 
> "ja_JP.SJIS" as their locale (which, note, first requires editing some 
> files in /var/lib/locales manually to enable its use..) may still have 
> python not work with invalid-in-shift-jis filenames. Since that locale 
> is widely recognized as a bad idea to use, and is not supported by any 
> distros, it certainly doesn't bother me that it isn't 100% supported in 
> python. It seems like the most common reason why people want to use SJIS 
> is to make old pre-unicode apps work right in WINE -- in which case it 
> doesn't actually affect unix python at all.
> 
> I'd personally be fine with python just declaring that the 
> filesystem-encoding will *always* be utf-8b and ignore the locale...but 
> I expect some other people might complain about that. Of course, 
> application authors can decide to do that themselves by calling 
> sys.setfilesystemencoding('utf-8b') at the start of their program.

It seems to me that the 3.1+ doc set (or wiki) could be usefully 
extended with a How-to on working with filenames.  I am not sure that 
everything useful fits anywhere in particular the ref manuals.

From a.badger at gmail.com  Thu Apr 30 23:35:42 2009
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Thu, 30 Apr 2009 14:35:42 -0700
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
 Interfaces
In-Reply-To: <7e51d15d0904301355u2268bf0te06769792f697cc7@mail.gmail.com>
References: <20090427211447.GA4291@cskk.homeip.net>	<49F658A5.7080807@g.nevcal.com>
	<79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com>
	<loom.20090428T114723-520@post.gmane.org>	<79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com>
	<49F6F09E.2020506@voidspace.org.uk>	<1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com>
	<m2ocueq6mm.fsf@cs.uu.nl>
	<7e51d15d0904301355u2268bf0te06769792f697cc7@mail.gmail.com>
Message-ID: <49FA19AE.9060402@gmail.com>

Thomas Breuel wrote:
>     Not for me (I am using Python 2.6.2).
> 
>     >>> f = open(chr(255), 'w')
>     Traceback (most recent call last):
>      File "<stdin>", line 1, in <module>
>     IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
>     >>>
> 
> 
> You can get the same error on Linux:
> 
> $ python
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
> [GCC 4.3.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> f=open(chr(255),'w')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
>>>>
> 
> (Some file system drivers do not enforce valid utf8 yet, but I suspect
> they will in the future.)
> 
Do you suspect that from discussing the issue with kernel developers or
reading a thread on lkml?  If not, then you're suspicion seems to be
pretty groundless....

The fact that VFAT enforces an encoding does not lend itself to your
argument for two reasons:

1) VFAT is not a Unix filesystem.  It's a filesystem that's compatible
with Windows/DOS.  If Windows and DOS have filesystem encodings, then it
makes sense for that driver to enforce that as well.  Filesystems
intended to be used natively on Linux/Unix do not necessarily make this
design decision.

2) The encoding is specified when mounting the filesystem.  This means
that you can still mix encodings in a number of ways.  If you mount with
an encoding that has full byte coverage, for instance, each user can put
filenames from different encodings on there.  If you mount with utf8 on
a system which uses euc-jp as the default encoding, you can have full
paths that contain a mix of utf-8 and euc-jp.  Etc.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/e26f76b4/attachment.pgp>

From tjreedy at udel.edu  Thu Apr 30 23:41:56 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 30 Apr 2009 17:41:56 -0400
Subject: [Python-Dev] 3.1 beta deferred
In-Reply-To: <1afaf6160904301425h4b420827w3c51eafd097e9c73@mail.gmail.com>
References: <1afaf6160904301425h4b420827w3c51eafd097e9c73@mail.gmail.com>
Message-ID: <gtd5v3$asg$3@ger.gmane.org>

Benjamin Peterson wrote:
> Hi everyone!
> In the interest of letting Martin implement PEP 383 for 3.1, I am
> deferring the release of the 3.1 beta until next Wednesday, May 6th.

That might also give time for Larry Hastngs' UNC path patch.
(and anything else essentially ready ;-)