From python at mrabarnett.plus.com  Mon Aug  1 00:52:04 2011
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 31 Jul 2011 23:52:04 +0100
Subject: [Python-Dev] urllib bug in Python 3.2.1?
Message-ID: <4E35DC94.2090208@mrabarnett.plus.com>

Someone over at StackOverflow has a problem with urlopen in Python 3.2.1:

 
http://stackoverflow.com/questions/6892573/problem-with-urlopen/6892843#6892843

This is the code:

     from urllib.request import urlopen
     f = 
urlopen('http://online.wsj.com/mdc/public/page/2_3020-tips.html?mod=topnav_2_3000')
     page = f.read()
     f.close()

With Python 3.1 and Python 3.2.1 it works OK, but with Python 3.2.1 the
read returns an empty string (I checked it myself).

From nad at acm.org  Mon Aug  1 01:06:48 2011
From: nad at acm.org (Ned Deily)
Date: Sun, 31 Jul 2011 16:06:48 -0700
Subject: [Python-Dev] urllib bug in Python 3.2.1?
References: <4E35DC94.2090208@mrabarnett.plus.com>
Message-ID: <nad-9C5EFC.16064831072011@news.gmane.org>

In article <4E35DC94.2090208 at mrabarnett.plus.com>,
 MRAB <python at mrabarnett.plus.com> wrote:
> Someone over at StackOverflow has a problem with urlopen in Python 3.2.1:
> 
>  
> http://stackoverflow.com/questions/6892573/problem-with-urlopen/6892843#689284
> 3
> 
> This is the code:
> 
>      from urllib.request import urlopen
>      f = 
> urlopen('http://online.wsj.com/mdc/public/page/2_3020-tips.html?mod=topnav_2_3
> 000')
>      page = f.read()
>      f.close()
> 
> With Python 3.1 and Python 3.2.1 it works OK, but with Python 3.2.1 the
> read returns an empty string (I checked it myself).

http://bugs.python.org/issue12576

-- 
 Ned Deily,
 nad at acm.org


From rdmurray at bitdance.com  Tue Aug  2 04:22:20 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Mon, 01 Aug 2011 22:22:20 -0400
Subject: [Python-Dev]
	=?utf8?q?=5BPython-checkins=5D_cpython_=283=2E2=29?=
	=?utf8?q?=3A_Skip_test=5Fgetsetlocale=5Fissue1813=28=29_on_Fedora_?=
	=?utf8?q?due_to_setlocale=28=29_bug=2E?=
In-Reply-To: <E1Qo1oB-0007HZ-53@dinsdale.python.org>
References: <E1Qo1oB-0007HZ-53@dinsdale.python.org>
Message-ID: <20110802022221.9FE582506C6@webabinitio.net>

On Tue, 02 Aug 2011 01:22:03 +0200, stefan.krah <python-checkins at python.org> wrote:
> http://hg.python.org/cpython/rev/68b5f87566fb
> changeset:   71683:68b5f87566fb
> branch:      3.2
> parent:      71679:1f9ca1819d7c
> user:        Stefan Krah <skrah at bytereef.org>
> date:        Tue Aug 02 01:06:16 2011 +0200
> summary:
>   Skip test_getsetlocale_issue1813() on Fedora due to setlocale() bug.
> See: https://bugzilla.redhat.com/show_bug.cgi?id=726536
> 
> files:
>   Lib/test/test_locale.py |  3 +++
>   1 files changed, 3 insertions(+), 0 deletions(-)
> 
> 
> diff --git a/Lib/test/test_locale.py b/Lib/test/test_locale.py
> --- a/Lib/test/test_locale.py
> +++ b/Lib/test/test_locale.py
> @@ -1,4 +1,5 @@
>  from test.support import run_unittest, verbose
> +from platform import linux_distribution
>  import unittest
>  import locale
>  import sys
> @@ -391,6 +392,8 @@
>          # crasher from bug #7419
>          self.assertRaises(locale.Error, locale.setlocale, 12345)
> 
> +    @unittest.skipIf(linux_distribution()[0] == 'Fedora', "Fedora setlocale() "
> +                     "bug: https://bugzilla.redhat.com/show_bug.cgi?id=726536")
>      def test_getsetlocale_issue1813(self):
>          # Issue #1813: setting and getting the locale under a Turkish locale
>          oldlocale = locale.setlocale(locale.LC_CTYPE)

Why 'Fedora'?  This bug affects more than just Fedora:  as I reported on
the issue, I'm seeing it on Gentoo as well.  (Also, including the issue
number in the commit message is helpful).

Note that since the bug report says that "Gentoo has been including this
fix for two years", the fact that it is failing on my Gentoo system
would seem to indicate that something about the fix is not right.

So, I'm not sure this skip is even valid.  I'm not sure we've finished
diagnosing the bug.

If there are any helpful tests I can run on Gentoo, please let me know.

--
R. David Murray           http://www.bitdance.com

From nadeem.vawda at gmail.com  Tue Aug  2 10:17:52 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Tue, 2 Aug 2011 10:17:52 +0200
Subject: [Python-Dev] [Python-checkins] cpython: Issue #11651: Move
 options for running tests into a Python script.
In-Reply-To: <4E372CB3.60408@udel.edu>
References: <E1Qo0mw-0002O6-EP@dinsdale.python.org> <4E372CB3.60408@udel.edu>
Message-ID: <CANF4RMksgibY_jLiEu4zdVwJ=1pokE=fRsAVNhb_1OU9nKwu6Q@mail.gmail.com>

Thanks for catching that. Fixed in 0b52b6f1bfab.

Nadeem

From stefan at bytereef.org  Tue Aug  2 10:48:07 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Tue, 2 Aug 2011 10:48:07 +0200
Subject: [Python-Dev] [Python-checkins] cpython (3.2):
	Skip	test_getsetlocale_issue1813() on Fedora due to setlocale() bug.
In-Reply-To: <20110802022221.9FE582506C6@webabinitio.net>
References: <E1Qo1oB-0007HZ-53@dinsdale.python.org>
	<20110802022221.9FE582506C6@webabinitio.net>
Message-ID: <20110802084807.GA1830@sleipnir.bytereef.org>

R. David Murray <rdmurray at bitdance.com> wrote:
> On Tue, 02 Aug 2011 01:22:03 +0200, stefan.krah <python-checkins at python.org> wrote:
> >   Skip test_getsetlocale_issue1813() on Fedora due to setlocale() bug.
> > See: https://bugzilla.redhat.com/show_bug.cgi?id=726536
> > +    @unittest.skipIf(linux_distribution()[0] == 'Fedora', "Fedora setlocale() "
> > +                     "bug: https://bugzilla.redhat.com/show_bug.cgi?id=726536")
> 
> Why 'Fedora'?  This bug affects more than just Fedora:  as I reported on
> the issue, I'm seeing it on Gentoo as well.  (Also, including the issue
> number in the commit message is helpful).
> 
> Note that since the bug report says that "Gentoo has been including this
> fix for two years", the fact that it is failing on my Gentoo system
> would seem to indicate that something about the fix is not right.
> 
> So, I'm not sure this skip is even valid.  I'm not sure we've finished
> diagnosing the bug.

Fedora's glibc has an additional issue with the Turkish 'I' that can
be reproduced by the simple C program in:

  https://bugzilla.redhat.com/show_bug.cgi?id=726536


I disabled the test specifically on Fedora because it a) seems to be the
only Linux buildbot where this test fails and b) this does not seem
like a Python issue to me.


Since you say that the fix for issue #1813 might not be right, do
you think that the fix should work around this glibc issue?


> If there are any helpful tests I can run on Gentoo, please let me know.

Yes, you could run the small test program. If you get the same results
as on Fedora, then I wonder why the Gentoo buildbots are green.

Do they have tr_TR and tr_TR.iso88599 installed? 


Stefan Krah



From ronaldoussoren at mac.com  Tue Aug  2 10:40:20 2011
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Tue, 02 Aug 2011 10:40:20 +0200
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Skip
 test_getsetlocale_issue1813() on Fedora due to setlocale() bug.
In-Reply-To: <20110802022221.9FE582506C6@webabinitio.net>
References: <E1Qo1oB-0007HZ-53@dinsdale.python.org>
	<20110802022221.9FE582506C6@webabinitio.net>
Message-ID: <DF7BC001-A4B9-4412-9F9C-14D3DD90EAF2@mac.com>


On 2 Aug, 2011, at 4:22, R. David Murray wrote:

> On Tue, 02 Aug 2011 01:22:03 +0200, stefan.krah <python-checkins at python.org> wrote:
>> http://hg.python.org/cpython/rev/68b5f87566fb
>> changeset:   71683:68b5f87566fb
>> branch:      3.2
>> parent:      71679:1f9ca1819d7c
>> user:        Stefan Krah <skrah at bytereef.org>
>> date:        Tue Aug 02 01:06:16 2011 +0200
>> summary:
>>  Skip test_getsetlocale_issue1813() on Fedora due to setlocale() bug.
>> See: https://bugzilla.redhat.com/show_bug.cgi?id=726536
>> 
>> files:
>>  Lib/test/test_locale.py |  3 +++
>>  1 files changed, 3 insertions(+), 0 deletions(-)
>> 
>> 
>> diff --git a/Lib/test/test_locale.py b/Lib/test/test_locale.py
>> --- a/Lib/test/test_locale.py
>> +++ b/Lib/test/test_locale.py
>> @@ -1,4 +1,5 @@
>> from test.support import run_unittest, verbose
>> +from platform import linux_distribution
>> import unittest
>> import locale
>> import sys
>> @@ -391,6 +392,8 @@
>>         # crasher from bug #7419
>>         self.assertRaises(locale.Error, locale.setlocale, 12345)
>> 
>> +    @unittest.skipIf(linux_distribution()[0] == 'Fedora', "Fedora setlocale() "
>> +                     "bug: https://bugzilla.redhat.com/show_bug.cgi?id=726536")
>>     def test_getsetlocale_issue1813(self):
>>         # Issue #1813: setting and getting the locale under a Turkish locale
>>         oldlocale = locale.setlocale(locale.LC_CTYPE)
> 
> Why 'Fedora'?  This bug affects more than just Fedora:  as I reported on
> the issue, I'm seeing it on Gentoo as well.  (Also, including the issue
> number in the commit message is helpful).
> 
> Note that since the bug report says that "Gentoo has been including this
> fix for two years", the fact that it is failing on my Gentoo system
> would seem to indicate that something about the fix is not right.
> 
> So, I'm not sure this skip is even valid.  I'm not sure we've finished
> diagnosing the bug.
> 
> If there are any helpful tests I can run on Gentoo, please let me know.

Wouldn't it be better to mark this as an expected failure on the affected platforms? Skipping the test unconditionally will skip the test even when Fedora gets around to fixing this issue.

Ronald

> 
> --
> R. David Murray           http://www.bitdance.com
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2224 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110802/74b3caa0/attachment.bin>

From stefan at bytereef.org  Tue Aug  2 12:12:37 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Tue, 2 Aug 2011 12:12:37 +0200
Subject: [Python-Dev] [Python-checkins] cpython (3.2):
	Skip	test_getsetlocale_issue1813() on Fedora due to setlocale() bug.
In-Reply-To: <20110802084807.GA1830@sleipnir.bytereef.org>
References: <E1Qo1oB-0007HZ-53@dinsdale.python.org>
	<20110802022221.9FE582506C6@webabinitio.net>
	<20110802084807.GA1830@sleipnir.bytereef.org>
Message-ID: <20110802101237.GA12366@sleipnir.bytereef.org>

Stefan Krah <stefan at bytereef.org> wrote:
> Fedora's glibc has an additional issue with the Turkish 'I' that can
> be reproduced by the simple C program in:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=726536

OK, this runs successfully on Ubuntu Lucid and FreeBSD (if you change
the first tr_TR to tr_TR.UTF-8).

But it fails on Debian lenny, as does test_getsetlocale_issue1813().


I suspect many buildbots are green because they don't have tr_TR and
tr_TR.iso8859-9 installed.



Synopsis for the people who don't want to wade through the bug reports:


If this is a valid C program ...

#include <stdio.h>
#include <locale.h>
int
main(void)
{
    char *s;
    printf("%s\n", setlocale(LC_CTYPE, "tr_TR"));
    printf("%s\n", setlocale(LC_CTYPE, NULL));
    s = setlocale(LC_CTYPE, "tr_TR.ISO8859-9");
    printf("%s\n", s ? s : "null");
    return 0;
}

..., several systems (Fedora 14, Debian lenny) have a glibc bug that
is exposed by test_getsetlocale_issue1813(). People usually don't
see this because tr_TR and tr_TR.iso8859-9 aren't installed.



Stefan Krah



From scott+python-dev at scottdial.com  Tue Aug  2 12:20:47 2011
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Tue, 02 Aug 2011 06:20:47 -0400
Subject: [Python-Dev] [Python-checkins] cpython (3.2):
 Skip	test_getsetlocale_issue1813() on Fedora due to setlocale() bug.
In-Reply-To: <20110802084807.GA1830@sleipnir.bytereef.org>
References: <E1Qo1oB-0007HZ-53@dinsdale.python.org>
	<20110802022221.9FE582506C6@webabinitio.net>
	<20110802084807.GA1830@sleipnir.bytereef.org>
Message-ID: <4E37CF7F.2020900@scottdial.com>

On 8/2/2011 4:48 AM, Stefan Krah wrote:
> R. David Murray <rdmurray at bitdance.com> wrote:
>> If there are any helpful tests I can run on Gentoo, please let me know.
> 
> Yes, you could run the small test program. If you get the same results
> as on Fedora, then I wonder why the Gentoo buildbots are green.
> 
> Do they have tr_TR and tr_TR.iso88599 installed? 

Highly doubtful. It is a normal part of the Gentoo install process to
select the locales that you want for the system. Even the example list
of locales doesn't include any Turkish locales, so one would've had to
gone to specific effort to add that one.

-- 
Scott Dial
scott at scottdial.com

From rdmurray at bitdance.com  Tue Aug  2 13:31:46 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Tue, 02 Aug 2011 07:31:46 -0400
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Skip
	test_getsetlocale_issue1813() on Fedora due to setlocale() bug.
In-Reply-To: <20110802101237.GA12366@sleipnir.bytereef.org>
References: <E1Qo1oB-0007HZ-53@dinsdale.python.org>
	<20110802022221.9FE582506C6@webabinitio.net>
	<20110802084807.GA1830@sleipnir.bytereef.org>
	<20110802101237.GA12366@sleipnir.bytereef.org>
Message-ID: <20110802113146.E6D312506C6@webabinitio.net>

On Tue, 02 Aug 2011 12:12:37 +0200, Stefan Krah <stefan at bytereef.org> wrote:
> Stefan Krah <stefan at bytereef.org> wrote:
> > Fedora's glibc has an additional issue with the Turkish 'I' that can
> > be reproduced by the simple C program in:
> > 
> >   https://bugzilla.redhat.com/show_bug.cgi?id=726536
> 
> OK, this runs successfully on Ubuntu Lucid and FreeBSD (if you change
> the first tr_TR to tr_TR.UTF-8).
> 
> But it fails on Debian lenny, as does test_getsetlocale_issue1813().
> 
> I suspect many buildbots are green because they don't have tr_TR and
> tr_TR.iso8859-9 installed.

This is true for my Gentoo buildbots.  Once we've figured out the
best way to handle this, I'll fix that (install the other locales) for
my two.

> Synopsis for the people who don't want to wade through the bug reports:
> 
> If this is a valid C program ...
> 
> #include <stdio.h>
> #include <locale.h>
> int
> main(void)
> {
>     char *s;
>     printf("%s\n", setlocale(LC_CTYPE, "tr_TR"));
>     printf("%s\n", setlocale(LC_CTYPE, NULL));
>     s = setlocale(LC_CTYPE, "tr_TR.ISO8859-9");
>     printf("%s\n", s ? s : "null");
>     return 0;
> }
> 
> ..., several systems (Fedora 14, Debian lenny) have a glibc bug that
> is exposed by test_getsetlocale_issue1813(). People usually don't
> see this because tr_TR and tr_TR.iso8859-9 aren't installed.

I get null as the final output of that regardless of whether I use
'tr_TR' or 'tr_TR.utf8'.

This is with glibc-2.13-r2 (the r2 is Gentoo's mod number).

I'll attach this to the bug report, too, perhaps the discussion should
move there.

--
R. David Murray           http://www.bitdance.com

From solipsis at pitrou.net  Tue Aug  2 14:16:01 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 2 Aug 2011 14:16:01 +0200
Subject: [Python-Dev] cpython (3.2): Fix closes Issue12676 - Invalid
 identifier used in TypeError message in
References: <E1QoCIN-0000fo-3c@dinsdale.python.org>
Message-ID: <20110802141601.3a09784a@pitrou.net>

On Tue, 02 Aug 2011 12:33:55 +0200
senthil.kumaran <python-checkins at python.org> wrote:
>                  raise TypeError("data should be a bytes-like object\
> -                        or an iterable, got %r " % type(it))
> +                        or an iterable, got %r " % type(data))

There are still a lot of spaces in your message. You should use string
literal concatenation instead:

                raise TypeError(
                        "data should be a bytes-like object "
                        "or an iterable, got %r"
                        % type(data))
 



From chris at simplistix.co.uk  Tue Aug  2 19:48:11 2011
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 02 Aug 2011 18:48:11 +0100
Subject: [Python-Dev] email-6.0.0.a1
In-Reply-To: <20110719212139.D5D732500D5@webabinitio.net>
References: <20110719212139.D5D732500D5@webabinitio.net>
Message-ID: <4E38385B.4080201@simplistix.co.uk>

On 19/07/2011 22:21, R. David Murray wrote:
> The basic additional API is that a 'source' attribute contains the
> text the generator read from the input source, and a 'value' attribute
> that contains the value with all the Content-Transfer-Encoding stuff
> undone so that you have a real unicode string.  By changing a policy
> setting, you can have that value as the string value of the header.
> You can also assign a string with non-ASCII characters to a header, and
> the right thing will happen.  (Well, eventually it will happen...right
> now it only works correctly for unstructured headers).  Further, Date
> headers have a datetime attribute (and accept being set to a datetime),
> and address headers have attributes for accessing the individual addresses
> in the header.  Other structured headers will eventually grow additional
> attributes as well.

This all sounds pretty awesome, congrats :-)

Has the header wrapping bug that was all part of the big headers mess 
been resolved now?

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From rdmurray at bitdance.com  Tue Aug  2 23:27:06 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Tue, 02 Aug 2011 17:27:06 -0400
Subject: [Python-Dev] email-6.0.0.a1
In-Reply-To: <4E38385B.4080201@simplistix.co.uk>
References: <20110719212139.D5D732500D5@webabinitio.net>
	<4E38385B.4080201@simplistix.co.uk>
Message-ID: <20110802212706.E6099B14005@webabinitio.net>

On Tue, 02 Aug 2011 18:48:11 +0100, Chris Withers <chris at simplistix.co.uk> wrote:
> On 19/07/2011 22:21, R. David Murray wrote:
> > The basic additional API is that a 'source' attribute contains the
> > text the generator read from the input source, and a 'value' attribute
> > that contains the value with all the Content-Transfer-Encoding stuff
> > undone so that you have a real unicode string.  By changing a policy
> > setting, you can have that value as the string value of the header.
> > You can also assign a string with non-ASCII characters to a header, and
> > the right thing will happen.  (Well, eventually it will happen...right
> > now it only works correctly for unstructured headers).  Further, Date
> > headers have a datetime attribute (and accept being set to a datetime),
> > and address headers have attributes for accessing the individual addresses
> > in the header.  Other structured headers will eventually grow additional
> > attributes as well.
> 
> This all sounds pretty awesome, congrats :-)
> 
> Has the header wrapping bug that was all part of the big headers mess 
> been resolved now?

If it is the bug I think you are talking about, it was resolved in 3.2.1.
If there's still an open header wrapping bug (other than the one about
smime and spaces after the ':') please let me know the issue number,
as I don't see any in my list.

There may still be an issue with whitespace padding in the encoded
word context; I haven't tested issue 1467619 since I made my
other changes.  If it is not fixed in 3.2.1 already, it will be
fixed in email6 by the time I finish the new wrapping code for that.

--David

From solipsis at pitrou.net  Wed Aug  3 00:42:28 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 3 Aug 2011 00:42:28 +0200
Subject: [Python-Dev] cpython: NEWS note for bbeda42ea6a8
References: <E1QoNU1-0007BT-Sv@dinsdale.python.org>
Message-ID: <20110803004228.5bde2dc7@pitrou.net>

On Wed, 03 Aug 2011 00:30:41 +0200
benjamin.peterson <python-checkins at python.org> wrote:
> 
> diff --git a/Misc/NEWS b/Misc/NEWS
> --- a/Misc/NEWS
> +++ b/Misc/NEWS
> @@ -10,6 +10,8 @@
>  Core and Builtins
>  -----------------
>  
> +- Add ThreadError to threading.__all__.
> +

This should surely be in the library section.

Regards

Antoine.



From eric at trueblade.com  Wed Aug  3 05:06:42 2011
From: eric at trueblade.com (Eric Smith)
Date: Tue, 02 Aug 2011 23:06:42 -0400
Subject: [Python-Dev] Fwd: [Python-checkins] devguide: Add Sandro to the
	list of core developers
Message-ID: <4E38BB42.6060009@trueblade.com>

Speaking of developers.rst, could whoever added Jason Coombs also update
developers.rst? I've added Jason to the committers mailing list.

Thanks.
Eric.

-------- Original Message --------
Subject: [Python-checkins] devguide: Add Sandro to the list of core
developers
Date: Tue, 02 Aug 2011 14:58:38 +0200
From: antoine.pitrou <python-checkins at python.org>
Reply-To: python-dev at python.org
To: python-checkins at python.org

http://hg.python.org/devguide/rev/2783106b0ccc
changeset:   438:2783106b0ccc
user:        Antoine Pitrou <solipsis at pitrou.net>
date:        Tue Aug 02 14:56:56 2011 +0200
summary:
  Add Sandro to the list of core developers

files:
  developers.rst |  4 ++++
  1 files changed, 4 insertions(+), 0 deletions(-)


diff --git a/developers.rst b/developers.rst
--- a/developers.rst
+++ b/developers.rst
@@ -24,6 +24,10 @@
 Permissions History
 -------------------

+- Sandro Tosi was given push privileges on Aug 1 2011 by Antoine Pitrou,
+  for documentation and other contributions, on recommendation by Ezio
+  Melotti, R. David Murray and others.
+
 - Charles-Fran?ois Natali was given push privileges on May 19 2011 by
Antoine
   Pitrou, for general contributions, on recommandation by Victor Stinner,
   Brian Curtin and others.

-- 
Repository URL: http://hg.python.org/devguide

From g.brandl at gmx.net  Wed Aug  3 08:22:54 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 03 Aug 2011 08:22:54 +0200
Subject: [Python-Dev] cpython: expose sched.h functions (closes #12655)
In-Reply-To: <E1QoNU2-0007Ba-QL@dinsdale.python.org>
References: <E1QoNU2-0007Ba-QL@dinsdale.python.org>
Message-ID: <j1ape9$rut$1@dough.gmane.org>

Am 03.08.2011 00:30, schrieb benjamin.peterson:
> http://hg.python.org/cpython/rev/89e92e684b37
> changeset:   71704:89e92e684b37
> user:        Benjamin Peterson <benjamin at python.org>
> date:        Tue Aug 02 17:30:04 2011 -0500
> summary:
>   expose sched.h functions (closes #12655)

> +static PyObject *
> +posix_sched_setaffinity(PyObject *self, PyObject *args)
> +{
> +    pid_t pid;
> +    Py_cpu_set *cpu_set;
> +
> +    if (!PyArg_ParseTuple(args, _Py_PARSE_PID "O!|sched_setaffinity",

[...]

> +static PyObject *
> +posix_sched_getaffinity(PyObject *self, PyObject *args)
> +{
> +    pid_t pid;
> +    int ncpus;
> +    Py_cpu_set *res;
> +
> +    if (!PyArg_ParseTuple(args, _Py_PARSE_PID "i|sched_getaffinity",

These should be separated by ":", not "|", if I'm not mistaken?

Georg


From solipsis at pitrou.net  Wed Aug  3 15:23:19 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 3 Aug 2011 15:23:19 +0200
Subject: [Python-Dev] cpython (3.2): Fix closes issue12683 - urljoin to
 work with relative join of svn scheme.
References: <E1QoYyx-0003xs-IX@dinsdale.python.org>
Message-ID: <20110803152319.3844f598@pitrou.net>

On Wed, 03 Aug 2011 12:47:23 +0200
senthil.kumaran <python-checkins at python.org> wrote:
> 
> diff --git a/Lib/test/test_urlparse.py b/Lib/test/test_urlparse.py
> --- a/Lib/test/test_urlparse.py
> +++ b/Lib/test/test_urlparse.py
> @@ -371,6 +371,8 @@
>          self.checkJoin('http:///', '..','http:///')
>          self.checkJoin('', 'http://a/b/c/g?y/./x','http://a/b/c/g?y/./x')
>          self.checkJoin('', 'http://a/./g', 'http://a/./g')
> +        self.checkJoin('svn://pathtorepo/dir1', 'dir2', 'svn://pathtorepo/dir2')
> +        self.checkJoin('svn://pathtorepo/dir1', 'dir2', 'svn://pathtorepo/dir2')

This is the same test repeated. Perhaps you meant svn+ssh?

Regards

Antoine.



From senthil at uthcode.com  Wed Aug  3 15:56:57 2011
From: senthil at uthcode.com (Senthil Kumaran)
Date: Wed, 3 Aug 2011 21:56:57 +0800
Subject: [Python-Dev] cpython (3.2): Fix closes Issue12676 - Invalid
 identifier used in TypeError message in
In-Reply-To: <20110802141601.3a09784a@pitrou.net>
References: <E1QoCIN-0000fo-3c@dinsdale.python.org>
	<20110802141601.3a09784a@pitrou.net>
Message-ID: <20110803135657.GA2477@mathmagic>

On Tue, Aug 02, 2011 at 02:16:01PM +0200, Antoine Pitrou wrote:
> There are still a lot of spaces in your message. You should use string

Yes, did not realize that.. :( Georg fixed this in his commit.

Thanks,
Senthil

From senthil at uthcode.com  Wed Aug  3 15:57:35 2011
From: senthil at uthcode.com (Senthil Kumaran)
Date: Wed, 3 Aug 2011 21:57:35 +0800
Subject: [Python-Dev] cpython (3.2): Fix closes issue12683 - urljoin to
 work with relative join of svn scheme.
In-Reply-To: <20110803152319.3844f598@pitrou.net>
References: <E1QoYyx-0003xs-IX@dinsdale.python.org>
	<20110803152319.3844f598@pitrou.net>
Message-ID: <20110803135735.GB2477@mathmagic>

On Wed, Aug 03, 2011 at 03:23:19PM +0200, Antoine Pitrou wrote:
> This is the same test repeated. Perhaps you meant svn+ssh?

oops, thanks for the catch. yes, I did mean svn+ssh. I shall change
it.

-- 
Senthil

From ethan at stoneleaf.us  Wed Aug  3 22:36:00 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 03 Aug 2011 13:36:00 -0700
Subject: [Python-Dev] unittest bug
Message-ID: <4E39B130.4080504@stoneleaf.us>

My apologies for posting here first, but I'm not yet confident enough in 
my bug searching fu, and duplicates are a pain.

Here's the issue:

from unittest import *
class MyTest(TestCase):
     def test_add(self):
         self.assertEqual(1,(2-1),"Sample Subraction Test")


if __name__ == '__main__':
     main()

I know this isn't the normal way to use unittest, but since __init__ 
goes to the trouble of defining __all__ I would think it was supported. 
  However, it doesn't work -- I added some print statements to show 
where the problem lies (in unittest.loader.TestLoader.loadTestsFromModule):

----------------------------------------------------------------------
checking <class 'unittest.case.FunctionTestCase'>
         added
checking <class '__main__.MyTest'>
         added
checking <class 'unittest.case.SkipTest'>
checking <class 'unittest.case.TestCase'>
         added
checking <class 'unittest.loader.TestLoader'>
checking <class 'unittest.result.TestResult'>
checking <class 'unittest.suite.TestSuite'>
checking <class 'unittest.runner.TextTestResult'>
checking <class 'unittest.runner.TextTestRunner'>
checking <module 'builtins' (built-in)>
checking None
checking None
checking 'test_add.py'
checking '__main__'
checking None
checking <unittest.loader.TestLoader object at 0x00C92BF0>
checking <function expectedFailure at 0x00C7D930>
checking <function findTestCases at 0x00C8CA50>
checking <function getTestCaseNames at 0x00C8C9C0>
checking <function installHandler at 0x00C97A08>
checking <class 'unittest.main.TestProgram'>
checking <function makeSuite at 0x00C8CA08>
checking <function registerResult at 0x00C97978>
checking <function removeHandler at 0x00C97A50>
checking <function removeResult at 0x00C979C0>
checking <function skip at 0x00C7D858>
checking <function skipIf at 0x00C7D8A0>
checking <function skipUnless at 0x00C7D8E8>

test =
<unittest.suite.TestSuite tests=[<unittest.suite.TestSuite 
tests=[<unittest.case.FunctionTestCase tec=runTest>]>, 
<unittest.suite.TestSuite tests=[<__main__.
MyTest testMethod=test_add>]>, <unittest.suite.TestSuite tests=[]>]>
---------------------------------------------------------------------

compared with running using the `import unittest` method:
---------------------------------------------------------------------
checking <class '__main__.MyTest'>
         added
checking <module 'builtins' (built-in)>
checking None
checking None
checking 'test_add_right.py'
checking '__main__'
checking None
checking <module 'unittest' from 'C:\python32\lib\unittest\__init__.py'>

test =
<unittest.suite.TestSuite tests=[<unittest.suite.TestSuite 
tests=[<__main__.MyTest testMethod=test_add>]>]>
---------------------------------------------------------------------

As you can see, the TestLoader is getting false positives from 
case.FunctionTestCase and case.TestCase.  This a problem because, 
besides running more tests than it should, this happens:

E.
======================================================================
Traceback (most recent call last):
   File "test_add.py", line 8, in <module>
     main()
   File "C:\python32\lib\unittest\main.py", line 125, in __init__
     self.runTests()
   File "C:\python32\lib\unittest\main.py", line 271, in runTests
     self.result = testRunner.run(self.test)
   File "C:\python32\lib\unittest\runner.py", line 175, in run
     result.printErrors()
   File "C:\python32\lib\unittest\runner.py", line 109, in printErrors
     self.printErrorList('ERROR', self.errors)
   File "C:\python32\lib\unittest\runner.py", line 115, in printErrorList
     self.stream.writeln("%s: %s" % (flavour,self.getDescription(test)))
   File "C:\python32\lib\unittest\runner.py", line 47, in getDescription
     return '\n'.join((str(test), doc_first_line))
   File "C:\python32\lib\unittest\case.py", line 1246, in __str__
     self._testFunc.__name__)
AttributeError: 'str' object has no attribute '__name__'

I'll be happy to file a bug report if someone can confirm this hasn't 
already been filed.

Thanks for the help!

~Ethan~

PS
No, that's not my code. ;)

From fuzzyman at voidspace.org.uk  Wed Aug  3 23:32:22 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Wed, 3 Aug 2011 22:32:22 +0100
Subject: [Python-Dev] unittest bug
In-Reply-To: <4E39B130.4080504@stoneleaf.us>
References: <4E39B130.4080504@stoneleaf.us>
Message-ID: <0EFECB4D-72D8-4B71-B3CD-C28B20A662C3@voidspace.org.uk>

On 3 Aug 2011, at 21:36, Ethan Furman wrote:
> My apologies for posting here first, but I'm not yet confident enough in my bug searching fu, and duplicates are a pain.
> 
> Here's the issue:
> 
> from unittest import *


That's the bug right there. Just import TestCase and main and everything should work fine. Using "import *" is not recommended except at the interactive interpreter and it doesn't play well with unittest.main which does magic introspection to find tests to run.

Michael

> class MyTest(TestCase):
>    def test_add(self):
>        self.assertEqual(1,(2-1),"Sample Subraction Test")
> 
> 
> if __name__ == '__main__':
>    main()
> 
> I know this isn't the normal way to use unittest, but since __init__ goes to the trouble of defining __all__ I would think it was supported.  However, it doesn't work -- I added some print statements to show where the problem lies (in unittest.loader.TestLoader.loadTestsFromModule):
> 
> ----------------------------------------------------------------------
> checking <class 'unittest.case.FunctionTestCase'>
>        added
> checking <class '__main__.MyTest'>
>        added
> checking <class 'unittest.case.SkipTest'>
> checking <class 'unittest.case.TestCase'>
>        added
> checking <class 'unittest.loader.TestLoader'>
> checking <class 'unittest.result.TestResult'>
> checking <class 'unittest.suite.TestSuite'>
> checking <class 'unittest.runner.TextTestResult'>
> checking <class 'unittest.runner.TextTestRunner'>
> checking <module 'builtins' (built-in)>
> checking None
> checking None
> checking 'test_add.py'
> checking '__main__'
> checking None
> checking <unittest.loader.TestLoader object at 0x00C92BF0>
> checking <function expectedFailure at 0x00C7D930>
> checking <function findTestCases at 0x00C8CA50>
> checking <function getTestCaseNames at 0x00C8C9C0>
> checking <function installHandler at 0x00C97A08>
> checking <class 'unittest.main.TestProgram'>
> checking <function makeSuite at 0x00C8CA08>
> checking <function registerResult at 0x00C97978>
> checking <function removeHandler at 0x00C97A50>
> checking <function removeResult at 0x00C979C0>
> checking <function skip at 0x00C7D858>
> checking <function skipIf at 0x00C7D8A0>
> checking <function skipUnless at 0x00C7D8E8>
> 
> test =
> <unittest.suite.TestSuite tests=[<unittest.suite.TestSuite tests=[<unittest.case.FunctionTestCase tec=runTest>]>, <unittest.suite.TestSuite tests=[<__main__.
> MyTest testMethod=test_add>]>, <unittest.suite.TestSuite tests=[]>]>
> ---------------------------------------------------------------------
> 
> compared with running using the `import unittest` method:
> ---------------------------------------------------------------------
> checking <class '__main__.MyTest'>
>        added
> checking <module 'builtins' (built-in)>
> checking None
> checking None
> checking 'test_add_right.py'
> checking '__main__'
> checking None
> checking <module 'unittest' from 'C:\python32\lib\unittest\__init__.py'>
> 
> test =
> <unittest.suite.TestSuite tests=[<unittest.suite.TestSuite tests=[<__main__.MyTest testMethod=test_add>]>]>
> ---------------------------------------------------------------------
> 
> As you can see, the TestLoader is getting false positives from case.FunctionTestCase and case.TestCase.  This a problem because, besides running more tests than it should, this happens:
> 
> E.
> ======================================================================
> Traceback (most recent call last):
>  File "test_add.py", line 8, in <module>
>    main()
>  File "C:\python32\lib\unittest\main.py", line 125, in __init__
>    self.runTests()
>  File "C:\python32\lib\unittest\main.py", line 271, in runTests
>    self.result = testRunner.run(self.test)
>  File "C:\python32\lib\unittest\runner.py", line 175, in run
>    result.printErrors()
>  File "C:\python32\lib\unittest\runner.py", line 109, in printErrors
>    self.printErrorList('ERROR', self.errors)
>  File "C:\python32\lib\unittest\runner.py", line 115, in printErrorList
>    self.stream.writeln("%s: %s" % (flavour,self.getDescription(test)))
>  File "C:\python32\lib\unittest\runner.py", line 47, in getDescription
>    return '\n'.join((str(test), doc_first_line))
>  File "C:\python32\lib\unittest\case.py", line 1246, in __str__
>    self._testFunc.__name__)
> AttributeError: 'str' object has no attribute '__name__'
> 
> I'll be happy to file a bug report if someone can confirm this hasn't already been filed.
> 
> Thanks for the help!
> 
> ~Ethan~
> 
> PS
> No, that's not my code. ;)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
> 




--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html






From ethan at stoneleaf.us  Wed Aug  3 23:58:31 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 03 Aug 2011 14:58:31 -0700
Subject: [Python-Dev] unittest bug
In-Reply-To: <0EFECB4D-72D8-4B71-B3CD-C28B20A662C3@voidspace.org.uk>
References: <4E39B130.4080504@stoneleaf.us>
	<0EFECB4D-72D8-4B71-B3CD-C28B20A662C3@voidspace.org.uk>
Message-ID: <4E39C487.9090403@stoneleaf.us>

Michael Foord wrote:
> On 3 Aug 2011, at 21:36, Ethan Furman wrote:
>> My apologies for posting here first, but I'm not yet confident enough in my bug searching fu, and duplicates are a pain.
>>
>> Here's the issue:
>>
>> from unittest import *
> 
> That's the bug right there. Just import TestCase and main and everything should work fine. Using "import *" is not recommended except at the interactive interpreter and it doesn't play well with unittest.main which does magic introspection to find tests to run.

If from xxx import * is not supported, why provide __all__?  At the very 
least the lack of a warning is a documentation bug.

~Ethan~

From fuzzyman at voidspace.org.uk  Wed Aug  3 23:44:50 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Wed, 3 Aug 2011 22:44:50 +0100
Subject: [Python-Dev] unittest bug
In-Reply-To: <4E39C487.9090403@stoneleaf.us>
References: <4E39B130.4080504@stoneleaf.us>
	<0EFECB4D-72D8-4B71-B3CD-C28B20A662C3@voidspace.org.uk>
	<4E39C487.9090403@stoneleaf.us>
Message-ID: <EA5285EA-004E-4B67-9AF0-A166B58AA838@voidspace.org.uk>

On 3 Aug 2011, at 22:58, Ethan Furman wrote:
> Michael Foord wrote:
>> On 3 Aug 2011, at 21:36, Ethan Furman wrote:
>>> My apologies for posting here first, but I'm not yet confident enough in my bug searching fu, and duplicates are a pain.
>>> 
>>> Here's the issue:
>>> 
>>> from unittest import *
>> That's the bug right there. Just import TestCase and main and everything should work fine. Using "import *" is not recommended except at the interactive interpreter and it doesn't play well with unittest.main which does magic introspection to find tests to run.
> 
> If from xxx import * is not supported, why provide __all__?  

a) to define the public API
b) to limit the symbols exported - that is not the same as having main(?) work with import *, they're orthogonal

> At the very least the lack of a warning is a documentation bug.
> 

Feel free to propose a patch fixing that problem (on the issue tracker please).

All the best,

Michael Foord

> ~Ethan~
> 




--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html






From ethan at stoneleaf.us  Thu Aug  4 01:00:52 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 03 Aug 2011 16:00:52 -0700
Subject: [Python-Dev] unittest bug
In-Reply-To: <EA5285EA-004E-4B67-9AF0-A166B58AA838@voidspace.org.uk>
References: <4E39B130.4080504@stoneleaf.us>
	<0EFECB4D-72D8-4B71-B3CD-C28B20A662C3@voidspace.org.uk>
	<4E39C487.9090403@stoneleaf.us>
	<EA5285EA-004E-4B67-9AF0-A166B58AA838@voidspace.org.uk>
Message-ID: <4E39D324.6050600@stoneleaf.us>

Michael Foord wrote:
> On 3 Aug 2011, at 22:58, Ethan Furman wrote:
>> Michael Foord wrote:
>>> On 3 Aug 2011, at 21:36, Ethan Furman wrote:
>>>> My apologies for posting here first, but I'm not yet confident enough in my bug searching fu, and duplicates are a pain.
>>>>
>>>> Here's the issue:
>>>>
>>>> from unittest import *
 >>>
>>> That's the bug right there. Just import TestCase and main and everything should work fine. Using "import *" is not recommended except at the interactive interpreter and it doesn't play well with unittest.main which does magic introspection to find tests to run.
 >>
>> If from xxx import * is not supported, why provide __all__?  
> 
> a) to define the public API

In trying to refute this, I found 
http://docs.python.org/py3k/reference/simple_stmts.html?highlight=__all__#the-import-statement
and learned something new.  Thanks, Michael!

I think I'll withdraw my bug report, however -- since
`from ... import *` is already noted as usually bad practice it should 
fall on the shoulders of the modules where it is /supposed/ to work to 
advertise that, and absent any such advertisement it should not be used.

~Ethan~

From g.brandl at gmx.net  Thu Aug  4 07:42:22 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 04 Aug 2011 07:42:22 +0200
Subject: [Python-Dev] Daily reference leaks (65c412586901): sum=0
In-Reply-To: <E1QooYj-0006z2-Ga@ap.vmr.nerim.net>
References: <E1QooYj-0006z2-Ga@ap.vmr.nerim.net>
Message-ID: <j1dbe8$oii$1@dough.gmane.org>

Am 04.08.2011 05:25, schrieb solipsis at pitrou.net:
> results for 65c412586901 on branch "default" 
> --------------------------------------------
> 
> 
> 
> Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R',
> '3:3:/home/antoine/cpython/refleaks/reflogso7nu3', '-x']

Do we need this mail even if there are no leaks to report?

Georg


From ncoghlan at gmail.com  Thu Aug  4 07:54:54 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 4 Aug 2011 15:54:54 +1000
Subject: [Python-Dev] Daily reference leaks (65c412586901): sum=0
In-Reply-To: <j1dbe8$oii$1@dough.gmane.org>
References: <E1QooYj-0006z2-Ga@ap.vmr.nerim.net> <j1dbe8$oii$1@dough.gmane.org>
Message-ID: <CADiSq7fmnLwN3Xzsz5aeBbLTd-kar-2Z6nth06H4XZ8Q=Ragdw@mail.gmail.com>

On Thu, Aug 4, 2011 at 3:42 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> Am 04.08.2011 05:25, schrieb solipsis at pitrou.net:
>> results for 65c412586901 on branch "default"
>> --------------------------------------------
>>
>>
>>
>> Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R',
>> '3:3:/home/antoine/cpython/refleaks/reflogso7nu3', '-x']
>
> Do we need this mail even if there are no leaks to report?

I find it useful in order to tell the difference between "no leaks to
report" and "refleak checking job is no longer running"

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From solipsis at pitrou.net  Thu Aug  4 13:12:00 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 4 Aug 2011 13:12:00 +0200
Subject: [Python-Dev] Daily reference leaks (65c412586901): sum=0
References: <E1QooYj-0006z2-Ga@ap.vmr.nerim.net> <j1dbe8$oii$1@dough.gmane.org>
	<CADiSq7fmnLwN3Xzsz5aeBbLTd-kar-2Z6nth06H4XZ8Q=Ragdw@mail.gmail.com>
Message-ID: <20110804131200.026acdc0@pitrou.net>

On Thu, 4 Aug 2011 15:54:54 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Thu, Aug 4, 2011 at 3:42 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> > Am 04.08.2011 05:25, schrieb solipsis at pitrou.net:
> >> results for 65c412586901 on branch "default"
> >> --------------------------------------------
> >>
> >>
> >>
> >> Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R',
> >> '3:3:/home/antoine/cpython/refleaks/reflogso7nu3', '-x']
> >
> > Do we need this mail even if there are no leaks to report?
> 
> I find it useful in order to tell the difference between "no leaks to
> report" and "refleak checking job is no longer running"

That's exactly why I'm sending it every day :)

Regards

Antoine.



From rpjday at crashcourse.ca  Fri Aug  5 15:01:01 2011
From: rpjday at crashcourse.ca (Robert P. J. Day)
Date: Fri, 5 Aug 2011 09:01:01 -0400 (EDT)
Subject: [Python-Dev] what is the significance of "plat-linux2" in the
 python build process?
Message-ID: <alpine.DEB.2.02.1108050739250.14752@localhost6.localdomain6>


  (note:  i'm not a python dev subscriber so please make sure you CC
me with any advice and i'm hoping this desperate plea for assistance
is at least enough on-point for this list that someone can help me
out.  and, yes, this is rather verbose but i wanted to supply all of
the relevant details in one shot.)

  i asked about this on the general python help list but i suspect
it's more appropriate to ask developers about this and i'm hoping
someone can clear this up for me.

  i'm building an embedded system using wind river linux 4.2 (WRL
4.2), and part of that build process involves downloading, patching
and compiling python 2.6.2 for the eventual target filesystem.  this
is all being done on my 64-bit ubuntu 11.04 system and, to keep things
simple, i'm not even cross-compiling, i've selected a common 64-bit PC
as the target, so i should be able to ignore any cross compile-related
glitches i might have had.

  this build process works just fine for everyone else on the planet
but it fails for me because i'm doing something apparently no one else
has tried -- i'm running a (hand-rolled) linux 3.x kernel on my build
host and it *seems* that that's what's messing up the python
compilation somewhere in the WRL build scripts.  (as a side note, i
have run across other issues in the WRL build system where it was
simply never imagined that one might want to build on a system running
a 3.x kernel, so that's why i'm suspecting it has something to do with
that.  apparently, i'm out there on the bleeding edge and this is what
might be causing me grief.)

  the symptom seems to be that there is confusion in the python build
process between two directories: "plat-linux2" and "plat-linux3".  my
first simple question would be: what do those names represent and how
should they appear in the build process?

  as a benchmark, i downloaded an *absolutely stock* Python-2.6.2
tarball, untarred it, ran "./configure", then searched for any
references to those strings just so i could have a basis for
comparison.  so, immediately after the configure, here's what i found
for the stock 2.6.2 python tarball:

$ find . -name "plat-linux*"
./Lib/plat-linux2
$

$ grep -r plat-linux *
Doc/install/index.rst:   ['', '/usr/local/lib/python2.3',
'/usr/local/lib/python2.3/plat-linux2',
Doc/install/index.rst:'/www/python/lib/pythonX.Y/plat-linux2', ...]``.
Misc/RPM/python-2.6.spec:%{__prefix}/%{libdirname}/python%{libvers}/plat-linux2
Misc/HISTORY:	* Lib/plat-sunos5/CDIO.py, Lib/plat-linux2/CDROM.py:
Misc/HISTORY:e.g. lib-tk, lib-stdwin, plat-win, plat-linux2,
plat-sunos5, dos-8x3.
$

  so, before the build, there are a few references to plat-linux2 and
*none* to plat-linux3.  i then ran "make" (which seemed to work just
fine) and here's the result of a similar search after the make:

$ find . -name "plat-linux*"
./Lib/plat-linux2
$

  so that's still the same after the build, there is no plat-linux3
file or directory that's been created.  however, if i do a recursive
grep:

$ grep -r plat-linux3 *
Binary file libpython2.6.a matches
Binary file Modules/getpath.o matches
Binary file python matches
$

  then it's clear that the string "plat-linux3" is now embedded in a
small number of the build results.  so what does that string
represent?  why is it there and what does "2" mean compared to "3" in
this context?  and, most importantly, even though it's there, it
didn't stop the build from completing.

  now take a look at the tail end of the output of the WRL build of
python, where things go wrong (what is clearly a packaging step so i'm
well aware that this is somewhat outside the scope of normal python
building):

===== begin output =====
... snip ...
Checking for unpackaged file(s): /home/rpjday/WindRiver/projects/42/python/host-cross/bin/../lib64/rpm/check-files /home/rpjday/WindRiver/projects/42/python/build/INSTALL_STAGE/python-2.6.2
error: Installed (but unpackaged) file(s) found:
   /usr/lib64/python2.6/plat-linux3/IN.py
   /usr/lib64/python2.6/plat-linux3/IN.pyc
   /usr/lib64/python2.6/plat-linux3/IN.pyo
   /usr/lib64/python2.6/plat-linux3/regen


RPM build errors:
    File not found: /home/rpjday/WindRiver/projects/42/python/build/INSTALL_STAGE/python-2.6.2/usr/lib64/python2.6/plat-linux2
    Installed (but unpackaged) file(s) found:
   /usr/lib64/python2.6/plat-linux3/IN.py
   /usr/lib64/python2.6/plat-linux3/IN.pyc
   /usr/lib64/python2.6/plat-linux3/IN.pyo
   /usr/lib64/python2.6/plat-linux3/regen
/home/rpjday/WindRiver/projects/42/python/scripts/packages.mk:2661: *** [python.install] Error 1
/home/rpjday/WindRiver/projects/42/python/scripts/packages.mk:3017: *** [python.buildlogger] Error 2
/home/rpjday/WindRiver/projects/42/python/scripts/packages.mk:3225: *** [python.build] Error 2
make: *** [python] Error 2
make: Leaving directory `/home/rpjday/WindRiver/projects/42/python/build'

===== end output =====

  note the obvious reference to this "plat-linux3" directory that
appears out of nowhere that never existed in the stock build.  and if
i wander down to the WRL python build directory:

$ find . -name plat-linux*
./Lib/plat-linux2
./Lib/plat-linux3
$

$ find Lib/plat-linux[23]
Lib/plat-linux2
Lib/plat-linux2/CDROM.py
Lib/plat-linux2/DLFCN.py
Lib/plat-linux2/IN.py
Lib/plat-linux2/regen
Lib/plat-linux2/TYPES.py
Lib/plat-linux3
Lib/plat-linux3/IN.py
Lib/plat-linux3/regen
$

  and this is as far as i got before confusion set in.  it seems that
the WRL build process is getting confused about which of those two
values to use for the build and ends up scattering generated artifacts
across both directories, at which point the packaging step gets
confused and gives up.

  if anyone can clarify what might be going on here and what *should*
be happening, i'd be grateful.  i realize i'm asking for remote
diagnosis on a proprietary build system, i'm just wondering what a
native python build *should* look like and what it should produce.
thanks muchly for any guidance.

rday

-- 

========================================================================
Robert P. J. Day                                 Ottawa, Ontario, CANADA
                        http://crashcourse.ca

Twitter:                                       http://twitter.com/rpjday
LinkedIn:                               http://ca.linkedin.com/in/rpjday
========================================================================

From solipsis at pitrou.net  Fri Aug  5 15:31:09 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 5 Aug 2011 15:31:09 +0200
Subject: [Python-Dev] what is the significance of "plat-linux2" in the
 python build process?
References: <alpine.DEB.2.02.1108050739250.14752@localhost6.localdomain6>
Message-ID: <20110805153109.79c181b9@pitrou.net>

On Fri, 5 Aug 2011 09:01:01 -0400 (EDT)
"Robert P. J. Day" <rpjday at crashcourse.ca> wrote:
> 
>   this build process works just fine for everyone else on the planet
> but it fails for me because i'm doing something apparently no one else
> has tried -- i'm running a (hand-rolled) linux 3.x kernel on my build
> host and it *seems* that that's what's messing up the python
> compilation somewhere in the WRL build scripts.  (as a side note, i
> have run across other issues in the WRL build system where it was
> simply never imagined that one might want to build on a system running
> a 3.x kernel, so that's why i'm suspecting it has something to do with
> that.  apparently, i'm out there on the bleeding edge and this is what
> might be causing me grief.)

You could take a look at http://bugs.python.org/issue12326
The current 2.7 branch should work for you, you'll have to get it from
the Mercurial repository.

That says, the plat-* stuff is quite useless as it is. There's a patch
here to improve its usefulness slightly:
http://bugs.python.org/issue12619

Although I would question the existence of such undocumented modules,
which are hardly even used internally.

Regards

Antoine.



From status at bugs.python.org  Fri Aug  5 18:07:26 2011
From: status at bugs.python.org (Python tracker)
Date: Fri,  5 Aug 2011 18:07:26 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20110805160726.BD8A21D084@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-07-29 - 2011-08-05)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    2899 (+10)
  closed 21579 (+32)
  total  24478 (+42)

Open issues with patches: 1255 


Issues opened (27)
==================

#1813: Codec lookup failing under turkish locale
http://bugs.python.org/issue1813  reopened by skrah

#9723: Add shlex.quote
http://bugs.python.org/issue9723  reopened by eric.araujo

#12656: test.test_asyncore: add tests for AF_INET6 and AF_UNIX sockets
http://bugs.python.org/issue12656  opened by neologix

#12657: Cannot override JSON encoding of basic type subclasses
http://bugs.python.org/issue12657  opened by barry

#12659: Add tests for packaging.tests.support
http://bugs.python.org/issue12659  opened by eric.araujo

#12660: test_gdb fails when installed
http://bugs.python.org/issue12660  opened by pitrou

#12661: Add a new shutil.cleartree function to shutil module
http://bugs.python.org/issue12661  opened by chin

#12662: Allow configparser to process suplicate options
http://bugs.python.org/issue12662  opened by ojab

#12666: map semantic change not documented in What's New
http://bugs.python.org/issue12666  opened by jason.coombs

#12668: 3.2 What's New: it's integer->string, not the opposite
http://bugs.python.org/issue12668  opened by sandro.tosi

#12669: test_curses skipped on buildbots
http://bugs.python.org/issue12669  opened by nadeem.vawda

#12672: Some problems in documentation extending/newtypes.html
http://bugs.python.org/issue12672  opened by eli.bendersky

#12675: tokenize module happily tokenizes code with syntax errors
http://bugs.python.org/issue12675  opened by Gareth.Rees

#12677: Turtle, fix right/left rotation orientation
http://bugs.python.org/issue12677  opened by sandro.tosi

#12678: test_packaging and test_distutils failures under Windows
http://bugs.python.org/issue12678  opened by pitrou

#12680: cPickle.loads is not thread safe due to non-thread-safe import
http://bugs.python.org/issue12680  opened by Sagiv.Malihi

#12681: unittest expectedFailure could take a message argument like sk
http://bugs.python.org/issue12681  opened by r.david.murray

#12682: Meaning of 'accepted' resolution as documented in devguide
http://bugs.python.org/issue12682  opened by r.david.murray

#12684: profile does not dump stats on exception like cProfile does
http://bugs.python.org/issue12684  opened by anacrolix

#12686: argparse - document (and improve?) use of SUPPRESS with help=
http://bugs.python.org/issue12686  opened by derks

#12687: Python 3.2 fails to load protocol 0 pickle
http://bugs.python.org/issue12687  opened by vinay.sajip

#12690: Tix bug 2643483
http://bugs.python.org/issue12690  opened by Gary.Levin

#12691: tokenize.untokenize is broken
http://bugs.python.org/issue12691  opened by Gareth.Rees

#12692: test_urllib2net is triggering a ResourceWarning
http://bugs.python.org/issue12692  opened by brett.cannon

#12693: test.support.transient_internet prints to stderr when verbose 
http://bugs.python.org/issue12693  opened by brett.cannon

#12694: crlf.py script from Tools doesn't work with Python 3.2
http://bugs.python.org/issue12694  opened by bialix

#12696: pydoc error page due to lacking permissions on ./*
http://bugs.python.org/issue12696  opened by gagern



Most recent 15 issues with no replies (15)
==========================================

#12696: pydoc error page due to lacking permissions on ./*
http://bugs.python.org/issue12696

#12694: crlf.py script from Tools doesn't work with Python 3.2
http://bugs.python.org/issue12694

#12684: profile does not dump stats on exception like cProfile does
http://bugs.python.org/issue12684

#12672: Some problems in documentation extending/newtypes.html
http://bugs.python.org/issue12672

#12668: 3.2 What's New: it's integer->string, not the opposite
http://bugs.python.org/issue12668

#12662: Allow configparser to process suplicate options
http://bugs.python.org/issue12662

#12660: test_gdb fails when installed
http://bugs.python.org/issue12660

#12659: Add tests for packaging.tests.support
http://bugs.python.org/issue12659

#12657: Cannot override JSON encoding of basic type subclasses
http://bugs.python.org/issue12657

#12656: test.test_asyncore: add tests for AF_INET6 and AF_UNIX sockets
http://bugs.python.org/issue12656

#12653: Provide accelerators for all buttons in Windows installers
http://bugs.python.org/issue12653

#12645: test.support. import_fresh_module - incorrect doc
http://bugs.python.org/issue12645

#12639: msilib Directory.start_component() fails if keyfile is not Non
http://bugs.python.org/issue12639

#12623: "universal newlines" subprocess support broken with select- an
http://bugs.python.org/issue12623

#12622: failfast argument to TextTestRunner not documented
http://bugs.python.org/issue12622



Most recent 15 issues waiting for review (15)
=============================================

#12684: profile does not dump stats on exception like cProfile does
http://bugs.python.org/issue12684

#12677: Turtle, fix right/left rotation orientation
http://bugs.python.org/issue12677

#12668: 3.2 What's New: it's integer->string, not the opposite
http://bugs.python.org/issue12668

#12661: Add a new shutil.cleartree function to shutil module
http://bugs.python.org/issue12661

#12656: test.test_asyncore: add tests for AF_INET6 and AF_UNIX sockets
http://bugs.python.org/issue12656

#12652: Keep test.support docs out of the global docs index
http://bugs.python.org/issue12652

#12650: Subprocess leaks fd upon kill()
http://bugs.python.org/issue12650

#12646: zlib.Decompress.decompress/flush do not raise any exceptions w
http://bugs.python.org/issue12646

#12639: msilib Directory.start_component() fails if keyfile is not Non
http://bugs.python.org/issue12639

#12633: sys.modules doc entry should reflect restrictions
http://bugs.python.org/issue12633

#12627: Implement PEP 394: The "python" Command on Unix-Like Systems
http://bugs.python.org/issue12627

#12625: sporadic test_unittest failure
http://bugs.python.org/issue12625

#12619: Automatically regenerate platform-specific modules
http://bugs.python.org/issue12619

#12618: py_compile cannot create files in current directory
http://bugs.python.org/issue12618

#12614: Allow to explicitly set the method of urllib.request.Request
http://bugs.python.org/issue12614



Top 10 most discussed issues (10)
=================================

#12675: tokenize module happily tokenizes code with syntax errors
http://bugs.python.org/issue12675   8 msgs

#11049: add tests for test.support
http://bugs.python.org/issue11049   7 msgs

#7424: segmentation fault in listextend during install
http://bugs.python.org/issue7424   6 msgs

#11572: bring Lib/copy.py to 100% coverage
http://bugs.python.org/issue11572   6 msgs

#12648: Wrong import module search order on Windows
http://bugs.python.org/issue12648   6 msgs

#12682: Meaning of 'accepted' resolution as documented in devguide
http://bugs.python.org/issue12682   6 msgs

#1813: Codec lookup failing under turkish locale
http://bugs.python.org/issue1813   5 msgs

#12652: Keep test.support docs out of the global docs index
http://bugs.python.org/issue12652   5 msgs

#8639: Allow callable objects in inspect.getfullargspec
http://bugs.python.org/issue8639   4 msgs

#9968: cgi.FieldStorage: Give control about the directory used for up
http://bugs.python.org/issue9968   4 msgs



Issues closed (33)
==================

#9788: atexit and execution order
http://bugs.python.org/issue9788  closed by eric.araujo

#11104: distutils sdist ignores MANIFEST
http://bugs.python.org/issue11104  closed by eric.araujo

#11281: smtplib: add ability to bind to specific source IP address/por
http://bugs.python.org/issue11281  closed by python-dev

#11651: Improve test targets in Makefile
http://bugs.python.org/issue11651  closed by nadeem.vawda

#11699: Doc for optparse.OptionParser.get_option_group is wrong
http://bugs.python.org/issue11699  closed by eli.bendersky

#11933: newer() function in dep_util.py mixes up new vs. old files due
http://bugs.python.org/issue11933  closed by eric.araujo

#12183: Document behaviour of shutil.copy2 and copystat with symlinks
http://bugs.python.org/issue12183  closed by python-dev

#12295: Fix ResourceWarning in turtledemo help window
http://bugs.python.org/issue12295  closed by eric.araujo

#12331: lib2to3 and packaging tests fail because they write into prote
http://bugs.python.org/issue12331  closed by eric.araujo

#12464: tempfile.TemporaryDirectory.cleanup follows symbolic links
http://bugs.python.org/issue12464  closed by neologix

#12531: documentation index entries for * and **
http://bugs.python.org/issue12531  closed by eli.bendersky

#12540: "Restart Shell" command leaves pythonw.exe processes running
http://bugs.python.org/issue12540  closed by ned.deily

#12562: calling mmap twice fails on Windows
http://bugs.python.org/issue12562  closed by pitrou

#12626: run test cases based on a glob filter
http://bugs.python.org/issue12626  closed by pitrou

#12631: Mutable Sequence Type in .remove() is consistent only with lis
http://bugs.python.org/issue12631  closed by petri.lehtinen

#12654: sum() works with bytes objects
http://bugs.python.org/issue12654  closed by benjamin.peterson

#12655: Expose sched.h functions
http://bugs.python.org/issue12655  closed by python-dev

#12658: Build fails in a non-checkout directory
http://bugs.python.org/issue12658  closed by pitrou

#12663: ArgumentParser.error writes to stderr not to stdout
http://bugs.python.org/issue12663  closed by python-dev

#12664: Path variable - Windows installer
http://bugs.python.org/issue12664  closed by r.david.murray

#12665: Dictionary view example has error in set ops
http://bugs.python.org/issue12665  closed by sandro.tosi

#12667: Better logging.handler.SMTPHandler doc for 'secure' argument
http://bugs.python.org/issue12667  closed by python-dev

#12670: Fix struct code after forward declaration on ctypes doc
http://bugs.python.org/issue12670  closed by sandro.tosi

#12671: urlopen returning empty string
http://bugs.python.org/issue12671  closed by mrabarnett

#12673: SEGFAULT error on OpenBSD (sparc)
http://bugs.python.org/issue12673  closed by r.david.murray

#12674: pydoc str.split does not find the method
http://bugs.python.org/issue12674  closed by r.david.murray

#12676: Bug in http.client
http://bugs.python.org/issue12676  closed by python-dev

#12679: ThreadError is not in threading.__all__
http://bugs.python.org/issue12679  closed by python-dev

#12683: urlparse.urljoin different behavior for different scheme
http://bugs.python.org/issue12683  closed by python-dev

#12685: The backslash escape doesn't concatenate two strings in one in
http://bugs.python.org/issue12685  closed by benjamin.peterson

#12688: ConfigParser.__init__(iterpolation=None) documentation != beha
http://bugs.python.org/issue12688  closed by lukasz.langa

#12689: IDLE crashes after pressing ctrl+space
http://bugs.python.org/issue12689  closed by r.david.murray

#12695: subprocess.Popen: OSError: [Errno 9] Bad file descriptor
http://bugs.python.org/issue12695  closed by gagern

From chris at simplistix.co.uk  Fri Aug  5 19:35:56 2011
From: chris at simplistix.co.uk (Chris Withers)
Date: Fri, 05 Aug 2011 18:35:56 +0100
Subject: [Python-Dev] cpython (2.7): note Ellipsis syntax
In-Reply-To: <C8B25BD0-D450-467F-900C-693A3F243F1F@gmail.com>
References: <E1QnB1X-0007a7-Ae@dinsdale.python.org>
	<j11e0p$v9e$1@dough.gmane.org>
	<CAPZV6o_sFBEptadqdP738Qs5xbzxCraXLRJissC9zbrQ2x9KTg@mail.gmail.com>
	<j12slu$pcc$1@dough.gmane.org>
	<C8B25BD0-D450-467F-900C-693A3F243F1F@gmail.com>
Message-ID: <4E3C29FC.8090501@simplistix.co.uk>

On 31/07/2011 07:47, Raymond Hettinger wrote:
>
> It's really nice for stub functions:
>
> def foo(x):
>      ...

I guess pass is too pass-??

;-)

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From jimjjewett at gmail.com  Fri Aug  5 23:55:33 2011
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 5 Aug 2011 17:55:33 -0400
Subject: [Python-Dev] [Python-checkins] cpython: #11572: improvements to
 copy module tests along with removal of old test suite
In-Reply-To: <E1QpRb2-0005rw-8H@dinsdale.python.org>
References: <E1QpRb2-0005rw-8H@dinsdale.python.org>
Message-ID: <CA+OGgf5PMZn1Sn5=xNmfY=Kh5jeQd1pkoU-7A5C5ehTHtXLJOQ@mail.gmail.com>

Why was the old test suite removed?

Even if everything is covered by the test file (and that isn't clear
from this checkin), I don't see anything wrong with a quick test that
doesn't require loading the whole testing apparatus.  (I would have no
objection to including a comment saying that the majority of the tests
are in the test file; I just wonder why they have to be removed
entirely.)

On Fri, Aug 5, 2011 at 5:06 PM, sandro.tosi <python-checkins at python.org> wrote:
> http://hg.python.org/cpython/rev/74e79b2c114a
> changeset: ? 71749:74e79b2c114a
> user: ? ? ? ?Sandro Tosi <sandro.tosi at gmail.com>
> date: ? ? ? ?Fri Aug 05 23:05:35 2011 +0200
> summary:
> ?#11572: improvements to copy module tests along with removal of old test suite
>
> files:
> ?Lib/copy.py ? ? ? ? ? | ? 65 -----------
> ?Lib/test/test_copy.py | ?168 ++++++++++++++++-------------
> ?2 files changed, 95 insertions(+), 138 deletions(-)
>
>
> diff --git a/Lib/copy.py b/Lib/copy.py
> --- a/Lib/copy.py
> +++ b/Lib/copy.py
> @@ -323,68 +323,3 @@
> ?# Helper for instance creation without calling __init__
> ?class _EmptyClass:
> ? ? pass
> -
> -def _test():
> - ? ?l = [None, 1, 2, 3.14, 'xyzzy', (1, 2), [3.14, 'abc'],
> - ? ? ? ? {'abc': 'ABC'}, (), [], {}]
> - ? ?l1 = copy(l)
> - ? ?print(l1==l)
> - ? ?l1 = map(copy, l)
> - ? ?print(l1==l)
> - ? ?l1 = deepcopy(l)
> - ? ?print(l1==l)
> - ? ?class C:
> - ? ? ? ?def __init__(self, arg=None):
> - ? ? ? ? ? ?self.a = 1
> - ? ? ? ? ? ?self.arg = arg
> - ? ? ? ? ? ?if __name__ == '__main__':
> - ? ? ? ? ? ? ? ?import sys
> - ? ? ? ? ? ? ? ?file = sys.argv[0]
> - ? ? ? ? ? ?else:
> - ? ? ? ? ? ? ? ?file = __file__
> - ? ? ? ? ? ?self.fp = open(file)
> - ? ? ? ? ? ?self.fp.close()
> - ? ? ? ?def __getstate__(self):
> - ? ? ? ? ? ?return {'a': self.a, 'arg': self.arg}
> - ? ? ? ?def __setstate__(self, state):
> - ? ? ? ? ? ?for key, value in state.items():
> - ? ? ? ? ? ? ? ?setattr(self, key, value)
> - ? ? ? ?def __deepcopy__(self, memo=None):
> - ? ? ? ? ? ?new = self.__class__(deepcopy(self.arg, memo))
> - ? ? ? ? ? ?new.a = self.a
> - ? ? ? ? ? ?return new
> - ? ?c = C('argument sketch')
> - ? ?l.append(c)
> - ? ?l2 = copy(l)
> - ? ?print(l == l2)
> - ? ?print(l)
> - ? ?print(l2)
> - ? ?l2 = deepcopy(l)
> - ? ?print(l == l2)
> - ? ?print(l)
> - ? ?print(l2)
> - ? ?l.append({l[1]: l, 'xyz': l[2]})
> - ? ?l3 = copy(l)
> - ? ?import reprlib
> - ? ?print(map(reprlib.repr, l))
> - ? ?print(map(reprlib.repr, l1))
> - ? ?print(map(reprlib.repr, l2))
> - ? ?print(map(reprlib.repr, l3))
> - ? ?l3 = deepcopy(l)
> - ? ?print(map(reprlib.repr, l))
> - ? ?print(map(reprlib.repr, l1))
> - ? ?print(map(reprlib.repr, l2))
> - ? ?print(map(reprlib.repr, l3))
> - ? ?class odict(dict):
> - ? ? ? ?def __init__(self, d = {}):
> - ? ? ? ? ? ?self.a = 99
> - ? ? ? ? ? ?dict.__init__(self, d)
> - ? ? ? ?def __setitem__(self, k, i):
> - ? ? ? ? ? ?dict.__setitem__(self, k, i)
> - ? ? ? ? ? ?self.a
> - ? ?o = odict({"A" : "B"})
> - ? ?x = deepcopy(o)
> - ? ?print(o, x)
> -
> -if __name__ == '__main__':
> - ? ?_test()
> diff --git a/Lib/test/test_copy.py b/Lib/test/test_copy.py
> --- a/Lib/test/test_copy.py
> +++ b/Lib/test/test_copy.py
> @@ -17,7 +17,7 @@
> ? ? # Attempt full line coverage of copy.py from top to bottom
>
> ? ? def test_exceptions(self):
> - ? ? ? ?self.assertTrue(copy.Error is copy.error)
> + ? ? ? ?self.assertIs(copy.Error, copy.error)
> ? ? ? ? self.assertTrue(issubclass(copy.Error, Exception))
>
> ? ? # The copy() method
> @@ -54,20 +54,26 @@
> ? ? def test_copy_reduce_ex(self):
> ? ? ? ? class C(object):
> ? ? ? ? ? ? def __reduce_ex__(self, proto):
> + ? ? ? ? ? ? ? ?c.append(1)
> ? ? ? ? ? ? ? ? return ""
> ? ? ? ? ? ? def __reduce__(self):
> - ? ? ? ? ? ? ? ?raise support.TestFailed("shouldn't call this")
> + ? ? ? ? ? ? ? ?self.fail("shouldn't call this")
> + ? ? ? ?c = []
> ? ? ? ? x = C()
> ? ? ? ? y = copy.copy(x)
> - ? ? ? ?self.assertTrue(y is x)
> + ? ? ? ?self.assertIs(y, x)
> + ? ? ? ?self.assertEqual(c, [1])
>
> ? ? def test_copy_reduce(self):
> ? ? ? ? class C(object):
> ? ? ? ? ? ? def __reduce__(self):
> + ? ? ? ? ? ? ? ?c.append(1)
> ? ? ? ? ? ? ? ? return ""
> + ? ? ? ?c = []
> ? ? ? ? x = C()
> ? ? ? ? y = copy.copy(x)
> - ? ? ? ?self.assertTrue(y is x)
> + ? ? ? ?self.assertIs(y, x)
> + ? ? ? ?self.assertEqual(c, [1])
>
> ? ? def test_copy_cant(self):
> ? ? ? ? class C(object):
> @@ -91,7 +97,7 @@
> ? ? ? ? ? ? ? ? ?"hello", "hello\u1234", f.__code__,
> ? ? ? ? ? ? ? ? ?NewStyle, range(10), Classic, max]
> ? ? ? ? for x in tests:
> - ? ? ? ? ? ?self.assertTrue(copy.copy(x) is x, repr(x))
> + ? ? ? ? ? ?self.assertIs(copy.copy(x), x)
>
> ? ? def test_copy_list(self):
> ? ? ? ? x = [1, 2, 3]
> @@ -185,9 +191,9 @@
> ? ? ? ? x = [x, x]
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y[0] is not x[0])
> - ? ? ? ?self.assertTrue(y[0] is y[1])
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIsNot(y[0], x[0])
> + ? ? ? ?self.assertIs(y[0], y[1])
>
> ? ? def test_deepcopy_issubclass(self):
> ? ? ? ? # XXX Note: there's no way to test the TypeError coming out of
> @@ -227,20 +233,26 @@
> ? ? def test_deepcopy_reduce_ex(self):
> ? ? ? ? class C(object):
> ? ? ? ? ? ? def __reduce_ex__(self, proto):
> + ? ? ? ? ? ? ? ?c.append(1)
> ? ? ? ? ? ? ? ? return ""
> ? ? ? ? ? ? def __reduce__(self):
> - ? ? ? ? ? ? ? ?raise support.TestFailed("shouldn't call this")
> + ? ? ? ? ? ? ? ?self.fail("shouldn't call this")
> + ? ? ? ?c = []
> ? ? ? ? x = C()
> ? ? ? ? y = copy.deepcopy(x)
> - ? ? ? ?self.assertTrue(y is x)
> + ? ? ? ?self.assertIs(y, x)
> + ? ? ? ?self.assertEqual(c, [1])
>
> ? ? def test_deepcopy_reduce(self):
> ? ? ? ? class C(object):
> ? ? ? ? ? ? def __reduce__(self):
> + ? ? ? ? ? ? ? ?c.append(1)
> ? ? ? ? ? ? ? ? return ""
> + ? ? ? ?c = []
> ? ? ? ? x = C()
> ? ? ? ? y = copy.deepcopy(x)
> - ? ? ? ?self.assertTrue(y is x)
> + ? ? ? ?self.assertIs(y, x)
> + ? ? ? ?self.assertEqual(c, [1])
>
> ? ? def test_deepcopy_cant(self):
> ? ? ? ? class C(object):
> @@ -264,14 +276,14 @@
> ? ? ? ? ? ? ? ? ?"hello", "hello\u1234", f.__code__,
> ? ? ? ? ? ? ? ? ?NewStyle, range(10), Classic, max]
> ? ? ? ? for x in tests:
> - ? ? ? ? ? ?self.assertTrue(copy.deepcopy(x) is x, repr(x))
> + ? ? ? ? ? ?self.assertIs(copy.deepcopy(x), x)
>
> ? ? def test_deepcopy_list(self):
> ? ? ? ? x = [[1, 2], 3]
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(x is not y)
> - ? ? ? ?self.assertTrue(x[0] is not y[0])
> + ? ? ? ?self.assertIsNot(x, y)
> + ? ? ? ?self.assertIsNot(x[0], y[0])
>
> ? ? def test_deepcopy_reflexive_list(self):
> ? ? ? ? x = []
> @@ -279,16 +291,26 @@
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? for op in comparisons:
> ? ? ? ? ? ? self.assertRaises(RuntimeError, op, y, x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y[0] is y)
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIs(y[0], y)
> ? ? ? ? self.assertEqual(len(y), 1)
>
> + ? ?def test_deepcopy_empty_tuple(self):
> + ? ? ? ?x = ()
> + ? ? ? ?y = copy.deepcopy(x)
> + ? ? ? ?self.assertIs(x, y)
> +
> ? ? def test_deepcopy_tuple(self):
> ? ? ? ? x = ([1, 2], 3)
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(x is not y)
> - ? ? ? ?self.assertTrue(x[0] is not y[0])
> + ? ? ? ?self.assertIsNot(x, y)
> + ? ? ? ?self.assertIsNot(x[0], y[0])
> +
> + ? ?def test_deepcopy_tuple_of_immutables(self):
> + ? ? ? ?x = ((1, 2), 3)
> + ? ? ? ?y = copy.deepcopy(x)
> + ? ? ? ?self.assertIs(x, y)
>
> ? ? def test_deepcopy_reflexive_tuple(self):
> ? ? ? ? x = ([],)
> @@ -296,16 +318,16 @@
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? for op in comparisons:
> ? ? ? ? ? ? self.assertRaises(RuntimeError, op, y, x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y[0] is not x[0])
> - ? ? ? ?self.assertTrue(y[0][0] is y)
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIsNot(y[0], x[0])
> + ? ? ? ?self.assertIs(y[0][0], y)
>
> ? ? def test_deepcopy_dict(self):
> ? ? ? ? x = {"foo": [1, 2], "bar": 3}
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(x is not y)
> - ? ? ? ?self.assertTrue(x["foo"] is not y["foo"])
> + ? ? ? ?self.assertIsNot(x, y)
> + ? ? ? ?self.assertIsNot(x["foo"], y["foo"])
>
> ? ? def test_deepcopy_reflexive_dict(self):
> ? ? ? ? x = {}
> @@ -315,8 +337,8 @@
> ? ? ? ? ? ? self.assertRaises(TypeError, op, y, x)
> ? ? ? ? for op in equality_comparisons:
> ? ? ? ? ? ? self.assertRaises(RuntimeError, op, y, x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y['foo'] is y)
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIs(y['foo'], y)
> ? ? ? ? self.assertEqual(len(y), 1)
>
> ? ? def test_deepcopy_keepalive(self):
> @@ -349,7 +371,7 @@
> ? ? ? ? x = C([42])
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(y.foo is not x.foo)
> + ? ? ? ?self.assertIsNot(y.foo, x.foo)
>
> ? ? def test_deepcopy_inst_deepcopy(self):
> ? ? ? ? class C:
> @@ -362,8 +384,8 @@
> ? ? ? ? x = C([42])
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y.foo is not x.foo)
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIsNot(y.foo, x.foo)
>
> ? ? def test_deepcopy_inst_getinitargs(self):
> ? ? ? ? class C:
> @@ -376,8 +398,8 @@
> ? ? ? ? x = C([42])
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y.foo is not x.foo)
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIsNot(y.foo, x.foo)
>
> ? ? def test_deepcopy_inst_getstate(self):
> ? ? ? ? class C:
> @@ -390,8 +412,8 @@
> ? ? ? ? x = C([42])
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y.foo is not x.foo)
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIsNot(y.foo, x.foo)
>
> ? ? def test_deepcopy_inst_setstate(self):
> ? ? ? ? class C:
> @@ -404,8 +426,8 @@
> ? ? ? ? x = C([42])
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y.foo is not x.foo)
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIsNot(y.foo, x.foo)
>
> ? ? def test_deepcopy_inst_getstate_setstate(self):
> ? ? ? ? class C:
> @@ -420,8 +442,8 @@
> ? ? ? ? x = C([42])
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y.foo is not x.foo)
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIsNot(y.foo, x.foo)
>
> ? ? def test_deepcopy_reflexive_inst(self):
> ? ? ? ? class C:
> @@ -429,8 +451,8 @@
> ? ? ? ? x = C()
> ? ? ? ? x.foo = x
> ? ? ? ? y = copy.deepcopy(x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y.foo is y)
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIs(y.foo, y)
>
> ? ? # _reconstruct()
>
> @@ -440,9 +462,9 @@
> ? ? ? ? ? ? ? ? return ""
> ? ? ? ? x = C()
> ? ? ? ? y = copy.copy(x)
> - ? ? ? ?self.assertTrue(y is x)
> + ? ? ? ?self.assertIs(y, x)
> ? ? ? ? y = copy.deepcopy(x)
> - ? ? ? ?self.assertTrue(y is x)
> + ? ? ? ?self.assertIs(y, x)
>
> ? ? def test_reconstruct_nostate(self):
> ? ? ? ? class C(object):
> @@ -451,9 +473,9 @@
> ? ? ? ? x = C()
> ? ? ? ? x.foo = 42
> ? ? ? ? y = copy.copy(x)
> - ? ? ? ?self.assertTrue(y.__class__ is x.__class__)
> + ? ? ? ?self.assertIs(y.__class__, x.__class__)
> ? ? ? ? y = copy.deepcopy(x)
> - ? ? ? ?self.assertTrue(y.__class__ is x.__class__)
> + ? ? ? ?self.assertIs(y.__class__, x.__class__)
>
> ? ? def test_reconstruct_state(self):
> ? ? ? ? class C(object):
> @@ -467,7 +489,7 @@
> ? ? ? ? self.assertEqual(y, x)
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(y.foo is not x.foo)
> + ? ? ? ?self.assertIsNot(y.foo, x.foo)
>
> ? ? def test_reconstruct_state_setstate(self):
> ? ? ? ? class C(object):
> @@ -483,7 +505,7 @@
> ? ? ? ? self.assertEqual(y, x)
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(y, x)
> - ? ? ? ?self.assertTrue(y.foo is not x.foo)
> + ? ? ? ?self.assertIsNot(y.foo, x.foo)
>
> ? ? def test_reconstruct_reflexive(self):
> ? ? ? ? class C(object):
> @@ -491,8 +513,8 @@
> ? ? ? ? x = C()
> ? ? ? ? x.foo = x
> ? ? ? ? y = copy.deepcopy(x)
> - ? ? ? ?self.assertTrue(y is not x)
> - ? ? ? ?self.assertTrue(y.foo is y)
> + ? ? ? ?self.assertIsNot(y, x)
> + ? ? ? ?self.assertIs(y.foo, y)
>
> ? ? # Additions for Python 2.3 and pickle protocol 2
>
> @@ -506,12 +528,12 @@
> ? ? ? ? x = C([[1, 2], 3])
> ? ? ? ? y = copy.copy(x)
> ? ? ? ? self.assertEqual(x, y)
> - ? ? ? ?self.assertTrue(x is not y)
> - ? ? ? ?self.assertTrue(x[0] is y[0])
> + ? ? ? ?self.assertIsNot(x, y)
> + ? ? ? ?self.assertIs(x[0], y[0])
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(x, y)
> - ? ? ? ?self.assertTrue(x is not y)
> - ? ? ? ?self.assertTrue(x[0] is not y[0])
> + ? ? ? ?self.assertIsNot(x, y)
> + ? ? ? ?self.assertIsNot(x[0], y[0])
>
> ? ? def test_reduce_5tuple(self):
> ? ? ? ? class C(dict):
> @@ -523,12 +545,12 @@
> ? ? ? ? x = C([("foo", [1, 2]), ("bar", 3)])
> ? ? ? ? y = copy.copy(x)
> ? ? ? ? self.assertEqual(x, y)
> - ? ? ? ?self.assertTrue(x is not y)
> - ? ? ? ?self.assertTrue(x["foo"] is y["foo"])
> + ? ? ? ?self.assertIsNot(x, y)
> + ? ? ? ?self.assertIs(x["foo"], y["foo"])
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(x, y)
> - ? ? ? ?self.assertTrue(x is not y)
> - ? ? ? ?self.assertTrue(x["foo"] is not y["foo"])
> + ? ? ? ?self.assertIsNot(x, y)
> + ? ? ? ?self.assertIsNot(x["foo"], y["foo"])
>
> ? ? def test_copy_slots(self):
> ? ? ? ? class C(object):
> @@ -536,7 +558,7 @@
> ? ? ? ? x = C()
> ? ? ? ? x.foo = [42]
> ? ? ? ? y = copy.copy(x)
> - ? ? ? ?self.assertTrue(x.foo is y.foo)
> + ? ? ? ?self.assertIs(x.foo, y.foo)
>
> ? ? def test_deepcopy_slots(self):
> ? ? ? ? class C(object):
> @@ -545,7 +567,7 @@
> ? ? ? ? x.foo = [42]
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(x.foo, y.foo)
> - ? ? ? ?self.assertTrue(x.foo is not y.foo)
> + ? ? ? ?self.assertIsNot(x.foo, y.foo)
>
> ? ? def test_deepcopy_dict_subclass(self):
> ? ? ? ? class C(dict):
> @@ -562,7 +584,7 @@
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(x, y)
> ? ? ? ? self.assertEqual(x._keys, y._keys)
> - ? ? ? ?self.assertTrue(x is not y)
> + ? ? ? ?self.assertIsNot(x, y)
> ? ? ? ? x['bar'] = 1
> ? ? ? ? self.assertNotEqual(x, y)
> ? ? ? ? self.assertNotEqual(x._keys, y._keys)
> @@ -575,8 +597,8 @@
> ? ? ? ? y = copy.copy(x)
> ? ? ? ? self.assertEqual(list(x), list(y))
> ? ? ? ? self.assertEqual(x.foo, y.foo)
> - ? ? ? ?self.assertTrue(x[0] is y[0])
> - ? ? ? ?self.assertTrue(x.foo is y.foo)
> + ? ? ? ?self.assertIs(x[0], y[0])
> + ? ? ? ?self.assertIs(x.foo, y.foo)
>
> ? ? def test_deepcopy_list_subclass(self):
> ? ? ? ? class C(list):
> @@ -586,8 +608,8 @@
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(list(x), list(y))
> ? ? ? ? self.assertEqual(x.foo, y.foo)
> - ? ? ? ?self.assertTrue(x[0] is not y[0])
> - ? ? ? ?self.assertTrue(x.foo is not y.foo)
> + ? ? ? ?self.assertIsNot(x[0], y[0])
> + ? ? ? ?self.assertIsNot(x.foo, y.foo)
>
> ? ? def test_copy_tuple_subclass(self):
> ? ? ? ? class C(tuple):
> @@ -604,8 +626,8 @@
> ? ? ? ? self.assertEqual(tuple(x), ([1, 2], 3))
> ? ? ? ? y = copy.deepcopy(x)
> ? ? ? ? self.assertEqual(tuple(y), ([1, 2], 3))
> - ? ? ? ?self.assertTrue(x is not y)
> - ? ? ? ?self.assertTrue(x[0] is not y[0])
> + ? ? ? ?self.assertIsNot(x, y)
> + ? ? ? ?self.assertIsNot(x[0], y[0])
>
> ? ? def test_getstate_exc(self):
> ? ? ? ? class EvilState(object):
> @@ -633,10 +655,10 @@
> ? ? ? ? obj = C()
> ? ? ? ? x = weakref.ref(obj)
> ? ? ? ? y = _copy(x)
> - ? ? ? ?self.assertTrue(y is x)
> + ? ? ? ?self.assertIs(y, x)
> ? ? ? ? del obj
> ? ? ? ? y = _copy(x)
> - ? ? ? ?self.assertTrue(y is x)
> + ? ? ? ?self.assertIs(y, x)
>
> ? ? def test_copy_weakref(self):
> ? ? ? ? self._check_weakref(copy.copy)
> @@ -652,7 +674,7 @@
> ? ? ? ? u[a] = b
> ? ? ? ? u[c] = d
> ? ? ? ? v = copy.copy(u)
> - ? ? ? ?self.assertFalse(v is u)
> + ? ? ? ?self.assertIsNot(v, u)
> ? ? ? ? self.assertEqual(v, u)
> ? ? ? ? self.assertEqual(v[a], b)
> ? ? ? ? self.assertEqual(v[c], d)
> @@ -682,8 +704,8 @@
> ? ? ? ? v = copy.deepcopy(u)
> ? ? ? ? self.assertNotEqual(v, u)
> ? ? ? ? self.assertEqual(len(v), 2)
> - ? ? ? ?self.assertFalse(v[a] is b)
> - ? ? ? ?self.assertFalse(v[c] is d)
> + ? ? ? ?self.assertIsNot(v[a], b)
> + ? ? ? ?self.assertIsNot(v[c], d)
> ? ? ? ? self.assertEqual(v[a].i, b.i)
> ? ? ? ? self.assertEqual(v[c].i, d.i)
> ? ? ? ? del c
> @@ -702,12 +724,12 @@
> ? ? ? ? self.assertNotEqual(v, u)
> ? ? ? ? self.assertEqual(len(v), 2)
> ? ? ? ? (x, y), (z, t) = sorted(v.items(), key=lambda pair: pair[0].i)
> - ? ? ? ?self.assertFalse(x is a)
> + ? ? ? ?self.assertIsNot(x, a)
> ? ? ? ? self.assertEqual(x.i, a.i)
> - ? ? ? ?self.assertTrue(y is b)
> - ? ? ? ?self.assertFalse(z is c)
> + ? ? ? ?self.assertIs(y, b)
> + ? ? ? ?self.assertIsNot(z, c)
> ? ? ? ? self.assertEqual(z.i, c.i)
> - ? ? ? ?self.assertTrue(t is d)
> + ? ? ? ?self.assertIs(t, d)
> ? ? ? ? del x, y, z, t
> ? ? ? ? del d
> ? ? ? ? self.assertEqual(len(v), 1)
> @@ -720,7 +742,7 @@
> ? ? ? ? f.b = f.m
> ? ? ? ? g = copy.deepcopy(f)
> ? ? ? ? self.assertEqual(g.m, g.b)
> - ? ? ? ?self.assertTrue(g.b.__self__ is g)
> + ? ? ? ?self.assertIs(g.b.__self__, g)
> ? ? ? ? g.b()
>
>
>
> --
> Repository URL: http://hg.python.org/cpython
>
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://mail.python.org/mailman/listinfo/python-checkins
>
>

From sandro.tosi at gmail.com  Sat Aug  6 00:03:11 2011
From: sandro.tosi at gmail.com (Sandro Tosi)
Date: Sat, 6 Aug 2011 00:03:11 +0200
Subject: [Python-Dev] [Python-checkins] cpython: #11572: improvements to
 copy module tests along with removal of old test suite
In-Reply-To: <CA+OGgf5PMZn1Sn5=xNmfY=Kh5jeQd1pkoU-7A5C5ehTHtXLJOQ@mail.gmail.com>
References: <E1QpRb2-0005rw-8H@dinsdale.python.org>
	<CA+OGgf5PMZn1Sn5=xNmfY=Kh5jeQd1pkoU-7A5C5ehTHtXLJOQ@mail.gmail.com>
Message-ID: <CAPdtAj31pPcvoFApEuemzEROORwP2BaH57VHc_mrMZBrEi2GOQ@mail.gmail.com>

Hi Jim,

On Fri, Aug 5, 2011 at 23:55, Jim Jewett <jimjjewett at gmail.com> wrote:
> Why was the old test suite removed?
>
> Even if everything is covered by the test file (and that isn't clear
> from this checkin), I don't see anything wrong with a quick test that
> doesn't require loading the whole testing apparatus. ?(I would have no
> objection to including a comment saying that the majority of the tests
> are in the test file; I just wonder why they have to be removed
> entirely.)

I see these reasons mainly:

- it adds nothing to the stdlib (where it was included): they are
tests, so they should be in the test suite
- it's unmaintained, since all the work on new tests or any change
will happen on the test_copy.py file and not in the copy.py (that's
true for any other module)
- and also running the tests for a single modules is just a (in this
case, I keep using copy:

./python -m test test_copy

and it has the advantage of running the whole test suite for that
module, not just some random code.

I plan to do other changes like this in the next days/weeks, so
actually thanks for the question :) since it bring that up to
python-dev we others can comment.

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

From solipsis at pitrou.net  Sat Aug  6 00:07:04 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 6 Aug 2011 00:07:04 +0200
Subject: [Python-Dev] [Python-checkins] cpython: #11572: improvements to
 copy module tests along with removal of old test suite
References: <E1QpRb2-0005rw-8H@dinsdale.python.org>
	<CA+OGgf5PMZn1Sn5=xNmfY=Kh5jeQd1pkoU-7A5C5ehTHtXLJOQ@mail.gmail.com>
Message-ID: <20110806000704.6ab08fb7@pitrou.net>

On Fri, 5 Aug 2011 17:55:33 -0400
Jim Jewett <jimjjewett at gmail.com> wrote:
> Why was the old test suite removed?
> 
> Even if everything is covered by the test file (and that isn't clear
> from this checkin), I don't see anything wrong with a quick test that
> doesn't require loading the whole testing apparatus.  (I would have no
> objection to including a comment saying that the majority of the tests
> are in the test file; I just wonder why they have to be removed
> entirely.)

Nobody ever runs such tests when they are not part of the official
regression test suite, which makes them barely useful. Looking at them,
they don't seem very advanced and are probably covered by test_copy
already.

The only reason to have special code in the __main__ section of stdlib
modules is when it provides some (interactive) service to the user (for
example, "python -m zipfile" will give you a trivial equivalent of the
zip/unzip commands).

Regards

Antoine.



From cwg at falma.de  Sat Aug  6 13:55:36 2011
From: cwg at falma.de (Christoph Groth)
Date: Sat, 06 Aug 2011 13:55:36 +0200
Subject: [Python-Dev] inconsistent __abstractmethods__ behavior;
	lack of documentation
Message-ID: <87bow24uvb.fsf@falma.de>

Hi,

while playing with abstract base classes and looking at their
implementation, I've stumbled across the following issue.  With Python
3.2, the script

class Foo(object):
    __abstractmethods__ = ['boo']
class Bar(object):
    pass
Bar.__abstractmethods__ = ['boo']
f = Foo()
b = Bar()

produces the following output

Traceback (most recent call last):
  File "/home/cwg/test2.py", line 9, in <module>
    b = Bar()
TypeError: Can't instantiate abstract class Bar with abstract methods buzz

This seems to violate PEP 3119: it is not mentioned there that setting
the __abstractmethods__ attribute already during class definition (as in
"Foo") should have no effect.

I think this happens because CPython uses the Py_TPFLAGS_IS_ABSTRACT
flag to check whether a class is abstract.  Apparently, this flag is not
set when the dictionary of the class contains __abstractmethods__
already upon creation.

As a second issue, the special __abstractmethods__ attribute (which is a
feature of the interpreter) is not mentioned anywhere in the
documentation.

If these are confirmed to be bugs, I can enter them into the issue
tracker.

Christoph


From guido at python.org  Sat Aug  6 14:29:09 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Aug 2011 08:29:09 -0400
Subject: [Python-Dev] inconsistent __abstractmethods__ behavior;
	lack of documentation
In-Reply-To: <87bow24uvb.fsf@falma.de>
References: <87bow24uvb.fsf@falma.de>
Message-ID: <CAP7+vJJ7WqiDhXJiaWP7pFkuMM9znExrCfG57m+p5omm+MUhBg@mail.gmail.com>

Christoph,

Do you realize that __xxx__ names can have any semantics they darn
well please? If a particular __xxx__ name (or some aspect of it) is
undocumented that's not a bug (not even a doc bug), it just means
"hands off".

That said, there may well be a bug, but it would be in the behavior of
those things that *are* documented.

--Guido

On Sat, Aug 6, 2011 at 7:55 AM, Christoph Groth <cwg at falma.de> wrote:
> Hi,
>
> while playing with abstract base classes and looking at their
> implementation, I've stumbled across the following issue. ?With Python
> 3.2, the script
>
> class Foo(object):
> ? ?__abstractmethods__ = ['boo']
> class Bar(object):
> ? ?pass
> Bar.__abstractmethods__ = ['boo']
> f = Foo()
> b = Bar()
>
> produces the following output
>
> Traceback (most recent call last):
> ?File "/home/cwg/test2.py", line 9, in <module>
> ? ?b = Bar()
> TypeError: Can't instantiate abstract class Bar with abstract methods buzz
>
> This seems to violate PEP 3119: it is not mentioned there that setting
> the __abstractmethods__ attribute already during class definition (as in
> "Foo") should have no effect.
>
> I think this happens because CPython uses the Py_TPFLAGS_IS_ABSTRACT
> flag to check whether a class is abstract. ?Apparently, this flag is not
> set when the dictionary of the class contains __abstractmethods__
> already upon creation.
>
> As a second issue, the special __abstractmethods__ attribute (which is a
> feature of the interpreter) is not mentioned anywhere in the
> documentation.
>
> If these are confirmed to be bugs, I can enter them into the issue
> tracker.
>
> Christoph
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)

From cwg at falma.de  Sat Aug  6 14:54:38 2011
From: cwg at falma.de (Christoph Groth)
Date: Sat, 06 Aug 2011 14:54:38 +0200
Subject: [Python-Dev] inconsistent __abstractmethods__ behavior;
	lack of documentation
References: <87bow24uvb.fsf@falma.de>
	<CAP7+vJJ7WqiDhXJiaWP7pFkuMM9znExrCfG57m+p5omm+MUhBg@mail.gmail.com>
Message-ID: <8739he4s4x.fsf@falma.de>

Guido,

thanks for the quick reply!  Of course I am aware that __xxx__ names are
special.  But I was assuming that the features of a python interpreter
which are necessary to execute the pure python modules of the standard
library are supposed to be documented.

Christoph


From tjreedy at udel.edu  Sat Aug  6 23:58:14 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 06 Aug 2011 17:58:14 -0400
Subject: [Python-Dev] inconsistent __abstractmethods__ behavior;
	lack of documentation
In-Reply-To: <CAP7+vJJ7WqiDhXJiaWP7pFkuMM9znExrCfG57m+p5omm+MUhBg@mail.gmail.com>
References: <87bow24uvb.fsf@falma.de>
	<CAP7+vJJ7WqiDhXJiaWP7pFkuMM9znExrCfG57m+p5omm+MUhBg@mail.gmail.com>
Message-ID: <j1kddp$a2a$1@dough.gmane.org>

On 8/6/2011 8:29 AM, Guido van Rossum wrote:

> Do you realize that __xxx__ names can have any semantics they darn
> well please?

That does not seem to be to be the issue Cristoff raised.

> If a particular __xxx__ name (or some aspect of it) is
> undocumented that's not a bug (not even a doc bug), it just means
> "hands off".

"__abstractmethods__" is used in the stdlib at least in abc.py:

95 class ABCMeta(type):
...
116 def __new__(mcls, name, bases, namespace):
...
123 for name in getattr(base, "__abstractmethods__", set()):
124 value = getattr(cls, name, None)
125 if getattr(value, "__isabstractmethod__", False):
126 abstracts.add(name)
127 cls.__abstractmethods__ = frozenset(abstracts)

Since this module implements a PEP (3119) and is not marked as CPython 
specific, it should run correctly on all implementations. So 
implementors need to know what the above means. (

The doc to abc.py invites readers to read this code:
**Source code:** :source:`Lib/abc.py`

For both reasons, this attribute appears to be part of Python rather 
than being private to CPython. If so, the special name *should* be 
documented somewhere.

If it should *never* be used anywhere else (which I suspect after seeing 
that it is not used in numbers.py), that could be said.

"__abstractmethods__: A special attribute used within ABCmeta.__new__ 
that should never be used anywhere else as is has a special-case effect 
for this one use."

The problem with intentionally completely not documenting names publicly 
accessible in the stdlib code or from the interactive interpreter is 
that the non-documentation is not documented, and so the issue of 
documentation will repeatedly arise. The special names section of 'data 
model' could have a subsection for such. "The following special names 
are not documented as to their meaning as users should ignore them." or 
some such.

-- 
Terry Jan Reedy


From guido at python.org  Sun Aug  7 14:45:41 2011
From: guido at python.org (Guido van Rossum)
Date: Sun, 7 Aug 2011 08:45:41 -0400
Subject: [Python-Dev] inconsistent __abstractmethods__ behavior;
	lack of documentation
In-Reply-To: <j1kddp$a2a$1@dough.gmane.org>
References: <87bow24uvb.fsf@falma.de>
	<CAP7+vJJ7WqiDhXJiaWP7pFkuMM9znExrCfG57m+p5omm+MUhBg@mail.gmail.com>
	<j1kddp$a2a$1@dough.gmane.org>
Message-ID: <CAP7+vJLBYrS2ufDaOFMAzz6e9jBX8kNSEWU7KijVLuNH=KHvKg@mail.gmail.com>

On Sat, Aug 6, 2011 at 5:58 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 8/6/2011 8:29 AM, Guido van Rossum wrote:
>
>> Do you realize that __xxx__ names can have any semantics they darn
>> well please?
>
> That does not seem to be to be the issue Cristoff raised.

I apologize, I was too fast on this one. My only excuse is that
Christoph didn't indicate he was trying to figure out what another
Python implementation should do -- only that he was "playing with ABCs
and looking at their implementation". Looking around more I agree that
*for implementers of Python* there needs to be some documentation of
__abstractmethods__; alternatively, another Python implementation
might have to provide a different implementation of abc.py.

>> If a particular __xxx__ name (or some aspect of it) is
>> undocumented that's not a bug (not even a doc bug), it just means
>> "hands off".
>
> "__abstractmethods__" is used in the stdlib at least in abc.py:
>
> 95 class ABCMeta(type):
> ...
> 116 def __new__(mcls, name, bases, namespace):
> ...
> 123 for name in getattr(base, "__abstractmethods__", set()):
> 124 value = getattr(cls, name, None)
> 125 if getattr(value, "__isabstractmethod__", False):
> 126 abstracts.add(name)
> 127 cls.__abstractmethods__ = frozenset(abstracts)
>
> Since this module implements a PEP (3119) and is not marked as CPython
> specific, it should run correctly on all implementations.

I wouldn't draw that conclusion.

IMO its occurrence as a "pure-Python" module in the stdlib today says
nothing about how much of it is tied to CPython or not. That can only
be explained in comments, docstrings or offline documentation. (Though
there may be some emerging convention that CPython-specific code in
the stdlib must be marked in some way that can be detected by tools,
I'm not aware that much progress has been made in this area. But
admittedly I am not the expert here.)

> So implementors
> need to know what the above means. (

Here I agree.

> The doc to abc.py invites readers to read this code:
> **Source code:** :source:`Lib/abc.py`

<aybe that was an unwise shortcut.

> For both reasons, this attribute appears to be part of Python rather than
> being private to CPython. If so, the special name *should* be documented
> somewhere.

I'm happily to agree in this case, but I disagree that you could
conclude all this from the evidence you have so far shown.

> If it should *never* be used anywhere else (which I suspect after seeing
> that it is not used in numbers.py), that could be said.
>
> "__abstractmethods__: A special attribute used within ABCmeta.__new__ that
> should never be used anywhere else as is has a special-case effect for this
> one use."
>
> The problem with intentionally completely not documenting names publicly
> accessible in the stdlib code or from the interactive interpreter is that
> the non-documentation is not documented, and so the issue of documentation
> will repeatedly arise. The special names section of 'data model' could have
> a subsection for such. "The following special names are not documented as to
> their meaning as users should ignore them." or some such.

I disagree. If you see a __dunder__ name which has no documentation
you should simply refrain from using it, and no harm will come to you.
There is a clearly stated rule in the language reference saying this.
It also documents some specific __dunder__ names that are significant
for users and can be used in certain specific ways (__init__,
__name__, etc.). But I see no reason for a requirement to have an
exhaustive list of undocumented __dunder__ names, regardless if they
are supposed to be special for CPython only, for all Python versions,
for the stdlib only, or whatever.

-- 
--Guido van Rossum (python.org/~guido)

From ncoghlan at gmail.com  Sun Aug  7 15:03:04 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 7 Aug 2011 23:03:04 +1000
Subject: [Python-Dev] inconsistent __abstractmethods__ behavior;
	lack of documentation
In-Reply-To: <CAP7+vJLBYrS2ufDaOFMAzz6e9jBX8kNSEWU7KijVLuNH=KHvKg@mail.gmail.com>
References: <87bow24uvb.fsf@falma.de>
	<CAP7+vJJ7WqiDhXJiaWP7pFkuMM9znExrCfG57m+p5omm+MUhBg@mail.gmail.com>
	<j1kddp$a2a$1@dough.gmane.org>
	<CAP7+vJLBYrS2ufDaOFMAzz6e9jBX8kNSEWU7KijVLuNH=KHvKg@mail.gmail.com>
Message-ID: <CADiSq7fjzGT4Jv8MyiTNsucmQau3B3uDZLiJHhyzT0Bka008ww@mail.gmail.com>

On Sun, Aug 7, 2011 at 10:45 PM, Guido van Rossum <guido at python.org> wrote:
> On Sat, Aug 6, 2011 at 5:58 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>> The problem with intentionally completely not documenting names publicly
>> accessible in the stdlib code or from the interactive interpreter is that
>> the non-documentation is not documented, and so the issue of documentation
>> will repeatedly arise. The special names section of 'data model' could have
>> a subsection for such. "The following special names are not documented as to
>> their meaning as users should ignore them." or some such.
>
> I disagree. If you see a __dunder__ name which has no documentation
> you should simply refrain from using it, and no harm will come to you.
> There is a clearly stated rule in the language reference saying this.
> It also documents some specific __dunder__ names that are significant
> for users and can be used in certain specific ways (__init__,
> __name__, etc.). But I see no reason for a requirement to have an
> exhaustive list of undocumented __dunder__ names, regardless if they
> are supposed to be special for CPython only, for all Python versions,
> for the stdlib only, or whatever.

Indeed, the way it tends to work out in practice (especially for pure
Python code) is that the other implementations will just copy the
internal details from CPython, and only if that turns out to be
problematic for some reason will they suggest a clarification in the
language reference to separate out the details of Python-the-language
from CPython-the-implementation in a particular case. That doesn't
happen very often, but when it does it generally seems to be because
they want to do something more sane than what we do, but the more sane
behaviour is hard for us to implement for some reason. Even more
rarely such questions may expose an outright bug in the reference
implementation (e.g. the operand precedence bug for sequence objects
implemented in C that's on my to-do list for 3.3).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From doug.hellmann at gmail.com  Sun Aug  7 17:09:05 2011
From: doug.hellmann at gmail.com (Doug Hellmann)
Date: Sun, 7 Aug 2011 11:09:05 -0400
Subject: [Python-Dev] "Meet the Team" on Python Insider
Message-ID: <FA4C619B-F527-4990-8956-2F3A62B137BB@gmail.com>

[Renewing this request for participation since there are a few new members since the original request went out.]

We are running a series of interviews with the Python developers on the python-dev blog (http://blog.python.org). There is a short list of questions below this message. If you would like to be included in the series, please reply directly to me with your answers.

We will be doing one or two posts per week, depending on the number of responses and availability of other information to post. We'll have to wait until we have several responses before we start, so we don't announce a new series and then post two messages before it peters out. Please help us by sending your answers as quickly as you can so we can tell what we'll be dealing with.

Posts will be published in roughly the order the responses are received, with the text exactly as you send it (unedited, except for formatting it as HTML). If the questions don't apply to you, you can't remember the answer, or don't have an answer, then you can either leave that question blank or interpret it more broadly and give some related information. Blank questions will be omitted from the posts.

Thanks,
Doug



Personal information:

name
location (city, country, whatever you want to give--we don't need your mailing address)
home page or blog url

Questions:

1. How long have you been using Python?

2. How long have you been a core committer?

3. How did you get started as a core developer? Do you remember your first commit?

4. Which parts of Python are you working on now?

5. What do you do with Python when you aren't doing core development work? (day job, other projects, etc.)

6. What do you do when you aren't programming?


From victor.stinner at haypocalc.com  Mon Aug  8 22:26:04 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 08 Aug 2011 22:26:04 +0200
Subject: [Python-Dev] urllib bug in Python 3.2.1?
In-Reply-To: <nad-9C5EFC.16064831072011@news.gmane.org>
References: <4E35DC94.2090208@mrabarnett.plus.com>
	<nad-9C5EFC.16064831072011@news.gmane.org>
Message-ID: <4E40465C.2080500@haypocalc.com>

>> With Python 3.1 and Python 3.2.1 it works OK, but with Python 3.2.1 the
>> read returns an empty string (I checked it myself).
>
> http://bugs.python.org/issue12576

The bug is now fixed. Can you release a Python 3.2.2, maybe only with 
this fix?

Victor

From tjreedy at udel.edu  Tue Aug  9 01:35:22 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 08 Aug 2011 19:35:22 -0400
Subject: [Python-Dev] urllib bug in Python 3.2.1?
In-Reply-To: <4E40465C.2080500@haypocalc.com>
References: <4E35DC94.2090208@mrabarnett.plus.com>
	<nad-9C5EFC.16064831072011@news.gmane.org>
	<4E40465C.2080500@haypocalc.com>
Message-ID: <j1prs1$99d$1@dough.gmane.org>

On 8/8/2011 4:26 PM, Victor Stinner wrote:
>>> With Python 3.1 and Python 3.2.1 it works OK, but with Python 3.2.1 the
>>> read returns an empty string (I checked it myself).
>>
>> http://bugs.python.org/issue12576
>
> The bug is now fixed. Can you release a Python 3.2.2, maybe only with
> this fix?

Any new release should also have
http://bugs.python.org/issue12540
which fixes another bad regression.

-- 
Terry Jan Reedy


From doug.hellmann at gmail.com  Tue Aug  9 03:45:36 2011
From: doug.hellmann at gmail.com (Doug Hellmann)
Date: Mon, 8 Aug 2011 21:45:36 -0400
Subject: [Python-Dev] "Meet the Team" on Python Insider
In-Reply-To: <FA4C619B-F527-4990-8956-2F3A62B137BB@gmail.com>
References: <FA4C619B-F527-4990-8956-2F3A62B137BB@gmail.com>
Message-ID: <9D783268-7A06-4366-8207-AD43935FC1A1@gmail.com>

I should have made clear that if you have already completed the survey, we still have your data in the queue. The invitation is for anyone who has not yet sent us the info, including new team members.

Doug

On Aug 7, 2011, at 11:09 AM, Doug Hellmann wrote:

> [Renewing this request for participation since there are a few new members since the original request went out.]
> 
> We are running a series of interviews with the Python developers on the python-dev blog (http://blog.python.org). There is a short list of questions below this message. If you would like to be included in the series, please reply directly to me with your answers.
> 
> We will be doing one or two posts per week, depending on the number of responses and availability of other information to post. We'll have to wait until we have several responses before we start, so we don't announce a new series and then post two messages before it peters out. Please help us by sending your answers as quickly as you can so we can tell what we'll be dealing with.
> 
> Posts will be published in roughly the order the responses are received, with the text exactly as you send it (unedited, except for formatting it as HTML). If the questions don't apply to you, you can't remember the answer, or don't have an answer, then you can either leave that question blank or interpret it more broadly and give some related information. Blank questions will be omitted from the posts.
> 
> Thanks,
> Doug
> 
> 
> 
> Personal information:
> 
> name
> location (city, country, whatever you want to give--we don't need your mailing address)
> home page or blog url
> 
> Questions:
> 
> 1. How long have you been using Python?
> 
> 2. How long have you been a core committer?
> 
> 3. How did you get started as a core developer? Do you remember your first commit?
> 
> 4. Which parts of Python are you working on now?
> 
> 5. What do you do with Python when you aren't doing core development work? (day job, other projects, etc.)
> 
> 6. What do you do when you aren't programming?
> 


From g.brandl at gmx.net  Tue Aug  9 08:02:45 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 09 Aug 2011 08:02:45 +0200
Subject: [Python-Dev] urllib bug in Python 3.2.1?
In-Reply-To: <j1prs1$99d$1@dough.gmane.org>
References: <4E35DC94.2090208@mrabarnett.plus.com>
	<nad-9C5EFC.16064831072011@news.gmane.org>
	<4E40465C.2080500@haypocalc.com> <j1prs1$99d$1@dough.gmane.org>
Message-ID: <j1qii4$vc7$1@dough.gmane.org>

Am 09.08.2011 01:35, schrieb Terry Reedy:
> On 8/8/2011 4:26 PM, Victor Stinner wrote:
>>>> With Python 3.1 and Python 3.2.1 it works OK, but with Python 3.2.1 the
>>>> read returns an empty string (I checked it myself).
>>>
>>> http://bugs.python.org/issue12576
>>
>> The bug is now fixed. Can you release a Python 3.2.2, maybe only with
>> this fix?
> 
> Any new release should also have
> http://bugs.python.org/issue12540
> which fixes another bad regression.

I can certainly release a version with these two fixes.  Question is, should
we call it 3.2.2, or 3.2.1.1 (3.2.1p1)?

Georg


From tjreedy at udel.edu  Tue Aug  9 09:08:24 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 09 Aug 2011 03:08:24 -0400
Subject: [Python-Dev] urllib bug in Python 3.2.1?
In-Reply-To: <j1qii4$vc7$1@dough.gmane.org>
References: <4E35DC94.2090208@mrabarnett.plus.com>
	<nad-9C5EFC.16064831072011@news.gmane.org>
	<4E40465C.2080500@haypocalc.com> <j1prs1$99d$1@dough.gmane.org>
	<j1qii4$vc7$1@dough.gmane.org>
Message-ID: <j1qmdg$mn0$1@dough.gmane.org>

On 8/9/2011 2:02 AM, Georg Brandl wrote:
> Am 09.08.2011 01:35, schrieb Terry Reedy:
>> On 8/8/2011 4:26 PM, Victor Stinner wrote:
>>>>> With Python 3.1 and Python 3.2.1 it works OK, but with Python 3.2.1 the
>>>>> read returns an empty string (I checked it myself).
>>>>
>>>> http://bugs.python.org/issue12576
>>>
>>> The bug is now fixed. Can you release a Python 3.2.2, maybe only with
>>> this fix?
>>
>> Any new release should also have
>> http://bugs.python.org/issue12540
>> which fixes another bad regression.
>
> I can certainly release a version with these two fixes.  Question is, should
> we call it 3.2.2, or 3.2.1.1 (3.2.1p1)?

I believe precedent and practicality say 3.2.2. How much more, if 
anything is up to you. The important question is whether Martin is 
willing to do a Windows installer, as 12540 only affects Windows.

-- 
Terry Jan Reedy


From socketpair at gmail.com  Tue Aug  9 11:31:47 2011
From: socketpair at gmail.com (=?UTF-8?B?0JzQsNGA0Log0JrQvtGA0LXQvdCx0LXRgNCz?=)
Date: Tue, 9 Aug 2011 15:31:47 +0600
Subject: [Python-Dev] GIL removal question
Message-ID: <CAEmTpZGe2J6poDUW3sihHS3LHDdQ3cq5gWqfty_=z5W8R0R3-Q@mail.gmail.com>

Probably I want to re-invent a bicycle. I want developers to say me
why we can not remove GIL in that way:

1. Remove GIL completely with all current logick.
2. Add it's own RW-locking to all mutable objects (like list or dict)
3. Add RW-locks to every context instance
4. use RW-locks when accessing members of object instances

Only one reason, I see, not do that -- is performance of
singlethreaded applications. Why not to fix locking functions for this
4 cases to stubs when only one thread present? For atomicity, locks
may be implemented as this:
For example for this source:
--------------------------------
import threading

def x():
    i=1000
    while i:
        i--

a = threading.Thread(target=x)
b = threading.Thread(target=x)
a.start()
b.start()
a.join()
b.join()
--------------------------------
in my case it will be fully parallel, as common object is not locked
much (only global context when a.xxxx = yyyy executed). I think,
performance of such code will be higher that using GIL.

Other significant reason of not using my case, as I think, is a plenty
of atomic processor instructions in each thread, which affect kernel
performance.

Also, I know about incompatibility my variant with existing code.

In a summary: Please say clearly why, actually, my variant is not
still implemented.

Thanks.

-- 
Segmentation fault

From socketpair at gmail.com  Tue Aug  9 11:33:10 2011
From: socketpair at gmail.com (=?UTF-8?B?0JzQsNGA0Log0JrQvtGA0LXQvdCx0LXRgNCz?=)
Date: Tue, 9 Aug 2011 15:33:10 +0600
Subject: [Python-Dev] GIL removal question
Message-ID: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>

Probably I want to re-invent a bicycle. I want developers to say me
why we can not remove GIL in that way:

1. Remove GIL completely with all current logick.
2. Add it's own RW-locking to all mutable objects (like list or dict)
3. Add RW-locks to every context instance
4. use RW-locks when accessing members of object instances

Only one reason, I see, not do that -- is performance of
singlethreaded applications. Why not to fix locking functions for this
4 cases to stubs when only one thread present? For atomicity, locks
may be implemented as this:
For example for this source:
--------------------------------
import threading

def x():
    i=1000
    while i:
        i--

a = threading.Thread(target=x)
b = threading.Thread(target=x)
a.start()
b.start()
a.join()
b.join()
--------------------------------
in my case it will be fully parallel, as common object is not locked
much (only global context when a.xxxx = yyyy executed). I think,
performance of such code will be higher that using GIL.

Other significant reason of not using my case, as I think, is a plenty
of atomic processor instructions in each thread, which affect kernel
performance.

Also, I know about incompatibility my variant with existing code.

In a summary: Please say clearly why, actually, my variant is not
still implemented.

Thanks.

-- 
Segmentation fault

From arfrever.fta at gmail.com  Tue Aug  9 15:53:17 2011
From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis)
Date: Tue, 9 Aug 2011 15:53:17 +0200
Subject: [Python-Dev] urllib bug in Python 3.2.1?
In-Reply-To: <j1qii4$vc7$1@dough.gmane.org>
References: <4E35DC94.2090208@mrabarnett.plus.com>
	<j1prs1$99d$1@dough.gmane.org> <j1qii4$vc7$1@dough.gmane.org>
Message-ID: <201108091553.18632.Arfrever.FTA@gmail.com>

2011-08-09 08:02:45 Georg Brandl napisa?(a):
> Am 09.08.2011 01:35, schrieb Terry Reedy:
> > On 8/8/2011 4:26 PM, Victor Stinner wrote:
> >>>> With Python 3.1 and Python 3.2.1 it works OK, but with Python 3.2.1 the
> >>>> read returns an empty string (I checked it myself).
> >>>
> >>> http://bugs.python.org/issue12576
> >>
> >> The bug is now fixed. Can you release a Python 3.2.2, maybe only with
> >> this fix?
> > 
> > Any new release should also have
> > http://bugs.python.org/issue12540
> > which fixes another bad regression.
> 
> I can certainly release a version with these two fixes.  Question is, should
> we call it 3.2.2, or 3.2.1.1 (3.2.1p1)?

I would suggest that a normal release with all changes committed on 3.2 branch be created.

-- 
Arfrever Frehtes Taifersar Arahesis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110809/21ec4e00/attachment.pgp>

From stefan_ml at behnel.de  Tue Aug  9 16:11:07 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 09 Aug 2011 16:11:07 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CAEmTpZGe2J6poDUW3sihHS3LHDdQ3cq5gWqfty_=z5W8R0R3-Q@mail.gmail.com>
References: <CAEmTpZGe2J6poDUW3sihHS3LHDdQ3cq5gWqfty_=z5W8R0R3-Q@mail.gmail.com>
Message-ID: <j1rf5r$kmh$1@dough.gmane.org>

???? ?????????, 09.08.2011 11:31:
> In a summary: Please say clearly why, actually, my variant is not
> still implemented.

This question comes up on the different Python lists every once in a while. 
In general, if you want something to be implemented in a specific way, feel 
free to provide the implementation.

There were several attempts to remove the GIL from the interpreter, you can 
look them up in the archives of this mailing list. They all failed to 
provide competitive performance, especially for the single-threaded case, 
and were therefore deemed inappropriate "solutions" to the "problem".

Note that I put "problem" into quotes, simply because it is controversial 
if the GIL actually *is* a problem. This question has also been discussed 
and rediscussed in great length on the different Python lists.

Stefan


From barry at python.org  Tue Aug  9 16:25:29 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 9 Aug 2011 10:25:29 -0400
Subject: [Python-Dev] urllib bug in Python 3.2.1?
In-Reply-To: <j1qii4$vc7$1@dough.gmane.org>
References: <4E35DC94.2090208@mrabarnett.plus.com>
	<nad-9C5EFC.16064831072011@news.gmane.org>
	<4E40465C.2080500@haypocalc.com> <j1prs1$99d$1@dough.gmane.org>
	<j1qii4$vc7$1@dough.gmane.org>
Message-ID: <20110809102529.0f60dd93@resist.wooz.org>

On Aug 09, 2011, at 08:02 AM, Georg Brandl wrote:

>I can certainly release a version with these two fixes.  Question is, should
>we call it 3.2.2, or 3.2.1.1 (3.2.1p1)?

Definitely 3.2.2.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110809/e602443f/attachment.pgp>

From g.brandl at gmx.net  Tue Aug  9 20:36:50 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 09 Aug 2011 20:36:50 +0200
Subject: [Python-Dev] urllib bug in Python 3.2.1?
In-Reply-To: <20110809102529.0f60dd93@resist.wooz.org>
References: <4E35DC94.2090208@mrabarnett.plus.com>
	<nad-9C5EFC.16064831072011@news.gmane.org>
	<4E40465C.2080500@haypocalc.com> <j1prs1$99d$1@dough.gmane.org>
	<j1qii4$vc7$1@dough.gmane.org>
	<20110809102529.0f60dd93@resist.wooz.org>
Message-ID: <j1ruo2$ust$1@dough.gmane.org>

Am 09.08.2011 16:25, schrieb Barry Warsaw:
> On Aug 09, 2011, at 08:02 AM, Georg Brandl wrote:
> 
>>I can certainly release a version with these two fixes.  Question is, should
>>we call it 3.2.2, or 3.2.1.1 (3.2.1p1)?
> 
> Definitely 3.2.2.

OK, 3.2.2 it is.  I will have to have a closer look at the other changes in
the branch to decide if it'll be a single(double)-fix only release.

Schedule would be roughly as follows: rc1 this Friday/Saturday, then I'm on
vacation for a little more than one week, so final would be the weekend of
27/28 August.

Georg


From dave at dabeaz.com  Wed Aug 10 13:09:07 2011
From: dave at dabeaz.com (David Beazley)
Date: Wed, 10 Aug 2011 06:09:07 -0500
Subject: [Python-Dev] GIL removal question
In-Reply-To: <mailman.75.1312970403.18995.python-dev@python.org>
References: <mailman.75.1312970403.18995.python-dev@python.org>
Message-ID: <6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>

> 
> Message: 1
> Date: Tue, 9 Aug 2011 15:31:47 +0600
> From: ???? ????????? <socketpair at gmail.com>
> To: python-dev at python.org
> Subject: [Python-Dev] GIL removal question
> Message-ID:
> 	<CAEmTpZGe2J6poDUW3sihHS3LHDdQ3cq5gWqfty_=z5W8R0R3-Q at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> Probably I want to re-invent a bicycle. I want developers to say me
> why we can not remove GIL in that way:
> 
> 1. Remove GIL completely with all current logick.
> 2. Add it's own RW-locking to all mutable objects (like list or dict)
> 3. Add RW-locks to every context instance
> 4. use RW-locks when accessing members of object instances

You're forgetting step 5.

5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch).

> Only one reason, I see, not do that -- is performance of
> singlethreaded applications.

After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse.  Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.

Just as an aside, I recently did some experiments with the fabled patch to remove the GIL from Python 1.4 (mainly for my own historical curiosity).   On Linux, the performance isn't just slightly worse, it makes single-threaded code run about 6-7 times slower and threaded code runs even worse.  So, basically everything runs like a dog.  No GIL though.

Cheers,
Dave


From ncoghlan at gmail.com  Wed Aug 10 13:15:37 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 10 Aug 2011 21:15:37 +1000
Subject: [Python-Dev] GIL removal question
In-Reply-To: <6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
Message-ID: <CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>

On Wed, Aug 10, 2011 at 9:09 PM, David Beazley <dave at dabeaz.com> wrote:
> You're forgetting step 5.
>
> 5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch).
...
> After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse. ?Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.

PyPy would actually make a significantly better basis for this kind of
experimentation, since they *don't* use reference counting for their
memory management.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From dave at dabeaz.com  Wed Aug 10 13:32:27 2011
From: dave at dabeaz.com (David Beazley)
Date: Wed, 10 Aug 2011 06:32:27 -0500
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
	<CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
Message-ID: <9E289F07-B0DA-47E6-B46C-22FAE17D4A0D@dabeaz.com>


On Aug 10, 2011, at 6:15 AM, Nick Coghlan wrote:

> On Wed, Aug 10, 2011 at 9:09 PM, David Beazley <dave at dabeaz.com> wrote:
>> You're forgetting step 5.
>> 
>> 5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch).
> ...
>> After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse.  Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
> 
> PyPy would actually make a significantly better basis for this kind of
> experimentation, since they *don't* use reference counting for their
> memory management.
> 

That's an experiment that would pretty interesting.  I think the real question would boil down to what *else* do they have to lock to make everything work.   Reference counting is a huge bottleneck for CPython to be sure, but it's definitely not the only issue that has to be addressed in making a free-threaded Python.

Cheers,
Dave
  

From ncoghlan at gmail.com  Wed Aug 10 13:42:22 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 10 Aug 2011 21:42:22 +1000
Subject: [Python-Dev] GIL removal question
In-Reply-To: <9E289F07-B0DA-47E6-B46C-22FAE17D4A0D@dabeaz.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
	<CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
	<9E289F07-B0DA-47E6-B46C-22FAE17D4A0D@dabeaz.com>
Message-ID: <CADiSq7cCaaaG6OuiPqgZ82sw_vOJYVmd1J=M1h=CdqEPUiY9fw@mail.gmail.com>

On Wed, Aug 10, 2011 at 9:32 PM, David Beazley <dave at dabeaz.com> wrote:
> On Aug 10, 2011, at 6:15 AM, Nick Coghlan wrote:
>> PyPy would actually make a significantly better basis for this kind of
>> experimentation, since they *don't* use reference counting for their
>> memory management.
>
> That's an experiment that would pretty interesting. ?I think the real question would boil down to what *else* do they have to lock to make everything work. ? Reference counting is a huge bottleneck for CPython to be sure, but it's definitely not the only issue that has to be addressed in making a free-threaded Python.
>

Yeah, the problem reduces back to the 4 steps in the original post.
Still not trivial, since there's quite a bit of internal interpreter
state to protect, but significantly more feasible than dealing with
CPython's reference counting. However, you do get additional
complexities like the JIT compiler coming into play, so it is really a
question that would need to be raised directly with the PyPy dev team.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From guido at python.org  Wed Aug 10 13:43:09 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 10 Aug 2011 07:43:09 -0400
Subject: [Python-Dev] GIL removal question
In-Reply-To: <9E289F07-B0DA-47E6-B46C-22FAE17D4A0D@dabeaz.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
	<CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
	<9E289F07-B0DA-47E6-B46C-22FAE17D4A0D@dabeaz.com>
Message-ID: <CAP7+vJJ9XiAYF3c8+QvDLdgDJHRnXCTUv22tO17jRuU_cuPDsw@mail.gmail.com>

On Wed, Aug 10, 2011 at 7:32 AM, David Beazley <dave at dabeaz.com> wrote:
>
> On Aug 10, 2011, at 6:15 AM, Nick Coghlan wrote:
>
>> On Wed, Aug 10, 2011 at 9:09 PM, David Beazley <dave at dabeaz.com> wrote:
>>> You're forgetting step 5.
>>>
>>> 5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch).
>> ...
>>> After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse. ?Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
>>
>> PyPy would actually make a significantly better basis for this kind of
>> experimentation, since they *don't* use reference counting for their
>> memory management.
>>
>
> That's an experiment that would pretty interesting. ?I think the real question would boil down to what *else* do they have to lock to make everything work. ? Reference counting is a huge bottleneck for CPython to be sure, but it's definitely not the only issue that has to be addressed in making a free-threaded Python.

They have a specific plan, based on Software Transactional Memory:
http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill.html

Personally, I'm not holding my breath, because STM in other areas has
so far captured many imaginations without bringing practical results
(I keep hearing about it as this promising theory that needs more work
to implement, sort-of like String Theory in theoretical physics).

But I'm also not denying that Armin Rigo has a brain the size of the
planet, and PyPy *has* already made much real, practical progress.

-- 
--Guido van Rossum (python.org/~guido)

From fijall at gmail.com  Wed Aug 10 17:20:28 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Wed, 10 Aug 2011 17:20:28 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CAP7+vJJ9XiAYF3c8+QvDLdgDJHRnXCTUv22tO17jRuU_cuPDsw@mail.gmail.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
	<CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
	<9E289F07-B0DA-47E6-B46C-22FAE17D4A0D@dabeaz.com>
	<CAP7+vJJ9XiAYF3c8+QvDLdgDJHRnXCTUv22tO17jRuU_cuPDsw@mail.gmail.com>
Message-ID: <CAK5idxQGvrCrjvQerEFo11yx7ksWF=q7s7Nc8jZL+-rfMn4umw@mail.gmail.com>

On Wed, Aug 10, 2011 at 1:43 PM, Guido van Rossum <guido at python.org> wrote:
> On Wed, Aug 10, 2011 at 7:32 AM, David Beazley <dave at dabeaz.com> wrote:
>>
>> On Aug 10, 2011, at 6:15 AM, Nick Coghlan wrote:
>>
>>> On Wed, Aug 10, 2011 at 9:09 PM, David Beazley <dave at dabeaz.com> wrote:
>>>> You're forgetting step 5.
>>>>
>>>> 5. Put fine-grain locks around all reference counting operations (or rewrite all of Python's memory management and garbage collection from scratch).
>>> ...
>>>> After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse. ?Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
>>>
>>> PyPy would actually make a significantly better basis for this kind of
>>> experimentation, since they *don't* use reference counting for their
>>> memory management.
>>>
>>
>> That's an experiment that would pretty interesting. ?I think the real question would boil down to what *else* do they have to lock to make everything work. ? Reference counting is a huge bottleneck for CPython to be sure, but it's definitely not the only issue that has to be addressed in making a free-threaded Python.
>
> They have a specific plan, based on Software Transactional Memory:
> http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill.html
>
> Personally, I'm not holding my breath, because STM in other areas has
> so far captured many imaginations without bringing practical results
> (I keep hearing about it as this promising theory that needs more work
> to implement, sort-of like String Theory in theoretical physics).

Note that the PyPy's plan does *not* assume the end result will be
comparable in the single-threaded case. The goal is to be able to
compile two *different* pypy's, one fast single-threaded, one
gil-less, but with a significant overhead. The trick is to get this
working in a way that does not increase maintenance burden. It's also
research, so among other things it might not work.

Cheers,
fijal

From riscutiavlad at gmail.com  Wed Aug 10 18:14:49 2011
From: riscutiavlad at gmail.com (Vlad Riscutia)
Date: Wed, 10 Aug 2011 09:14:49 -0700
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CAK5idxQGvrCrjvQerEFo11yx7ksWF=q7s7Nc8jZL+-rfMn4umw@mail.gmail.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
	<CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
	<9E289F07-B0DA-47E6-B46C-22FAE17D4A0D@dabeaz.com>
	<CAP7+vJJ9XiAYF3c8+QvDLdgDJHRnXCTUv22tO17jRuU_cuPDsw@mail.gmail.com>
	<CAK5idxQGvrCrjvQerEFo11yx7ksWF=q7s7Nc8jZL+-rfMn4umw@mail.gmail.com>
Message-ID: <CAJ-9HZ0PTv+VqxuDTiLxvT2D2c4EL7BD168rV-ns4brw3EVwkg@mail.gmail.com>

Removing GIL is interesting work and probably multiple people are willing to
contribute. Threading and synchronization is a deep topic and it might be
that if just one person toys around with removing GIL he might not see
performance improvement (not meaning to offend anyone who tried this,
honestly) but what about forking a branch for this work, with some good
benchmarks in place and have community contribute? Let's say first step
would be just replacing GIL with some fine grained locks with expected
performance degradation but afterwards we can try to incrementally improve
on this.

Thank you,
Vlad

On Wed, Aug 10, 2011 at 8:20 AM, Maciej Fijalkowski <fijall at gmail.com>wrote:

> On Wed, Aug 10, 2011 at 1:43 PM, Guido van Rossum <guido at python.org>
> wrote:
> > On Wed, Aug 10, 2011 at 7:32 AM, David Beazley <dave at dabeaz.com> wrote:
> >>
> >> On Aug 10, 2011, at 6:15 AM, Nick Coghlan wrote:
> >>
> >>> On Wed, Aug 10, 2011 at 9:09 PM, David Beazley <dave at dabeaz.com>
> wrote:
> >>>> You're forgetting step 5.
> >>>>
> >>>> 5. Put fine-grain locks around all reference counting operations (or
> rewrite all of Python's memory management and garbage collection from
> scratch).
> >>> ...
> >>>> After implementing the aforementioned step 5, you will find that the
> performance of everything, including the threaded code, will be quite a bit
> worse.  Frankly, this is probably the most significant obstacle to have any
> kind of GIL-less Python with reasonable performance.
> >>>
> >>> PyPy would actually make a significantly better basis for this kind of
> >>> experimentation, since they *don't* use reference counting for their
> >>> memory management.
> >>>
> >>
> >> That's an experiment that would pretty interesting.  I think the real
> question would boil down to what *else* do they have to lock to make
> everything work.   Reference counting is a huge bottleneck for CPython to be
> sure, but it's definitely not the only issue that has to be addressed in
> making a free-threaded Python.
> >
> > They have a specific plan, based on Software Transactional Memory:
> >
> http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill.html
> >
> > Personally, I'm not holding my breath, because STM in other areas has
> > so far captured many imaginations without bringing practical results
> > (I keep hearing about it as this promising theory that needs more work
> > to implement, sort-of like String Theory in theoretical physics).
>
> Note that the PyPy's plan does *not* assume the end result will be
> comparable in the single-threaded case. The goal is to be able to
> compile two *different* pypy's, one fast single-threaded, one
> gil-less, but with a significant overhead. The trick is to get this
> working in a way that does not increase maintenance burden. It's also
> research, so among other things it might not work.
>
> Cheers,
> fijal
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/riscutiavlad%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110810/6103014a/attachment.html>

From brian.curtin at gmail.com  Wed Aug 10 18:19:12 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Wed, 10 Aug 2011 11:19:12 -0500
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CAJ-9HZ0PTv+VqxuDTiLxvT2D2c4EL7BD168rV-ns4brw3EVwkg@mail.gmail.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
	<CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
	<9E289F07-B0DA-47E6-B46C-22FAE17D4A0D@dabeaz.com>
	<CAP7+vJJ9XiAYF3c8+QvDLdgDJHRnXCTUv22tO17jRuU_cuPDsw@mail.gmail.com>
	<CAK5idxQGvrCrjvQerEFo11yx7ksWF=q7s7Nc8jZL+-rfMn4umw@mail.gmail.com>
	<CAJ-9HZ0PTv+VqxuDTiLxvT2D2c4EL7BD168rV-ns4brw3EVwkg@mail.gmail.com>
Message-ID: <CAD+XWwp4v4ef6B_1xEAqe1wPsESRnbg4iY+gY2ACB-K2QdPvwg@mail.gmail.com>

On Wed, Aug 10, 2011 at 11:14, Vlad Riscutia <riscutiavlad at gmail.com> wrote:

> Removing GIL is interesting work and probably multiple people are willing
> to contribute. Threading and synchronization is a deep topic and it might be
> that if just one person toys around with removing GIL he might not see
> performance improvement (not meaning to offend anyone who tried this,
> honestly) but what about forking a branch for this work, with some good
> benchmarks in place and have community contribute? Let's say first step
> would be just replacing GIL with some fine grained locks with expected
> performance degradation but afterwards we can try to incrementally improve
> on this.
>
> Thank you,
> Vlad
>

Feel free to start this: http://hg.python.org/cpython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110810/f9d974ac/attachment.html>

From ericsnowcurrently at gmail.com  Wed Aug 10 19:04:05 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 10 Aug 2011 11:04:05 -0600
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CAD+XWwp4v4ef6B_1xEAqe1wPsESRnbg4iY+gY2ACB-K2QdPvwg@mail.gmail.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
	<CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
	<9E289F07-B0DA-47E6-B46C-22FAE17D4A0D@dabeaz.com>
	<CAP7+vJJ9XiAYF3c8+QvDLdgDJHRnXCTUv22tO17jRuU_cuPDsw@mail.gmail.com>
	<CAK5idxQGvrCrjvQerEFo11yx7ksWF=q7s7Nc8jZL+-rfMn4umw@mail.gmail.com>
	<CAJ-9HZ0PTv+VqxuDTiLxvT2D2c4EL7BD168rV-ns4brw3EVwkg@mail.gmail.com>
	<CAD+XWwp4v4ef6B_1xEAqe1wPsESRnbg4iY+gY2ACB-K2QdPvwg@mail.gmail.com>
Message-ID: <CALFfu7C4b8T-euSZKzR2ZFnvcUNe0USObUVUqKmyAJog3rRS7Q@mail.gmail.com>

On Wed, Aug 10, 2011 at 10:19 AM, Brian Curtin <brian.curtin at gmail.com> wrote:
> On Wed, Aug 10, 2011 at 11:14, Vlad Riscutia <riscutiavlad at gmail.com> wrote:
>>
>> Removing GIL is interesting work and probably multiple people are willing
>> to contribute.?Threading and synchronization is a deep topic and it might be
>> that if just one person toys around with removing GIL he might not see
>> performance improvement (not meaning to offend anyone who tried this,
>> honestly) but what about forking a branch for this work, with some good
>> benchmarks in place and have community contribute? Let's say first step
>> would be just replacing GIL with some fine grained locks with expected
>> performance degradation but afterwards we can try to incrementally improve
>> on this.
>> Thank you,
>> Vlad
>
> Feel free to start this:?http://hg.python.org/cpython

+1 on not waiting for someone else to do it if you have an idea. :)

Bitbucket makes it really easy for anyone to fork a repo into a new
project and they keep an up to date mirror of the CPython repo:

https://bitbucket.org/mirror/cpython/overview

-eric

> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com
>
>

From raymond.hettinger at gmail.com  Wed Aug 10 19:19:06 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Wed, 10 Aug 2011 10:19:06 -0700
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
	<CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
Message-ID: <8A2B2AFE-8AD4-4379-A2E9-64987CA52B68@gmail.com>


On Aug 10, 2011, at 4:15 AM, Nick Coghlan wrote:

>> After implementing the aforementioned step 5, you will find that the performance of everything, including the threaded code, will be quite a bit worse.  Frankly, this is probably the most significant obstacle to have any kind of GIL-less Python with reasonable performance.
> 
> PyPy would actually make a significantly better basis for this kind of
> experimentation, since they *don't* use reference counting for their
> memory management.

Jython may be a better choice.  It is all about concurrency.  Its dicts are built on top of Java's ConcurrentHashMap for example.



Raymond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110810/29aa4b7a/attachment-0001.html>

From brian.curtin at gmail.com  Wed Aug 10 20:55:56 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Wed, 10 Aug 2011 13:55:56 -0500
Subject: [Python-Dev] Moving forward with the concurrent package
Message-ID: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>

Now that we have concurrent.futures, is there any plan for multiprocessing
to follow suit? PEP 3148 mentions a hope to add or move things in the future
[0], which would be now.

[0] http://www.python.org/dev/peps/pep-3148/#naming
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110810/b0b00acc/attachment.html>

From benjamin at python.org  Wed Aug 10 21:54:33 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 10 Aug 2011 14:54:33 -0500
Subject: [Python-Dev] Moving forward with the concurrent package
In-Reply-To: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
Message-ID: <CAPZV6o8DBO1vuYnTR=wmcX9LNYm70RBRi0LWX4uMo9GxLXm3TQ@mail.gmail.com>

2011/8/10 Brian Curtin <brian.curtin at gmail.com>:
> Now that we have concurrent.futures, is there any plan for multiprocessing
> to follow suit? PEP 3148 mentions a hope to add or move things in the future

Is there some sort of concrete proposal? The PEP just seems to mention
it as an idea.

In general, -1. I think we don't need to be moving things around more
to little advantage.


-- 
Regards,
Benjamin

From fijall at gmail.com  Wed Aug 10 22:06:58 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Wed, 10 Aug 2011 22:06:58 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <8A2B2AFE-8AD4-4379-A2E9-64987CA52B68@gmail.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
	<CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
	<8A2B2AFE-8AD4-4379-A2E9-64987CA52B68@gmail.com>
Message-ID: <CAK5idxTHA+sTQReLG1X17sD5jTRH01AO-7iG3vkVThBkccnuLg@mail.gmail.com>

On Wed, Aug 10, 2011 at 7:19 PM, Raymond Hettinger
<raymond.hettinger at gmail.com> wrote:
>
> On Aug 10, 2011, at 4:15 AM, Nick Coghlan wrote:
>
> After implementing the aforementioned step 5, you will find that the
> performance of everything, including the threaded code, will be quite a bit
> worse. ?Frankly, this is probably the most significant obstacle to have any
> kind of GIL-less Python with reasonable performance.
>
> PyPy would actually make a significantly better basis for this kind of
> experimentation, since they *don't* use reference counting for their
> memory management.
>
> Jython may be a better choice. ?It is all about concurrency. ?Its dicts are
> built on top of?Java's ConcurrentHashMap for example.
>

Jython is kind of boring choice because it does not have a GIL at all
(same as IronPython). It might *work* for what you're trying to
achieve but GIL-removal is not really that interesting.

From solipsis at pitrou.net  Wed Aug 10 22:36:43 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 10 Aug 2011 22:36:43 +0200
Subject: [Python-Dev] Moving forward with the concurrent package
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
	<CAPZV6o8DBO1vuYnTR=wmcX9LNYm70RBRi0LWX4uMo9GxLXm3TQ@mail.gmail.com>
Message-ID: <20110810223643.5fadea2d@msiwind>

Le Wed, 10 Aug 2011 14:54:33 -0500,
Benjamin Peterson <benjamin at python.org> a ?crit :
> 2011/8/10 Brian Curtin <brian.curtin at gmail.com>:
> > Now that we have concurrent.futures, is there any plan for
> > multiprocessing to follow suit? PEP 3148 mentions a hope to add or move
> > things in the future
> 
> Is there some sort of concrete proposal? The PEP just seems to mention
> it as an idea.
> 
> In general, -1. I think we don't need to be moving things around more
> to little advantage.

Agreed. Also, flat is better than nested. Whoever wants to populate the
concurrent package should work on new features to be added to it, rather
than plans to rename things around.

Regards

Antoine.



From brian.curtin at gmail.com  Wed Aug 10 22:45:40 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Wed, 10 Aug 2011 15:45:40 -0500
Subject: [Python-Dev] Moving forward with the concurrent package
In-Reply-To: <20110810223643.5fadea2d@msiwind>
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
	<CAPZV6o8DBO1vuYnTR=wmcX9LNYm70RBRi0LWX4uMo9GxLXm3TQ@mail.gmail.com>
	<20110810223643.5fadea2d@msiwind>
Message-ID: <CAD+XWwo2F5L=4xNC0dTd5fx=isQXn3ngho59=XOcVFrB5uUJcg@mail.gmail.com>

On Wed, Aug 10, 2011 at 15:36, Antoine Pitrou <solipsis at pitrou.net> wrote:

> Le Wed, 10 Aug 2011 14:54:33 -0500,
> Benjamin Peterson <benjamin at python.org> a ?crit :
> > 2011/8/10 Brian Curtin <brian.curtin at gmail.com>:
> > > Now that we have concurrent.futures, is there any plan for
> > > multiprocessing to follow suit? PEP 3148 mentions a hope to add or move
> > > things in the future
> >
> > Is there some sort of concrete proposal? The PEP just seems to mention
> > it as an idea.
> >
> > In general, -1. I think we don't need to be moving things around more
> > to little advantage.
>
> Agreed. Also, flat is better than nested. Whoever wants to populate the
> concurrent package should work on new features to be added to it, rather
> than plans to rename things around.


I agree with flat being better than nested and won't be pushing to move
things around, but the creation of the concurrent package seemed like a
place to put those things. I just found myself typing
"concurrent.multiprocessing" a minute ago, so I figured I'd put it out
there.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110810/749bc784/attachment.html>

From raymond.hettinger at gmail.com  Wed Aug 10 22:46:41 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Wed, 10 Aug 2011 13:46:41 -0700
Subject: [Python-Dev] Moving forward with the concurrent package
In-Reply-To: <20110810223643.5fadea2d@msiwind>
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
	<CAPZV6o8DBO1vuYnTR=wmcX9LNYm70RBRi0LWX4uMo9GxLXm3TQ@mail.gmail.com>
	<20110810223643.5fadea2d@msiwind>
Message-ID: <A90DDE78-EB52-42CA-8964-B6ECCEA4ED05@gmail.com>


On Aug 10, 2011, at 1:36 PM, Antoine Pitrou wrote:

> Le Wed, 10 Aug 2011 14:54:33 -0500,
> Benjamin Peterson <benjamin at python.org> a ?crit :
>> 2011/8/10 Brian Curtin <brian.curtin at gmail.com>:
>>> Now that we have concurrent.futures, is there any plan for
>>> multiprocessing to follow suit? PEP 3148 mentions a hope to add or move
>>> things in the future
>> 
>> Is there some sort of concrete proposal? The PEP just seems to mention
>> it as an idea.
>> 
>> In general, -1. I think we don't need to be moving things around more
>> to little advantage.
> 
> Agreed. Also, flat is better than nested. Whoever wants to populate the
> concurrent package should work on new features to be added to it, rather
> than plans to rename things around.

I concur.


Raymond

From sandro.tosi at gmail.com  Wed Aug 10 23:02:42 2011
From: sandro.tosi at gmail.com (Sandro Tosi)
Date: Wed, 10 Aug 2011 23:02:42 +0200
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes
 Issue12722 - link heapq source in the text format in the
In-Reply-To: <4E42E24B.8020601@udel.edu>
References: <E1Qr9Fh-0002Rv-2b@dinsdale.python.org> <4E42E24B.8020601@udel.edu>
Message-ID: <CAPdtAj2YcGSOyDK1NVWpYJVT=qHNV4Tw50ifKnh8W7njGAn=sg@mail.gmail.com>

On Wed, Aug 10, 2011 at 21:55, Terry Reedy <tjreedy at udel.edu> wrote:
>
>>
>> ? ? Latest version of the `heapq Python source code
>>
>> -<http://svn.python.org/view/python/branches/release27-maint/Lib/heapq.py?view=markup>`_
>>
>> +<http://svn.python.org/view/*checkout*/python/branches/release27-maint/Lib/heapq.py?content-type=text%2Fplain>`_
>
> Should links be to the hg repository instead of svn?
> Is svn updated from hg?
> I thought is was (mostly) historical read-only.

I made the same remark to Senthil on IRC, and came out that web
frontend for hg.p.o doesn't allow for a nice way to specify a branch
(different than 'default'), it's something like
hg.python.org/cpython/<last cset id on a given branch>/path/to/file.py
which is almost always outdated :)

What do we use to provide the web part of hg.p.o? maybe we can just
ask the developers of this tool to provide (or advertize) a proper way
to select a branch. If some can provide me some info, I can do the
"ask the devs" part.

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

From ezio.melotti at gmail.com  Wed Aug 10 22:58:24 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Wed, 10 Aug 2011 23:58:24 +0300
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes
 Issue12722 - link heapq source in the text format in the
In-Reply-To: <CAPdtAj2YcGSOyDK1NVWpYJVT=qHNV4Tw50ifKnh8W7njGAn=sg@mail.gmail.com>
References: <E1Qr9Fh-0002Rv-2b@dinsdale.python.org> <4E42E24B.8020601@udel.edu>
	<CAPdtAj2YcGSOyDK1NVWpYJVT=qHNV4Tw50ifKnh8W7njGAn=sg@mail.gmail.com>
Message-ID: <4E42F0F0.1070203@gmail.com>

On 11/08/2011 0.02, Sandro Tosi wrote:
> On Wed, Aug 10, 2011 at 21:55, Terry Reedy<tjreedy at udel.edu>  wrote:
>>>      Latest version of the `heapq Python source code
>>>
>>> -<http://svn.python.org/view/python/branches/release27-maint/Lib/heapq.py?view=markup>`_
>>>
>>> +<http://svn.python.org/view/*checkout*/python/branches/release27-maint/Lib/heapq.py?content-type=text%2Fplain>`_
>> Should links be to the hg repository instead of svn?
>> Is svn updated from hg?
>> I thought is was (mostly) historical read-only.
> I made the same remark to Senthil on IRC, and came out that web
> frontend for hg.p.o doesn't allow for a nice way to specify a branch
> (different than 'default'), it's something like
> hg.python.org/cpython/<last cset id on a given branch>/path/to/file.py
> which is almost always outdated :)

hg.python.org/cpython/2.7/path/to/file.py should work just fine.

IIRC the reason why we don't do it on 2.x is because we don't have the 
'source' directive available in Sphinx and therefore we would have to 
update all the links manually to link to h.p.o instead of s.p.o.

Best Regards,
Ezio Melotti

>
> What do we use to provide the web part of hg.p.o? maybe we can just
> ask the developers of this tool to provide (or advertize) a proper way
> to select a branch. If some can provide me some info, I can do the
> "ask the devs" part.
>
> Cheers,


From jnoller at gmail.com  Wed Aug 10 23:34:42 2011
From: jnoller at gmail.com (Jesse Noller)
Date: Wed, 10 Aug 2011 17:34:42 -0400
Subject: [Python-Dev] Moving forward with the concurrent package
In-Reply-To: <CAD+XWwo2F5L=4xNC0dTd5fx=isQXn3ngho59=XOcVFrB5uUJcg@mail.gmail.com>
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
	<CAPZV6o8DBO1vuYnTR=wmcX9LNYm70RBRi0LWX4uMo9GxLXm3TQ@mail.gmail.com>
	<20110810223643.5fadea2d@msiwind>
	<CAD+XWwo2F5L=4xNC0dTd5fx=isQXn3ngho59=XOcVFrB5uUJcg@mail.gmail.com>
Message-ID: <CACQrdO=qfwAg8PMFmnQnujvoKFiWTMDM4e0vzi52=G1wqjn2EQ@mail.gmail.com>

On Wed, Aug 10, 2011 at 4:45 PM, Brian Curtin <brian.curtin at gmail.com> wrote:
> On Wed, Aug 10, 2011 at 15:36, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>> Le Wed, 10 Aug 2011 14:54:33 -0500,
>> Benjamin Peterson <benjamin at python.org> a ?crit :
>> > 2011/8/10 Brian Curtin <brian.curtin at gmail.com>:
>> > > Now that we have concurrent.futures, is there any plan for
>> > > multiprocessing to follow suit? PEP 3148 mentions a hope to add or
>> > > move
>> > > things in the future
>> >
>> > Is there some sort of concrete proposal? The PEP just seems to mention
>> > it as an idea.
>> >
>> > In general, -1. I think we don't need to be moving things around more
>> > to little advantage.
>>
>> Agreed. Also, flat is better than nested. Whoever wants to populate the
>> concurrent package should work on new features to be added to it, rather
>> than plans to rename things around.
>
> I agree with flat being better than nested and won't be pushing to move
> things around, but the creation of the concurrent package seemed like a
> place to put those things. I just found myself typing
> "concurrent.multiprocessing" a minute ago, so I figured I'd put it out
> there.

I would like to move certain *features* of multiprocessing into that
namespace - some things like map and others don't belong in the
multiprocessing namespace, and should have been put into concurrent.*
a long time ago.

As for my plans: I had intended on making multiprocessing a closer
corollary to threading, and moving the bigger features that should
have been broken out into a different package (such as
http://bugs.python.org/issue12708) and the managers.

Those plans are obviously stalled as my time is being spent elsewhere.
I disagree on the "flat is better than nested" point -
multiprocessing's namespace is flat - but bloated, and many of it's
features could work just as well in a threaded context (e.g, they are
generally useful outside of multiprocessing alone).

Regardless; currently I can't lead this, and multiprocessing-sig is silent.

Jesse

From benjamin at python.org  Wed Aug 10 23:37:16 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 10 Aug 2011 16:37:16 -0500
Subject: [Python-Dev] Moving forward with the concurrent package
In-Reply-To: <A90DDE78-EB52-42CA-8964-B6ECCEA4ED05@gmail.com>
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
	<CAPZV6o8DBO1vuYnTR=wmcX9LNYm70RBRi0LWX4uMo9GxLXm3TQ@mail.gmail.com>
	<20110810223643.5fadea2d@msiwind>
	<A90DDE78-EB52-42CA-8964-B6ECCEA4ED05@gmail.com>
Message-ID: <CAPZV6o9kuKjOGD8a68doyrCsF0gey-MkYsKs98pcJVeo6wiPMw@mail.gmail.com>

2011/8/10 Raymond Hettinger <raymond.hettinger at gmail.com>:
>
> On Aug 10, 2011, at 1:36 PM, Antoine Pitrou wrote:
>
>> Le Wed, 10 Aug 2011 14:54:33 -0500,
>> Benjamin Peterson <benjamin at python.org> a ?crit :
>>> 2011/8/10 Brian Curtin <brian.curtin at gmail.com>:
>>>> Now that we have concurrent.futures, is there any plan for
>>>> multiprocessing to follow suit? PEP 3148 mentions a hope to add or move
>>>> things in the future
>>>
>>> Is there some sort of concrete proposal? The PEP just seems to mention
>>> it as an idea.
>>>
>>> In general, -1. I think we don't need to be moving things around more
>>> to little advantage.
>>
>> Agreed. Also, flat is better than nested. Whoever wants to populate the
>> concurrent package should work on new features to be added to it, rather
>> than plans to rename things around.
>
> I concur.

So we could put yourself, Antoine, and me in the concurrent package. :)

Sorry,
Benjamin

From ncoghlan at gmail.com  Thu Aug 11 01:03:35 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 11 Aug 2011 09:03:35 +1000
Subject: [Python-Dev] Moving forward with the concurrent package
In-Reply-To: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
Message-ID: <CADiSq7fceLr6nShMhH-mx6C9aKo5mPJfm2Rwje5iOfToLnS_oQ@mail.gmail.com>

On Thu, Aug 11, 2011 at 4:55 AM, Brian Curtin <brian.curtin at gmail.com> wrote:
> Now that we have concurrent.futures, is there any plan for multiprocessing
> to follow suit? PEP 3148 mentions a hope to add or move things in the future
> [0], which would be now.

As Jesse said, moving multiprocessing or threading wholesale was never
part of the plan. The main motivator of that comment in PEP 3148 was
the idea of creating 'concurrent.pool', which would provide a
concurrent worker pool API modelled on multiprocessing.Pool that
supported either threads or processes as the back end, just like the
executor model in concurrent.futures.

The basic approach is to look at a feature in threading or
multiprocessing that is only available in one of them and ask the
question: Does it make sense to allow a project to switch easily
between a threading strategy and a multiprocessing strategy when using
this feature?

If the answer to that question is yes (as it was for
concurrent.futures itself, and as I believe it to be for
multiprocessing.Pool), then a feature request (and probably a PEP)
proposing the definition of a common API in the concurrent namespace
would be appropriate.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From jnoller at gmail.com  Thu Aug 11 01:06:36 2011
From: jnoller at gmail.com (Jesse Noller)
Date: Wed, 10 Aug 2011 19:06:36 -0400
Subject: [Python-Dev] Moving forward with the concurrent package
In-Reply-To: <CADiSq7fceLr6nShMhH-mx6C9aKo5mPJfm2Rwje5iOfToLnS_oQ@mail.gmail.com>
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
	<CADiSq7fceLr6nShMhH-mx6C9aKo5mPJfm2Rwje5iOfToLnS_oQ@mail.gmail.com>
Message-ID: <CACQrdOkGPxjgxVAMUCe-_p=iAaijGYBJ1fBJXxo7_qoDfaheEQ@mail.gmail.com>

On Wed, Aug 10, 2011 at 7:03 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Thu, Aug 11, 2011 at 4:55 AM, Brian Curtin <brian.curtin at gmail.com> wrote:
>> Now that we have concurrent.futures, is there any plan for multiprocessing
>> to follow suit? PEP 3148 mentions a hope to add or move things in the future
>> [0], which would be now.
>
> As Jesse said, moving multiprocessing or threading wholesale was never
> part of the plan. The main motivator of that comment in PEP 3148 was
> the idea of creating 'concurrent.pool', which would provide a
> concurrent worker pool API modelled on multiprocessing.Pool that
> supported either threads or processes as the back end, just like the
> executor model in concurrent.futures.
>
> The basic approach is to look at a feature in threading or
> multiprocessing that is only available in one of them and ask the
> question: Does it make sense to allow a project to switch easily
> between a threading strategy and a multiprocessing strategy when using
> this feature?
>
> If the answer to that question is yes (as it was for
> concurrent.futures itself, and as I believe it to be for
> multiprocessing.Pool), then a feature request (and probably a PEP)
> proposing the definition of a common API in the concurrent namespace
> would be appropriate.
>

Precisely. Thank you Nick, want a job working for PyCon? ;)

From senthil at uthcode.com  Thu Aug 11 02:13:49 2011
From: senthil at uthcode.com (Senthil Kumaran)
Date: Thu, 11 Aug 2011 08:13:49 +0800
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes
 Issue12722 - link heapq source in the text format in the
In-Reply-To: <4E42F0F0.1070203@gmail.com>
References: <E1Qr9Fh-0002Rv-2b@dinsdale.python.org> <4E42E24B.8020601@udel.edu>
	<CAPdtAj2YcGSOyDK1NVWpYJVT=qHNV4Tw50ifKnh8W7njGAn=sg@mail.gmail.com>
	<4E42F0F0.1070203@gmail.com>
Message-ID: <20110811001349.GA2146@mathmagic>

On Wed, Aug 10, 2011 at 11:58:24PM +0300, Ezio Melotti wrote:
> 
> hg.python.org/cpython/2.7/path/to/file.py should work just fine.

The correct path seems to be:

http://hg.python.org/cpython/file/2.7/Lib/<modulefile.py>
> 
> IIRC the reason why we don't do it on 2.x is because we don't have
> the 'source' directive available in Sphinx and therefore we would
> have to update all the links manually to link to h.p.o instead of
> s.p.o.

I see.  Does sphinx have any such directive already? How is it
supposed to behave?

-- 
Senthil

From raymond.hettinger at gmail.com  Thu Aug 11 02:20:35 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Wed, 10 Aug 2011 17:20:35 -0700
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes
	Issue12722 - link heapq source in the text format in the
In-Reply-To: <4E42F0F0.1070203@gmail.com>
References: <E1Qr9Fh-0002Rv-2b@dinsdale.python.org> <4E42E24B.8020601@udel.edu>
	<CAPdtAj2YcGSOyDK1NVWpYJVT=qHNV4Tw50ifKnh8W7njGAn=sg@mail.gmail.com>
	<4E42F0F0.1070203@gmail.com>
Message-ID: <14107F2C-DC29-44FC-8DC5-45C6AEB2A598@gmail.com>


On Aug 10, 2011, at 1:58 PM, Ezio Melotti wrote:

> we would have to update all the links manually to link to h.p.o instead of s.p.o.

sed is your friend.


Raymond

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110810/9574659a/attachment.html>

From g.brandl at gmx.net  Thu Aug 11 07:26:14 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 11 Aug 2011 07:26:14 +0200
Subject: [Python-Dev] cpython: News item for #12724
In-Reply-To: <E1QrKAR-0003ja-JK@dinsdale.python.org>
References: <E1QrKAR-0003ja-JK@dinsdale.python.org>
Message-ID: <j1vp3s$6ui$1@dough.gmane.org>

Am 11.08.2011 03:34, schrieb brian.curtin:
> http://hg.python.org/cpython/rev/3a6782f2a4a8
> changeset:   71811:3a6782f2a4a8
> user:        Brian Curtin <brian at python.org>
> date:        Wed Aug 10 20:32:10 2011 -0500
> summary:
>   News item for #12724
> 
> files:
>   Misc/NEWS |  2 ++
>   1 files changed, 2 insertions(+), 0 deletions(-)

If it gets a NEWS entry, why isn't it documented?

Georg


From solipsis at pitrou.net  Thu Aug 11 09:02:42 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 11 Aug 2011 09:02:42 +0200
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
Message-ID: <20110811090242.1083782f@msiwind>

Le Thu, 11 Aug 2011 03:34:37 +0200,
brian.curtin <python-checkins at python.org> a ?crit :
> http://hg.python.org/cpython/rev/77a65b078852
> changeset:   71809:77a65b078852
> parent:      71803:1b4fae183da3
> user:        Brian Curtin <brian at python.org>
> date:        Wed Aug 10 20:05:21 2011 -0500
> summary:
>   Add Py_RETURN_NOTIMPLEMENTED macro. Fixes #12724.


It would sound more useful to have a generic Py_RETURN() macro rather than
some specific forms for each and every common object.

Regards

Antoine.



From solipsis at pitrou.net  Thu Aug 11 09:07:47 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 11 Aug 2011 09:07:47 +0200
Subject: [Python-Dev] Moving forward with the concurrent package
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
	<CADiSq7fceLr6nShMhH-mx6C9aKo5mPJfm2Rwje5iOfToLnS_oQ@mail.gmail.com>
Message-ID: <20110811090747.10643b8d@msiwind>

Le Thu, 11 Aug 2011 09:03:35 +1000,
Nick Coghlan <ncoghlan at gmail.com> a ?crit :
> On Thu, Aug 11, 2011 at 4:55 AM, Brian Curtin <brian.curtin at gmail.com>
> wrote:
> > Now that we have concurrent.futures, is there any plan for
> > multiprocessing to follow suit? PEP 3148 mentions a hope to add or move
> > things in the future [0], which would be now.
> 
> As Jesse said, moving multiprocessing or threading wholesale was never
> part of the plan. The main motivator of that comment in PEP 3148 was
> the idea of creating 'concurrent.pool', which would provide a
> concurrent worker pool API modelled on multiprocessing.Pool that
> supported either threads or processes as the back end, just like the
> executor model in concurrent.futures.

Executors *are* pools, so I don't know what you're talking about.

Besides, multiprocessing.Pool is quite bloated and therefore difficult to
improve. It should be slowly phased out in favour of concurrent.futures.

In general, it would be nice if people wanting to improve the concurrent
primitives made actual, concrete propositions. We've had lots of
hand-waving in that area for years, to no effect.

Regards

Antoine.



From ncoghlan at gmail.com  Thu Aug 11 09:56:59 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 11 Aug 2011 17:56:59 +1000
Subject: [Python-Dev] Moving forward with the concurrent package
In-Reply-To: <20110811090747.10643b8d@msiwind>
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
	<CADiSq7fceLr6nShMhH-mx6C9aKo5mPJfm2Rwje5iOfToLnS_oQ@mail.gmail.com>
	<20110811090747.10643b8d@msiwind>
Message-ID: <CADiSq7eG89X1zSk7BkwBygwfU6n-6p4rxgWGdNz7oWi_XKWxyg@mail.gmail.com>

On Thu, Aug 11, 2011 at 5:07 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Le Thu, 11 Aug 2011 09:03:35 +1000,
> Nick Coghlan <ncoghlan at gmail.com> a ?crit :
>> On Thu, Aug 11, 2011 at 4:55 AM, Brian Curtin <brian.curtin at gmail.com>
>> wrote:
>> > Now that we have concurrent.futures, is there any plan for
>> > multiprocessing to follow suit? PEP 3148 mentions a hope to add or move
>> > things in the future [0], which would be now.
>>
>> As Jesse said, moving multiprocessing or threading wholesale was never
>> part of the plan. The main motivator of that comment in PEP 3148 was
>> the idea of creating 'concurrent.pool', which would provide a
>> concurrent worker pool API modelled on multiprocessing.Pool that
>> supported either threads or processes as the back end, just like the
>> executor model in concurrent.futures.
>
> Executors *are* pools, so I don't know what you're talking about.

Yes, that's the point. A developer shouldn't be forced into using a
particular invocation model (i.e. futures) just to get thread or
process pool functionality - the pool should be a lower layer building
block that's provided separately.

As you say, though, nobody has stepped up for the task of actually
defining that common, lower level interface.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From yoavglazner at gmail.com  Thu Aug 11 11:58:06 2011
From: yoavglazner at gmail.com (yoav glazner)
Date: Thu, 11 Aug 2011 12:58:06 +0300
Subject: [Python-Dev] Moving forward with the concurrent package
In-Reply-To: <CADiSq7eG89X1zSk7BkwBygwfU6n-6p4rxgWGdNz7oWi_XKWxyg@mail.gmail.com>
References: <CAD+XWwrnMmafcOaNSiGvdAXf+8CpQXPVez4Rs57F-qX_K45bMQ@mail.gmail.com>
	<CADiSq7fceLr6nShMhH-mx6C9aKo5mPJfm2Rwje5iOfToLnS_oQ@mail.gmail.com>
	<20110811090747.10643b8d@msiwind>
	<CADiSq7eG89X1zSk7BkwBygwfU6n-6p4rxgWGdNz7oWi_XKWxyg@mail.gmail.com>
Message-ID: <CAJ78kjMWK9ZdqQ1K6U3La_tH6tCiBtC25onQMmLf0cqJN=dovA@mail.gmail.com>

On Thu, Aug 11, 2011 at 10:56 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Thu, Aug 11, 2011 at 5:07 PM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
> > Le Thu, 11 Aug 2011 09:03:35 +1000,
> > Nick Coghlan <ncoghlan at gmail.com> a ?crit :
> >> On Thu, Aug 11, 2011 at 4:55 AM, Brian Curtin <brian.curtin at gmail.com>
> >> wrote:
> >> > Now that we have concurrent.futures, is there any plan for
> >> > multiprocessing to follow suit? PEP 3148 mentions a hope to add or
> move
> >> > things in the future [0], which would be now.
> >>
> >> As Jesse said, moving multiprocessing or threading wholesale was never
> >> part of the plan. The main motivator of that comment in PEP 3148 was
> >> the idea of creating 'concurrent.pool', which would provide a
> >> concurrent worker pool API modelled on multiprocessing.Pool that
> >> supported either threads or processes as the back end, just like the
> >> executor model in concurrent.futures.
> >
> > Executors *are* pools, so I don't know what you're talking about.
>

Also the Pool from multiprocessing "works" for threads and process:

from multiprocessing.pool import Pool as ProcessPool
from multiprocessing.dummy import Pool as ThreadPool
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110811/1acb854b/attachment.html>

From merwok at netwok.org  Thu Aug 11 16:33:51 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Thu, 11 Aug 2011 16:33:51 +0200
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes
 Issue12722 - link heapq source in the text format in the
In-Reply-To: <4E42F0F0.1070203@gmail.com>
References: <E1Qr9Fh-0002Rv-2b@dinsdale.python.org>
	<4E42E24B.8020601@udel.edu>	<CAPdtAj2YcGSOyDK1NVWpYJVT=qHNV4Tw50ifKnh8W7njGAn=sg@mail.gmail.com>
	<4E42F0F0.1070203@gmail.com>
Message-ID: <4E43E84F.3050309@netwok.org>

Hi,

> IIRC the reason why we don't do it on 2.x is because we don't have the 
> 'source' directive available in Sphinx and therefore we would have to 
> update all the links manually to link to h.p.o instead of s.p.o.

In 3.2 and higher, there is a custom source role in
Doc/tools/sphinxext/pyspecific.py.  For 2.7, I volunteered to change all
links manually (sed being, as usual, my friend) but just lacked time.

Cheers

From merwok at netwok.org  Thu Aug 11 16:36:00 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Thu, 11 Aug 2011 16:36:00 +0200
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
 /usr/bin/python2 symlink upstream)
In-Reply-To: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
Message-ID: <4E43E8D0.40201@netwok.org>

Hi,

I?ve read the latest version of this PEP, as updated by Nick Coghlan in
the Mercurial repo on July, 20th.  Excuse me if I repeat old arguments,
I did not reread all the threads.

In summary, I don?t think the PEP is useful right now, nor that it will
set a good practice for the future.

> * Unix-like software distributions (including systems like Mac OS X and
Minor: I call these ?operating systems?.

> * The Python 2.x ``idle``, ``pydoc``, and ``python-config`` commands should
>   likewise be available as ``idle2``, ``pydoc2``, and ``python2-config``,
>   with the original commands invoking these versions by default, but possibly
>   invoking the Python 3.x versions instead if configured to do so by the
>   system administrator.
This item ignores that on some OSes, defining the default Python version
is not a decision made by the sysadmin.  The example I know is Debian
(and derivatives): despite what one can read on the Web, it is not a
good idea to change /usr/bin/python to point to the version you want;
the decision affects all scripts used by the system itself, and is thus
the call of the Debian Python maintainers.  (FTR, Debian developers
discussed adding /usr/bin/python2 at the latest DebConf and rejected it;
I don?t know if the arguments raised are the same as mine, but maybe
Piotr or someone else will chime in in this thread.)

> This is needed as, even though the majority of distributions still alias the
> ``python`` command to Python 2, some now alias it to Python 3. Some of
> the former also do not provide a ``python2`` command; hence, there is
> currently no way for Python 2 code (or any code that invokes the Python 2
> interpreter directly rather than via ``sys.executable``) to reliably run on
> all Unix-like systems without modification, as the ``python`` command will
> invoke the wrong interpreter version on some systems, and the ``python2``
> command will fail completely on others. The recommendations in this PEP
> provide a very simple mechanism to restore cross-platform support, with
> minimal additional work required on the part of distribution maintainers.
I would like more data about this.  How many OSes have moved their
python executable to python2?  How much people does that impact?  Right
now I think that there?s only Arch and Gentoo, which I would call
minority platforms.  (I?m aware that all UNIX-like free operating
systems could be considered a minority OS all together, but we?re
talking about UNIX-like OSes here :)  Doing what the majority does is
not always a good thing, but for this PEP I think that numbers can help
us assess whether the trouble/benefit ratio is worth it.

In my opinion, the current situation is clear: python is some python2.y,
python3 is a python3.y, this is not ambiguous and will still work in ten
years when we get python4.  Thus, the previous decision of python-dev to
use python3 indefinitely seems good to me.  As a script/program author,
if I use python2 in my shebangs now to support what appears to be
minority platforms, I?m breaking compatibility with a huge number of
systems.  Therefore, I don?t see how this PEP makes the situation
better.  If one OS wants to change the meaning of the python command,
then its packaging tools should adapt shebangs, and its users should be
aware that the majority of existing Python 3 scripts will break.
Therefore, I?m strongly -1 on this PEP: changing the meaning of python
brings much trouble for little or no benefit, and adding python2 adds
another compatibility trouble.

It would be interesting to have feedback from people who lived the
transition to Python 2.

> * The ``pythonX.X`` (e.g. ``python2.6``) commands exist on some systems, on
>   which they invoke specific minor versions of the Python interpreter. It
>   can be useful for distribution-specific packages to take advantage of these
>   utilities if they exist, since it will prevent code breakage if the default
>   minor version of a given major version is changed. However, scripts
>   intending to be cross-platform should not rely on the presence of these
>   utilities, but rather should be tested on several recent minor versions of
>   the target major version, compensating, if necessary, for the small
>   differences that exist between minor versions. This prevents the need for
>   sysadmins to install many very similar versions of the interpreter.
Here again I would be interested in more numbers.  Pythons that people
manually download and install using the provided makefile do have these
pythonx.y executables, so I thought that all OSes did likewise.

Moreover, I disagree about the implied assertion that the minor number
hardly matters (I?m paraphrasing): Python 2.6 and 2.7 *are* different,
not ?very similar?.  I don?t know very well the usages of the community,
but in my experience moving from 2.x to 2.x+1, or even just checking
that your code still works, is a Big Deal.  I?d like this whole bullet
item to be removed.

> Impact on PYTHON* Environment Variables
I think this section should be named PYTHONPATH, as it is the only
envvar that it talks about.  Another minor edit: s/folder/directory/

Regards

From merwok at netwok.org  Thu Aug 11 16:39:34 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Thu, 11 Aug 2011 16:39:34 +0200
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
Message-ID: <4E43E9A6.7020608@netwok.org>

Hi,

I?ve read PEP 402 and would like to offer comments.

I know a bit about the import system, but not down to the nitty-gritty
details of PEP 302 and __path__ computations and all this fun stuff (by
which I mean, not fun at all).  As such, I can?t find nasty issues in
dark corners, but I can offer feedback as a user.  I think it?s a very
well-written explanation of a very useful feature: +1 from me.  If it is
accepted, the docs will certainly be much more concise, but the PEP as a
thought process is a useful document to read.

> When new users come to Python from other languages, they are often
> confused by Python's packaging semantics.
Minor: I would reserve ?packaging? for
packaging/distribution/installation/deployment matters, not Python
modules.  I suggest ?Python package semantics?.

> On the negative side, however, it is non-intuitive for beginners, and
> requires a more complex step to turn a module into a package.  If
> ``Foo`` begins its life as ``Foo.py``, then it must be moved and
> renamed to ``Foo/__init__.py``.
Minor: In the UNIX world, or with version control tools, moving and
renaming are the same one thing (hg mv spam.py spam/__init__.py for
example).  Also, if you turn a module into a package, you may want to
move code around, change imports, etc., so I?m not sure the renaming
part is such a big step.  Anyway, if the import-sig people say that
users think it?s a complex or costly operation, I can believe it.

> (By the way, both of these additions to the import protocol (i.e. the
> dynamically-added ``__path__``, and dynamically-created modules)
> apply recursively to child packages, using the parent package's
> ``__path__`` in place of ``sys.path`` as a basis for generating a
> child ``__path__``.  This means that self-contained and virtual
> packages can contain each other without limitation, with the caveat
> that if you put a virtual package inside a self-contained one, it's
> gonna have a really short ``__path__``!)
I don?t understand the caveat or its implications.

> In other words, we don't allow pure virtual packages to be imported
> directly, only modules and self-contained packages.  (This is an
> acceptable limitation, because there is no *functional* value to
> importing such a package by itself.  After all, the module object
> will have no *contents* until you import at least one of its
> subpackages or submodules!)
> 
> Once ``zc.buildout`` has been successfully imported, though, there
> *will* be a ``zc`` module in ``sys.modules``, and trying to import it
> will of course succeed.  We are only preventing an *initial* import
> from succeeding, in order to prevent false-positive import successes
> when clashing subdirectories are present on ``sys.path``.
I find that limitation acceptable.  After all, there is no zc project,
and no zc module, just a zc namespace.  I?ll just regret that it?s not
possible to provide a module docstring to inform that this is a
namespace package used for X and Y.

> The resulting list (whether empty or not) is then stored in a
> ``sys.virtual_package_paths`` dictionary, keyed by module name.
This was probably said on import-sig, but here I go: yet another import
artifact in the sys module!  I hope we get ImportEngine in 3.3 to clean
up all this.

> * A new ``extend_virtual_paths(path_entry)`` function, to extend
>   existing, already-imported virtual packages' ``__path__`` attributes
>   to include any portions found in a new ``sys.path`` entry.  This
>   function should be called by applications extending ``sys.path``
>   at runtime, e.g. when adding a plugin directory or an egg to the
>   path.
Let?s imagine my application Spam has a namespace spam.ext for plugins.
 To use a custom directory where plugins are stored, or a zip file with
plugins (I don?t use eggs, so let me talk about zip files here), I?d
have to call sys.path.append *and* pkgutil.extend_virtual_paths?

> * ``ImpImporter.iter_modules()`` should be changed to also detect and
>   yield the names of modules found in virtual packages.
Is there any value in providing an argument to get the pre-PEP behavior?
 Or to look at it from a different place, how can Python code know that
some module is a virtual or pure virtual package, if that is even a
useful thing to know?

> Last, but not least, the ``imp`` module (or ``importlib``, if
> appropriate) should expose the algorithm described in the `virtual
> paths`_ section above, as a
> ``get_virtual_path(modulename, parent_path=None)`` function, so that
> creators of ``__import__`` replacements can use it.
If I?m not mistaken, the rule of thumb these days is that imp is edited
when it?s absolutely necessary, otherwise code goes into importlib (more
easily written, read and maintained).

I wonder if importlib.import_module could implement the new import
semantics all by itself, so that we can benefit from this PEP in older
Pythons (importlib is on PyPI).

> * If you are changing a currently self-contained package into a
>   virtual one, it's important to note that you can no longer use its
>   ``__file__`` attribute to locate data files stored in a package
>   directory.  Instead, you must search ``__path__`` or use the
>   ``__file__`` of a submodule adjacent to the desired files, or
>   of a self-contained subpackage that contains the desired files.
Wouldn?t pkgutil.get_data help here?

Besides, putting data files in a Python package is held very poorly by
some (mostly people following the File Hierarchy Standard), and in
distutils2/packaging, we (will) have a resources system that?s as
convenient for users and more flexible for OS packagers.  Using __file__
for more than information on the module is frowned upon for other
reasons anyway (I talked about a Debian developer about this one day but
forgot), so I think the limitation is okay.

> * XXX what is the __file__ of a "pure virtual" package?  ``None``?
>   Some arbitrary string?  The path of the first directory with a
>   trailing separator?  No matter what we put, *some* code is
>   going to break, but the last choice might allow some code to
>   accidentally work.  Is that good or bad?
A pure virtual package having no source file, I think it should have no
__file__ at all.  I don?t know if that would break more code than using
an empty string for example, but it feels righter.

> For those implementing PEP \302 importer objects:
Minor: Here I think a link would not be a nuisance (IOW remove the
backslash).

Regards

From brian.curtin at gmail.com  Thu Aug 11 16:43:44 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Thu, 11 Aug 2011 09:43:44 -0500
Subject: [Python-Dev] cpython: News item for #12724
In-Reply-To: <j1vp3s$6ui$1@dough.gmane.org>
References: <E1QrKAR-0003ja-JK@dinsdale.python.org>
	<j1vp3s$6ui$1@dough.gmane.org>
Message-ID: <CAD+XWwqkZ-nv1BaAmtr1H-qTLe-e0taD9eVZS78w3j=d1vCA9Q@mail.gmail.com>

On Thu, Aug 11, 2011 at 00:26, Georg Brandl <g.brandl at gmx.net> wrote:

> Am 11.08.2011 03:34, schrieb brian.curtin:
> > http://hg.python.org/cpython/rev/3a6782f2a4a8
> > changeset:   71811:3a6782f2a4a8
> > user:        Brian Curtin <brian at python.org>
> > date:        Wed Aug 10 20:32:10 2011 -0500
> > summary:
> >   News item for #12724
> >
> > files:
> >   Misc/NEWS |  2 ++
> >   1 files changed, 2 insertions(+), 0 deletions(-)
>
> If it gets a NEWS entry, why isn't it documented?


I left it out just to see if you were paying attention :)

Now that I got caught, added in
http://hg.python.org/cpython/rev/e88362fb4950
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110811/c0bc1e83/attachment.html>

From sandro.tosi at gmail.com  Thu Aug 11 16:47:17 2011
From: sandro.tosi at gmail.com (Sandro Tosi)
Date: Thu, 11 Aug 2011 16:47:17 +0200
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes
 Issue12722 - link heapq source in the text format in the
In-Reply-To: <4E43E84F.3050309@netwok.org>
References: <E1Qr9Fh-0002Rv-2b@dinsdale.python.org> <4E42E24B.8020601@udel.edu>
	<CAPdtAj2YcGSOyDK1NVWpYJVT=qHNV4Tw50ifKnh8W7njGAn=sg@mail.gmail.com>
	<4E42F0F0.1070203@gmail.com> <4E43E84F.3050309@netwok.org>
Message-ID: <CAPdtAj1_A1oZ2o4m7WxvtOGNnrSvoN2UUq6Vb1Qee-brWbjftg@mail.gmail.com>

On Thu, Aug 11, 2011 at 16:33, ?ric Araujo <merwok at netwok.org> wrote:
> Hi,
>
>> IIRC the reason why we don't do it on 2.x is because we don't have the
>> 'source' directive available in Sphinx and therefore we would have to
>> update all the links manually to link to h.p.o instead of s.p.o.
>
> In 3.2 and higher, there is a custom source role in
> Doc/tools/sphinxext/pyspecific.py. ?For 2.7, I volunteered to change all
> links manually (sed being, as usual, my friend) but just lacked time.

Is there a reason we can't use the same sphinx role in 2.7 too? And
also the same sphinx (thus sphinxext) versions on 2.7 and 3.x? that
would probably help in keeping the diffs on the documentation smaller.

Regards,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

From wok at no-log.org  Thu Aug 11 17:01:21 2011
From: wok at no-log.org (Merwok)
Date: Thu, 11 Aug 2011 17:01:21 +0200
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes
 Issue12722 - link heapq source in the text format in the
In-Reply-To: <CAPdtAj1_A1oZ2o4m7WxvtOGNnrSvoN2UUq6Vb1Qee-brWbjftg@mail.gmail.com>
References: <E1Qr9Fh-0002Rv-2b@dinsdale.python.org>
	<4E42E24B.8020601@udel.edu>	<CAPdtAj2YcGSOyDK1NVWpYJVT=qHNV4Tw50ifKnh8W7njGAn=sg@mail.gmail.com>	<4E42F0F0.1070203@gmail.com>
	<4E43E84F.3050309@netwok.org>
	<CAPdtAj1_A1oZ2o4m7WxvtOGNnrSvoN2UUq6Vb1Qee-brWbjftg@mail.gmail.com>
Message-ID: <4E43EEC1.9060000@no-log.org>

Le 11/08/2011 16:47, Sandro Tosi a ?crit :
> Is there a reason we can't use the same sphinx role in 2.7 too? And
> also the same sphinx (thus sphinxext) versions on 2.7 and 3.x? that
> would probably help in keeping the diffs on the documentation smaller.

Even though the pyspecific module is wholly private and used only for
our build process, Georg seems to follow the rule that we don?t add new
features in stable branches.  I think that?s why the new role was added
in 3.2 when in was in dev phase but not to 2.7 (see #10334).  We also
use different versions of Sphinx.

Regards

From rdmurray at bitdance.com  Thu Aug 11 17:00:21 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Thu, 11 Aug 2011 11:00:21 -0400
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
	/usr/bin/python2 symlink upstream)
In-Reply-To: <4E43E8D0.40201@netwok.org>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
	<4E43E8D0.40201@netwok.org>
Message-ID: <20110811150022.02A192505A7@webabinitio.net>

I think you missed the point of the PEP.  The point is to create a new,
python-dev-blessed standard that the distros will follow.  The primary
goal is so that a script can specify python2 or python3 in the #!
line and expect that to work on all compliant linux systems, which we
hope will be all of them.  Everything else is just details.  And yes,
that distinction is much more important than the distinction between
minor version numbers.  That's the whole point of python3, after all.

--
R. David Murray           http://www.bitdance.com

From merwok at netwok.org  Thu Aug 11 17:12:22 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Thu, 11 Aug 2011 17:12:22 +0200
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
 /usr/bin/python2 symlink upstream)
In-Reply-To: <20110811150022.02A192505A7@webabinitio.net>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
	<4E43E8D0.40201@netwok.org>
	<20110811150022.02A192505A7@webabinitio.net>
Message-ID: <4E43F156.8040008@netwok.org>

Hi Devid,

> I think you missed the point of the PEP.  The point is to create a new,
> python-dev-blessed standard that the distros will follow.  The primary
> goal is so that a script can specify python2 or python3 in the #!
> line and expect that to work on all compliant linux systems, which we
> hope will be all of them.  Everything else is just details.

I?m sorry if my opinion on that main point was lost among remarks on
details.  To rephrase one part of my reply: Right now, the de facto
standard is that shebangs can use python to mean python2 and python3 to
mean python3.  Adding python2 to that and supporting making python
ambiguous seems harmful to me.

Regards

From barry at python.org  Thu Aug 11 17:39:52 2011
From: barry at python.org (Barry Warsaw)
Date: Thu, 11 Aug 2011 11:39:52 -0400
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <4E43E9A6.7020608@netwok.org>
References: <4E43E9A6.7020608@netwok.org>
Message-ID: <20110811113952.2e257351@resist.wooz.org>

On Aug 11, 2011, at 04:39 PM, ?ric Araujo wrote:

>> * XXX what is the __file__ of a "pure virtual" package?  ``None``?
>>   Some arbitrary string?  The path of the first directory with a
>>   trailing separator?  No matter what we put, *some* code is
>>   going to break, but the last choice might allow some code to
>>   accidentally work.  Is that good or bad?
>A pure virtual package having no source file, I think it should have no
>__file__ at all.  I don?t know if that would break more code than using
>an empty string for example, but it feels righter.

I agree that the empty string is the worst of the choices.  no __file__ or
__file__=None is better.

-Barry

From glyph at twistedmatrix.com  Thu Aug 11 20:02:59 2011
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Thu, 11 Aug 2011 14:02:59 -0400
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <20110811113952.2e257351@resist.wooz.org>
References: <4E43E9A6.7020608@netwok.org>
	<20110811113952.2e257351@resist.wooz.org>
Message-ID: <41282FF3-4DAF-4996-B745-E0BEA477FB01@twistedmatrix.com>

On Aug 11, 2011, at 11:39 AM, Barry Warsaw wrote:

> On Aug 11, 2011, at 04:39 PM, ?ric Araujo wrote:
> 
>>> * XXX what is the __file__ of a "pure virtual" package?  ``None``?
>>>  Some arbitrary string?  The path of the first directory with a
>>>  trailing separator?  No matter what we put, *some* code is
>>>  going to break, but the last choice might allow some code to
>>>  accidentally work.  Is that good or bad?
>> A pure virtual package having no source file, I think it should have no
>> __file__ at all.  I don?t know if that would break more code than using
>> an empty string for example, but it feels righter.
> 
> I agree that the empty string is the worst of the choices.  no __file__ or
> __file__=None is better.

In some sense, I agree: hacks like empty strings are likely to lead to path-manipulation bugs where the wrong file gets opened (or worse, deleted, with predictable deleterious effects).  But the whole "pure virtual" mechanism here seems to pile even more inconsistency on top of an already irritatingly inconsistent import mechanism.  I was reasonably happy with my attempt to paper over PEP 302's weirdnesses from a user perspective:

http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html

(or https://launchpad.net/modules if you are not a Twisted user)

Users of this API can traverse the module hierarchy with certain expectations; each module or package would have .pathEntry and .filePath attributes, each of which would refer to the appropriate place.  Of course __path__ complicates things a bit, but so it goes.

Now it seems like pure virtual packages are going to introduce a new type of special case into the hierarchy which have neither .pathEntry nor .filePath objects.

Rather than a one-by-one ad-hoc consideration of which attribute should be set to None or empty strings or "<string>" or what have you, I'd really like to see a discussion in the PEP saying what a package really is vs. what a module is, and what one can reasonably expect from it from an API and tooling perspective.  Right now I have to puzzle out the intent of the final API from the problem/solution description and thought experiment.

Despite authoring several namespace packages myself, I don't have any of the problems described in the PEP.  I just want to know how to write correct tools given this new specification.  I suspect that this PEP will be the only reference for how packages work for a long time coming (just as PEP 302 was before it) so it should really get this right.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110811/a75b8630/attachment-0001.html>

From solipsis at pitrou.net  Thu Aug 11 20:12:41 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 11 Aug 2011 20:12:41 +0200
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
References: <4E43E9A6.7020608@netwok.org>
	<20110811113952.2e257351@resist.wooz.org>
Message-ID: <20110811201241.32c4348c@pitrou.net>

On Thu, 11 Aug 2011 11:39:52 -0400
Barry Warsaw <barry at python.org> wrote:

> On Aug 11, 2011, at 04:39 PM, ?ric Araujo wrote:
> 
> >> * XXX what is the __file__ of a "pure virtual" package?  ``None``?
> >>   Some arbitrary string?  The path of the first directory with a
> >>   trailing separator?  No matter what we put, *some* code is
> >>   going to break, but the last choice might allow some code to
> >>   accidentally work.  Is that good or bad?
> >A pure virtual package having no source file, I think it should have no
> >__file__ at all.  I don?t know if that would break more code than using
> >an empty string for example, but it feels righter.
> 
> I agree that the empty string is the worst of the choices.  no __file__ or
> __file__=None is better.

None should be the answer. It simplifies inspection of module data
(repr(__file__) gives you something recognizable instead of raising)
and makes semantically sense (!) since there is, indeed, no actual file
backing the module.

Regards

Antoine.



From g.brandl at gmx.net  Thu Aug 11 20:22:35 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 11 Aug 2011 20:22:35 +0200
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes
 Issue12722 - link heapq source in the text format in the
In-Reply-To: <4E43EEC1.9060000@no-log.org>
References: <E1Qr9Fh-0002Rv-2b@dinsdale.python.org>
	<4E42E24B.8020601@udel.edu>	<CAPdtAj2YcGSOyDK1NVWpYJVT=qHNV4Tw50ifKnh8W7njGAn=sg@mail.gmail.com>	<4E42F0F0.1070203@gmail.com>
	<4E43E84F.3050309@netwok.org>
	<CAPdtAj1_A1oZ2o4m7WxvtOGNnrSvoN2UUq6Vb1Qee-brWbjftg@mail.gmail.com>
	<4E43EEC1.9060000@no-log.org>
Message-ID: <j216jg$e7f$1@dough.gmane.org>

Am 11.08.2011 17:01, schrieb Merwok:
> Le 11/08/2011 16:47, Sandro Tosi a ?crit :
>> Is there a reason we can't use the same sphinx role in 2.7 too? And
>> also the same sphinx (thus sphinxext) versions on 2.7 and 3.x? that
>> would probably help in keeping the diffs on the documentation smaller.
> 
> Even though the pyspecific module is wholly private and used only for
> our build process, Georg seems to follow the rule that we don?t add new
> features in stable branches.  I think that?s why the new role was added
> in 3.2 when in was in dev phase but not to 2.7 (see #10334).

I think I just put it in default as a test, and intended to backport it
later when it proved useful.  You're welcome to do so now.

>  We also use different versions of Sphinx.

That doesn't matter for this role.

Georg



From pje at telecommunity.com  Thu Aug 11 20:30:51 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Thu, 11 Aug 2011 14:30:51 -0400
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <4E43E9A6.7020608@netwok.org>
References: <4E43E9A6.7020608@netwok.org>
Message-ID: <20110811183114.701DF3A406B@sparrow.telecommunity.com>

At 04:39 PM 8/11/2011 +0200, ??ric Araujo wrote:
>Hi,
>
>I've read PEP 402 and would like to offer comments.

Thanks.

>Minor: I would reserve "packaging" for
>packaging/distribution/installation/deployment matters, not Python
>modules.  I suggest "Python package semantics".

Changing to "Python package import semantics" to hopefully be even 
clearer.  ;-)

(Nitpick: I was somewhat intentionally ambiguous because we are 
talking here about how a package is physically implemented in the 
filesystem, and that actually *is* kind of a packaging issue.  But 
it's not necessarily a *useful* intentional ambiguity, so I've no 
problem with removing it.)


>Minor: In the UNIX world, or with version control tools, moving and
>renaming are the same one thing (hg mv spam.py spam/__init__.py for
>example).  Also, if you turn a module into a package, you may want to
>move code around, change imports, etc., so I'm not sure the renaming
>part is such a big step.  Anyway, if the import-sig people say that
>users think it's a complex or costly operation, I can believe it.

It's not that it's complex or costly in anything other than *mental* 
overhead -- you have to remember to do it and it's not particularly 
obvious.  (But people on import-sig did mention this and other things 
covered by the PEP as being a frequent root cause of beginner 
inquiries on #python, Stackoverflow, et al.)


> > (By the way, both of these additions to the import protocol (i.e. the
> > dynamically-added ``__path__``, and dynamically-created modules)
> > apply recursively to child packages, using the parent package's
> > ``__path__`` in place of ``sys.path`` as a basis for generating a
> > child ``__path__``.  This means that self-contained and virtual
> > packages can contain each other without limitation, with the caveat
> > that if you put a virtual package inside a self-contained one, it's
> > gonna have a really short ``__path__``!)
>I don't understand the caveat or its implications.

Since each package's __path__ is the same length or shorter than its 
parent's by default, then if you put a virtual package inside a 
self-contained one, it will be functionally speaking no different 
than a self-contained one, in that it will have only one path 
entry.  So, it's not really useful to put a virtual package inside a 
self-contained one, even though you can do it.  (Apart form it 
letting you avoid a superfluous __init__ module, assuming it's indeed 
superfluous.)


> > In other words, we don't allow pure virtual packages to be imported
> > directly, only modules and self-contained packages.  (This is an
> > acceptable limitation, because there is no *functional* value to
> > importing such a package by itself.  After all, the module object
> > will have no *contents* until you import at least one of its
> > subpackages or submodules!)
> >
> > Once ``zc.buildout`` has been successfully imported, though, there
> > *will* be a ``zc`` module in ``sys.modules``, and trying to import it
> > will of course succeed.  We are only preventing an *initial* import
> > from succeeding, in order to prevent false-positive import successes
> > when clashing subdirectories are present on ``sys.path``.
>I find that limitation acceptable.  After all, there is no zc project,
>and no zc module, just a zc namespace.  I'll just regret that it's not
>possible to provide a module docstring to inform that this is a
>namespace package used for X and Y.

It *is* possible - you'd just have to put it in a "zc.py" file.  IOW, 
this PEP still allows "namespace-defining packages" to exist, as was 
requested by early commenters on PEP 382.  It just doesn't *require* 
them to exist in order for the namespace contents to be importable.


> > The resulting list (whether empty or not) is then stored in a
> > ``sys.virtual_package_paths`` dictionary, keyed by module name.
>This was probably said on import-sig, but here I go: yet another import
>artifact in the sys module!  I hope we get ImportEngine in 3.3 to clean
>up all this.

Well, I rather *like* having them there, personally, vs. having to 
learn yet another API, but oh well, whatever.  AFAIK, ImportEngine 
isn't going to do away with the need for the global ones to live 
somewhere, at least not in 3.3.


> > * A new ``extend_virtual_paths(path_entry)`` function, to extend
> >   existing, already-imported virtual packages' ``__path__`` attributes
> >   to include any portions found in a new ``sys.path`` entry.  This
> >   function should be called by applications extending ``sys.path``
> >   at runtime, e.g. when adding a plugin directory or an egg to the
> >   path.
>Let's imagine my application Spam has a namespace spam.ext for plugins.
>  To use a custom directory where plugins are stored, or a zip file with
>plugins (I don't use eggs, so let me talk about zip files here), I'd
>have to call sys.path.append *and* pkgutil.extend_virtual_paths?

As written in the current proposal, yes.  There was some discussion 
on Python-Dev about having this happen automatically, and I proposed 
that it could be done by making virtual packages' __path__ attributes 
an iterable proxy object, rather than a list:

   http://mail.python.org/pipermail/python-dev/2011-July/112429.html

(This is an open option that hasn't been added to the PEP as yet, 
because I wanted to know Guido's thoughts on the proposal as it 
stands before burdening it with more implementation detail for a 
feature (automatic updates) that he might not be very keen on to 
begin with, even it does make the semantics that much more familiar 
for Perl or PHP users.)


> > * ``ImpImporter.iter_modules()`` should be changed to also detect and
> >   yield the names of modules found in virtual packages.
>Is there any value in providing an argument to get the pre-PEP behavior?
>  Or to look at it from a different place, how can Python code know that
>some module is a virtual or pure virtual package, if that is even a
>useful thing to know?

Is it a useful thing?  Dunno.  That's why it's open for comment.  If 
the auto-update approach is used, then the __path__ of virtual 
packages will have a distinguishable type().


> > Last, but not least, the ``imp`` module (or ``importlib``, if
> > appropriate) should expose the algorithm described in the `virtual
> > paths`_ section above, as a
> > ``get_virtual_path(modulename, parent_path=None)`` function, so that
> > creators of ``__import__`` replacements can use it.
>If I'm not mistaken, the rule of thumb these days is that imp is edited
>when it's absolutely necessary, otherwise code goes into importlib (more
>easily written, read and maintained).
>
>I wonder if importlib.import_module could implement the new import
>semantics all by itself, so that we can benefit from this PEP in older
>Pythons (importlib is on PyPI).
AFAIK, *that* importlib doesn't include a reimplementation of the 
full import process, though I suppose I could be wrong.  My personal 
plan was just to create a specific pep382 module to include with 
future versions of setuptools, but as things worked out, I'm not sure 
if that'll be sanely doable for pep402.


> > * If you are changing a currently self-contained package into a
> >   virtual one, it's important to note that you can no longer use its
> >   ``__file__`` attribute to locate data files stored in a package
> >   directory.  Instead, you must search ``__path__`` or use the
> >   ``__file__`` of a submodule adjacent to the desired files, or
> >   of a self-contained subpackage that contains the desired files.
>Wouldn't pkgutil.get_data help here?

Not so long as you passed it a package name instead of a module 
name.  This issue exists today with namespace pacakges; it's not new 
to virtual packages.


>Besides, putting data files in a Python package is held very poorly by
>some (mostly people following the File Hierarchy Standard),
ISTM that anybody who thinks that is being inconsistent in 
considering the Python code itself to not be a "data file" by that 
same criterion...  especially since one of the more common uses for 
such "data" files are for e.g. HTML templates (which usually contain 
some sort of code) or GUI resources (which are pretty tightly bound 
to the code).

Are those same people similarly concerned when a Firefox extension 
contains image files as well as JavaScript?  And if not, why is 
Python different?

IOW, I think that those people are being confused by our use of the 
term "data" and thus think of it as an entirely different sort of 
"data" than what is meant by "package data" in the Python world.  I 
am not sure what word would unconfuse (defuse?) them, but we simply 
mean "files that are part of the package but are not of a type that 
Python can import by default," not "user-modifiable data" or "data 
that has meaning or usefulness to code other than the code it was 
packaged with."

Perhaps "package-embedded resources" would be a better 
phrase?  Certainly, it implies that they're *supposed* to be embedded 
there.  ;-)


> > * XXX what is the __file__ of a "pure virtual" package?  ``None``?
> >   Some arbitrary string?  The path of the first directory with a
> >   trailing separator?  No matter what we put, *some* code is
> >   going to break, but the last choice might allow some code to
> >   accidentally work.  Is that good or bad?
>A pure virtual package having no source file, I think it should have no
>__file__ at all.  I don't know if that would break more code than using
>an empty string for example, but it feels righter.
>
> > For those implementing PEP \302 importer objects:
>Minor: Here I think a link would not be a nuisance (IOW remove the
>backslash).

Done. 


From rdmurray at bitdance.com  Thu Aug 11 20:31:33 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Thu, 11 Aug 2011 14:31:33 -0400
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
	/usr/bin/python2 symlink upstream)
In-Reply-To: <4E43F156.8040008@netwok.org>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
	<4E43E8D0.40201@netwok.org>
	<20110811150022.02A192505A7@webabinitio.net>
	<4E43F156.8040008@netwok.org>
Message-ID: <20110811183133.D56E72505A7@webabinitio.net>

On Thu, 11 Aug 2011 17:12:22 +0200, =?UTF-8?B?w4lyaWMgQXJhdWpv?= <merwok at netwok.org> wrote:
> I???m sorry if my opinion on that main point was lost among remarks on
> details.  To rephrase one part of my reply: Right now, the de facto
> standard is that shebangs can use python to mean python2 and python3 to
> mean python3.  Adding python2 to that and supporting making python
> ambiguous seems harmful to me.

OK.  So you are -1 on the PEP.

I'm a big +1.

To address your argument briefly, *now* a minority of distros have python
pointing to python2.  We expect this to change.  It may not happen
for 5 years, but someday it will.  So this PEP is about preparing for
the future.

Given that, I fail to see what harm having an additional symlink named
python2 will do.

And yes this was argued about earlier and should (in theory at least)
be addressed by the PEP, which is why I'm concluding that you are -1 on
the PEP :).

--
R. David Murray           http://www.bitdance.com

From sturla at molden.no  Thu Aug 11 21:11:11 2011
From: sturla at molden.no (Sturla Molden)
Date: Thu, 11 Aug 2011 21:11:11 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
Message-ID: <4E44294F.5060005@molden.no>

Den 09.08.2011 11:33, skrev ???? ?????????:
> Probably I want to re-invent a bicycle. I want developers to say me
> why we can not remove GIL in that way:
>
> 1. Remove GIL completely with all current logick.
> 2. Add it's own RW-locking to all mutable objects (like list or dict)
> 3. Add RW-locks to every context instance
> 4. use RW-locks when accessing members of object instances
>
> Only one reason, I see, not do that -- is performance of
> singlethreaded applications. Why not to fix locking functions for this
> 4 cases to stubs when only one thread present?

This has been discussed to death before, and is probably OT to this list.

There is another reason than speed of single-threaded applications, but 
it is rather technical: As CPython uses reference counting for garbage 
collection, we would get "false sharing" of reference counts -- which 
would work as an "invisible GIL" (synchronization bottleneck) anyway. 
That is, if one processor writes to memory in a cache-line shared by 
another processor, they must stop whatever they are doing to synchronize 
the dirty cache lines with RAM. Thus, updating reference counts would 
flood the memory bus with traffic and be much worse than the GIL. 
Instead of doing useful work, the processors would be stuck 
synchronizing dirty cache lines. You can think of it as a severe traffic 
jam.

To get rid of the GIL, CPython would either need

(a) another GC method (e.g. similar to .NET or Java)

or

(b) another threading model (e.g. one interpreter per thread, as in Tcl, 
Erlang, or .NET app domains).

As CPython has neither, we are better off with the GIL.

Nobody likes the GIL, fork a project to write a GIL free CPython if you 
can. But note that:

1. With Cython, you have full manual control over the GIL. IronPython 
and Jython does not have a GIL at all.

2. Much of the FUD against the GIL is plain ignorance: The GIL slows 
down parallel computational code, but any serious number crunching 
should use numerical performance libraries (i.e. C extensions) anyway. 
Libraries are free to release the GIL or spawn threads internally. Also, 
the GIL does not matter for (a) I/O bound code such as network servers 
or clients and (b) background threads in GUI programs -- which are the 
two common use-cases for threads in Python programs. If the GIL bites 
you, it's most likely a warning that your program is badly written, 
independent of the GIL issue.

There seems to be a common misunderstanding that Python threads work 
like fibers due to they GIL. They do not! Python threads are native OS 
threads and can do anything a thread can do, including executing library 
code in parallel. If one thread is blocking on I/O, the other threads 
can continue with their business.

The only thing Python threads cannot do is access the Python interpreter 
concurrently. And the reason CPython needs that restriction is reference 
counting.

Sturla



From victor.stinner at haypocalc.com  Thu Aug 11 21:31:56 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 11 Aug 2011 21:31:56 +0200
Subject: [Python-Dev] Status of the PEP 400?
	(deprecate	codecs.StreamReader/StreamWriter)
In-Reply-To: <CAP7+vJJQv+fOP7D8E1z2ftjmwm5iap5wxTwQug8nE-5xTM0H-Q@mail.gmail.com>
References: <4E308D63.9090901@haypocalc.com>
	<4E3125D7.2030103@egenix.com>	<4E312BCB.3080301@haypocalc.com>
	<20110729171731.2059cc3e@pitrou.net>	<CADiSq7cpLRYHe8JSEC1W-Ff6rzsYk1x1c2MPjcaei-FA1PcqvQ@mail.gmail.com>
	<CAP7+vJJQv+fOP7D8E1z2ftjmwm5iap5wxTwQug8nE-5xTM0H-Q@mail.gmail.com>
Message-ID: <4E442E2C.4050700@haypocalc.com>

Le 29/07/2011 19:01, Guido van Rossum a ?crit :
>>>> I will add your alternative to the PEP (except if you would like to do
>>>> that yourself?). If I understood correctly, you propose to:
>>>>
>>>>    * rename codecs.open() to codecs.open_stream()
>>>>    * change codecs.open() to reuse open() (and so io.TextIOWrapper)
(...)
>
> +1

Ok, most people prefer this option. Should I modify the PEP to "move" 
this option has the first/main proposition (move my proposition as an 
alternative?), or can the PEP be validated in the current state?

Victor

From tjreedy at udel.edu  Fri Aug 12 00:05:04 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 11 Aug 2011 18:05:04 -0400
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
 /usr/bin/python2 symlink upstream)
In-Reply-To: <4E43E8D0.40201@netwok.org>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
	<4E43E8D0.40201@netwok.org>
Message-ID: <j21jmr$vd9$1@dough.gmane.org>

On 8/11/2011 10:36 AM, ?ric Araujo wrote:

> It would be interesting to have feedback from people who lived the
> transition to Python 2.

There was no comparable transition. Python 2.0 was basically 1.6 renamed 
for a different distributor. I regard Python 2.2, which introduced 
new-style, as the beginning of Python 2 as something significantly 
different from Python 1. I suppose one could also point to the earlier 
intro of unicode. The new iterator protocol was also a major change. In 
any case, back compatibility was kept in all three respects (and others) 
until Python 3.

-- 
Terry Jan Reedy



From tjreedy at udel.edu  Fri Aug 12 00:21:20 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 11 Aug 2011 18:21:20 -0400
Subject: [Python-Dev] Status of the PEP 400? (deprecate
	codecs.StreamReader/StreamWriter)
In-Reply-To: <4E442E2C.4050700@haypocalc.com>
References: <4E308D63.9090901@haypocalc.com>
	<4E3125D7.2030103@egenix.com>	<4E312BCB.3080301@haypocalc.com>
	<20110729171731.2059cc3e@pitrou.net>	<CADiSq7cpLRYHe8JSEC1W-Ff6rzsYk1x1c2MPjcaei-FA1PcqvQ@mail.gmail.com>
	<CAP7+vJJQv+fOP7D8E1z2ftjmwm5iap5wxTwQug8nE-5xTM0H-Q@mail.gmail.com>
	<4E442E2C.4050700@haypocalc.com>
Message-ID: <j21klb$5ea$1@dough.gmane.org>

On 8/11/2011 3:31 PM, Victor Stinner wrote:
> Le 29/07/2011 19:01, Guido van Rossum a ?crit :
>>>>> I will add your alternative to the PEP (except if you would like to do
>>>>> that yourself?). If I understood correctly, you propose to:
>>>>>
>>>>> * rename codecs.open() to codecs.open_stream()
>>>>> * change codecs.open() to reuse open() (and so io.TextIOWrapper)
> (...)
>>
>> +1
>
> Ok, most people prefer this option. Should I modify the PEP to "move"
> this option has the first/main proposition (move my proposition as an
> alternative?), or can the PEP be validated in the current state?

I would relabel the above as the Minimal Change Alternative or M.A.L. 
alternative or whatever and possibly move it but in any case note that 
Guido (and others) accepted that alternative with consideration of more 
drastic changes deferred to later. And add an explicit reference to the 
email you quoted.

-- 
Terry Jan Reedy



From ncoghlan at gmail.com  Fri Aug 12 01:36:32 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 12 Aug 2011 09:36:32 +1000
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <20110811183114.701DF3A406B@sparrow.telecommunity.com>
References: <4E43E9A6.7020608@netwok.org>
	<20110811183114.701DF3A406B@sparrow.telecommunity.com>
Message-ID: <CADiSq7ejkoeSXwusM2npzOr2ORL=z8xi8Gc+8D3C0-XN=UYkUw@mail.gmail.com>

On Fri, Aug 12, 2011 at 4:30 AM, P.J. Eby <pje at telecommunity.com> wrote:
> At 04:39 PM 8/11/2011 +0200, ??ric Araujo wrote:
>> > The resulting list (whether empty or not) is then stored in a
>> > ``sys.virtual_package_paths`` dictionary, keyed by module name.
>> This was probably said on import-sig, but here I go: yet another import
>> artifact in the sys module! ?I hope we get ImportEngine in 3.3 to clean
>> up all this.
>
> Well, I rather *like* having them there, personally, vs. having to learn yet
> another API, but oh well, whatever. ?AFAIK, ImportEngine isn't going to do
> away with the need for the global ones to live somewhere, at least not in
> 3.3.

And likely not for the entire 3.x series - I shudder at the thought of
the backwards incompatibility hell associated with trying to remove
them...

The point of the ImportEngine API is that the caching elements of the
import state introduce cross dependencies between various global data
structures. Code that manipulates those data structures needs to
correctly invalidate or otherwise update the state as things change. I
seem to recall a certain programming construct that is designed to
make it easier to manage interdependent data structures...

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Fri Aug 12 05:10:24 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 12 Aug 2011 13:10:24 +1000
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
 /usr/bin/python2 symlink upstream)
In-Reply-To: <4E43F156.8040008@netwok.org>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
	<4E43E8D0.40201@netwok.org>
	<20110811150022.02A192505A7@webabinitio.net>
	<4E43F156.8040008@netwok.org>
Message-ID: <CADiSq7eun0LfBD7VAqSRvjns3wpbZrCg4o6cN=X34Vu5T1UoTA@mail.gmail.com>

On Fri, Aug 12, 2011 at 1:12 AM, ?ric Araujo <merwok at netwok.org> wrote:
> I?m sorry if my opinion on that main point was lost among remarks on
> details. ?To rephrase one part of my reply: Right now, the de facto
> standard is that shebangs can use python to mean python2 and python3 to
> mean python3. ?Adding python2 to that and supporting making python
> ambiguous seems harmful to me.

This PEP comes mainly out of the fact that we collectively think Arch
(the case that prompted the original discussion) are making a mistake
that will hurt their users in switching the default Python *right
now*, so the PEP is first and foremost designed to record that
consensus. However, their actions do mean that the 'python' name is
*already* ambiguous, no matter what the mainstream distros think. The
Debian maintainers may not care about that, but *I* do, as does anyone
wanting to write distro-agnostic shebang lines.

Given that some distros (large or small), along with some system
administrators, are going to want to have python refer to python3,
either now or at some point in the future, there are really only two
options available to us here:

1. Accept the reality of that situation, and propose a mechanism that
minimises the impact of the resulting ambiguity on end users of Python
by allowing developers to be explicit about their target language.
This is the approach advocated in PEP 394.

2. Tell the Arch developers (and anyone else inclined to point the
python name at python3) that they're wrong, and the python symlink
should, now and forever, always refer to a version of Python 2.x.

It's worth noting that there has never been any previous python-dev
consensus to use 'python3' indefinitely - the status quo came about
because it makes sense for the moment even to those of us that *want*
'python' to eventually refer to 'python3', so there was no previous
need for an explicit choice between the two alternatives. By
migrating, Arch has forced us to choose between either supporting
their action or else telling Python users "Don't blame us, blame the
distros that pointed python at python3" when things break.

I flat out disagree with the second approach - having to type
'python3' when a 3.x variant is the only version of Python installed
would just be dumb, and I also think playing that kind of blame game
in the face of inevitable cross-distro compatibility problems is
disrespectful to our users. If you want to get Zen about it,
'practicality beats purity', 'explicit is better than implicit' and
'In the face of ambiguity, refuse the temptation to guess' all come
down in favour of the approach in PEP 394.

If I haven't persuaded you to adjust your view up to at least a -0
(i.e. don't entirely agree, but won't object to others moving forward
with it) and you still wish to advocate for the second approach, then
I suggest creating a competing PEP in order to provide a clear
alternative proposal (with Guido or his appointed delegate having the
final say, as usual) that explains the alternative recommendation for:
- distros that have already switched their 'python' links to refer to python3
- Python developers wishing to write shebang lines that work on
multiple 2.x versions and support both platforms that define 'python2'
and those that only define 'python'

FWIW, the closest historical precedent I can recall is Red Hat's
issues when users switched the system Python from 1.5 to 2.2+, and the
lesson learned from that exercise was that distro installed utilities
should always reference a specific Python version rather than relying
on the system administrator leaving the 'python' link alone. It sounds
like Debian chose not to heed that lesson, which is unfortunate
(although, to be honest, I'm not sure how well Fedora/Red Hat heed it,
either). However, the commentary in PEP 394 based on that history
(i.e. that distros really shouldn't care where the python name points)
will remain in place.

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Fri Aug 12 05:14:14 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 12 Aug 2011 13:14:14 +1000
Subject: [Python-Dev] Status of the PEP 400? (deprecate
	codecs.StreamReader/StreamWriter)
In-Reply-To: <j21klb$5ea$1@dough.gmane.org>
References: <4E308D63.9090901@haypocalc.com> <4E3125D7.2030103@egenix.com>
	<4E312BCB.3080301@haypocalc.com>
	<20110729171731.2059cc3e@pitrou.net>
	<CADiSq7cpLRYHe8JSEC1W-Ff6rzsYk1x1c2MPjcaei-FA1PcqvQ@mail.gmail.com>
	<CAP7+vJJQv+fOP7D8E1z2ftjmwm5iap5wxTwQug8nE-5xTM0H-Q@mail.gmail.com>
	<4E442E2C.4050700@haypocalc.com> <j21klb$5ea$1@dough.gmane.org>
Message-ID: <CADiSq7fUwofjZ0=v3OGRCsum5z6n+f9kN3oqEnQh-v0GXqW59A@mail.gmail.com>

On Fri, Aug 12, 2011 at 8:21 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 8/11/2011 3:31 PM, Victor Stinner wrote:
>> Ok, most people prefer this option. Should I modify the PEP to "move"
>> this option has the first/main proposition (move my proposition as an
>> alternative?), or can the PEP be validated in the current state?
>
> I would relabel the above as the Minimal Change Alternative or M.A.L.
> alternative or whatever and possibly move it but in any case note that Guido
> (and others) accepted that alternative with consideration of more drastic
> changes deferred to later. And add an explicit reference to the email you
> quoted.

Yeah, definitely retitle/rewrite/rearrange to be clear what Guido
accepted and then state that any future deprecation of components in
the codecs module will be dealt with as a new PEP.

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From vinay_sajip at yahoo.co.uk  Fri Aug 12 10:47:30 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Fri, 12 Aug 2011 08:47:30 +0000 (UTC)
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
References: <4E43E9A6.7020608@netwok.org>
Message-ID: <loom.20110812T103709-106@post.gmane.org>

?ric Araujo <merwok <at> netwok.org> writes:

> Besides, putting data files in a Python package is held very poorly by
> some (mostly people following the File Hierarchy Standard), and in
> distutils2/packaging, we (will) have a resources system that?s as
> convenient for users and more flexible for OS packagers.  Using __file__
> for more than information on the module is frowned upon for other
> reasons anyway (I talked about a Debian developer about this one day but
> forgot), so I think the limitation is okay.
> 

The FHS does not apply in all scenarios - not all Python code is
deployed/packaged at system level. For example, plug-ins (such as Django apps)
are often not meant to be installed by a system-level packager. This might also
be true in scenarios where Python is embedded into some other application. It's
really useful to be able to co-locate packages with their data (e.g. in a zip
file) and I don't think all instances of putting data files in a package are to
be frowned upon.

Regards,

Vinay Sajip


From solipsis at pitrou.net  Fri Aug 12 12:58:46 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Aug 2011 12:58:46 +0200
Subject: [Python-Dev] PEP 3154 - pickle protocol 4
Message-ID: <20110812125846.00a75cd1@pitrou.net>


Hello,

This PEP is an attempt to foster a number of small incremental
improvements in a future pickle protocol version. The PEP process is
used in order to gather as many improvements as possible, because the
introduction of a new protocol version should be a rare occurrence.

Feel free to suggest any additions.

Regards

Antoine.


http://www.python.org/dev/peps/pep-3154/

PEP: 3154
Title: Pickle protocol version 4
Version: $Revision$
Last-Modified: $Date$
Author: Antoine Pitrou <solipsis at pitrou.net>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2011-08-11
Python-Version: 3.3
Post-History:
Resolution: TBD


Abstract
========

Data serialized using the pickle module must be portable accross Python
versions.  It should also support the latest language features as well
as implementation-specific features.  For this reason, the pickle
module knows about several protocols (currently numbered from 0 to 3),
each of which appeared in a different Python version.  Using a
low-numbered protocol version allows to exchange data with old Python
versions, while using a high-numbered protocol allows access to newer
features and sometimes more efficient resource use (both CPU time
required for (de)serializing, and disk size / network bandwidth
required for data transfer).


Rationale
=========

The latest current protocol, coincidentally named protocol 3, appeared
with Python 3.0 and supports the new incompatible features in the
language (mainly, unicode strings by default and the new bytes
object).  The opportunity was not taken at the time to improve the
protocol in other ways.

This PEP is an attempt to foster a number of small incremental
improvements in a future new protocol version.  The PEP process is used
in order to gather as many improvements as possible, because the
introduction of a new protocol version should be a rare occurrence.


Improvements in discussion
==========================

64-bit compatibility for large objects
--------------------------------------

Current protocol versions export object sizes for various built-in types
(str, bytes) as 32-bit ints.  This forbids serialization of large data
[1]_. New opcodes are required to support very large bytes and str
objects.

Native opcodes for sets and frozensets
--------------------------------------

Many common built-in types (such as str, bytes, dict, list, tuple) have
dedicated opcodes to improve resource consumption when serializing and
deserializing them; however, sets and frozensets don't.  Adding such
opcodes would be an obvious improvement.  Also, dedicated set support
could help remove the current impossibility of pickling
self-referential sets [2]_.

Binary encoding for all opcodes
-------------------------------

The GLOBAL opcode, which is still used in protocol 3, uses the so-called
"text" mode of the pickle protocol, which involves looking for newlines
in the pickle stream.  Looking for newlines is difficult to optimize on
a non-seekable stream, and therefore a new version of GLOBAL
(BINGLOBAL?) could use a binary encoding instead.

It seems that all other opcodes emitted when using protocol 3 already
use binary encoding.



Acknowledgments
===============

(...)


References
==========

.. [1] "pickle not 64-bit ready":
   http://bugs.python.org/issue11564

.. [2] "Cannot pickle self-referencing sets":
   http://bugs.python.org/issue9269


Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:



From catch-all at masklinn.net  Fri Aug 12 14:32:43 2011
From: catch-all at masklinn.net (Xavier Morel)
Date: Fri, 12 Aug 2011 14:32:43 +0200
Subject: [Python-Dev] PEP 3154 - pickle protocol 4
In-Reply-To: <20110812125846.00a75cd1@pitrou.net>
References: <20110812125846.00a75cd1@pitrou.net>
Message-ID: <A45A574B-09AA-4871-961A-DF1110B2CCCF@masklinn.net>

On 2011-08-12, at 12:58 , Antoine Pitrou wrote:
> Current protocol versions export object sizes for various built-in types
> (str, bytes) as 32-bit ints.  This forbids serialization of large data
> [1]_. New opcodes are required to support very large bytes and str
> objects.
How about changing object sizes to be 64b always? Too much overhead for the
common case (which might be smaller pickled objects)? Or a slightly more
devious scheme (e.g. tag-bit, untagged is 31b size, tagged is 63), which
would not require adding opcodes for that?

> Also, dedicated set support
> could help remove the current impossibility of pickling
> self-referential sets [2]_.

Is there really no possibility of fix recursive pickling once
and for all? Dedicated optcodes for resource consumption
purposes (and to match those of other build-in types) is
still a good idea, but being able to pickle arbitrary
recursive structures would be even better would it not?

And if specific (new) opcodes are required to handle recursive
pickling correctly, that's the occasion.

From solipsis at pitrou.net  Fri Aug 12 15:30:09 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Aug 2011 15:30:09 +0200
Subject: [Python-Dev] PEP 3154 - pickle protocol 4
In-Reply-To: <A45A574B-09AA-4871-961A-DF1110B2CCCF@masklinn.net>
References: <20110812125846.00a75cd1@pitrou.net>
	<A45A574B-09AA-4871-961A-DF1110B2CCCF@masklinn.net>
Message-ID: <1313155809.3603.18.camel@localhost.localdomain>


Hello,

Le vendredi 12 ao?t 2011 ? 14:32 +0200, Xavier Morel a ?crit :
> On 2011-08-12, at 12:58 , Antoine Pitrou wrote:
> > Current protocol versions export object sizes for various built-in types
> > (str, bytes) as 32-bit ints.  This forbids serialization of large data
> > [1]_. New opcodes are required to support very large bytes and str
> > objects.
> How about changing object sizes to be 64b always? Too much overhead for the
> common case (which might be smaller pickled objects)?

Yes, and also the old opcodes must still be supported, so there's no
maintenance gain in not exploiting them.

> Or a slightly more
> devious scheme (e.g. tag-bit, untagged is 31b size, tagged is 63), which
> would not require adding opcodes for that?

The opcode space is not full enough to justify this kind of
complication, IMO.

> > Also, dedicated set support
> > could help remove the current impossibility of pickling
> > self-referential sets [2]_.
> 
> Is there really no possibility of fix recursive pickling once
> and for all? Dedicated optcodes for resource consumption
> purposes (and to match those of other build-in types) is
> still a good idea, but being able to pickle arbitrary
> recursive structures would be even better would it not?

That's true. Actually, it seems pickling recursive sets could have
worked from the start, if a difference __reduce__ had been chosen and a
__setstate__ had been defined:

>>> class X: pass
... 
>>> class myset(set):
...    def __reduce__(self):
...        return (self.__class__, (), list(self))
...    def __setstate__(self, state):
...        self.update(state)
>>> m = myset((1,2,3))
>>> x = X()
>>> x.m = m
>>> m.add(x)
>>> mm = pickle.loads(pickle.dumps(m))
>>> m
myset({1, 2, 3, <__main__.X object at 0x7fe3635c6990>})
>>> mm
myset({1, 2, 3, <__main__.X object at 0x7fe3635c6c30>})

  # m has a reference loop

>>> [x for x in m if getattr(x, 'm', None) is m]
[<__main__.X object at 0x7fe3635c6990>]

  # mm retains a similar reference loop

>>> [x for x in mm if getattr(x, 'm', None) is mm]
[<__main__.X object at 0x7fe3635c6c30>]

  # the representation is roughly as efficient as the original one

>>> len(pickle.dumps(set([1,2,3])))
36
>>> len(pickle.dumps(myset([1,2,3])))
37


We can't change set.__reduce__ (or __reduce_ex__) without a protocol
bump, though, since past Pythons would fail loading the pickles.

Regards

Antoine.



From van.lindberg at gmail.com  Fri Aug 12 16:32:23 2011
From: van.lindberg at gmail.com (VanL)
Date: Fri, 12 Aug 2011 09:32:23 -0500
Subject: [Python-Dev] GIL removal question
In-Reply-To: <4E44294F.5060005@molden.no>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no>
Message-ID: <j23dhn$9it$1@dough.gmane.org>

On 8/11/2011 2:11 PM, Sturla Molden wrote:
>
> (b) another threading model (e.g. one interpreter per thread, as in Tcl,
> Erlang, or .NET app domains).

We are close to this, in that we already have baked-in support for 
subinterpreters. Out of curiosity, why isn't this being pursued?


From pje at telecommunity.com  Fri Aug 12 17:24:57 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 12 Aug 2011 11:24:57 -0400
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <41282FF3-4DAF-4996-B745-E0BEA477FB01@twistedmatrix.com>
References: <4E43E9A6.7020608@netwok.org>
	<20110811113952.2e257351@resist.wooz.org>
	<41282FF3-4DAF-4996-B745-E0BEA477FB01@twistedmatrix.com>
Message-ID: <20110812152512.112A53A406B@sparrow.telecommunity.com>

At 02:02 PM 8/11/2011 -0400, Glyph Lefkowitz wrote:
>Rather than a one-by-one ad-hoc consideration of which attribute 
>should be set to None or empty strings or "<string>" or what have 
>you, I'd really like to see a discussion in the PEP saying what a 
>package really is vs. what a module is, and what one can reasonably 
>expect from it from an API and tooling perspective.

The assumption I've been working from is the only guarantee I've ever 
seen the Python docs give: i.e., that a package is a module object 
with a __path__ attribute.  Modules aren't even required to have a 
__file__ object -- builtin modules don't, for example.  (And the 
contents of __file__ are not required to have any particular 
semantics: PEP 302 notes that it can be a dummy value like 
"<frozen>", for example.)

Technically, btw, PEP 302 requires __file__ to be a string, so making 
__file__ = None will be a backwards-incompatible change.  But any 
code that walks modules in sys.modules is going to break today if it 
expects a __file__ attribute to exist, because 'sys' itself doesn't have one!

So, my leaning is towards leaving off __file__, since today's code 
already has to deal with it being nonexistent, if it's working with 
arbitrary modules, and that'll produce breakage sooner rather than 
later -- the twisted.python.modules code, for example, would fail 
with a loud AttributeError, rather than going on to silently assume 
that a module with a dummy __file__ isn't a package.   (Which is NOT 
a valid assumption *now*, btw, as I'll explain below.)

Anyway, if you have any suggestions for verbiage that should be added 
to the PEP to clarify these assumptions, I'd be happy to add 
them.  However, I think that the real problem you're encountering at 
the moment has more to do with making assumptions about the Python 
import ecosystem that aren't valid today, and haven't been valid 
since at least the introduction of PEP 302, if not earlier import 
hook systems as well.


>  But the whole "pure virtual" mechanism here seems to pile even 
> more inconsistency on top of an already irritatingly inconsistent 
> import mechanism.  I was reasonably happy with my attempt to paper 
> over PEP 302's weirdnesses from a user perspective:
>
><http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html>http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html
>
>(or <https://launchpad.net/modules>https://launchpad.net/modules if 
>you are not a Twisted user)
>
>Users of this API can traverse the module hierarchy with certain 
>expectations; each module or package would have .pathEntry and 
>.filePath attributes, each of which would refer to the appropriate 
>place.  Of course __path__ complicates things a bit, but so it goes.

I don't mean to be critical, and no doubt what you've written works 
fine for your current requirements, but on my quick attempt to skim 
through the code I found many things which appear to me to be 
incompatible with PEP 302.

That is, the above code hardocdes a variety of assumptions about the 
import system that haven't been true since Python 2.3.  (For example, 
it assumes that the contents of sys.path strings have inspectable 
semantics, that the contents of __file__ can tell you things about 
the module-ness or package-ness of a module object, etc.)

If you want to fully support PEP 302, you might want to consider 
making this a wrapper over the corresponding pkgutil APIs (available 
since Python 2.5) that do roughly the same things, but which delegate 
all path string inspection to importer objects and allow extensible 
delegation for importers that don't support the optional methods involved.

(Of course, if the pkgutil APIs are missing something you need, 
perhaps you could propose additions.)


>Now it seems like pure virtual packages are going to introduce a new 
>type of special case into the hierarchy which have neither 
>.pathEntry nor .filePath objects.

The problem is that your API's notion that these things exist as 
coherent concepts was never really a valid assumption in the first 
place.  .pth files and namespace packages already meant that the idea 
of a package coming from a single path entry made no sense.  And 
namespace packages installed by setuptools' system packaging mode 
*don't have a __file__ attribute* today...  heck they don't have 
__init__ modules, either.

So, adding virtual packages isn't actually going to change anything, 
except perhaps by making these scenarios more common.


From solipsis at pitrou.net  Fri Aug 12 17:42:26 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Aug 2011 17:42:26 +0200
Subject: [Python-Dev] GIL removal question
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no> <j23dhn$9it$1@dough.gmane.org>
Message-ID: <20110812174226.0cd068b1@pitrou.net>

On Fri, 12 Aug 2011 09:32:23 -0500
VanL <van.lindberg at gmail.com> wrote:
> On 8/11/2011 2:11 PM, Sturla Molden wrote:
> >
> > (b) another threading model (e.g. one interpreter per thread, as in Tcl,
> > Erlang, or .NET app domains).
> 
> We are close to this, in that we already have baked-in support for 
> subinterpreters. Out of curiosity, why isn't this being pursued?

Because it is half-baked, breaks with some features in some extension
modules, and still requires the GIL for shared data structures.

Regards

Antoine.



From status at bugs.python.org  Fri Aug 12 18:07:27 2011
From: status at bugs.python.org (Python tracker)
Date: Fri, 12 Aug 2011 18:07:27 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20110812160727.404561CC0B@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-08-05 - 2011-08-12)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    2923 (+24)
  closed 21602 (+23)
  total  24525 (+47)

Open issues with patches: 1264 


Issues opened (35)
==================

#12032: Tools/Scripts/crlf.py needs updating for python 3+
http://bugs.python.org/issue12032  reopened by eric.araujo

#12701: Apple's clang 2.1 (xcode 4.1, OSX 10.7) optimizer miscompiles 
http://bugs.python.org/issue12701  opened by deadshort

#12702: shutil.copytree() should use os.lutimes() to copy the metadata
http://bugs.python.org/issue12702  opened by petri.lehtinen

#12703: Improve error reporting for packaging.util.resolve_name
http://bugs.python.org/issue12703  opened by Natim

#12704: Language References does not specify exception raised by final
http://bugs.python.org/issue12704  opened by Nikratio

#12705: Make compile('1\n2\n', '', 'single') raise an exception instea
http://bugs.python.org/issue12705  opened by Devin Jeanpierre

#12706: timeout sentinel in ftplib and poplib documentation
http://bugs.python.org/issue12706  opened by orsenthil

#12707: Deprecate addinfourl getters
http://bugs.python.org/issue12707  opened by ezio.melotti

#12708: multiprocessing.Pool is missing a starmap[_async]() method.
http://bugs.python.org/issue12708  opened by hynek

#12711: Explain tracker components in devguide
http://bugs.python.org/issue12711  opened by eric.araujo

#12712: weave build_tools library identification
http://bugs.python.org/issue12712  opened by Tim.Holme

#12713: argparse: allow abbreviation of sub commands by users
http://bugs.python.org/issue12713  opened by pwil3058

#12716: Reorganize os docs for files/dirs/fds
http://bugs.python.org/issue12716  opened by benjamin.peterson

#12720: Expose linux extended filesystem attributes
http://bugs.python.org/issue12720  opened by benjamin.peterson

#12721: Chaotic use of helper functions in test_shutil for reading and
http://bugs.python.org/issue12721  opened by hynek

#12723: Provide an API in tkSimpleDialog for defining custom validatio
http://bugs.python.org/issue12723  opened by rabbidous

#12725: Docs: Odd phrase "floating seconds" in socket.html
http://bugs.python.org/issue12725  opened by Cris.Simpson

#12726: explain why locale.getlocale() does not read system's locales
http://bugs.python.org/issue12726  opened by alexis

#12728: Python re lib fails case insensitive matches on Unicode data
http://bugs.python.org/issue12728  opened by tchrist

#12729: Python lib re cannot handle Unicode properly due to narrow/wid
http://bugs.python.org/issue12729  opened by tchrist

#12730: Python's casemapping functions are untrustworthy due to narrow
http://bugs.python.org/issue12730  opened by tchrist

#12731: python lib re uses obsolete sense of \w in full violation of U
http://bugs.python.org/issue12731  opened by tchrist

#12732: Can't portably use Unicode in Python identifiers
http://bugs.python.org/issue12732  opened by tchrist

#12733: Request for grapheme support in Python re lib
http://bugs.python.org/issue12733  opened by tchrist

#12734: Request for property support in Python re lib
http://bugs.python.org/issue12734  opened by tchrist

#12735: request full Unicode collation support in std python library
http://bugs.python.org/issue12735  opened by tchrist

#12737: string.title()  is overzealous by upcasing combining marks ina
http://bugs.python.org/issue12737  opened by tchrist

#12738: Bug in multiprocessing.JoinableQueue() implementation on Ubunt
http://bugs.python.org/issue12738  opened by Michael.Hall

#12739: read stuck with multithreading and simultaneous subprocess.Pop
http://bugs.python.org/issue12739  opened by SAPikachu

#12740: Add struct.Struct.nmemb
http://bugs.python.org/issue12740  opened by skrah

#12741: Implementation of shutil.move
http://bugs.python.org/issue12741  opened by David.Townshend

#12742: Add support for CESU-8 encoding
http://bugs.python.org/issue12742  opened by adalx

#12743: C API marshalling doc contains XXX
http://bugs.python.org/issue12743  opened by JJeffries

#12715: Add symlink support to shutil functions
http://bugs.python.org/issue12715  opened by petri.lehtinen

#12736: Request for python casemapping functions to use full not simpl
http://bugs.python.org/issue12736  opened by tchrist



Most recent 15 issues with no replies (15)
==========================================

#12743: C API marshalling doc contains XXX
http://bugs.python.org/issue12743

#12742: Add support for CESU-8 encoding
http://bugs.python.org/issue12742

#12741: Implementation of shutil.move
http://bugs.python.org/issue12741

#12740: Add struct.Struct.nmemb
http://bugs.python.org/issue12740

#12739: read stuck with multithreading and simultaneous subprocess.Pop
http://bugs.python.org/issue12739

#12737: string.title()  is overzealous by upcasing combining marks ina
http://bugs.python.org/issue12737

#12736: Request for python casemapping functions to use full not simpl
http://bugs.python.org/issue12736

#12735: request full Unicode collation support in std python library
http://bugs.python.org/issue12735

#12733: Request for grapheme support in Python re lib
http://bugs.python.org/issue12733

#12732: Can't portably use Unicode in Python identifiers
http://bugs.python.org/issue12732

#12731: python lib re uses obsolete sense of \w in full violation of U
http://bugs.python.org/issue12731

#12730: Python's casemapping functions are untrustworthy due to narrow
http://bugs.python.org/issue12730

#12728: Python re lib fails case insensitive matches on Unicode data
http://bugs.python.org/issue12728

#12726: explain why locale.getlocale() does not read system's locales
http://bugs.python.org/issue12726

#12725: Docs: Odd phrase "floating seconds" in socket.html
http://bugs.python.org/issue12725



Most recent 15 issues waiting for review (15)
=============================================

#12740: Add struct.Struct.nmemb
http://bugs.python.org/issue12740

#12723: Provide an API in tkSimpleDialog for defining custom validatio
http://bugs.python.org/issue12723

#12721: Chaotic use of helper functions in test_shutil for reading and
http://bugs.python.org/issue12721

#12720: Expose linux extended filesystem attributes
http://bugs.python.org/issue12720

#12711: Explain tracker components in devguide
http://bugs.python.org/issue12711

#12708: multiprocessing.Pool is missing a starmap[_async]() method.
http://bugs.python.org/issue12708

#12691: tokenize.untokenize is broken
http://bugs.python.org/issue12691

#12684: profile does not dump stats on exception like cProfile does
http://bugs.python.org/issue12684

#12668: 3.2 What's New: it's integer->string, not the opposite
http://bugs.python.org/issue12668

#12666: map semantic change not documented in What's New
http://bugs.python.org/issue12666

#12656: test.test_asyncore: add tests for AF_INET6 and AF_UNIX sockets
http://bugs.python.org/issue12656

#12652: Keep test.support docs out of the global docs index
http://bugs.python.org/issue12652

#12650: Subprocess leaks fd upon kill()
http://bugs.python.org/issue12650

#12646: zlib.Decompress.decompress/flush do not raise any exceptions w
http://bugs.python.org/issue12646

#12639: msilib Directory.start_component() fails if keyfile is not Non
http://bugs.python.org/issue12639



Top 10 most discussed issues (10)
=================================

#12682: Meaning of 'accepted' resolution as documented in devguide
http://bugs.python.org/issue12682  10 msgs

#12301: Use :data:`sys.thing` instead of ``sys.thing`` throughout
http://bugs.python.org/issue12301   9 msgs

#12191: Add shutil.chown to allow to use user and group name (and not 
http://bugs.python.org/issue12191   8 msgs

#12666: map semantic change not documented in What's New
http://bugs.python.org/issue12666   7 msgs

#12672: Some problems in documentation extending/newtypes.html
http://bugs.python.org/issue12672   7 msgs

#2857: Add "java modified utf-8" codec
http://bugs.python.org/issue2857   6 msgs

#12721: Chaotic use of helper functions in test_shutil for reading and
http://bugs.python.org/issue12721   6 msgs

#12541: Accepting Badly formed headers in urllib HTTPBasicAuth
http://bugs.python.org/issue12541   5 msgs

#12701: Apple's clang 2.1 (xcode 4.1, OSX 10.7) optimizer miscompiles 
http://bugs.python.org/issue12701   5 msgs

#12715: Add symlink support to shutil functions
http://bugs.python.org/issue12715   5 msgs



Issues closed (24)
==================

#10087: HTML calendar is broken
http://bugs.python.org/issue10087  closed by python-dev

#10741: PyGILState_GetThisThreadState() lacks a doc entry
http://bugs.python.org/issue10741  closed by sandro.tosi

#12437: _ctypes.dlopen does not include errno in OSError
http://bugs.python.org/issue12437  closed by pitrou

#12575: add a AST validator
http://bugs.python.org/issue12575  closed by python-dev

#12608: crash in PyAST_Compile when running Python code
http://bugs.python.org/issue12608  closed by meador.inge

#12661: Add a new shutil.cleartree function to shutil module
http://bugs.python.org/issue12661  closed by chin

#12662: Add support for duplicate options in configparser
http://bugs.python.org/issue12662  closed by lukasz.langa

#12677: Turtle, fix right/left rotation orientation
http://bugs.python.org/issue12677  closed by sandro.tosi

#12687: Python 3.2 fails to load protocol 0 pickle
http://bugs.python.org/issue12687  closed by pitrou

#12694: crlf.py script from Tools doesn't work with Python 3.2
http://bugs.python.org/issue12694  closed by r.david.murray

#12697: timeit documention still refers to the timeit.py script
http://bugs.python.org/issue12697  closed by python-dev

#12698: urllib does not use no_proxy when it has blanks
http://bugs.python.org/issue12698  closed by python-dev

#12699: strange behaviour of locale.getlocale() -> None, None
http://bugs.python.org/issue12699  closed by ned.deily

#12700: test_faulthandler fails on Mac OS X Lion
http://bugs.python.org/issue12700  closed by haypo

#12709: In multiprocessing, error_callback isn't documented for map_as
http://bugs.python.org/issue12709  closed by sandro.tosi

#12710: GTK crash
http://bugs.python.org/issue12710  closed by sandro.tosi

#12714: argparse.ArgumentParser.add_argument() documentation error
http://bugs.python.org/issue12714  closed by r.david.murray

#12717: ConfigParser._Chainmap error in 2.7.2
http://bugs.python.org/issue12717  closed by rhettinger

#12718: Logical mistake of importer method in logging.config.BaseConfi
http://bugs.python.org/issue12718  closed by vinay.sajip

#12719: Direct access to tp_dict can lead to stale attributes
http://bugs.python.org/issue12719  closed by python-dev

#12722: Link to heapq source from docs.python.org not working
http://bugs.python.org/issue12722  closed by python-dev

#12724: Add Py_RETURN_NOTIMPLEMENTED
http://bugs.python.org/issue12724  closed by brian.curtin

#12727: "make" always reruns asdl_c.py
http://bugs.python.org/issue12727  closed by python-dev

#11047: Bad description for an entry in whatsnew/2.7
http://bugs.python.org/issue11047  closed by python-dev

From barry at python.org  Fri Aug 12 18:19:23 2011
From: barry at python.org (Barry Warsaw)
Date: Fri, 12 Aug 2011 12:19:23 -0400
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
 /usr/bin/python2 symlink upstream)
In-Reply-To: <CADiSq7eun0LfBD7VAqSRvjns3wpbZrCg4o6cN=X34Vu5T1UoTA@mail.gmail.com>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
	<4E43E8D0.40201@netwok.org>
	<20110811150022.02A192505A7@webabinitio.net>
	<4E43F156.8040008@netwok.org>
	<CADiSq7eun0LfBD7VAqSRvjns3wpbZrCg4o6cN=X34Vu5T1UoTA@mail.gmail.com>
Message-ID: <20110812121923.16216dd1@resist.wooz.org>

On Aug 12, 2011, at 01:10 PM, Nick Coghlan wrote:

>1. Accept the reality of that situation, and propose a mechanism that
>minimises the impact of the resulting ambiguity on end users of Python
>by allowing developers to be explicit about their target language.
>This is the approach advocated in PEP 394.
>
>2. Tell the Arch developers (and anyone else inclined to point the
>python name at python3) that they're wrong, and the python symlink
>should, now and forever, always refer to a version of Python 2.x.

FWIW, although I generally support the PEP, I also think that distros
themselves have a responsibility to ensure their #! lines are correct, for
scripts they install.  Meaning, if it requires rewriting the #! line on OS
package install, so be it.

-Barry

From catch-all at masklinn.net  Fri Aug 12 18:51:10 2011
From: catch-all at masklinn.net (Xavier Morel)
Date: Fri, 12 Aug 2011 18:51:10 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <4E44294F.5060005@molden.no>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no>
Message-ID: <4E01DE69-C873-48AE-B810-F1E467CBF792@masklinn.net>

On 2011-08-11, at 21:11 , Sturla Molden wrote:
> 
> (b) another threading model (e.g. one interpreter per thread, as in Tcl, Erlang, or .NET app domains).
Nitpick: this is not correct re. erlang.

While it is correct that it uses "another threading model" (one could even say "no threading model"), it's not a "one interpreter per thread" model at all:

* Erlang uses "erlang processes", which are very cheap preempted *processes* (no shared memory). There have always been tens to thousands to millions of erlang processes per interpreter

* A long time ago (before 2006 and the SMP VM, that was R11B) the erlang VM was single-threaded, so all those erlang processes ran in a single OS thread. To use multiple OS threads one had to create an erlang cluster (start multiple VMs and distribute spawned processes over those). However, this was already an m:n model, there were multiple erlang processes for each VM.

* Since the introduction of the SMP VM, the erlang interpreter can create multiple *schedulers* (one per physical core by default), with each scheduler running in its own OS thread. In this model, there's a single interpreter and an m:n mapping of erlang processes to OS threads within that single interpreter. (interestingly, because -smp generates resource contention within the interpreter going back to pre-SMP by setting the number of schedulers per node to 1 can yield increased overall performances)

From merwok at netwok.org  Fri Aug 12 18:51:04 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Fri, 12 Aug 2011 18:51:04 +0200
Subject: [Python-Dev] Backporting howto/pyporting to 2.7
Message-ID: <4E4559F8.7040507@netwok.org>

Hi everyone,

I think it would be useful to have the ?Porting Python 2 Code to Python
3? HOWTO in the 2.7 docs, as I think that a lot of users consult the 2.7
docs.  Is there any reason not to do it?

Regards

From rene at stranden.com  Fri Aug 12 18:57:10 2011
From: rene at stranden.com (Rene Nejsum)
Date: Fri, 12 Aug 2011 18:57:10 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <20110812174226.0cd068b1@pitrou.net>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no> <j23dhn$9it$1@dough.gmane.org>
	<20110812174226.0cd068b1@pitrou.net>
Message-ID: <3F137782-3643-4077-92F7-519C55B921CC@stranden.com>

My two danish kroner on GIL issues?.

I think I understand the background and need for GIL. Without it Python programs would have been cluttered with lock/synchronized statements and C-extensions would be harder to write. Thanks to Sturla Molden for he's explanation earlier in this thread.

However, the GIL is also from a time, where single threaded programs running in single core CPU's was the common case.

On a new MacBook Pro I have 8 core's and would expect my multithreaded Python program to run significantly fast than on a one-core CPU.

Instead the program slows down to a much worse performance than on a one-core CPU. (Have a look at David Beazley's excellent talk on PyCon 2010 and he's paper  http://www.dabeaz.com/GIL/ and http://blip.tv/carlfk/mindblowing-python-gil-2243379)

For my viewpoint the multicore performance problems is the primary problem with the GIL, event though the other issues pointed out are valid.

I still believe that the solution for Python would be to have an "every object is a thread/coroutine" solution a'la 

 - ABCL (http://en.wikipedia.org/wiki/Actor-Based_Concurrent_Language) and 
 - COOC (Concurrent Object Oriented C, (ftp://tsbgw.isl.rdc.toshiba.co.jp/pub/toshiba/cooc-beta.1.1.tar.Z) 

at least looked into as a alternative to a STM solution.

But, my head is not big enough to fully understand this :-)

kind regards
/rene



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110812/b7fce3aa/attachment.html>

From glyph at twistedmatrix.com  Fri Aug 12 19:09:25 2011
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Fri, 12 Aug 2011 13:09:25 -0400
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <20110812152512.112A53A406B@sparrow.telecommunity.com>
References: <4E43E9A6.7020608@netwok.org>
	<20110811113952.2e257351@resist.wooz.org>
	<41282FF3-4DAF-4996-B745-E0BEA477FB01@twistedmatrix.com>
	<20110812152512.112A53A406B@sparrow.telecommunity.com>
Message-ID: <0DA48AAD-78EE-496E-BF20-023B7A0868FD@twistedmatrix.com>


On Aug 12, 2011, at 11:24 AM, P.J. Eby wrote:

> That is, the above code hardocdes a variety of assumptions about the import system that haven't been true since Python 2.3.

Thanks for this feedback.  I honestly did not realize how old and creaky this code had gotten.  It was originally developed for Python 2.4 and it certainly shows its age.  Practically speaking, the code is correct for the bundled importers, and paths and zipfiles are all we've cared about thus far.

> (For example, it assumes that the contents of sys.path strings have inspectable semantics, that the contents of __file__ can tell you things about the module-ness or package-ness of a module object, etc.)

Unfortunately, the primary goal of this code is to do something impossible - walk the module hierarchy without importing any code.  So some heuristics are necessary.  Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be.

However, the isPackage() method can and should be looking at the module if it's already loaded, and not always guessing based on paths.  The whole reason there's an 'importPackages' flag to walk() is that some applications of this code care more about accuracy than others, so it tries to be as correct as it can be.

(Of course this is still wrong for the case where a __path__ is dynamically constructed by user code, but there's only so well one can do at that.)

> If you want to fully support PEP 302, you might want to consider making this a wrapper over the corresponding pkgutil APIs (available since Python 2.5) that do roughly the same things, but which delegate all path string inspection to importer objects and allow extensible delegation for importers that don't support the optional methods involved.

This code still needs to support Python 2.4, but I will make a note of this for future reference.

> (Of course, if the pkgutil APIs are missing something you need, perhaps you could propose additions.)

>> Now it seems like pure virtual packages are going to introduce a new type of special case into the hierarchy which have neither .pathEntry nor .filePath objects.
> 
> The problem is that your API's notion that these things exist as coherent concepts was never really a valid assumption in the first place.  .pth files and namespace packages already meant that the idea of a package coming from a single path entry made no sense.  And namespace packages installed by setuptools' system packaging mode *don't have a __file__ attribute* today...  heck they don't have __init__ modules, either.

The fact that getModule('sys') breaks is reason enough to re-visit some of these design decisions.

> So, adding virtual packages isn't actually going to change anything, except perhaps by making these scenarios more common.

In that case, I guess it's a good thing; these bugs should be dealt with.  Thanks for pointing them out.  My opinion of PEP 402 has been completely reversed - although I'd still like to see a section about the module system from a library/tools author point of view rather than a time-traveling perl user's narrative :).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110812/43f1b847/attachment.html>

From merwok at netwok.org  Fri Aug 12 19:17:20 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Fri, 12 Aug 2011 19:17:20 +0200
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Use real word in
 English text (i.e. not code)
In-Reply-To: <4E455C8A.4030104@udel.edu>
References: <E1QruDF-00087v-KO@dinsdale.python.org> <4E455C8A.4030104@udel.edu>
Message-ID: <4E456020.2020904@netwok.org>

Hi,

>> summary:
>>    Use real word in English text (i.e. not code)
> I agree that 'arg' for 'argument is email/twitter-speak, not proper 
> document prose.

>> -   :synopsis: Command-line option and argument-parsing library.
>> +   :synopsis: Command-line option and argument parsing library.
> However, 'argument-parsing' could/should be left hyphenated as a 
> compound adjective for the same reason 'command-line' is.

With all due respect to the fact that you?re a native speaker and I?m
not, here I disagree because I parse the sentence in this way (using
parens to group things by precedence, if you want):

(((command-line (option and argument)) parsing) library)

To paraphrase, it?s a library to parse options and arguments from the
command line, not a library to parse arguments and (missing verb-ing)
options from the command line.  (I?m not sure I?m clear.)

> An arg you missed
Yes, I looked for all instances of args but not arg.  Will do.

Regards

From rdmurray at bitdance.com  Fri Aug 12 19:34:49 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Fri, 12 Aug 2011 13:34:49 -0400
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
	/usr/bin/python2 symlink upstream)
In-Reply-To: <20110812121923.16216dd1@resist.wooz.org>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
	<4E43E8D0.40201@netwok.org>
	<20110811150022.02A192505A7@webabinitio.net>
	<4E43F156.8040008@netwok.org>
	<CADiSq7eun0LfBD7VAqSRvjns3wpbZrCg4o6cN=X34Vu5T1UoTA@mail.gmail.com>
	<20110812121923.16216dd1@resist.wooz.org>
Message-ID: <20110812173449.8C9E22505A7@webabinitio.net>

On Fri, 12 Aug 2011 12:19:23 -0400, Barry Warsaw <barry at python.org> wrote:
> On Aug 12, 2011, at 01:10 PM, Nick Coghlan wrote:
> >1. Accept the reality of that situation, and propose a mechanism that
> >minimises the impact of the resulting ambiguity on end users of Python
> >by allowing developers to be explicit about their target language.
> >This is the approach advocated in PEP 394.
> >
> >2. Tell the Arch developers (and anyone else inclined to point the
> >python name at python3) that they're wrong, and the python symlink
> >should, now and forever, always refer to a version of Python 2.x.
> 
> FWIW, although I generally support the PEP, I also think that distros
> themselves have a responsibility to ensure their #! lines are correct, for
> scripts they install.  Meaning, if it requires rewriting the #! line on OS
> package install, so be it.

True, but I think that is orthogonal to the purposes of the PEP, which
is about supporting writing of system independent scripts that are *not*
provided by the distribution (or installed via packaging).  And PEP 397
aims to extend that to Windows, as well.

--
R. David Murray           http://www.bitdance.com

From barry at python.org  Fri Aug 12 19:37:14 2011
From: barry at python.org (Barry Warsaw)
Date: Fri, 12 Aug 2011 13:37:14 -0400
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
 /usr/bin/python2 symlink upstream)
In-Reply-To: <20110812173449.8C9E22505A7@webabinitio.net>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
	<4E43E8D0.40201@netwok.org>
	<20110811150022.02A192505A7@webabinitio.net>
	<4E43F156.8040008@netwok.org>
	<CADiSq7eun0LfBD7VAqSRvjns3wpbZrCg4o6cN=X34Vu5T1UoTA@mail.gmail.com>
	<20110812121923.16216dd1@resist.wooz.org>
	<20110812173449.8C9E22505A7@webabinitio.net>
Message-ID: <20110812133714.4a56ee98@resist.wooz.org>

On Aug 12, 2011, at 01:34 PM, R. David Murray wrote:

>True, but I think that is orthogonal to the purposes of the PEP, which
>is about supporting writing of system independent scripts that are *not*
>provided by the distribution (or installed via packaging).  And PEP 397
>aims to extend that to Windows, as well.

Yep, agreed.  It probably should also inform #! transformations that pysetup
could do.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110812/36247339/attachment.pgp>

From pje at telecommunity.com  Fri Aug 12 20:33:47 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 12 Aug 2011 14:33:47 -0400
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <0DA48AAD-78EE-496E-BF20-023B7A0868FD@twistedmatrix.com>
References: <4E43E9A6.7020608@netwok.org>
	<20110811113952.2e257351@resist.wooz.org>
	<41282FF3-4DAF-4996-B745-E0BEA477FB01@twistedmatrix.com>
	<20110812152512.112A53A406B@sparrow.telecommunity.com>
	<0DA48AAD-78EE-496E-BF20-023B7A0868FD@twistedmatrix.com>
Message-ID: <20110812183406.7060D3A406B@sparrow.telecommunity.com>

At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
>Upon further reflection, PEP 402 _will_ make dealing with namespace 
>packages from this code considerably easier: we won't need to do AST 
>analysis to look for a __path__ attribute or anything gross like 
>that improve correctness; we can just look in various directories on 
>sys.path and accurately predict what __path__ will be synthesized to be.

The flip side of that is that you can't always know whether a 
directory is a virtual package without deep inspection: one 
consequence of PEP 402 is that any directory that contains a Python 
module (of whatever type), however deeply nested, will be a valid 
package name.  So, you can't rule out that a given directory *might* 
be a package, without walking its entire reachable subtree.  (Within 
the subset of directory names that are valid Python identifiers, of course.)

However, you *can* quickly tell that a directory *might* be a package 
or is *probably* one: if it contains modules, or is the same name as 
an already-discovered module, it's a pretty safe bet that you can 
flag it as such.

In any case, you probably should *not* do the building of a virtual 
path yourself; the protocols and APIs added by PEP 402 should allow 
you to simply ask for the path to be constructed on your 
behalf.  Otherwise, you are going to be back in the same business of 
second-guessing arbitrary importer backends again!

(E.g. note that PEP 402 does not say virtual package subpaths must be 
filesystem or zipfile subdirectories of their parents - an importer 
could just as easily allow you to treat subdirectories named 
'twisted.python' as part of a virtual package with that name!)

Anyway, pkgutil defines some extra methods that importers can 
implement to support module-walking, and part of the PEP 402 
implementation should be to make this support virtual packages as well.


>This code still needs to support Python 2.4, but I will make a note 
>of this for future reference.

A suggestion: just take the pkgutil code and bundle it for Python 2.4 
as something._pkgutil.  There's very little about it that's 2.5+ 
specific, at least when I wrote the bits that do the module walking.

Of course, the main disadvantage of pkgutil for your purposes is that 
it currently requires packages to be imported in order to walk their 
child modules.  (IIRC, it does *not*, however, require them to be 
imported in order to discover their existence.)


>In that case, I guess it's a good thing; these bugs should be dealt 
>with.  Thanks for pointing them out.  My opinion of PEP 402 has been 
>completely reversed - although I'd still like to see a section about 
>the module system from a library/tools author point of view rather 
>than a time-traveling perl user's narrative :).

LOL.

If you will propose the wording you'd like to see, I'll be happy to 
check it for any current-and-or-future incorrect assumptions.  ;-)


From fdrake at acm.org  Fri Aug 12 20:42:24 2011
From: fdrake at acm.org (Fred Drake)
Date: Fri, 12 Aug 2011 14:42:24 -0400
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Use real word in
 English text (i.e. not code)
In-Reply-To: <4E456020.2020904@netwok.org>
References: <E1QruDF-00087v-KO@dinsdale.python.org> <4E455C8A.4030104@udel.edu>
	<4E456020.2020904@netwok.org>
Message-ID: <CAFT4OTFHZZmOdbbFMKuusW7no8gcoyx53VuZpXFR8c7EHTO=mQ@mail.gmail.com>

I think either

    Command-line option- and argument-parsing library.

or

    Command-line option and argument parsing library.

would be acceptable.


  -Fred

-- 
Fred L. Drake, Jr.? ? <fdrake at acm.org>
"A person who won't read has no advantage over one who can't read."
?? --Samuel Langhorne Clemens

From iacobcatalin at gmail.com  Fri Aug 12 20:56:19 2011
From: iacobcatalin at gmail.com (Catalin Iacob)
Date: Fri, 12 Aug 2011 20:56:19 +0200
Subject: [Python-Dev] Review request issue 12178
Message-ID: <CAHg_5gp8tR7f8pi_Z2pbuYL_k3oW1-FD-uLbozD=xS8_-q7UEw@mail.gmail.com>

Could a core developer please review the patch I proposed for issue
12178 "csv writer doesn't escape escapechar"?

Thanks!

From sturla at molden.no  Fri Aug 12 20:59:42 2011
From: sturla at molden.no (Sturla Molden)
Date: Fri, 12 Aug 2011 20:59:42 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <4E01DE69-C873-48AE-B810-F1E467CBF792@masklinn.net>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no>
	<4E01DE69-C873-48AE-B810-F1E467CBF792@masklinn.net>
Message-ID: <4E45781E.2040608@molden.no>

Den 12.08.2011 18:51, skrev Xavier Morel:
> * Erlang uses "erlang processes", which are very cheap preempted 
> *processes* (no shared memory). There have always been tens to 
> thousands to millions of erlang processes per interpreter source 
> contention within the interpreter going back to pre-SMP by setting the 
> number of schedulers per node to 1 can yield increased overall 
> performances) 

Technically, one can make threads behave like processes if they don't 
share memory pages (though they will still share address space). Erlangs 
use of 'process' instead of 'thread' does not mean an Erlang process has 
to be implemented as an OS process. With one interpreter per thread, and 
a malloc that does not let threads share memory pages (one heap per 
thread), Python could do the same.

On Windows, there is an API function called HeapAlloc, which lets us 
allocate memory form a dedicated heap. The common use case is to prevent 
threads from sharing memory, thus behaving like light-weight processes 
(except address space is shared). On Unix, is is more common to use 
fork() to create new processes instead, as processes are more 
light-weight than on Windows.

Sturla








From sturla at molden.no  Fri Aug 12 20:36:37 2011
From: sturla at molden.no (Sturla Molden)
Date: Fri, 12 Aug 2011 20:36:37 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no> <j23dhn$9it$1@dough.gmane.org>
	<20110812174226.0cd068b1@pitrou.net>
	<3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
Message-ID: <4E4572B5.4070109@molden.no>

Den 12.08.2011 18:57, skrev Rene Nejsum:
> My two danish kroner on GIL issues?.
>
> I think I understand the background and need for GIL. Without it 
> Python programs would have been cluttered with lock/synchronized 
> statements and C-extensions would be harder to write. Thanks to Sturla 
> Molden for he's explanation earlier in this thread.

I doesn't seem I managed to explain it :(

Yes, C extensions would be cluttered with synchronization statements, 
and that is annoying. But that was not my point all!

Even with fine-grained locking in place, a system using reference 
counting will not scale on an multi-processor computer. Cache-lines 
containing reference counts will become incoherent between the 
processors, causing traffic jam on the memory bus.

The technical term in parallel computing litterature is "false sharing".


> However, the GIL is also from a time, where single threaded programs 
> running in single core CPU's was the common case.
>
> On a new MacBook Pro I have 8 core's and would expect my multithreaded 
> Python program to run significantly fast than on a one-core CPU.
>
> Instead the program slows down to a much worse performance than on a 
> one-core CPU.

A multi-threaded program can be slower on a multi-processor computer as 
well, if it suffered from extensive "false sharing" (which Python 
programs nearly always will do).

That is, instead of doing useful work, the processors are stepping on 
each others toes. So they spend the bulk of the time synchronizing cache 
lines with RAM instead of computing.

On a computer with a single processor, there cannot be any false 
sharing. So even without a GIL, a multi-threaded program can often run 
faster on a single-processor computer. That might seem counter-intuitive 
at first. I seen this "inversed scaling" blamed on the GIL many times, 
but it's dead wrong.

Multi-threading is hard to get right, because the programmer must ensure 
that processors don't access the same cache lines. This is one of the 
reasons why numerical programs based on MPI (multiple processes and IPC) 
are likely to perform better than numerical programs based on OpenMP 
(multiple threads and shared memory).

As for Python, it means that it is easier to make a program based on 
multiprocessing scale well on a multi-processor computer, than a program 
based on threading and releasing the GIL. And that has nothing to do 
with the GIL! Albeit, I'd estimate 99% of Python programmers would blame 
it on the GIL. It has to do with what shared memory does if cache lines 
are shared. Intuition about what affects the performance of a 
multi-threaded program is very often wrong. If one needs parallel 
computing, multiple processes is much more likely to scale correctly. 
Threads are better reserved for things like non-blocking I/O.

The problem with the GIL is merely what people think it does -- not what 
it actually does. It is so easy to blame a performance issue on the GIL, 
when it is actually the use of threads and shared memory per se that is 
the problem.

Sturla

From aaron at agoragames.com  Fri Aug 12 21:17:59 2011
From: aaron at agoragames.com (Aaron Westendorf)
Date: Fri, 12 Aug 2011 15:17:59 -0400
Subject: [Python-Dev] GIL removal question
In-Reply-To: <4E45781E.2040608@molden.no>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no>
	<4E01DE69-C873-48AE-B810-F1E467CBF792@masklinn.net>
	<4E45781E.2040608@molden.no>
Message-ID: <CAHiEoM5CebEk4cJfxDcoYCOvuvFOH99tCgT_hdF0zdUJeBtbRw@mail.gmail.com>

Even in the Erlang model, the afore-mentioned issues of bus contention put a
cap on the number of threads you can run in any given application assuming
there's any amount of cross-thread synchronization. I wrote a blog post on
this subject with respect to my experience in tuning RabbitMQ on NUMA
architectures.

http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/

It should be noted that Erlang processes are not the same as OS processes.
They are more akin to green threads, scheduled on N number of legit OS
threads which are in turn run on C number of cores. The end effect is the
same though, as the data is effectively shared across NUMA nodes, which runs
into basic physical constraints.

I used to think the GIL was a major bottleneck, and though I'm not fond of
it, my recent experience has highlighted that *any* application which uses
shared memory will have significant bus contention when scaling across all
cores. The best course of action is shared-nothing MPI style, but in 64bit
land, that can mean significant wasted address space.

<http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/>
-Aaron


On Fri, Aug 12, 2011 at 2:59 PM, Sturla Molden <sturla at molden.no> wrote:

> Den 12.08.2011 18:51, skrev Xavier Morel:
>
>> * Erlang uses "erlang processes", which are very cheap preempted
>> *processes* (no shared memory). There have always been tens to thousands to
>> millions of erlang processes per interpreter source contention within the
>> interpreter going back to pre-SMP by setting the number of schedulers per
>> node to 1 can yield increased overall performances)
>>
>
> Technically, one can make threads behave like processes if they don't share
> memory pages (though they will still share address space). Erlangs use of
> 'process' instead of 'thread' does not mean an Erlang process has to be
> implemented as an OS process. With one interpreter per thread, and a malloc
> that does not let threads share memory pages (one heap per thread), Python
> could do the same.
>
> On Windows, there is an API function called HeapAlloc, which lets us
> allocate memory form a dedicated heap. The common use case is to prevent
> threads from sharing memory, thus behaving like light-weight processes
> (except address space is shared). On Unix, is is more common to use fork()
> to create new processes instead, as processes are more light-weight than on
> Windows.
>
> Sturla
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110812/a8fa96c3/attachment.html>

From a.badger at gmail.com  Fri Aug 12 21:22:17 2011
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 12 Aug 2011 12:22:17 -0700
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
 /usr/bin/python2 symlink upstream)
In-Reply-To: <20110812121923.16216dd1@resist.wooz.org>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
	<4E43E8D0.40201@netwok.org>
	<20110811150022.02A192505A7@webabinitio.net>
	<4E43F156.8040008@netwok.org>
	<CADiSq7eun0LfBD7VAqSRvjns3wpbZrCg4o6cN=X34Vu5T1UoTA@mail.gmail.com>
	<20110812121923.16216dd1@resist.wooz.org>
Message-ID: <20110812192217.GR5771@unaka.lan>

On Fri, Aug 12, 2011 at 12:19:23PM -0400, Barry Warsaw wrote:
> On Aug 12, 2011, at 01:10 PM, Nick Coghlan wrote:
> 
> >1. Accept the reality of that situation, and propose a mechanism that
> >minimises the impact of the resulting ambiguity on end users of Python
> >by allowing developers to be explicit about their target language.
> >This is the approach advocated in PEP 394.
> >
> >2. Tell the Arch developers (and anyone else inclined to point the
> >python name at python3) that they're wrong, and the python symlink
> >should, now and forever, always refer to a version of Python 2.x.
> 
> FWIW, although I generally support the PEP, I also think that distros
> themselves have a responsibility to ensure their #! lines are correct, for
> scripts they install.  Meaning, if it requires rewriting the #! line on OS
> package install, so be it.
> 
+1 with the one caveat... it's nice to upstream fixes.  If there's a simple
thing like python == python-2 and python3 == python-3 everywhere, this is
possible.  If there's something like python2 == python-2 and python-3 ==
python3 everywhere, this is also possible.  The problem is that: the latter
is not the case (python from python.org itself doesn't produce a python2
symlink on install) and historically the former was the case but since
python-dev rejected the notion that python == python-2 that is no long true.

As long as it's just Arch, there's still time to go with #2.  #1 is not
a complete solution (especially because /usr/bin/python2 will never exist on
some historical systems [not ones I run though, so someone else will need to
beat that horse :-)]) but is better than where we are now where there is no
guidance on what's right and wrong at all.

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110812/93d66314/attachment.pgp>

From catch-all at masklinn.net  Fri Aug 12 22:17:22 2011
From: catch-all at masklinn.net (Xavier Morel)
Date: Fri, 12 Aug 2011 22:17:22 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <4E45781E.2040608@molden.no>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no>
	<4E01DE69-C873-48AE-B810-F1E467CBF792@masklinn.net>
	<4E45781E.2040608@molden.no>
Message-ID: <C2DBF894-C2C0-4400-9E5F-319AB6C835F4@masklinn.net>


On 2011-08-12, at 20:59 , Sturla Molden wrote:

> Den 12.08.2011 18:51, skrev Xavier Morel:
>> * Erlang uses "erlang processes", which are very cheap preempted *processes* (no shared memory). There have always been tens to thousands to millions of erlang processes per interpreter source contention within the interpreter going back to pre-SMP by setting the number of schedulers per node to 1 can yield increased overall performances) 
> 
> Technically, one can make threads behave like processes if they don't share memory pages (though they will still share address space). Erlangs use of 'process' instead of 'thread' does not mean an Erlang process has to be implemented as an OS process.
Of course not. I did not write anything implying that.

> With one interpreter per thread, and a malloc that does not let threads share memory pages (one heap per thread), Python could do the same.
Again, my point is that Erlang does not work "with one interpreter per thread". Which was your claim.


From glyph at twistedmatrix.com  Fri Aug 12 23:03:47 2011
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Fri, 12 Aug 2011 17:03:47 -0400
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <20110812183406.7060D3A406B@sparrow.telecommunity.com>
References: <4E43E9A6.7020608@netwok.org>
	<20110811113952.2e257351@resist.wooz.org>
	<41282FF3-4DAF-4996-B745-E0BEA477FB01@twistedmatrix.com>
	<20110812152512.112A53A406B@sparrow.telecommunity.com>
	<0DA48AAD-78EE-496E-BF20-023B7A0868FD@twistedmatrix.com>
	<20110812183406.7060D3A406B@sparrow.telecommunity.com>
Message-ID: <08D6CFE0-3039-4DAF-944F-0F0DEFFFD87A@twistedmatrix.com>


On Aug 12, 2011, at 2:33 PM, P.J. Eby wrote:

> At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
>> Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be.
> 
> The flip side of that is that you can't always know whether a directory is a virtual package without deep inspection: one consequence of PEP 402 is that any directory that contains a Python module (of whatever type), however deeply nested, will be a valid package name.  So, you can't rule out that a given directory *might* be a package, without walking its entire reachable subtree.  (Within the subset of directory names that are valid Python identifiers, of course.)

Are there any rules about passing invalid identifiers to __import__ though, or is that just less likely? :)

> However, you *can* quickly tell that a directory *might* be a package or is *probably* one: if it contains modules, or is the same name as an already-discovered module, it's a pretty safe bet that you can flag it as such.

I still like the idea of a 'marker' file.  It would be great if there were a new marker like "__package__.py".  I say this more for the benefit of users looking at a directory on their filesystem and trying to understand whether this is a package or not than I do for my own programmatic tools though; it's already hard enough to understand the package-ness of a part of your filesystem and its interactions with PYTHONPATH; making directories mysteriously and automatically become packages depending on context will worsen that situation, I think.

I also have this not-terribly-well-defined idea that it would be handy for different providers of the _contents_ of namespace packages to provide their own instrumentation to be made aware that they've been added to the __path__ of a particular package.  This may be a solution in search of a problem, but I imagine that each __package__.py would be executed in the same module namespace.  This would allow namespace packages to do things like set up compatibility aliases, lazy imports, plugin registrations, etc, as they currently do with __init__.py.  Perhaps it would be better to define its relationship to the package-module namespace in a more sensible way than "execute all over each other in no particular order".

Also, if I had my druthers, Python would raise an exception if someone added a directory marked as a package to sys.path, to refuse to import things from it, and when a submodule was run as a script, add the nearest directory not marked as a package to sys.path, rather than the script's directory itself.  The whole "__name__ is wrong because your current directory was wrong when you ran that command" thing is so confusing to explain that I hope we can eventually consign it to the dustbin of history.  But if you can't even reasonably guess whether a directory is supposed to be an entry on sys.path or a package, that's going to be really hard to do.

> In any case, you probably should *not* do the building of a virtual path yourself; the protocols and APIs added by PEP 402 should allow you to simply ask for the path to be constructed on your behalf.  Otherwise, you are going to be back in the same business of second-guessing arbitrary importer backends again!

What do you mean "building of a virtual path"?

> (E.g. note that PEP 402 does not say virtual package subpaths must be filesystem or zipfile subdirectories of their parents - an importer could just as easily allow you to treat subdirectories named 'twisted.python' as part of a virtual package with that name!)
> 
> Anyway, pkgutil defines some extra methods that importers can implement to support module-walking, and part of the PEP 402 implementation should be to make this support virtual packages as well.

The more that this can focus on module-walking without executing code, the happier I'll be :).

>> This code still needs to support Python 2.4, but I will make a note of this for future reference.
> 
> A suggestion: just take the pkgutil code and bundle it for Python 2.4 as something._pkgutil.  There's very little about it that's 2.5+ specific, at least when I wrote the bits that do the module walking.
> 
> Of course, the main disadvantage of pkgutil for your purposes is that it currently requires packages to be imported in order to walk their child modules.  (IIRC, it does *not*, however, require them to be imported in order to discover their existence.)

One of the stipulations of this code is that it might give different results when the modules are loaded and not.  So it's fine to inspect that first and then invoke pkgutil only in the 'loaded' case, with the knowledge that the not-loaded case may be incorrect in the face of certain configurations.

>> In that case, I guess it's a good thing; these bugs should be dealt with.  Thanks for pointing them out.  My opinion of PEP 402 has been completely reversed - although I'd still like to see a section about the module system from a library/tools author point of view rather than a time-traveling perl user's narrative :).
> 
> LOL.
> 
> If you will propose the wording you'd like to see, I'll be happy to check it for any current-and-or-future incorrect assumptions.  ;-)

If I can come up with anything I will definitely send it along.

-glyph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110812/77ed67a5/attachment.html>

From guido at python.org  Fri Aug 12 23:38:25 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 12 Aug 2011 17:38:25 -0400
Subject: [Python-Dev] GIL removal question
In-Reply-To: <3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no> <j23dhn$9it$1@dough.gmane.org>
	<20110812174226.0cd068b1@pitrou.net>
	<3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
Message-ID: <CAP7+vJK2+JgNsRxvUB+GfRQkgXC-ThTDYqPq1FYhPbrWAe05vg@mail.gmail.com>

On Fri, Aug 12, 2011 at 12:57 PM, Rene Nejsum <rene at stranden.com> wrote:
> I think I understand the background and need for GIL. Without it Python
> programs would have been cluttered with lock/synchronized statements and
> C-extensions would be harder to write.

No, sorry, the first half of this is incorrect: with or without the
GIL *Python* code would need the same amount of fine-grained locking.
(The part about C extensions is correct.) I am butting in because this
is a common misunderstanding that really needs to be squashed whenever
it is aired -- the GIL does *not* help Python code to synchronize. A
thread-switch can occur between any two bytecode opcodes. Without the
GIL, atomic operations (e.g. dict lookups that doesn't require
evaluation of __eq__ or __hash__ implemented in Python) are still
supposed to be atomic.

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Fri Aug 12 23:42:39 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 12 Aug 2011 17:42:39 -0400
Subject: [Python-Dev] [PEPs] Rebooting PEP 394 (aka Support the
 /usr/bin/python2 symlink upstream)
In-Reply-To: <j21jmr$vd9$1@dough.gmane.org>
References: <CANaWP3zBo8cNWNHN=jxx_m3tUBk3k+vn+LYgqB+yimdTrzVxwA@mail.gmail.com>
	<4E43E8D0.40201@netwok.org> <j21jmr$vd9$1@dough.gmane.org>
Message-ID: <CAP7+vJKcbCH_Ch18B92iLiOC=Q4iEk+y1UGzL_Gj0G+Zbgf-5A@mail.gmail.com>

On Thu, Aug 11, 2011 at 6:05 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> There was no comparable transition. Python 2.0 was basically 1.6 renamed for
> a different distributor.

No that's not true. If you compare the "what's new" sections there is
quite a large difference between 1.6 and 2.0, despite being released
simultaneously.

> I regard Python 2.2, which introduced new-style, as
> the beginning of Python 2 as something significantly different from Python
> 1.

Just compare:

http://www.python.org/download/releases/2.0/
http://www.python.org/download/releases/1.6/

No argument that 2.2 was a big jump for the type system -- but not for Unicode.

> I suppose one could also point to the earlier intro of unicode.

In 1.6. (But internally we called it the "contractual obligation
release", a Monty Python reference.)

> The new
> iterator protocol was also a major change. In any case, back compatibility
> was kept in all three respects (and others) until Python 3.

(I gotta go, but I don't think it was such a big deal -- it was very
carefully made backwards compatible.)

-- 
--Guido van Rossum (python.org/~guido)

From pje at telecommunity.com  Sat Aug 13 00:42:31 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 12 Aug 2011 18:42:31 -0400
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <08D6CFE0-3039-4DAF-944F-0F0DEFFFD87A@twistedmatrix.com>
References: <4E43E9A6.7020608@netwok.org>
	<20110811113952.2e257351@resist.wooz.org>
	<41282FF3-4DAF-4996-B745-E0BEA477FB01@twistedmatrix.com>
	<20110812152512.112A53A406B@sparrow.telecommunity.com>
	<0DA48AAD-78EE-496E-BF20-023B7A0868FD@twistedmatrix.com>
	<20110812183406.7060D3A406B@sparrow.telecommunity.com>
	<08D6CFE0-3039-4DAF-944F-0F0DEFFFD87A@twistedmatrix.com>
Message-ID: <20110812224246.212FA3A406B@sparrow.telecommunity.com>

At 05:03 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
>Are there any rules about passing invalid identifiers to __import__ 
>though, or is that just less likely? :)

I suppose you have a point there.  ;-)


>I still like the idea of a 'marker' file.  It would be great if 
>there were a new marker like "__package__.py".

Having any required marker file makes separately-installable portions 
of a package impossible, since it would then be in conflict at 
installation time.

The (semi-)competing proposal, PEP 382, is based on allowing each 
portion to have a differently-named marker; we came up with PEP 402 
as a way to get rid of the need for any marker files (not to mention 
the bikeshedding involved.)


>What do you mean "building of a virtual path"?

Constructing the __path__-to-be of a not-yet-imported virtual 
package.  The PEP defines a protocol for constructing this, by asking 
the importer objects to provide __path__ entries, and it does not 
require anything to be imported.  So there's no reason to 
re-implement the algorithm yourself.


>The more that this can focus on module-walking without executing 
>code, the happier I'll be :).

Virtual packages actually improve on this situation, in that a 
virtual path can be computed without the need to import the 
package.  (Assuming a submodule or subpackage doesn't munge the 
__path__, of course.)


From rene at stranden.com  Sat Aug 13 00:51:40 2011
From: rene at stranden.com (Rene Nejsum)
Date: Sat, 13 Aug 2011 00:51:40 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CAP7+vJK2+JgNsRxvUB+GfRQkgXC-ThTDYqPq1FYhPbrWAe05vg@mail.gmail.com>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no> <j23dhn$9it$1@dough.gmane.org>
	<20110812174226.0cd068b1@pitrou.net>
	<3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
	<CAP7+vJK2+JgNsRxvUB+GfRQkgXC-ThTDYqPq1FYhPbrWAe05vg@mail.gmail.com>
Message-ID: <92060770-873B-4F54-B1FC-DB2840464A30@stranden.com>

Thank you for the clarification, I should have been more precise...

On 12/08/2011, at 23.38, Guido van Rossum wrote:

> On Fri, Aug 12, 2011 at 12:57 PM, Rene Nejsum <rene at stranden.com> wrote:
>> I think I understand the background and need for GIL. Without it Python
>> programs would have been cluttered with lock/synchronized statements and
>> C-extensions would be harder to write.
> 
> No, sorry, the first half of this is incorrect: with or without the
> GIL *Python* code would need the same amount of fine-grained locking.
> (The part about C extensions is correct.) I am butting in because this
> is a common misunderstanding that really needs to be squashed whenever
> it is aired -- the GIL does *not* help Python code to synchronize. A
> thread-switch can occur between any two bytecode opcodes. Without the
> GIL, atomic operations (e.g. dict lookups that doesn't require
> evaluation of __eq__ or __hash__ implemented in Python) are still
> supposed to be atomic.
> 
> -- 
> --Guido van Rossum (python.org/~guido)


From tjreedy at udel.edu  Sat Aug 13 03:36:24 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 12 Aug 2011 21:36:24 -0400
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Use real word in
 English text (i.e. not code)
In-Reply-To: <4E456020.2020904@netwok.org>
References: <E1QruDF-00087v-KO@dinsdale.python.org>
	<4E455C8A.4030104@udel.edu> <4E456020.2020904@netwok.org>
Message-ID: <j24kf3$c01$1@dough.gmane.org>

On 8/12/2011 1:17 PM, ?ric Araujo wrote:

> With all due respect to the fact that you?re a native speaker and I?m
> not, here I disagree because I parse the sentence in this way (using
> parens to group things by precedence, if you want):

You are right, I misparsed without considering the full context. You 
actually mean "Command-line option-and-argument-parsing library."
But multiple compound-noun adjectives are awkward and the above is ugly. 
Would "Command-line library for parsing options and arguments" fit?

-- 
Terry Jan Reedy



From ben+python at benfinney.id.au  Sat Aug 13 03:50:41 2011
From: ben+python at benfinney.id.au (Ben Finney)
Date: Sat, 13 Aug 2011 11:50:41 +1000
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Use real word in
	English text (i.e. not code)
References: <E1QruDF-00087v-KO@dinsdale.python.org>
	<4E455C8A.4030104@udel.edu> <4E456020.2020904@netwok.org>
	<j24kf3$c01$1@dough.gmane.org>
Message-ID: <87bovuf5am.fsf@benfinney.id.au>

Terry Reedy <tjreedy at udel.edu> writes:

> But multiple compound-noun adjectives are awkward and the above is ugly.
> Would "Command-line library for parsing options and arguments" fit?

Better, but the binding is still wrong. The ?command-line? should
instead be a modifier for ?options and arguments?.

So:

    Library for parsing command-line options and arguments

-- 
 \                ?Please to bathe inside the tub.? ?hotel room, Japan |
  `\                                                                   |
_o__)                                                                  |
Ben Finney


From greg.ewing at canterbury.ac.nz  Sat Aug 13 02:31:52 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 13 Aug 2011 12:31:52 +1200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <4E45781E.2040608@molden.no>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no>
	<4E01DE69-C873-48AE-B810-F1E467CBF792@masklinn.net>
	<4E45781E.2040608@molden.no>
Message-ID: <4E45C5F8.6060107@canterbury.ac.nz>

Sturla Molden wrote:
> With one interpreter per thread, and 
> a malloc that does not let threads share memory pages (one heap per 
> thread), Python could do the same.

Wouldn't that be more or less equivalent to running
each thread in a separate process?

-- 
Greg

From stefan_ml at behnel.de  Sat Aug 13 08:12:10 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 13 Aug 2011 08:12:10 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CAP7+vJK2+JgNsRxvUB+GfRQkgXC-ThTDYqPq1FYhPbrWAe05vg@mail.gmail.com>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>	<4E44294F.5060005@molden.no>
	<j23dhn$9it$1@dough.gmane.org>	<20110812174226.0cd068b1@pitrou.net>	<3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
	<CAP7+vJK2+JgNsRxvUB+GfRQkgXC-ThTDYqPq1FYhPbrWAe05vg@mail.gmail.com>
Message-ID: <j254jq$el6$1@dough.gmane.org>

Guido van Rossum, 12.08.2011 23:38:
> On Fri, Aug 12, 2011 at 12:57 PM, Rene Nejsum wrote:
>> I think I understand the background and need for GIL. Without it Python
>> programs would have been cluttered with lock/synchronized statements and
>> C-extensions would be harder to write.
>
> No, sorry, the first half of this is incorrect: with or without the
> GIL *Python* code would need the same amount of fine-grained locking.
> (The part about C extensions is correct.) I am butting in because this
> is a common misunderstanding that really needs to be squashed whenever
> it is aired -- the GIL does *not* help Python code to synchronize. A
> thread-switch can occur between any two bytecode opcodes. Without the
> GIL, atomic operations (e.g. dict lookups that doesn't require
> evaluation of __eq__ or __hash__ implemented in Python) are still
> supposed to be atomic.

And in this context, it's worth mentioning that even C code can be bitten 
by the GIL being temporarily released when calling back into the 
interpreter. Only plain C code sequences safely keep the GIL, including 
many (but not all) calls to the C-API.

Stefan


From g.brandl at gmx.net  Sat Aug 13 08:23:18 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 13 Aug 2011 08:23:18 +0200
Subject: [Python-Dev]
 =?windows-1252?q?cpython_=283=2E2=29=3A_Fix_find_com?=
 =?windows-1252?q?mand_in_makefile_=93funny=94_target?=
In-Reply-To: <E1QruDE-00087r-GV@dinsdale.python.org>
References: <E1QruDE-00087r-GV@dinsdale.python.org>
Message-ID: <j2558i$fs5$1@dough.gmane.org>

On 08/12/11 18:03, eric.araujo wrote:
> http://hg.python.org/cpython/rev/1b818f3639ef
> changeset:   71826:1b818f3639ef
> branch:      3.2
> parent:      71823:8032ea4c3619
> user:        ?ric Araujo <merwok at netwok.org>
> date:        Wed Aug 10 02:01:32 2011 +0200
> summary:
>   Fix find command in makefile ?funny? target
> 
> files:
>   Makefile.pre.in |  4 ++--
>   1 files changed, 2 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/Makefile.pre.in b/Makefile.pre.in
> --- a/Makefile.pre.in
> +++ b/Makefile.pre.in
> @@ -1283,7 +1283,7 @@
>  
>  # Find files with funny names
>  funny:
> -	find $(DISTDIRS) \
> +	find $(SUBDIRS) $(SUBDIRSTOO) \
>  		-name .svn -prune \
>  		-o -type d \
>  		-o -name '*.[chs]' \
> @@ -1313,7 +1313,7 @@
>  		-o -name .hgignore \
>  		-o -name .bzrignore \
>  		-o -name MANIFEST \
> -		-o -print
> +		-print

This actually broke the command; it only outputs "MANIFEST" now if present.
The previous version is correct; please revert this part.

Georg


From guido at python.org  Sat Aug 13 15:08:16 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 13 Aug 2011 09:08:16 -0400
Subject: [Python-Dev] GIL removal question
In-Reply-To: <j254jq$el6$1@dough.gmane.org>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no> <j23dhn$9it$1@dough.gmane.org>
	<20110812174226.0cd068b1@pitrou.net>
	<3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
	<CAP7+vJK2+JgNsRxvUB+GfRQkgXC-ThTDYqPq1FYhPbrWAe05vg@mail.gmail.com>
	<j254jq$el6$1@dough.gmane.org>
Message-ID: <CAP7+vJLs3itkYASHdVhFVqHEpJuJEfWneUrp66jwhAT=VfXmFw@mail.gmail.com>

On Sat, Aug 13, 2011 at 2:12 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Guido van Rossum, 12.08.2011 23:38:
>>
>> On Fri, Aug 12, 2011 at 12:57 PM, Rene Nejsum wrote:
>>>
>>> I think I understand the background and need for GIL. Without it Python
>>> programs would have been cluttered with lock/synchronized statements and
>>> C-extensions would be harder to write.
>>
>> No, sorry, the first half of this is incorrect: with or without the
>> GIL *Python* code would need the same amount of fine-grained locking.
>> (The part about C extensions is correct.) I am butting in because this
>> is a common misunderstanding that really needs to be squashed whenever
>> it is aired -- the GIL does *not* help Python code to synchronize. A
>> thread-switch can occur between any two bytecode opcodes. Without the
>> GIL, atomic operations (e.g. dict lookups that doesn't require
>> evaluation of __eq__ or __hash__ implemented in Python) are still
>> supposed to be atomic.
>
> And in this context, it's worth mentioning that even C code can be bitten by
> the GIL being temporarily released when calling back into the interpreter.
> Only plain C code sequences safely keep the GIL, including many (but not
> all) calls to the C-API.

And, though mostly off-topic, the worst problem with C code, calling
back into Python, and the GIL that I have seen (several times):
Suppose you are calling some complex C library that creates threads
itself, where those threads may also call back into Python. Here you
have to put a block around each Python callback that acquires the GIL
before and releases it after, since the new threads (created by C
code) start without the GIL acquired. I remember a truly nasty
incident where the latter was done, but the main thread did not
release the GIL since it was returning directly to Python (which would
of course release the GIL every so many opcodes so the callbacks would
run). But under certain conditions the block with the
acquire-release-GIL code around a Python callback was invoked in the
main thread (when a validation problem was detected early), and since
the main thread didn't release the GIL around the call into the C
code, it hung in a nasty spot. Add many layers of software, and a
hard-to-reproduce error condition that triggers this, and you have a
problem that's very hard to debug...

-- 
--Guido van Rossum (python.org/~guido)

From solipsis at pitrou.net  Sat Aug 13 17:43:46 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 13 Aug 2011 17:43:46 +0200
Subject: [Python-Dev] GIL removal question
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no> <j23dhn$9it$1@dough.gmane.org>
	<20110812174226.0cd068b1@pitrou.net>
	<3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
	<CAP7+vJK2+JgNsRxvUB+GfRQkgXC-ThTDYqPq1FYhPbrWAe05vg@mail.gmail.com>
	<j254jq$el6$1@dough.gmane.org>
	<CAP7+vJLs3itkYASHdVhFVqHEpJuJEfWneUrp66jwhAT=VfXmFw@mail.gmail.com>
Message-ID: <20110813174346.1a034a0d@pitrou.net>

On Sat, 13 Aug 2011 09:08:16 -0400
Guido van Rossum <guido at python.org> wrote:
> 
> And, though mostly off-topic, the worst problem with C code, calling
> back into Python, and the GIL that I have seen (several times):
> Suppose you are calling some complex C library that creates threads
> itself, where those threads may also call back into Python. Here you
> have to put a block around each Python callback that acquires the GIL
> before and releases it after, since the new threads (created by C
> code) start without the GIL acquired. I remember a truly nasty
> incident where the latter was done, but the main thread did not
> release the GIL since it was returning directly to Python (which would
> of course release the GIL every so many opcodes so the callbacks would
> run). But under certain conditions the block with the
> acquire-release-GIL code around a Python callback was invoked in the
> main thread (when a validation problem was detected early), and since
> the main thread didn't release the GIL around the call into the C
> code, it hung in a nasty spot. Add many layers of software, and a
> hard-to-reproduce error condition that triggers this, and you have a
> problem that's very hard to debug...

These days we have PyGILState_Ensure():
http://docs.python.org/dev/c-api/init.html#PyGILState_Ensure

and even dedicated documentation:
http://docs.python.org/dev/c-api/init.html#non-python-created-threads

;)

Regards

Antoine.



From doug.hellmann at gmail.com  Sun Aug 14 01:08:40 2011
From: doug.hellmann at gmail.com (Doug Hellmann)
Date: Sat, 13 Aug 2011 19:08:40 -0400
Subject: [Python-Dev] Fwd: Mirroring Python repos to Bitbucket
References: <4E42DF4A.8010407@atlassian.com>
Message-ID: <CAB98777-1712-4FFB-91A5-3E0B5E400440@gmail.com>


Charles McLaughlin of Atlassian has set up mirrors of the Mercurial repositories hosted on python.org as part of the ongoing infrastructure improvement work. These mirrors will give us a public fail-over repository in the event that hg.python.org goes offline unexpectedly, and also provide features such as RSS feeds of changes for users interested in monitoring the repository passively.

Thank you, Charles for setting this up and Atlassian for hosting it!

Doug

Begin forwarded message:

> From: Charles McLaughlin <cmclaughlin at atlassian.com>
> Date: August 10, 2011 3:43:06 PM EDT
> To: Jesse Noller <jnoller at gmail.com>, Doug Hellmann <doug.hellmann at gmail.com>
> Subject: Mirroring Python repos to Bitbucket
> 
> Hey,
> 
> You guys expressed some interest in mirroring repos to Bitbucket a
> couple weeks ago.  I mentioned we mirror a few Python repos here:
> 
>  https://bitbucket.org/mirror/
> 
> But that doesn't cover everything from hg.python.org.  So, I wrote a
> little script that scrapes the Python HgWeb and mirrors everything to
> Bitbucket.  Here's the script in case you're curious:
> 
>  https://bitbucket.org/cmclaughlin/mirror-python-repos/
> 
> We're running the script hourly to keep the mirrors up to date.  The
> mirrored repos live under this URL:
> 
>  https://bitbucket.org/python_mirrors
> 
> A few people here have mentioned "python_mirrors" is a strange name.  I
> can change that if you'd like.  I don't have any better ideas though.
> Also, anyone can fork my script if they see any room for improvement :)
> 
> Please feel free to forward this email to mailing lists, etc to get the
> word out.
> 
> Regards,
> Charles


From solipsis at pitrou.net  Sun Aug 14 01:23:01 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 14 Aug 2011 01:23:01 +0200
Subject: [Python-Dev] Mirroring Python repos to Bitbucket
References: <4E42DF4A.8010407@atlassian.com>
	<CAB98777-1712-4FFB-91A5-3E0B5E400440@gmail.com>
Message-ID: <20110814012301.45c46c1e@pitrou.net>

On Sat, 13 Aug 2011 19:08:40 -0400
Doug Hellmann <doug.hellmann at gmail.com> wrote:
> 
> Charles McLaughlin of Atlassian has set up mirrors of the Mercurial repositories hosted on python.org as part of the ongoing infrastructure improvement work. These mirrors will give us a public fail-over repository in the event that hg.python.org goes offline unexpectedly, and also provide features such as RSS feeds of changes for users interested in monitoring the repository passively.

There is already an RSS feed at http://hg.python.org/cpython/rss-log
Another possibility is the gmane mirror of python-checkins, which has
its own RSS feed: http://rss.gmane.org/gmane.comp.python.cvs

Regards

Antoine.



From guido at python.org  Sun Aug 14 01:26:47 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 13 Aug 2011 16:26:47 -0700
Subject: [Python-Dev] GIL removal question
In-Reply-To: <20110813174346.1a034a0d@pitrou.net>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no> <j23dhn$9it$1@dough.gmane.org>
	<20110812174226.0cd068b1@pitrou.net>
	<3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
	<CAP7+vJK2+JgNsRxvUB+GfRQkgXC-ThTDYqPq1FYhPbrWAe05vg@mail.gmail.com>
	<j254jq$el6$1@dough.gmane.org>
	<CAP7+vJLs3itkYASHdVhFVqHEpJuJEfWneUrp66jwhAT=VfXmFw@mail.gmail.com>
	<20110813174346.1a034a0d@pitrou.net>
Message-ID: <CAP7+vJJ7G3-5Xi7j0hOEHwpt_tnU6-6HV5o9P8C9YZp+aqmKyA@mail.gmail.com>

On Sat, Aug 13, 2011 at 8:43 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sat, 13 Aug 2011 09:08:16 -0400
> Guido van Rossum <guido at python.org> wrote:
>>
>> And, though mostly off-topic, the worst problem with C code, calling
>> back into Python, and the GIL that I have seen (several times):
>> Suppose you are calling some complex C library that creates threads
>> itself, where those threads may also call back into Python. Here you
>> have to put a block around each Python callback that acquires the GIL
>> before and releases it after, since the new threads (created by C
>> code) start without the GIL acquired. I remember a truly nasty
>> incident where the latter was done, but the main thread did not
>> release the GIL since it was returning directly to Python (which would
>> of course release the GIL every so many opcodes so the callbacks would
>> run). But under certain conditions the block with the
>> acquire-release-GIL code around a Python callback was invoked in the
>> main thread (when a validation problem was detected early), and since
>> the main thread didn't release the GIL around the call into the C
>> code, it hung in a nasty spot. Add many layers of software, and a
>> hard-to-reproduce error condition that triggers this, and you have a
>> problem that's very hard to debug...
>
> These days we have PyGILState_Ensure():
> http://docs.python.org/dev/c-api/init.html#PyGILState_Ensure
>
> and even dedicated documentation:
> http://docs.python.org/dev/c-api/init.html#non-python-created-threads
>
> ;)

That is awesome!

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Sun Aug 14 02:08:20 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 13 Aug 2011 17:08:20 -0700
Subject: [Python-Dev] PEP 3154 - pickle protocol 4
In-Reply-To: <20110812125846.00a75cd1@pitrou.net>
References: <20110812125846.00a75cd1@pitrou.net>
Message-ID: <CAP7+vJJNS1YMmxhPwCjbAnNYThZZbB+Hsv-cqcAPK8CW_oebeQ@mail.gmail.com>

On Fri, Aug 12, 2011 at 3:58 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> This PEP is an attempt to foster a number of small incremental
> improvements in a future pickle protocol version. The PEP process is
> used in order to gather as many improvements as possible, because the
> introduction of a new protocol version should be a rare occurrence.

Thanks. this sounds like a good idea. That's not to say that I have
already approved the PEP. :-) But from skimming it I have no
objections except that it needs to be fleshed out.

-- 
--Guido van Rossum (python.org/~guido)

From doug.hellmann at gmail.com  Sun Aug 14 02:42:46 2011
From: doug.hellmann at gmail.com (Doug Hellmann)
Date: Sat, 13 Aug 2011 20:42:46 -0400
Subject: [Python-Dev] Mirroring Python repos to Bitbucket
In-Reply-To: <20110814012301.45c46c1e@pitrou.net>
References: <4E42DF4A.8010407@atlassian.com>
	<CAB98777-1712-4FFB-91A5-3E0B5E400440@gmail.com>
	<20110814012301.45c46c1e@pitrou.net>
Message-ID: <4FD4019A-F3EF-47AD-8C8E-6D9A9D8BF8A8@gmail.com>


On Aug 13, 2011, at 7:23 PM, Antoine Pitrou wrote:

> On Sat, 13 Aug 2011 19:08:40 -0400
> Doug Hellmann <doug.hellmann at gmail.com> wrote:
>> 
>> Charles McLaughlin of Atlassian has set up mirrors of the Mercurial repositories hosted on python.org as part of the ongoing infrastructure improvement work. These mirrors will give us a public fail-over repository in the event that hg.python.org goes offline unexpectedly, and also provide features such as RSS feeds of changes for users interested in monitoring the repository passively.
> 
> There is already an RSS feed at http://hg.python.org/cpython/rss-log
> Another possibility is the gmane mirror of python-checkins, which has
> its own RSS feed: http://rss.gmane.org/gmane.comp.python.cvs

Thanks for the tip, I didn't know about either of those.

Doug


From ncoghlan at gmail.com  Sun Aug 14 03:37:18 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 14 Aug 2011 11:37:18 +1000
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CAP7+vJJ7G3-5Xi7j0hOEHwpt_tnU6-6HV5o9P8C9YZp+aqmKyA@mail.gmail.com>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no> <j23dhn$9it$1@dough.gmane.org>
	<20110812174226.0cd068b1@pitrou.net>
	<3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
	<CAP7+vJK2+JgNsRxvUB+GfRQkgXC-ThTDYqPq1FYhPbrWAe05vg@mail.gmail.com>
	<j254jq$el6$1@dough.gmane.org>
	<CAP7+vJLs3itkYASHdVhFVqHEpJuJEfWneUrp66jwhAT=VfXmFw@mail.gmail.com>
	<20110813174346.1a034a0d@pitrou.net>
	<CAP7+vJJ7G3-5Xi7j0hOEHwpt_tnU6-6HV5o9P8C9YZp+aqmKyA@mail.gmail.com>
Message-ID: <CADiSq7fqOexNk2ES04e6Gp0MOpFt=QJBk1Sowp83O4ChQjo-2A@mail.gmail.com>

On Sun, Aug 14, 2011 at 9:26 AM, Guido van Rossum <guido at python.org> wrote:
>> These days we have PyGILState_Ensure():
>> http://docs.python.org/dev/c-api/init.html#PyGILState_Ensure
>>
>> and even dedicated documentation:
>> http://docs.python.org/dev/c-api/init.html#non-python-created-threads
>>
>> ;)
>
> That is awesome!

Although, if it's possible to arrange it, it's still better to do that
once and then use BEGIN/END_ALLOW_THREADS to avoid the overhead of
creating and destroying the temporary thread states:
http://blog.ccpgames.com/kristjan/2011/06/23/temporary-thread-state-overhead/

Still, it's far, far easier than it used to be to handle the GIL
correctly from non-Python created threads.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Sun Aug 14 03:42:15 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 14 Aug 2011 11:42:15 +1000
Subject: [Python-Dev] Fwd: Mirroring Python repos to Bitbucket
In-Reply-To: <CAB98777-1712-4FFB-91A5-3E0B5E400440@gmail.com>
References: <4E42DF4A.8010407@atlassian.com>
	<CAB98777-1712-4FFB-91A5-3E0B5E400440@gmail.com>
Message-ID: <CADiSq7fC+bK_D6U4Mj=fcOhOe_NipPo5hmrP3zP8z6MoDZ48jQ@mail.gmail.com>

On Sun, Aug 14, 2011 at 9:08 AM, Doug Hellmann <doug.hellmann at gmail.com> wrote:
>
> Charles McLaughlin of Atlassian has set up mirrors of the Mercurial repositories hosted on python.org as part of the ongoing infrastructure improvement work. These mirrors will give us a public fail-over repository in the event that hg.python.org goes offline unexpectedly, and also provide features such as RSS feeds of changes for users interested in monitoring the repository passively.

The main advantage of those mirrors to my mind is that it makes it
easy for anyone to clone their own copy of the python.org repos
without having to upload the whole thing to bitbucket themselves. That
makes it easy for people to use a natural Mercurial workflow to
develop and collaborate on patches, even for components other than the
main CPython repo (e.g. the devguide or the benchmark suite).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Sun Aug 14 03:44:53 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 14 Aug 2011 11:44:53 +1000
Subject: [Python-Dev] [Python-checkins] cpython: Monotonic,
	not monotonous
In-Reply-To: <E1QsO1b-00017N-QW@dinsdale.python.org>
References: <E1QsO1b-00017N-QW@dinsdale.python.org>
Message-ID: <CADiSq7fjg12q91vL58G9pG5KEUkihyf2bSuxnxfndrc=p9WLww@mail.gmail.com>

On Sun, Aug 14, 2011 at 9:53 AM, antoine.pitrou
<python-checkins at python.org> wrote:
> http://hg.python.org/cpython/rev/0273d0734593
> changeset: ? 71862:0273d0734593
> user: ? ? ? ?Antoine Pitrou <solipsis at pitrou.net>
> date: ? ? ? ?Sun Aug 14 01:51:52 2011 +0200
> summary:
> ?Monotonic, not monotonous
>
> files:
> ?Lib/test/pickletester.py | ?2 +-
> ?1 files changed, 1 insertions(+), 1 deletions(-)

I dunno, I reckon systematically testing pickles could get pretty
monotonous, too ;)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From sturla at molden.no  Sun Aug 14 03:53:15 2011
From: sturla at molden.no (Sturla Molden)
Date: Sun, 14 Aug 2011 03:53:15 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <20110813174346.1a034a0d@pitrou.net>
References: <CAEmTpZGW1B0vaPytfLo_ivkU+tqiQpm2RLbzMf=LZqFBU-gALg@mail.gmail.com>
	<4E44294F.5060005@molden.no> <j23dhn$9it$1@dough.gmane.org>
	<20110812174226.0cd068b1@pitrou.net>
	<3F137782-3643-4077-92F7-519C55B921CC@stranden.com>
	<CAP7+vJK2+JgNsRxvUB+GfRQkgXC-ThTDYqPq1FYhPbrWAe05vg@mail.gmail.com>
	<j254jq$el6$1@dough.gmane.org>
	<CAP7+vJLs3itkYASHdVhFVqHEpJuJEfWneUrp66jwhAT=VfXmFw@mail.gmail.com>
	<20110813174346.1a034a0d@pitrou.net>
Message-ID: <4E472A8B.9040208@molden.no>

Den 13.08.2011 17:43, skrev Antoine Pitrou:
> These days we have PyGILState_Ensure():
> http://docs.python.org/dev/c-api/init.html#PyGILState_Ensure
>
With the most recent Cython (0.15) we can just do:

    with gil:
<suite>

to ensure holding the GIL.

And similarly from a thread holding the GIL

    with nogil:
<suite>

to temporarily release it.

There are also some OpenMP support in Cython 0.15. OpenMP is much easier 
than messing around with threads manually (it moves all the hard parts 
of multithreading to the compiler). Now Cython almost makes it look 
Pythonic:

http://docs.cython.org/src/userguide/parallelism.html


Sturla



From ziade.tarek at gmail.com  Sun Aug 14 11:41:47 2011
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Sun, 14 Aug 2011 11:41:47 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
Message-ID: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>

Hi all,

I am lacking of time right now to finish an important task before 3.2
final is out: we need to release "packaging" as a standalone release
under Python 2.x and Python 3.1, to gather as much feedback as we can
from more people.

Doing an automated conversion turned out to be a nightmare, and I was
about to go ahead and maintain a fork of the packaging package, with
the few modules that are needed (sysconfig, etc) within a standalone
release.

I am looking for someone that has some free time and that is willing
to lead this work.

3.2 can go out without this work of course, but it would be *much*
better to have that feedback

If you are interested, please let me know.

Cheers
Tarek

-- 
Tarek Ziad? | http://ziade.org

From martin at v.loewis.de  Sun Aug 14 18:31:50 2011
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 14 Aug 2011 18:31:50 +0200
Subject: [Python-Dev] Python 3.2.2rc1
Message-ID: <4E47F876.7010105@v.loewis.de>

On behalf of the Python development team, I'm happy to announce the
first release candidate of the Python 3.2.2 maintenance release (3.2.2rc1).

Python 3.2.2 fixes `a regression <http://bugs.python.org/12576>`_ in the
``urllib.request`` module that prevented opening many HTTP resources
correctly
with Python 3.2.1.

Python 3.2 is a continuation of the efforts to improve and stabilize the
Python 3.x line.  Since the final release of Python 2.7, the 2.x line
will only receive bugfixes, and new features are developed for 3.x only.

Since PEP 3003, the Moratorium on Language Changes, is in effect, there
are no changes in Python's syntax and built-in types in Python 3.2.
Development efforts concentrated on the standard library and support for
porting code to Python 3.  Highlights are:

* numerous improvements to the unittest module
* PEP 3147, support for .pyc repository directories
* PEP 3149, support for version tagged dynamic libraries
* PEP 3148, a new futures library for concurrent programming
* PEP 384, a stable ABI for extension modules
* PEP 391, dictionary-based logging configuration
* an overhauled GIL implementation that reduces contention
* an extended email package that handles bytes messages
* a much improved ssl module with support for SSL contexts and certificate
  hostname matching
* a sysconfig module to access configuration information
* additions to the shutil module, among them archive file support
* many enhancements to configparser, among them mapping protocol support
* improvements to pdb, the Python debugger
* countless fixes regarding bytes/string issues; among them full support
  for a bytes environment (filenames, environment variables)
* many consistency and behavior fixes for numeric operations

For a more extensive list of changes in 3.2, see

    http://docs.python.org/3.2/whatsnew/3.2.html

To download Python 3.2 visit:

    http://www.python.org/download/releases/3.2/

Please consider trying Python 3.2 with your code and reporting any bugs
you may notice to:

    http://bugs.python.org/


Enjoy!

--
Martin v. L?wis
(on behalf of the entire python-dev team and 3.2's contributors)

From ncoghlan at gmail.com  Mon Aug 15 01:20:44 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 15 Aug 2011 09:20:44 +1000
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
Message-ID: <CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>

On Sun, Aug 14, 2011 at 7:41 PM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
> Hi all,
>
> I am lacking of time right now to finish an important task before 3.2
> final is out:

If anyone else got at all confused by Tarek's email, s/3.x/3.x+1/ and
it will all make sense (the mentioned release numbers in the 3.x
series are all one lower than they should be - packaging is planned
for 3.3, but a standalone library will allow feedback to be gathered
from 2.x and 3.2 users before the API is 'locked in' for 3.3).

How far has packaging diverged from distutils2, though? Wasn't that
the planned venue for any backports in order to avoid name conflicts?

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From tjreedy at udel.edu  Mon Aug 15 03:06:29 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 14 Aug 2011 21:06:29 -0400
Subject: [Python-Dev] Python 3.2.2rc1
In-Reply-To: <4E47F876.7010105@v.loewis.de>
References: <4E47F876.7010105@v.loewis.de>
Message-ID: <j29rf3$ano$1@dough.gmane.org>

On 8/14/2011 12:31 PM, "Martin v. L?wis" wrote:
> On behalf of the Python development team, I'm happy to announce the
> first release candidate of the Python 3.2.2 maintenance release (3.2.2rc1).
>
> Python 3.2.2 fixes `a regression<http://bugs.python.org/12576>`_ in the
> ``urllib.request`` module that prevented opening many HTTP resources
> correctly
> with Python 3.2.1.

It also has the fix for http://bugs.python.org/issue12540
as requested. Thank you.

-- 
Terry Jan Reedy



From brett at python.org  Mon Aug 15 04:34:38 2011
From: brett at python.org (Brett Cannon)
Date: Sun, 14 Aug 2011 19:34:38 -0700
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <20110811090242.1083782f@msiwind>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
Message-ID: <CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>

On Thu, Aug 11, 2011 at 00:02, Antoine Pitrou <solipsis at pitrou.net> wrote:

> Le Thu, 11 Aug 2011 03:34:37 +0200,
> brian.curtin <python-checkins at python.org> a ?crit :
> > http://hg.python.org/cpython/rev/77a65b078852
> > changeset:   71809:77a65b078852
> > parent:      71803:1b4fae183da3
> > user:        Brian Curtin <brian at python.org>
> > date:        Wed Aug 10 20:05:21 2011 -0500
> > summary:
> >   Add Py_RETURN_NOTIMPLEMENTED macro. Fixes #12724.
>
>
> It would sound more useful to have a generic Py_RETURN() macro rather than
> some specific forms for each and every common object.
>

Since the macro is rather generic, sure, but the name should probably be
better since it doesn't necessarily convene the fact that a INCREF has
occurred. So maybe Py_INCREF_RETURN()?


>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110814/1415fae5/attachment.html>

From benjamin at python.org  Mon Aug 15 04:36:54 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Sun, 14 Aug 2011 21:36:54 -0500
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
Message-ID: <CAPZV6o9ECks6ypY5FifM97Pc+Coru322a0qcqPL2pY=a8cegBA@mail.gmail.com>

2011/8/14 Brett Cannon <brett at python.org>:
>
>
> On Thu, Aug 11, 2011 at 00:02, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>> Le Thu, 11 Aug 2011 03:34:37 +0200,
>> brian.curtin <python-checkins at python.org> a ?crit :
>> > http://hg.python.org/cpython/rev/77a65b078852
>> > changeset: ? 71809:77a65b078852
>> > parent: ? ? ?71803:1b4fae183da3
>> > user: ? ? ? ?Brian Curtin <brian at python.org>
>> > date: ? ? ? ?Wed Aug 10 20:05:21 2011 -0500
>> > summary:
>> > ? Add Py_RETURN_NOTIMPLEMENTED macro. Fixes #12724.
>>
>>
>> It would sound more useful to have a generic Py_RETURN() macro rather than
>> some specific forms for each and every common object.
>
> Since the macro is rather generic, sure, but the name should probably be
> better since it doesn't necessarily convene the fact that a INCREF has
> occurred. So maybe Py_INCREF_RETURN()?

That nearly nullifies the space saving. I think that fact that it's a
macro at all conveys that it does something else aside from "return
x;".


-- 
Regards,
Benjamin

From brett at python.org  Mon Aug 15 04:39:44 2011
From: brett at python.org (Brett Cannon)
Date: Sun, 14 Aug 2011 19:39:44 -0700
Subject: [Python-Dev] Backporting howto/pyporting to 2.7
In-Reply-To: <4E4559F8.7040507@netwok.org>
References: <4E4559F8.7040507@netwok.org>
Message-ID: <CAP1=2W5XvBqyTJLn6y3=TZzZ-0tFLBPvrFpPqjm2STCaeAVZaQ@mail.gmail.com>

Probably mostly hassle of maintaining changes between the two versions, but
the doc will probably get more exposure that way. I say if you want to
spearhead the backport, go for it.

On Fri, Aug 12, 2011 at 09:51, ?ric Araujo <merwok at netwok.org> wrote:

> Hi everyone,
>
> I think it would be useful to have the ?Porting Python 2 Code to Python
> 3? HOWTO in the 2.7 docs, as I think that a lot of users consult the 2.7
> docs.  Is there any reason not to do it?
>
> Regards
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110814/4164ce8d/attachment.html>

From brett at python.org  Mon Aug 15 04:45:49 2011
From: brett at python.org (Brett Cannon)
Date: Sun, 14 Aug 2011 19:45:49 -0700
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <CAPZV6o9ECks6ypY5FifM97Pc+Coru322a0qcqPL2pY=a8cegBA@mail.gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CAPZV6o9ECks6ypY5FifM97Pc+Coru322a0qcqPL2pY=a8cegBA@mail.gmail.com>
Message-ID: <CAP1=2W5baorVJ2NiyuiO9SSUWOivBk+zR=Utk-db_6YchCFmtQ@mail.gmail.com>

On Sun, Aug 14, 2011 at 19:36, Benjamin Peterson <benjamin at python.org>wrote:

> 2011/8/14 Brett Cannon <brett at python.org>:
> >
> >
> > On Thu, Aug 11, 2011 at 00:02, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
> >>
> >> Le Thu, 11 Aug 2011 03:34:37 +0200,
> >> brian.curtin <python-checkins at python.org> a ?crit :
> >> > http://hg.python.org/cpython/rev/77a65b078852
> >> > changeset:   71809:77a65b078852
> >> > parent:      71803:1b4fae183da3
> >> > user:        Brian Curtin <brian at python.org>
> >> > date:        Wed Aug 10 20:05:21 2011 -0500
> >> > summary:
> >> >   Add Py_RETURN_NOTIMPLEMENTED macro. Fixes #12724.
> >>
> >>
> >> It would sound more useful to have a generic Py_RETURN() macro rather
> than
> >> some specific forms for each and every common object.
> >
> > Since the macro is rather generic, sure, but the name should probably be
> > better since it doesn't necessarily convene the fact that a INCREF has
> > occurred. So maybe Py_INCREF_RETURN()?
>
> That nearly nullifies the space saving. I think that fact that it's a
> macro at all conveys that it does something else aside from "return
> x;".
>

This is C code; space savings went out the window along with gc a long time
ago.

Yes, being a macro helps differentiate semantics that a longer name is
probably not needed.


>
>
> --
> Regards,
> Benjamin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110814/76877b25/attachment.html>

From benjamin at python.org  Mon Aug 15 04:48:45 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Sun, 14 Aug 2011 21:48:45 -0500
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <CAP1=2W5baorVJ2NiyuiO9SSUWOivBk+zR=Utk-db_6YchCFmtQ@mail.gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CAPZV6o9ECks6ypY5FifM97Pc+Coru322a0qcqPL2pY=a8cegBA@mail.gmail.com>
	<CAP1=2W5baorVJ2NiyuiO9SSUWOivBk+zR=Utk-db_6YchCFmtQ@mail.gmail.com>
Message-ID: <CAPZV6o-_jOrjhy9Gc-PNrxr69=rEu9y58A1_haVYh0YVoa++zA@mail.gmail.com>

2011/8/14 Brett Cannon <brett at python.org>:
>
>
> On Sun, Aug 14, 2011 at 19:36, Benjamin Peterson <benjamin at python.org>
> wrote:
>>
>> 2011/8/14 Brett Cannon <brett at python.org>:
>> >
>> >
>> > On Thu, Aug 11, 2011 at 00:02, Antoine Pitrou <solipsis at pitrou.net>
>> > wrote:
>> >>
>> >> Le Thu, 11 Aug 2011 03:34:37 +0200,
>> >> brian.curtin <python-checkins at python.org> a ?crit :
>> >> > http://hg.python.org/cpython/rev/77a65b078852
>> >> > changeset: ? 71809:77a65b078852
>> >> > parent: ? ? ?71803:1b4fae183da3
>> >> > user: ? ? ? ?Brian Curtin <brian at python.org>
>> >> > date: ? ? ? ?Wed Aug 10 20:05:21 2011 -0500
>> >> > summary:
>> >> > ? Add Py_RETURN_NOTIMPLEMENTED macro. Fixes #12724.
>> >>
>> >>
>> >> It would sound more useful to have a generic Py_RETURN() macro rather
>> >> than
>> >> some specific forms for each and every common object.
>> >
>> > Since the macro is rather generic, sure, but the name should probably be
>> > better since it doesn't necessarily convene the fact that a INCREF has
>> > occurred. So maybe Py_INCREF_RETURN()?
>>
>> That nearly nullifies the space saving. I think that fact that it's a
>> macro at all conveys that it does something else aside from "return
>> x;".
>
> This is C code; space savings went out the window along with gc a long time
> ago.

I'm hanging on to it by a hair. :)



-- 
Regards,
Benjamin

From ncoghlan at gmail.com  Mon Aug 15 05:16:44 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 15 Aug 2011 13:16:44 +1000
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
Message-ID: <CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>

On Mon, Aug 15, 2011 at 12:34 PM, Brett Cannon <brett at python.org> wrote:
> On Thu, Aug 11, 2011 at 00:02, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> It would sound more useful to have a generic Py_RETURN() macro rather than
>> some specific forms for each and every common object.
>
> Since the macro is rather generic, sure, but the name should probably be
> better since it doesn't necessarily convene the fact that a INCREF has
> occurred. So maybe Py_INCREF_RETURN()?

Aside from None and NotImplemented, do we really do the straight
incref-and-return all that often?

While I was initially attracted to the idea of a generic macro, the
more I thought about it, the more it seemed like a magnet for
reference leak bugs.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From brett at python.org  Mon Aug 15 06:46:22 2011
From: brett at python.org (Brett Cannon)
Date: Sun, 14 Aug 2011 21:46:22 -0700
Subject: [Python-Dev] Latest draft of PEP 399 (Pure Python/C Accelerator
 Module Compatibility Requirements)
In-Reply-To: <CAP1=2W7wiE5t3dJxF-fFKOEXhykAdgtQphKvUYMjR2q5cKs9ZA@mail.gmail.com>
References: <CAP1=2W7wiE5t3dJxF-fFKOEXhykAdgtQphKvUYMjR2q5cKs9ZA@mail.gmail.com>
Message-ID: <CAP1=2W7uB3=D6wc_TP+B8U+zm2qsM-k_cwyZDr4V89hFL8_=Fw@mail.gmail.com>

Since the latest draft went four weeks w/o comment or complaint to address
the last round of issues, I am going to consider this PEP accepted (don't
think we need a BDFAP since this is procedural and not a language feature or
stdlib addition, but if people disagree then Guido can assign someone).

On Sun, Jul 17, 2011 at 17:16, Brett Cannon <brett at python.org> wrote:

> While at a mini-PyPy sprint w/ Alex Gaynor of PyPy and Phil Jenvey of
> Jython, I decided to finally put the time in to update this PEP yet again.
>
> The biggest changes is that the 100% branch coverage requirement has been
> replaced with "comprehensive" coverage from the tests. I think we are all
> enough grown-ups to not have to specify anything tighter than this. I also
> added a paragraph in the Details section about using the abstract C APIs
> (e.g., PyObject_GetItem) over type-specific ones (e.g., PyList_GetItem) in
> order to be more supportive of duck typing from the Python code. I figure
> the "be API compatible" assumes this, but mentioning it doesn't hurt (and
> should help make Raymond less angry =).
>
>
> PEP: 399
> Title: Pure Python/C Accelerator Module Compatibility Requirements
> Version: $Revision: 88219 $
> Last-Modified: $Date: 2011-01-27 13:47:00 -0800 (Thu, 27 Jan 2011) $
> Author: Brett Cannon <brett at python.org>
> Status: Draft
> Type: Informational
> Content-Type: text/x-rst
> Created: 04-Apr-2011
> Python-Version: 3.3
> Post-History: 04-Apr-2011, 12-Apr-2011, 17-Jul-2011
>
> Abstract
> ========
>
> The Python standard library under CPython contains various instances
> of modules implemented in both pure Python and C (either entirely or
> partially). This PEP requires that in these instances that the
> C code *must* pass the test suite used for the pure Python code
> so as to act as much as a drop-in replacement as reasonably possible
> (C- and VM-specific tests are exempt). It is also required that new
> C-based modules lacking a pure Python equivalent implementation get
> special permission to be added to the standard library.
>
>
> Rationale
> =========
>
> Python has grown beyond the CPython virtual machine (VM). IronPython_,
> Jython_, and PyPy_ are all currently viable alternatives to the
> CPython VM. The VM ecosystem that has sprung up around the Python
> programming language has led to Python being used in many different
> areas where CPython cannot be used, e.g., Jython allowing Python to be
> used in Java applications.
>
> A problem all of the VMs other than CPython face is handling modules
> from the standard library that are implemented (to some extent) in C.
> Since other VMs do not typically support the entire `C API of CPython`_
> they are unable to use the code used to create the module. Often times
> this leads these other VMs to either re-implement the modules in pure
> Python or in the programming language used to implement the VM itself
> (e.g., in C# for IronPython). This duplication of effort between
> CPython, PyPy, Jython, and IronPython is extremely unfortunate as
> implementing a module *at least* in pure Python would help mitigate
> this duplicate effort.
>
> The purpose of this PEP is to minimize this duplicate effort by
> mandating that all new modules added to Python's standard library
> *must* have a pure Python implementation _unless_ special dispensation
> is given. This makes sure that a module in the stdlib is available to
> all VMs and not just to CPython (pre-existing modules that do not meet
> this requirement are exempt, although there is nothing preventing
> someone from adding in a pure Python implementation retroactively).
>
> Re-implementing parts (or all) of a module in C (in the case
> of CPython) is still allowed for performance reasons, but any such
> accelerated code must pass the same test suite (sans VM- or C-specific
> tests) to verify semantics and prevent divergence. To accomplish this,
> the test suite for the module must have comprehensive coverage of the
> pure Python implementation before the acceleration code may be added.
>
>
> Details
> =======
>
> Starting in Python 3.3, any modules added to the standard library must
> have a pure Python implementation. This rule can only be ignored if
> the Python development team grants a special exemption for the module.
> Typically the exemption will be granted only when a module wraps a
> specific C-based library (e.g., sqlite3_). In granting an exemption it
> will be recognized that the module will be considered exclusive to
> CPython and not part of Python's standard library that other VMs are
> expected to support. Usage of ``ctypes`` to provide an
> API for a C library will continue to be frowned upon as ``ctypes``
> lacks compiler guarantees that C code typically relies upon to prevent
> certain errors from occurring (e.g., API changes).
>
> Even though a pure Python implementation is mandated by this PEP, it
> does not preclude the use of a companion acceleration module. If an
> acceleration module is provided it is to be named the same as the
> module it is accelerating with an underscore attached as a prefix,
> e.g., ``_warnings`` for ``warnings``. The common pattern to access
> the accelerated code from the pure Python implementation is to import
> it with an ``import *``, e.g., ``from _warnings import *``. This is
> typically done at the end of the module to allow it to overwrite
> specific Python objects with their accelerated equivalents. This kind
> of import can also be done before the end of the module when needed,
> e.g., an accelerated base class is provided but is then subclassed by
> Python code. This PEP does not mandate that pre-existing modules in
> the stdlib that lack a pure Python equivalent gain such a module. But
> if people do volunteer to provide and maintain a pure Python
> equivalent (e.g., the PyPy team volunteering their pure Python
> implementation of the ``csv`` module and maintaining it) then such
> code will be accepted. In those instances the C version is considered
> the reference implementation in terms of expected semantics.
>
> Any new accelerated code must act as a drop-in replacement as close
> to the pure Python implementation as reasonable. Technical details of
> the VM providing the accelerated code are allowed to differ as
> necessary, e.g., a class being a ``type`` when implemented in C. To
> verify that the Python and equivalent C code operate as similarly as
> possible, both code bases must be tested using the same tests which
> apply to the pure Python code (tests specific to the C code or any VM
> do not follow under this requirement). The test suite is expected to
> be extensive in order to verify expected semantics.
>
> Acting as a drop-in replacement also dictates that no public API be
> provided in accelerated code that does not exist in the pure Python
> code.  Without this requirement people could accidentally come to rely
> on a detail in the accelerated code which is not made available to
> other VMs that use the pure Python implementation. To help verify
> that the contract of semantic equivalence is being met, a module must
> be tested both with and without its accelerated code as thoroughly as
> possible.
>
> As an example, to write tests which exercise both the pure Python and
> C accelerated versions of a module, a basic idiom can be followed::
>
>     import collections.abc
>     from test.support import import_fresh_module, run_unittest
>     import unittest
>
>     c_heapq = import_fresh_module('heapq', fresh=['_heapq'])
>     py_heapq = import_fresh_module('heapq', blocked=['_heapq'])
>
>
>     class ExampleTest(unittest.TestCase):
>
>         def test_heappop_exc_for_non_MutableSequence(self):
>             # Raise TypeError when heap is not a
>             # collections.abc.MutableSequence.
>             class Spam:
>                 """Test class lacking many ABC-required methods
>                 (e.g., pop())."""
>                 def __len__(self):
>                     return 0
>
>             heap = Spam()
>             self.assertFalse(isinstance(heap,
>                                 collections.abc.MutableSequence))
>             with self.assertRaises(TypeError):
>                 self.heapq.heappop(heap)
>
>
>     class AcceleratedExampleTest(ExampleTest):
>
>         """Test using the accelerated code."""
>
>         heapq = c_heapq
>
>
>     class PyExampleTest(ExampleTest):
>
>         """Test with just the pure Python code."""
>
>         heapq = py_heapq
>
>
>     def test_main():
>         run_unittest(AcceleratedExampleTest, PyExampleTest)
>
>
>     if __name__ == '__main__':
>         test_main()
>
>
> If this test were to provide extensive coverage for
> ``heapq.heappop()`` in the pure Python implementation then the
> accelerated C code would be allowed to be added to CPython's standard
> library. If it did not, then the test suite would need to be updated
> until proper coverage was provided before the accelerated C code
> could be added.
>
> To also help with compatibility, C code should use abstract APIs on
> objects to prevent accidental dependence on specific types. For
> instance, if a function accepts a sequence then the C code should
> default to using `PyObject_GetItem()` instead of something like
> `PyList_GetItem()`. C code is allowed to have a fast path if the
> proper `PyList_Check()` is used, but otherwise APIs should work with
> any object that duck types to the proper interface instead of a
> specific type.
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> .. _IronPython: http://ironpython.net/
> .. _Jython: http://www.jython.org/
> .. _PyPy: http://pypy.org/
> .. _C API of CPython: http://docs.python.org/py3k/c-api/index.html
> .. _sqlite3: http://docs.python.org/py3k/library/sqlite3.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110814/ec05b56a/attachment-0001.html>

From benjamin at python.org  Mon Aug 15 06:50:19 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Sun, 14 Aug 2011 23:50:19 -0500
Subject: [Python-Dev] Latest draft of PEP 399 (Pure Python/C Accelerator
 Module Compatibility Requirements)
In-Reply-To: <CAP1=2W7uB3=D6wc_TP+B8U+zm2qsM-k_cwyZDr4V89hFL8_=Fw@mail.gmail.com>
References: <CAP1=2W7wiE5t3dJxF-fFKOEXhykAdgtQphKvUYMjR2q5cKs9ZA@mail.gmail.com>
	<CAP1=2W7uB3=D6wc_TP+B8U+zm2qsM-k_cwyZDr4V89hFL8_=Fw@mail.gmail.com>
Message-ID: <CAPZV6o-hbBCs=hOYhWqxPSTKKpAdk3jB_9SZQiHjsH+B6p2-Zw@mail.gmail.com>

2011/8/14 Brett Cannon <brett at python.org>:
>> proper `PyList_Check()` is used, but otherwise APIs should work with

To be terribly nitty, what is probably wanted to is PyList_CheckExact.



-- 
Regards,
Benjamin

From ziade.tarek at gmail.com  Mon Aug 15 12:31:02 2011
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Mon, 15 Aug 2011 12:31:02 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
Message-ID: <CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>

On Mon, Aug 15, 2011 at 1:20 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Sun, Aug 14, 2011 at 7:41 PM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
>> Hi all,
>>
>> I am lacking of time right now to finish an important task before 3.2
>> final is out:
>
> If anyone else got at all confused by Tarek's email, s/3.x/3.x+1/ and
> it will all make sense (the mentioned release numbers in the 3.x
> series are all one lower than they should be - packaging is planned
> for 3.3, but a standalone library will allow feedback to be gathered
> from 2.x and 3.2 users before the API is 'locked in' for 3.3).

Oh yeah sorry for the version mess up :)

> How far has packaging diverged from distutils2, though? Wasn't that
> the planned venue for any backports in order to avoid name conflicts?

The plan is to provide under earlier versions of Python a standalone
project that does not use the "packaging" namespace, but the
"distutils2" namespace.

IOW, the task to do is:

1/ copy packaging and all its stdlib dependencies in a standalone project
2/ rename packaging to distutils2
3/ make it work under older 2.x and 3.x (2.x would be the priority)  <====
4/ release it, promote its usage
5/ consolidate the API with the feedback received

I realize it's by far the less interesting task to do in packaging,
but it's by far one of the most important

Cheers
Tarek

-- 
Tarek Ziad? | http://ziade.org

From solipsis at pitrou.net  Mon Aug 15 14:17:23 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 15 Aug 2011 14:17:23 +0200
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
 #12724.
In-Reply-To: <CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
Message-ID: <1313410643.3557.2.camel@localhost.localdomain>

Le lundi 15 ao?t 2011 ? 13:16 +1000, Nick Coghlan a ?crit :
> On Mon, Aug 15, 2011 at 12:34 PM, Brett Cannon <brett at python.org> wrote:
> > On Thu, Aug 11, 2011 at 00:02, Antoine Pitrou <solipsis at pitrou.net> wrote:
> >> It would sound more useful to have a generic Py_RETURN() macro rather than
> >> some specific forms for each and every common object.
> >
> > Since the macro is rather generic, sure, but the name should probably be
> > better since it doesn't necessarily convene the fact that a INCREF has
> > occurred. So maybe Py_INCREF_RETURN()?
> 
> Aside from None and NotImplemented, do we really do the straight
> incref-and-return all that often?

AFAICT, often with True and False:

    x = (some condition) ? Py_True : Py_False;
    Py_INCREF(x);
    return x;

Regards

Antoine.



From ncoghlan at gmail.com  Mon Aug 15 14:35:08 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 15 Aug 2011 22:35:08 +1000
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <1313410643.3557.2.camel@localhost.localdomain>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
Message-ID: <CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>

On Mon, Aug 15, 2011 at 10:17 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> AFAICT, often with True and False:
>
> ? ?x = (some condition) ? Py_True : Py_False;
> ? ?Py_INCREF(x);
> ? ?return x;

And that's an idiom that works better with a Py_RETURN macro than it
would separate macros:

Py_RETURN(cond ? Py_True : Py_False);

OK, I'm persuaded that "Py_RETURN(Py_NotImplemented);" would be a
better way to handle this change: +1

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From raymond.hettinger at gmail.com  Mon Aug 15 14:46:12 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Mon, 15 Aug 2011 05:46:12 -0700
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
Message-ID: <76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>


On Aug 15, 2011, at 5:35 AM, Nick Coghlan wrote:

> On Mon, Aug 15, 2011 at 10:17 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> AFAICT, often with True and False:
>> 
>>    x = (some condition) ? Py_True : Py_False;
>>    Py_INCREF(x);
>>    return x;
> 
> And that's an idiom that works better with a Py_RETURN macro than it
> would separate macros:
> 
> Py_RETURN(cond ? Py_True : Py_False);
> 
> OK, I'm persuaded that "Py_RETURN(Py_NotImplemented);" would be a
> better way to handle this change: +1

I don't think that is worth it.
There is some value to keeping the API consistent with the style that has been used in the past.
So, I vote for Py_RETURN_NOTIMPLEMENTED.  There's no real need to factor this any further.
It's not hard and not important enough to introduce a new variation on return macros.
Adding another return style makes the C API harder to learn and remember.
If we we're starting from scratch, Py_RETURN(obj) would make sense.
But we're not starting from scratch, so we should stick with the precedents.


Raymond


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110815/18199ced/attachment.html>

From solipsis at pitrou.net  Mon Aug 15 14:48:07 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 15 Aug 2011 14:48:07 +0200
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
	<76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
Message-ID: <20110815144807.2c607721@pitrou.net>

On Mon, 15 Aug 2011 05:46:12 -0700
Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
> 
> I don't think that is worth it.
> There is some value to keeping the API consistent with the style that has been used in the past.
> So, I vote for Py_RETURN_NOTIMPLEMENTED.  There's no real need to factor this any further.
> It's not hard and not important enough to introduce a new variation on return macros.

Why is Py_RETURN_NOTIMPLEMENTED important at all?

Regards

Antoine.

From stefan_ml at behnel.de  Mon Aug 15 15:21:43 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 15 Aug 2011 15:21:43 +0200
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>	<20110811090242.1083782f@msiwind>	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
Message-ID: <j2b6h7$a72$1@dough.gmane.org>

Nick Coghlan, 15.08.2011 14:35:
> On Mon, Aug 15, 2011 at 10:17 PM, Antoine Pitrou<solipsis at pitrou.net>  wrote:
>> AFAICT, often with True and False:
>>
>>     x = (some condition) ? Py_True : Py_False;
>>     Py_INCREF(x);
>>     return x;
>
> And that's an idiom that works better with a Py_RETURN macro than it
> would separate macros:
>
> Py_RETURN(cond ? Py_True : Py_False);

And that would do what exactly? Duplicate the evaluation of the condition?

Stefan


From solipsis at pitrou.net  Mon Aug 15 15:29:48 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 15 Aug 2011 15:29:48 +0200
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
	<j2b6h7$a72$1@dough.gmane.org>
Message-ID: <20110815152948.67209fcf@pitrou.net>

On Mon, 15 Aug 2011 15:21:43 +0200
Stefan Behnel <stefan_ml at behnel.de> wrote:

> Nick Coghlan, 15.08.2011 14:35:
> > On Mon, Aug 15, 2011 at 10:17 PM, Antoine Pitrou<solipsis at pitrou.net>  wrote:
> >> AFAICT, often with True and False:
> >>
> >>     x = (some condition) ? Py_True : Py_False;
> >>     Py_INCREF(x);
> >>     return x;
> >
> > And that's an idiom that works better with a Py_RETURN macro than it
> > would separate macros:
> >
> > Py_RETURN(cond ? Py_True : Py_False);
> 
> And that would do what exactly? Duplicate the evaluation of the condition?

You don't need to.

#define Py_RETURN (x) do { \
     PyObject *_tmp = (x); \
     Py_INCREF(_tmp);      \
     return _tmp;          \
} while(0)




From barry at python.org  Mon Aug 15 15:49:43 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 15 Aug 2011 09:49:43 -0400
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
 #12724.
In-Reply-To: <76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
	<76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
Message-ID: <20110815094943.35f640b9@resist.wooz.org>

On Aug 15, 2011, at 05:46 AM, Raymond Hettinger wrote:

>I don't think that is worth it.  There is some value to keeping the API
>consistent with the style that has been used in the past.  So, I vote for
>Py_RETURN_NOTIMPLEMENTED.  There's no real need to factor this any further.
>It's not hard and not important enough to introduce a new variation on return
>macros.  Adding another return style makes the C API harder to learn and
>remember.  If we we're starting from scratch, Py_RETURN(obj) would make
>sense.  But we're not starting from scratch, so we should stick with the
>precedents.

I can see the small value in the convenience, but I tend to agree with Raymond
here.  I think we have to be careful about not descending into macro
obfuscation world.

-Barry

From solipsis at pitrou.net  Mon Aug 15 15:59:00 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 15 Aug 2011 15:59:00 +0200
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
	<76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
	<20110815094943.35f640b9@resist.wooz.org>
Message-ID: <20110815155900.046f932b@pitrou.net>

On Mon, 15 Aug 2011 09:49:43 -0400
Barry Warsaw <barry at python.org> wrote:
> On Aug 15, 2011, at 05:46 AM, Raymond Hettinger wrote:
> 
> >I don't think that is worth it.  There is some value to keeping the API
> >consistent with the style that has been used in the past.  So, I vote for
> >Py_RETURN_NOTIMPLEMENTED.  There's no real need to factor this any further.
> >It's not hard and not important enough to introduce a new variation on return
> >macros.  Adding another return style makes the C API harder to learn and
> >remember.  If we we're starting from scratch, Py_RETURN(obj) would make
> >sense.  But we're not starting from scratch, so we should stick with the
> >precedents.
> 
> I can see the small value in the convenience, but I tend to agree with Raymond
> here.  I think we have to be careful about not descending into macro
> obfuscation world.

How is Py_RETURN(Py_NotImplemented) more obfuscated than
Py_RETURN_NOTIMPLEMENTED ???



From petri at digip.org  Mon Aug 15 19:48:42 2011
From: petri at digip.org (Petri Lehtinen)
Date: Mon, 15 Aug 2011 20:48:42 +0300
Subject: [Python-Dev] Fwd: Mirroring Python repos to Bitbucket
In-Reply-To: <CAB98777-1712-4FFB-91A5-3E0B5E400440@gmail.com>
References: <4E42DF4A.8010407@atlassian.com>
	<CAB98777-1712-4FFB-91A5-3E0B5E400440@gmail.com>
Message-ID: <20110815174842.GA1598@ihaa>

Doug Hellmann wrote:
> 
> Charles McLaughlin of Atlassian has set up mirrors of the Mercurial
> repositories hosted on python.org as part of the ongoing
> infrastructure improvement work. These mirrors will give us a public
> fail-over repository in the event that hg.python.org goes offline
> unexpectedly, and also provide features such as RSS feeds of changes
> for users interested in monitoring the repository passively.

As a side note, for those preferring git there's also a very
unofficial git mirror at https://github.com/jonashaag/cpython. It uses
hg-git for converting and syncs once a day.

Petri

From tjreedy at udel.edu  Mon Aug 15 20:17:16 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 15 Aug 2011 14:17:16 -0400
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <20110815094943.35f640b9@resist.wooz.org>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
	<76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
	<20110815094943.35f640b9@resist.wooz.org>
Message-ID: <j2bnrs$1h4$1@dough.gmane.org>

On 8/15/2011 9:49 AM, Barry Warsaw wrote:
> On Aug 15, 2011, at 05:46 AM, Raymond Hettinger wrote:
>
>> I don't think that is worth it.  There is some value to keeping the API
>> consistent with the style that has been used in the past.  So, I vote for
>> Py_RETURN_NOTIMPLEMENTED.  There's no real need to factor this any further.
>> It's not hard and not important enough to introduce a new variation on return
>> macros.  Adding another return style makes the C API harder to learn and
>> remember.  If we we're starting from scratch, Py_RETURN(obj) would make
>> sense.  But we're not starting from scratch, so we should stick with the
>> precedents.
>
> I can see the small value in the convenience, but I tend to agree with Raymond
> here.  I think we have to be careful about not descending into macro
> obfuscation world.

Coming fresh to the C-API, as I partly am, I would rather have exactly 1 
generally useful macro that increments the refcount of an object and 
returns it. To me, multiple special-case, seldom-used macros are a 
better example of 'macro obfuscation'.

-- 
Terry Jan Reedy


From alexander.belopolsky at gmail.com  Mon Aug 15 21:04:02 2011
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Mon, 15 Aug 2011 15:04:02 -0400
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <20110811090242.1083782f@msiwind>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
Message-ID: <CAP7h-xZwv+_iJxEvpjsb94BKyuHrrYyQOYt2Fnsy=iP3epXcgg@mail.gmail.com>

On Thu, Aug 11, 2011 at 3:02 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
..
>> ? Add Py_RETURN_NOTIMPLEMENTED macro. Fixes #12724.
>
>
> It would sound more useful to have a generic Py_RETURN() macro rather than
> some specific forms for each and every common object.

Just my $0.02: I occasionally wish we had Py_RETURN_BOOL(1/0) instead
of Py_RETURN_TRUE/FALSE, but I feel proposed Py_RETURN() is too
ambiguous and should be called Py_RETURN_SINGLETON() or
Py_RETURN_NEWREF().  Longer spelling, however makes it less
attractive.  Overall, I am -1 on Py_RETURN().  Introducing the second
obvious way to spell Py_RETURN_NONE/TRUE/FALSE will clutter the API
and novices may be misled into always using Py_RETURN(x) instead of
return x attracting reference leaks.

From alexandre at peadrop.com  Mon Aug 15 21:56:14 2011
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 15 Aug 2011 12:56:14 -0700
Subject: [Python-Dev] PEP 3154 - pickle protocol 4
In-Reply-To: <20110812125846.00a75cd1@pitrou.net>
References: <20110812125846.00a75cd1@pitrou.net>
Message-ID: <CANcUUedTwQda2VAkQm9p5kpFJ0xX9Q9TaZqQG17aDfo=W5r-5w@mail.gmail.com>

On Fri, Aug 12, 2011 at 3:58 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

>
> Hello,
>
> This PEP is an attempt to foster a number of small incremental
> improvements in a future pickle protocol version. The PEP process is
> used in order to gather as many improvements as possible, because the
> introduction of a new protocol version should be a rare occurrence.
>
> Feel free to suggest any additions.
>
>
Your propositions sound all good to me. We will need to agree about the
details, but I believe these improvements to the current protocol will be
appreciated.

Also, one thing keeps coming back is the need for pickling functions and
methods which are not part of the global namespace (e.g. issue
9276<http://bugs.python.org/issue9276>).
Support for this would likely help us fixing another related namespace issue
(i.e., issue 3657 <http://bugs.python.org/issue3657%C2%A0>). Finally, we
currently missing support for pickling classes with __new__ taking
keyword-only arguments (i.e. issue 4727 <http://bugs.python.org/issue4727>).

-- Alexandre
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110815/3006b135/attachment-0001.html>

From ncoghlan at gmail.com  Tue Aug 16 00:32:09 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 16 Aug 2011 08:32:09 +1000
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <20110815155900.046f932b@pitrou.net>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
	<76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
	<20110815094943.35f640b9@resist.wooz.org>
	<20110815155900.046f932b@pitrou.net>
Message-ID: <CADiSq7cVpgHd75Bgu0_fyR3zxh9fCRvFEhO6Eb=qkEuEw+cneQ@mail.gmail.com>

On Mon, Aug 15, 2011 at 11:59 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 15 Aug 2011 09:49:43 -0400
> Barry Warsaw <barry at python.org> wrote:
>> I can see the small value in the convenience, but I tend to agree with Raymond
>> here. ?I think we have to be careful about not descending into macro
>> obfuscation world.
>
> How is Py_RETURN(Py_NotImplemented) more obfuscated than
> Py_RETURN_NOTIMPLEMENTED ???

Indeed, this entire discussion was started by the extension of the
Py_RETURN_NONE idiom to also adopt Py_RETURN_NOTIMPLEMENTED.

If the idiom is to be extended at all, why stop there? Why not cover
the Py_RETURN_TRUE and Py_RETURN_FALSE cases as well?

Or, we can add exactly one new macro that covers all 3 cases, and
others besides. I haven't encountered any complaints about people
failing to understand the difference between "return Py_None;" and
"Py_RETURN_NONE;" and see no major reason why "return x;" vs
"Py_RETURN(x);" would be significantly more confusing.

Based on this thread, there are actually two options I'd be fine with:
1. Just revert it and leave Py_RETURN_NONE as a special snowflake
2. Properly generalise the incref-and-return idiom via a Py_RETURN macro

Incrementally increasing complexity by adding a second instance of the
dedicated macro approach is precisely what we *shouldn't* be doing.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Tue Aug 16 00:37:11 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 16 Aug 2011 08:37:11 +1000
Subject: [Python-Dev] PEP 3154 - pickle protocol 4
In-Reply-To: <CANcUUedTwQda2VAkQm9p5kpFJ0xX9Q9TaZqQG17aDfo=W5r-5w@mail.gmail.com>
References: <20110812125846.00a75cd1@pitrou.net>
	<CANcUUedTwQda2VAkQm9p5kpFJ0xX9Q9TaZqQG17aDfo=W5r-5w@mail.gmail.com>
Message-ID: <CADiSq7dHM9Uc97__JuugKA5LOc1Ei5rZ6HYCX7HRYj+j84wfGQ@mail.gmail.com>

On Tue, Aug 16, 2011 at 5:56 AM, Alexandre Vassalotti
<alexandre at peadrop.com> wrote:
>
> On Fri, Aug 12, 2011 at 3:58 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>> Hello,
>>
>> This PEP is an attempt to foster a number of small incremental
>> improvements in a future pickle protocol version. The PEP process is
>> used in order to gather as many improvements as possible, because the
>> introduction of a new protocol version should be a rare occurrence.
>>
>> Feel free to suggest any additions.
>>
>
> Your propositions sound all good to me. We will need to agree about the
> details, but I believe these improvements to the current protocol will be
> appreciated.
> Also, one thing keeps coming back is the need for pickling functions and
> methods which are not part of the global namespace (e.g. issue 9276).
> Support for this would likely help us fixing another related namespace issue
> (i.e.,?issue 3657). Finally, we currently missing support for pickling
> classes with __new__ taking keyword-only arguments (i.e.?issue 4727).

In the spirit of PEP 395 and python 3's pickle._compat_pickle, perhaps
it would be worth looking at a mechanism whereby a pickle could
specify "alternate class names" for included class instances in the
pickle itself?

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From barry at python.org  Tue Aug 16 00:43:22 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 15 Aug 2011 18:43:22 -0400
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
 #12724.
In-Reply-To: <CADiSq7cVpgHd75Bgu0_fyR3zxh9fCRvFEhO6Eb=qkEuEw+cneQ@mail.gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
	<76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
	<20110815094943.35f640b9@resist.wooz.org>
	<20110815155900.046f932b@pitrou.net>
	<CADiSq7cVpgHd75Bgu0_fyR3zxh9fCRvFEhO6Eb=qkEuEw+cneQ@mail.gmail.com>
Message-ID: <20110815184322.4a6df324@resist.wooz.org>

On Aug 16, 2011, at 08:32 AM, Nick Coghlan wrote:

>Indeed, this entire discussion was started by the extension of the
>Py_RETURN_NONE idiom to also adopt Py_RETURN_NOTIMPLEMENTED.
>
>If the idiom is to be extended at all, why stop there? Why not cover
>the Py_RETURN_TRUE and Py_RETURN_FALSE cases as well?
>
>Or, we can add exactly one new macro that covers all 3 cases, and
>others besides. I haven't encountered any complaints about people
>failing to understand the difference between "return Py_None;" and
>"Py_RETURN_NONE;" and see no major reason why "return x;" vs
>"Py_RETURN(x);" would be significantly more confusing.
>
>Based on this thread, there are actually two options I'd be fine with:
>1. Just revert it and leave Py_RETURN_NONE as a special snowflake
>2. Properly generalise the incref-and-return idiom via a Py_RETURN macro
>
>Incrementally increasing complexity by adding a second instance of the
>dedicated macro approach is precisely what we *shouldn't* be doing.

My problem with Py_RETURN(x) is that it's not clear that it also does an
incref, and without that, I think it's *more* confusing to use rather than
just writing it out explicitly, Py_RETURN_NONE's historic existence
notwithstanding.

So I'd opt for #1, unless we can agree on a better color for the bikeshed.

-Barry

From guido at python.org  Tue Aug 16 00:52:00 2011
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Aug 2011 15:52:00 -0700
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <20110815184322.4a6df324@resist.wooz.org>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
	<76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
	<20110815094943.35f640b9@resist.wooz.org>
	<20110815155900.046f932b@pitrou.net>
	<CADiSq7cVpgHd75Bgu0_fyR3zxh9fCRvFEhO6Eb=qkEuEw+cneQ@mail.gmail.com>
	<20110815184322.4a6df324@resist.wooz.org>
Message-ID: <CAP7+vJK1DDas8cPw1hwXDoy9zjATeEcXjT2PM7fOUUhWQkunAA@mail.gmail.com>

On Mon, Aug 15, 2011 at 3:43 PM, Barry Warsaw <barry at python.org> wrote:
> On Aug 16, 2011, at 08:32 AM, Nick Coghlan wrote:
>
>>Indeed, this entire discussion was started by the extension of the
>>Py_RETURN_NONE idiom to also adopt Py_RETURN_NOTIMPLEMENTED.
>>
>>If the idiom is to be extended at all, why stop there? Why not cover
>>the Py_RETURN_TRUE and Py_RETURN_FALSE cases as well?
>>
>>Or, we can add exactly one new macro that covers all 3 cases, and
>>others besides. I haven't encountered any complaints about people
>>failing to understand the difference between "return Py_None;" and
>>"Py_RETURN_NONE;" and see no major reason why "return x;" vs
>>"Py_RETURN(x);" would be significantly more confusing.
>>
>>Based on this thread, there are actually two options I'd be fine with:
>>1. Just revert it and leave Py_RETURN_NONE as a special snowflake
>>2. Properly generalise the incref-and-return idiom via a Py_RETURN macro
>>
>>Incrementally increasing complexity by adding a second instance of the
>>dedicated macro approach is precisely what we *shouldn't* be doing.
>
> My problem with Py_RETURN(x) is that it's not clear that it also does an
> incref, and without that, I think it's *more* confusing to use rather than
> just writing it out explicitly, Py_RETURN_NONE's historic existence
> notwithstanding.
>
> So I'd opt for #1, unless we can agree on a better color for the bikeshed.

I dunno; if it *didn't* do an INCREF it would be a pretty pointless
macro (just expanding to "return x") and I like reducing the clutter
of a very common idiom. So I favor #2.

-- 
--Guido van Rossum (python.org/~guido)

From ethan at stoneleaf.us  Tue Aug 16 01:13:50 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 15 Aug 2011 16:13:50 -0700
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
 #12724.
In-Reply-To: <20110815184322.4a6df324@resist.wooz.org>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>	<20110811090242.1083782f@msiwind>	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>	<1313410643.3557.2.camel@localhost.localdomain>	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>	<76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>	<20110815094943.35f640b9@resist.wooz.org>	<20110815155900.046f932b@pitrou.net>	<CADiSq7cVpgHd75Bgu0_fyR3zxh9fCRvFEhO6Eb=qkEuEw+cneQ@mail.gmail.com>
	<20110815184322.4a6df324@resist.wooz.org>
Message-ID: <4E49A82E.7080105@stoneleaf.us>

Barry Warsaw wrote:
> On Aug 16, 2011, at 08:32 AM, Nick Coghlan wrote:
>> Based on this thread, there are actually two options I'd be fine with:
>> 1. Just revert it and leave Py_RETURN_NONE as a special snowflake
>> 2. Properly generalise the incref-and-return idiom via a Py_RETURN macro
>>
>> Incrementally increasing complexity by adding a second instance of the
>> dedicated macro approach is precisely what we *shouldn't* be doing.
> 
> My problem with Py_RETURN(x) is that it's not clear that it also does an
> incref, and without that, I think it's *more* confusing to use rather than
> just writing it out explicitly, Py_RETURN_NONE's historic existence
> notwithstanding.
> 
> So I'd opt for #1, unless we can agree on a better color for the bikeshed.

My apologies if this is just noise, but are there RETURN macros that 
don't do an INCREF?

~Ethan~

From ncoghlan at gmail.com  Tue Aug 16 01:39:09 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 16 Aug 2011 09:39:09 +1000
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <4E49A82E.7080105@stoneleaf.us>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
	<76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
	<20110815094943.35f640b9@resist.wooz.org>
	<20110815155900.046f932b@pitrou.net>
	<CADiSq7cVpgHd75Bgu0_fyR3zxh9fCRvFEhO6Eb=qkEuEw+cneQ@mail.gmail.com>
	<20110815184322.4a6df324@resist.wooz.org>
	<4E49A82E.7080105@stoneleaf.us>
Message-ID: <CADiSq7ceN=khXjs0X2iSvUkFX8B_dGOM2CSBXs3nSjohA4LLzA@mail.gmail.com>

On Tue, Aug 16, 2011 at 9:13 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
> Barry Warsaw wrote:
>> So I'd opt for #1, unless we can agree on a better color for the bikeshed.
>
> My apologies if this is just noise, but are there RETURN macros that don't
> do an INCREF?

No, Py_RETURN_NONE is the only previous example, and it was added to
simplify the very common idiom of:

    Py_INCREF(Py_None);
    return Py_None;

It was added originally because it helped to avoid *two* common bugs:

  return Py_None; # segfault waiting to happen

  return NULL; # Just plain wrong, but not picked up until tests are
run and hence irritating

I'd say NotImplemented is the second most common instance of that kind
of direct incref-and-return (since operator methods need to return it
to correctly support type coercion), although, as Antoine noted,
Py_True and Py_False would be up there as well.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From guido at python.org  Tue Aug 16 01:52:43 2011
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Aug 2011 16:52:43 -0700
Subject: [Python-Dev] cpython: Add Py_RETURN_NOTIMPLEMENTED macro. Fixes
	#12724.
In-Reply-To: <CADiSq7ceN=khXjs0X2iSvUkFX8B_dGOM2CSBXs3nSjohA4LLzA@mail.gmail.com>
References: <E1QrKAP-0003jS-Q5@dinsdale.python.org>
	<20110811090242.1083782f@msiwind>
	<CAP1=2W5NSQ3kavvcUnui-SdyA2Tap7DaHDeipfpKrmEdFVDLeg@mail.gmail.com>
	<CADiSq7fQW=wWzA=Yvc7tMSEnq+X3XOif3oNcH0So5ZP-vTRY=g@mail.gmail.com>
	<1313410643.3557.2.camel@localhost.localdomain>
	<CADiSq7cut67dM3H_qJcbBi=eZSDTm4UVTm9Zt1LQ+wUHb_j9gA@mail.gmail.com>
	<76C055DE-9433-41E9-A4B7-8ABDD433C29E@gmail.com>
	<20110815094943.35f640b9@resist.wooz.org>
	<20110815155900.046f932b@pitrou.net>
	<CADiSq7cVpgHd75Bgu0_fyR3zxh9fCRvFEhO6Eb=qkEuEw+cneQ@mail.gmail.com>
	<20110815184322.4a6df324@resist.wooz.org>
	<4E49A82E.7080105@stoneleaf.us>
	<CADiSq7ceN=khXjs0X2iSvUkFX8B_dGOM2CSBXs3nSjohA4LLzA@mail.gmail.com>
Message-ID: <CAP7+vJL-bEu1_887oH-UKqNo=3m12QsEhubyf4u8EfKzQkeOkg@mail.gmail.com>

On Mon, Aug 15, 2011 at 4:39 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Tue, Aug 16, 2011 at 9:13 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> Barry Warsaw wrote:
>>> So I'd opt for #1, unless we can agree on a better color for the bikeshed.
>>
>> My apologies if this is just noise, but are there RETURN macros that don't
>> do an INCREF?
>
> No, Py_RETURN_NONE is the only previous example, and it was added to
> simplify the very common idiom of:
>
> ? ?Py_INCREF(Py_None);
> ? ?return Py_None;
>
> It was added originally because it helped to avoid *two* common bugs:
>
> ?return Py_None; # segfault waiting to happen
>
> ?return NULL; # Just plain wrong, but not picked up until tests are
> run and hence irritating
>
> I'd say NotImplemented is the second most common instance of that kind
> of direct incref-and-return (since operator methods need to return it
> to correctly support type coercion), although, as Antoine noted,
> Py_True and Py_False would be up there as well.

I betcha if you extend your search to "return <variable>" preceded by
"INCREF(variable)" you'll find a whole lot more examples. :-)

-- 
--Guido van Rossum (python.org/~guido)

From ncoghlan at gmail.com  Tue Aug 16 04:35:48 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 16 Aug 2011 12:35:48 +1000
Subject: [Python-Dev] [Python-checkins] peps: Add Alexandre's suggestions
In-Reply-To: <E1Qt8Tg-0006Fv-At@dinsdale.python.org>
References: <E1Qt8Tg-0006Fv-At@dinsdale.python.org>
Message-ID: <CADiSq7d3N5R-j1boLoTUn9WmF7MA122Do8+bGmj_6B52GJwThw@mail.gmail.com>

On Tue, Aug 16, 2011 at 11:30 AM, antoine.pitrou
<python-checkins at python.org> wrote:
> +Serializing "pseudo-global" objects
> +-----------------------------------
> +
> +Objects which are not module-global, but should be treated in a similar
> +fashion -- such as methods [4]_ or nested classes -- cannot currently be
> +pickled (or, rather, unpickled) because the pickle protocol does not
> +correctly specify how to retrieve them. ?One solution would be through the
> +adjunction of a ``__namespace__`` (or ``__qualname__``) to all class and
> +function objects, specifying the full "path" by which they can be retrieved.
> +For globals, this would generally be ``"{}.{}".format(obj.__module__, obj.__name__)``.
> +Then a new opcode can resolve that path and push the object on the stack,
> +similarly to the GLOBAL opcode.
> +

I think this is the part that ties in with the pickle-related aspects
for PEP 395 - using '__qualname__'  would be one way to align a
module's real name with where it should be retrieved from and where
it's documentation lives (I like 'qualified name' as a term, too).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From solipsis at pitrou.net  Tue Aug 16 11:25:29 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 16 Aug 2011 11:25:29 +0200
Subject: [Python-Dev] [Python-checkins] peps: Add Alexandre's suggestions
References: <E1Qt8Tg-0006Fv-At@dinsdale.python.org>
	<CADiSq7d3N5R-j1boLoTUn9WmF7MA122Do8+bGmj_6B52GJwThw@mail.gmail.com>
Message-ID: <20110816112529.15fb6c69@pitrou.net>

On Tue, 16 Aug 2011 12:35:48 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Tue, Aug 16, 2011 at 11:30 AM, antoine.pitrou
> <python-checkins at python.org> wrote:
> > +Serializing "pseudo-global" objects
> > +-----------------------------------
> > +
> > +Objects which are not module-global, but should be treated in a similar
> > +fashion -- such as methods [4]_ or nested classes -- cannot currently be
> > +pickled (or, rather, unpickled) because the pickle protocol does not
> > +correctly specify how to retrieve them. ?One solution would be through the
> > +adjunction of a ``__namespace__`` (or ``__qualname__``) to all class and
> > +function objects, specifying the full "path" by which they can be retrieved.
> > +For globals, this would generally be ``"{}.{}".format(obj.__module__, obj.__name__)``.
> > +Then a new opcode can resolve that path and push the object on the stack,
> > +similarly to the GLOBAL opcode.
> > +
> 
> I think this is the part that ties in with the pickle-related aspects
> for PEP 395 - using '__qualname__'  would be one way to align a
> module's real name with where it should be retrieved from and where
> it's documentation lives (I like 'qualified name' as a term, too).

Oops, I admit I hadn't read PEP 395.
PEP 395 focuses on module aliasing, while the suggestion above focuses
on the path of objects in modules. How can we reconcile the two? Do we
want __qualname__ to be a relative "path" inside the module?
(but then __qualname__ cannot specify its own module name).

Regards

Antoine.



From ncoghlan at gmail.com  Tue Aug 16 12:15:51 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 16 Aug 2011 20:15:51 +1000
Subject: [Python-Dev] [Python-checkins] peps: Add Alexandre's suggestions
In-Reply-To: <20110816112529.15fb6c69@pitrou.net>
References: <E1Qt8Tg-0006Fv-At@dinsdale.python.org>
	<CADiSq7d3N5R-j1boLoTUn9WmF7MA122Do8+bGmj_6B52GJwThw@mail.gmail.com>
	<20110816112529.15fb6c69@pitrou.net>
Message-ID: <CADiSq7c4PfjcLDS67Ti6c+zLk7NqygaWU_1KTkJ1B+JyNQRoEg@mail.gmail.com>

On Tue, Aug 16, 2011 at 7:25 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Tue, 16 Aug 2011 12:35:48 +1000
> Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On Tue, Aug 16, 2011 at 11:30 AM, antoine.pitrou
>> <python-checkins at python.org> wrote:
>> > +Serializing "pseudo-global" objects
>> > +-----------------------------------
>> > +
>> > +Objects which are not module-global, but should be treated in a similar
>> > +fashion -- such as methods [4]_ or nested classes -- cannot currently be
>> > +pickled (or, rather, unpickled) because the pickle protocol does not
>> > +correctly specify how to retrieve them. ?One solution would be through the
>> > +adjunction of a ``__namespace__`` (or ``__qualname__``) to all class and
>> > +function objects, specifying the full "path" by which they can be retrieved.
>> > +For globals, this would generally be ``"{}.{}".format(obj.__module__, obj.__name__)``.
>> > +Then a new opcode can resolve that path and push the object on the stack,
>> > +similarly to the GLOBAL opcode.
>> > +
>>
>> I think this is the part that ties in with the pickle-related aspects
>> for PEP 395 - using '__qualname__' ?would be one way to align a
>> module's real name with where it should be retrieved from and where
>> it's documentation lives (I like 'qualified name' as a term, too).
>
> Oops, I admit I hadn't read PEP 395.
> PEP 395 focuses on module aliasing, while the suggestion above focuses
> on the path of objects in modules. How can we reconcile the two? Do we
> want __qualname__ to be a relative "path" inside the module?
> (but then __qualname__ cannot specify its own module name).

I was more thinking that if pickle grew the ability to handle two
different names for objects, then PEP 395 could run off the same
feature without having to mess with sys.modules.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From solipsis at pitrou.net  Tue Aug 16 13:23:44 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 16 Aug 2011 13:23:44 +0200
Subject: [Python-Dev] peps: Add Alexandre's suggestions
In-Reply-To: <CADiSq7c4PfjcLDS67Ti6c+zLk7NqygaWU_1KTkJ1B+JyNQRoEg@mail.gmail.com>
References: <E1Qt8Tg-0006Fv-At@dinsdale.python.org>
	<CADiSq7d3N5R-j1boLoTUn9WmF7MA122Do8+bGmj_6B52GJwThw@mail.gmail.com>
	<20110816112529.15fb6c69@pitrou.net>
	<CADiSq7c4PfjcLDS67Ti6c+zLk7NqygaWU_1KTkJ1B+JyNQRoEg@mail.gmail.com>
Message-ID: <20110816132344.6d64aca7@pitrou.net>

On Tue, 16 Aug 2011 20:15:51 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> >
> > Oops, I admit I hadn't read PEP 395.
> > PEP 395 focuses on module aliasing, while the suggestion above focuses
> > on the path of objects in modules. How can we reconcile the two? Do we
> > want __qualname__ to be a relative "path" inside the module?
> > (but then __qualname__ cannot specify its own module name).
> 
> I was more thinking that if pickle grew the ability to handle two
> different names for objects, then PEP 395 could run off the same
> feature without having to mess with sys.modules.

But what happens if a module contains, say, a nested class with a
__qualname__ (assigned by the interpreter) of "module_name.A.B", and the
module later gets a __qualname__ (assigned by the user) of
"module_alias"?

Regards

Antoine.

From ncoghlan at gmail.com  Tue Aug 16 13:37:31 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 16 Aug 2011 21:37:31 +1000
Subject: [Python-Dev] [Python-checkins] peps: Add Alexandre's suggestions
In-Reply-To: <20110816132344.6d64aca7@pitrou.net>
References: <E1Qt8Tg-0006Fv-At@dinsdale.python.org>
	<CADiSq7d3N5R-j1boLoTUn9WmF7MA122Do8+bGmj_6B52GJwThw@mail.gmail.com>
	<20110816112529.15fb6c69@pitrou.net>
	<CADiSq7c4PfjcLDS67Ti6c+zLk7NqygaWU_1KTkJ1B+JyNQRoEg@mail.gmail.com>
	<20110816132344.6d64aca7@pitrou.net>
Message-ID: <CADiSq7f3s6WfDdvKdMWeSMeN3SsNxhUC0qqo02uicAFLpS=6wg@mail.gmail.com>

On Tue, Aug 16, 2011 at 9:23 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Tue, 16 Aug 2011 20:15:51 +1000
> Nick Coghlan <ncoghlan at gmail.com> wrote:
>> >
>> > Oops, I admit I hadn't read PEP 395.
>> > PEP 395 focuses on module aliasing, while the suggestion above focuses
>> > on the path of objects in modules. How can we reconcile the two? Do we
>> > want __qualname__ to be a relative "path" inside the module?
>> > (but then __qualname__ cannot specify its own module name).
>>
>> I was more thinking that if pickle grew the ability to handle two
>> different names for objects, then PEP 395 could run off the same
>> feature without having to mess with sys.modules.
>
> But what happens if a module contains, say, a nested class with a
> __qualname__ (assigned by the interpreter) of "module_name.A.B", and the
> module later gets a __qualname__ (assigned by the user) of
> "module_alias"?

Yeah, I don't think it works with PEP 395 in its current state. But
then, I'm not sure 395 will work at all in its current state -
definitely a work in progress, that one. However, I'll definitely keep
this aspect in mind next time I update it - even if they don't use the
same mechanism, they should at least be compatible proposals.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From chris at simplistix.co.uk  Wed Aug 17 00:58:24 2011
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 16 Aug 2011 15:58:24 -0700
Subject: [Python-Dev] Sphinx version for Python 2.x docs
Message-ID: <4E4AF610.5040303@simplistix.co.uk>

Hi All,

Any chance the version of sphinx used to generate the docs on 
docs.python.org could be updated?

I'd love to take advantage of the "new format" intersphinx mapping:

http://sphinx.pocoo.org/ext/intersphinx.html#confval-intersphinx_mapping

...but since it looks like docs.python.org uses a version of sphinx 
that's too old for that, I can't like to:

:ref:`Foo <python:logrecord-attributes>`

...and have to link to:

`LogRecord attributes 
<http://docs.python.org/library/logging.html#logrecord-attributes>`__

instead :-S

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From sandro.tosi at gmail.com  Wed Aug 17 01:05:58 2011
From: sandro.tosi at gmail.com (Sandro Tosi)
Date: Wed, 17 Aug 2011 01:05:58 +0200
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <4E4AF610.5040303@simplistix.co.uk>
References: <4E4AF610.5040303@simplistix.co.uk>
Message-ID: <CAPdtAj3yjZjhkXBk3kVE1nnLuLY4jwJ_sCouQ6LBgc-OENg1kA@mail.gmail.com>

Hello Chris,

On Wed, Aug 17, 2011 at 00:58, Chris Withers <chris at simplistix.co.uk> wrote:
> Hi All,
>
> Any chance the version of sphinx used to generate the docs on
> docs.python.org could be updated?

I think what's needed first is to run a pilot: take the current 2.7
doc, update sphinx and look at what breaks, and evaluate if it's
fixable in a reasonable amount of time, or it's just too much and so
on.

Currently no-one has done that yet: would you ? :) That would helps up
quite much

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

From chris at simplistix.co.uk  Wed Aug 17 04:08:47 2011
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 16 Aug 2011 19:08:47 -0700
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <CAPdtAj3yjZjhkXBk3kVE1nnLuLY4jwJ_sCouQ6LBgc-OENg1kA@mail.gmail.com>
References: <4E4AF610.5040303@simplistix.co.uk>
	<CAPdtAj3yjZjhkXBk3kVE1nnLuLY4jwJ_sCouQ6LBgc-OENg1kA@mail.gmail.com>
Message-ID: <4E4B22AF.3090805@simplistix.co.uk>

On 16/08/2011 16:05, Sandro Tosi wrote:
> Hello Chris,
>
> On Wed, Aug 17, 2011 at 00:58, Chris Withers<chris at simplistix.co.uk>  wrote:
>> Hi All,
>>
>> Any chance the version of sphinx used to generate the docs on
>> docs.python.org could be updated?
>
> I think what's needed first is to run a pilot: take the current 2.7
> doc,

Where does that live?
Where are the instructions for building the docs? (dependencies needed, etc)

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From ncoghlan at gmail.com  Wed Aug 17 05:59:07 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 17 Aug 2011 13:59:07 +1000
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <4E4B22AF.3090805@simplistix.co.uk>
References: <4E4AF610.5040303@simplistix.co.uk>
	<CAPdtAj3yjZjhkXBk3kVE1nnLuLY4jwJ_sCouQ6LBgc-OENg1kA@mail.gmail.com>
	<4E4B22AF.3090805@simplistix.co.uk>
Message-ID: <CADiSq7d4d5yLJkqfoTx+dcEvn1QAvjpyMQnNNWLU0LTHgedMPw@mail.gmail.com>

On Wed, Aug 17, 2011 at 12:08 PM, Chris Withers <chris at simplistix.co.uk> wrote:
> On 16/08/2011 16:05, Sandro Tosi wrote:
>> I think what's needed first is to run a pilot: take the current 2.7
>> doc,
>
> Where does that live?
> Where are the instructions for building the docs? (dependencies needed, etc)

'make html' in the Docs directory of a CPython checkout ("hg clone
http://hg.python.org/cpython") usually does the trick.

See http://docs.python.org/dev/documenting/building.html for more
detail if the above doesn't work.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From sturla at molden.no  Wed Aug 17 17:22:22 2011
From: sturla at molden.no (Sturla Molden)
Date: Wed, 17 Aug 2011 17:22:22 +0200
Subject: [Python-Dev] GIL removal question
In-Reply-To: <CAP7+vJJ9XiAYF3c8+QvDLdgDJHRnXCTUv22tO17jRuU_cuPDsw@mail.gmail.com>
References: <mailman.75.1312970403.18995.python-dev@python.org>
	<6474188B-CC1C-4B33-9E6C-9D2ACFC637D9@dabeaz.com>
	<CADiSq7euK-=W9LrtEUqgJJJUjdHkX99ciJZEZ+pMduRAo90vfg@mail.gmail.com>
	<9E289F07-B0DA-47E6-B46C-22FAE17D4A0D@dabeaz.com>
	<CAP7+vJJ9XiAYF3c8+QvDLdgDJHRnXCTUv22tO17jRuU_cuPDsw@mail.gmail.com>
Message-ID: <4E4BDCAE.7070202@molden.no>

Den 10.08.2011 13:43, skrev Guido van Rossum:
> They have a specific plan, based on Software Transactional Memory:
> http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill.html
>

Microsoft's experiment to use STM in .NET failed though. And Linux got 
rid of the BKL without STM.

There is a similar but simpler paradim called "bulk synchronous 
parallel" (BSP) which might work too. Threads work independently for a 
particular amount of time with private objects (e.g. copy-on-write 
memory), then enter a barrier, changes to global objects are 
synchronized and the GC collects garbage, after which worker threads 
leave the barrier, and the cycle repeats.

To communicate changes to shared objects between synchronization 
barriers, Python code must use explicit locks and flush statements. But 
for the C code in the interpreter, BSP should give the same atomicity 
for Python bytecodes as the GIL  (there is just one active thread inside 
the barrier).

BSP is much simpler to implement than STM because of the barrier 
synchronization. BSP also cannot deadlock or livelock. And because 
threads in BSP work with private memory, there will be no trashing 
(false sharing) from the reference counting GC.

Sturla





From vinay_sajip at yahoo.co.uk  Thu Aug 18 00:30:39 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Wed, 17 Aug 2011 22:30:39 +0000 (UTC)
Subject: [Python-Dev] Packaging in Python 2 anyone ?
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
Message-ID: <loom.20110818T002304-368@post.gmane.org>

Tarek Ziad? <ziade.tarek <at> gmail.com> writes:

> IOW, the task to do is:
> 
> 1/ copy packaging and all its stdlib dependencies in a standalone project
> 2/ rename packaging to distutils2
> 3/ make it work under older 2.x and 3.x (2.x would be the priority)  <====
> 4/ release it, promote its usage
> 5/ consolidate the API with the feedback received
> 
> I realize it's by far the less interesting task to do in packaging,
> but it's by far one of the most important

Okay, I had a bit of spare time today, and here's as far as I've got:

Step 1 - done.
Step 2 - done.
Step 3 - On Python 2.6 most of the tests pass:

Ran 322 tests in 12.148s

FAILED (failures=3, errors=4, skipped=39)

See the detailed test results (for Linux) at https://gist.github.com/1152791

The code is at https://bitbucket.org/vinay.sajip/du2/

stdlib dependency code is either moved to util.py or test/support.py as
appropriate. You need to test in a virtualenv with unittest2 installed. No work
has been done on packaging the project.

I'm not sure if I'll have much more time to spend on this, but it's there in
case someone else can look at the remaining test failures, plus Steps 4 and 5;
hopefully I've broken the back of it :-)

Regards,

Vinay Sajip



From chrism at plope.com  Thu Aug 18 03:15:45 2011
From: chrism at plope.com (Chris McDonough)
Date: Wed, 17 Aug 2011 21:15:45 -0400
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <loom.20110818T002304-368@post.gmane.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
Message-ID: <1313630145.3775.0.camel@thinko>

I'll throw this out there.. why is it going to have a different name on
python2 than on python3?

- C

On Wed, 2011-08-17 at 22:30 +0000, Vinay Sajip wrote:
> Tarek Ziad? <ziade.tarek <at> gmail.com> writes:
> 
> > IOW, the task to do is:
> > 
> > 1/ copy packaging and all its stdlib dependencies in a standalone project
> > 2/ rename packaging to distutils2
> > 3/ make it work under older 2.x and 3.x (2.x would be the priority)  <====
> > 4/ release it, promote its usage
> > 5/ consolidate the API with the feedback received
> > 
> > I realize it's by far the less interesting task to do in packaging,
> > but it's by far one of the most important
> 
> Okay, I had a bit of spare time today, and here's as far as I've got:
> 
> Step 1 - done.
> Step 2 - done.
> Step 3 - On Python 2.6 most of the tests pass:
> 
> Ran 322 tests in 12.148s
> 
> FAILED (failures=3, errors=4, skipped=39)
> 
> See the detailed test results (for Linux) at https://gist.github.com/1152791
> 
> The code is at https://bitbucket.org/vinay.sajip/du2/
> 
> stdlib dependency code is either moved to util.py or test/support.py as
> appropriate. You need to test in a virtualenv with unittest2 installed. No work
> has been done on packaging the project.
> 
> I'm not sure if I'll have much more time to spend on this, but it's there in
> case someone else can look at the remaining test failures, plus Steps 4 and 5;
> hopefully I've broken the back of it :-)
> 
> Regards,
> 
> Vinay Sajip
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40plope.com



From fdrake at acm.org  Thu Aug 18 04:15:53 2011
From: fdrake at acm.org (Fred Drake)
Date: Wed, 17 Aug 2011 22:15:53 -0400
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <1313630145.3775.0.camel@thinko>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<1313630145.3775.0.camel@thinko>
Message-ID: <CAFT4OTHa08OEmkA8U0id8rtN6vcSEdKifMub-75gGu=G6Oc4=A@mail.gmail.com>

On Wed, Aug 17, 2011 at 9:15 PM, Chris McDonough <chrism at plope.com> wrote:
> I'll throw this out there.. why is it going to have a different name on
> python2 than on python3?

So it can be a drop-in replacement for the existing distutils2, I'd expect.

"packaging" is new with Python3, and is the Guido-approved name.


  -Fred

-- 
Fred L. Drake, Jr.? ? <fdrake at acm.org>
"A person who won't read has no advantage over one who can't read."
?? --Samuel Langhorne Clemens

From ncoghlan at gmail.com  Thu Aug 18 05:00:39 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 18 Aug 2011 13:00:39 +1000
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <CAFT4OTHa08OEmkA8U0id8rtN6vcSEdKifMub-75gGu=G6Oc4=A@mail.gmail.com>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<1313630145.3775.0.camel@thinko>
	<CAFT4OTHa08OEmkA8U0id8rtN6vcSEdKifMub-75gGu=G6Oc4=A@mail.gmail.com>
Message-ID: <CADiSq7e9hfypFZMGDtE8QNnZLG6sFr=FoXR2P62G_cdxcsmxzQ@mail.gmail.com>

On Thu, Aug 18, 2011 at 12:15 PM, Fred Drake <fdrake at acm.org> wrote:
> On Wed, Aug 17, 2011 at 9:15 PM, Chris McDonough <chrism at plope.com> wrote:
>> I'll throw this out there.. why is it going to have a different name on
>> python2 than on python3?
>
> So it can be a drop-in replacement for the existing distutils2, I'd expect.
>
> "packaging" is new with Python3, and is the Guido-approved name.

It's actually for the same reason that unittest changes are backported
under the unittest2 name - the distutils2 name can be used in the
future to get Python 3.4 packaging features in Python 3.3, but that
would be difficult if the backport shadowed the standard library name.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From fdrake at acm.org  Thu Aug 18 05:17:38 2011
From: fdrake at acm.org (Fred Drake)
Date: Wed, 17 Aug 2011 23:17:38 -0400
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <CADiSq7e9hfypFZMGDtE8QNnZLG6sFr=FoXR2P62G_cdxcsmxzQ@mail.gmail.com>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<1313630145.3775.0.camel@thinko>
	<CAFT4OTHa08OEmkA8U0id8rtN6vcSEdKifMub-75gGu=G6Oc4=A@mail.gmail.com>
	<CADiSq7e9hfypFZMGDtE8QNnZLG6sFr=FoXR2P62G_cdxcsmxzQ@mail.gmail.com>
Message-ID: <CAFT4OTFR53ED555KURPpMoHuxn-uiAti+OjcsB-PzTq0ocakVA@mail.gmail.com>

On Wed, Aug 17, 2011 at 11:00 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> It's actually for the same reason that unittest changes are backported
> under the unittest2 name - the distutils2 name can be used in the
> future to get Python 3.4 packaging features in Python 3.3, but that
> would be difficult if the backport shadowed the standard library name.

Ah, yes... the old "too bad we stuck it in the standard library" problem.

For some things, an easy lament, but for foundational packaging-related
things, it's hard to get around.


  -Fred

-- 
Fred L. Drake, Jr.? ? <fdrake at acm.org>
"A person who won't read has no advantage over one who can't read."
?? --Samuel Langhorne Clemens

From ziade.tarek at gmail.com  Thu Aug 18 09:32:45 2011
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 18 Aug 2011 09:32:45 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <CAFT4OTFR53ED555KURPpMoHuxn-uiAti+OjcsB-PzTq0ocakVA@mail.gmail.com>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<1313630145.3775.0.camel@thinko>
	<CAFT4OTHa08OEmkA8U0id8rtN6vcSEdKifMub-75gGu=G6Oc4=A@mail.gmail.com>
	<CADiSq7e9hfypFZMGDtE8QNnZLG6sFr=FoXR2P62G_cdxcsmxzQ@mail.gmail.com>
	<CAFT4OTFR53ED555KURPpMoHuxn-uiAti+OjcsB-PzTq0ocakVA@mail.gmail.com>
Message-ID: <CAGSi+Q4uGW9Ej9DhgHDMkZzhs3eJv+7+uJrsZYDnhpq37BTcFA@mail.gmail.com>

On Thu, Aug 18, 2011 at 5:17 AM, Fred Drake <fdrake at acm.org> wrote:
> On Wed, Aug 17, 2011 at 11:00 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> It's actually for the same reason that unittest changes are backported
>> under the unittest2 name - the distutils2 name can be used in the
>> future to get Python 3.4 packaging features in Python 3.3, but that
>> would be difficult if the backport shadowed the standard library name.
>
> Ah, yes... the old "too bad we stuck it in the standard library" problem.
>
> For some things, an easy lament, but for foundational packaging-related
> things, it's hard to get around.

Yeah exactly. And the good thing about packaging and distutils2 is
that for the regular usage (package your project) you don't type any
code, just define options in setup.cfg.  IOW there's no "import
packaging" or "import distutils2".


Cheers
Tarek

>
>
> ?-Fred
>
> --
> Fred L. Drake, Jr.? ? <fdrake at acm.org>
> "A person who won't read has no advantage over one who can't read."
> ?? --Samuel Langhorne Clemens
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com
>



-- 
Tarek Ziad? | http://ziade.org

From ziade.tarek at gmail.com  Thu Aug 18 09:35:01 2011
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 18 Aug 2011 09:35:01 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <loom.20110818T002304-368@post.gmane.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
Message-ID: <CAGSi+Q5pR0_8GNAFCQc9XMWWAJ5XLOCJe29EAUooOCFYkTnMiw@mail.gmail.com>

On Thu, Aug 18, 2011 at 12:30 AM, Vinay Sajip <vinay_sajip at yahoo.co.uk> wrote:
...
> Okay, I had a bit of spare time today, and here's as far as I've got:

Awesome, thanks a lot !

>
> Step 1 - done.
> Step 2 - done.
> Step 3 - On Python 2.6 most of the tests pass:
>
> Ran 322 tests in 12.148s
>
> FAILED (failures=3, errors=4, skipped=39)
>
> See the detailed test results (for Linux) at https://gist.github.com/1152791
>
> The code is at https://bitbucket.org/vinay.sajip/du2/
>
> stdlib dependency code is either moved to util.py or test/support.py as
> appropriate. You need to test in a virtualenv with unittest2 installed. No work
> has been done on packaging the project.
>
> I'm not sure if I'll have much more time to spend on this, but it's there in
> case someone else can look at the remaining test failures, plus Steps 4 and 5;
> hopefully I've broken the back of it :-)

Thank you very much !

Ideally, if you could push this to hg.python.org/distutils2
(overwriting the existing stuff).

Cheers
Tarek

-- 
Tarek Ziad? | http://ziade.org

From vinay_sajip at yahoo.co.uk  Thu Aug 18 11:16:21 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Thu, 18 Aug 2011 09:16:21 +0000 (UTC)
Subject: [Python-Dev] Packaging in Python 2 anyone ?
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<CAGSi+Q5pR0_8GNAFCQc9XMWWAJ5XLOCJe29EAUooOCFYkTnMiw@mail.gmail.com>
Message-ID: <loom.20110818T105443-668@post.gmane.org>

Tarek Ziad? <ziade.tarek <at> gmail.com> writes:

> Ideally, if you could push this to hg.python.org/distutils2
> (overwriting the existing stuff).

Okay, done. I've overwritten existing files and added new ones, only
removing/renaming things like index -> pypi and mkcfg -> create. I haven't
touched existing code e.g. the top-level test scripts or the _backport
directory. The added test_distutils2.py is what I used to run the tests.

Regards,

Vinay Sajip




From solipsis at pitrou.net  Thu Aug 18 11:26:40 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 18 Aug 2011 11:26:40 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<CAGSi+Q5pR0_8GNAFCQc9XMWWAJ5XLOCJe29EAUooOCFYkTnMiw@mail.gmail.com>
	<loom.20110818T105443-668@post.gmane.org>
Message-ID: <20110818112640.3cfa1455@pitrou.net>

On Thu, 18 Aug 2011 09:16:21 +0000 (UTC)
Vinay Sajip <vinay_sajip at yahoo.co.uk> wrote:
> Tarek Ziad? <ziade.tarek <at> gmail.com> writes:
> 
> > Ideally, if you could push this to hg.python.org/distutils2
> > (overwriting the existing stuff).
> 
> Okay, done. I've overwritten existing files and added new ones, only
> removing/renaming things like index -> pypi and mkcfg -> create. I haven't
> touched existing code e.g. the top-level test scripts or the _backport
> directory. The added test_distutils2.py is what I used to run the tests.

Impressive work!
That said, I'm not sure it was the best moment to backport, since
test_packaging currently fails under Windows (I think ?ric is supposed
to look at it).

Regards

Antoine.



From ziade.tarek at gmail.com  Thu Aug 18 11:37:32 2011
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 18 Aug 2011 11:37:32 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <20110818112640.3cfa1455@pitrou.net>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<CAGSi+Q5pR0_8GNAFCQc9XMWWAJ5XLOCJe29EAUooOCFYkTnMiw@mail.gmail.com>
	<loom.20110818T105443-668@post.gmane.org>
	<20110818112640.3cfa1455@pitrou.net>
Message-ID: <CAGSi+Q6=ffY0kXbkKi+0M1q3W2JX5JA5re+4Tqs6Px-tOCf0pA@mail.gmail.com>

On Thu, Aug 18, 2011 at 11:26 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Thu, 18 Aug 2011 09:16:21 +0000 (UTC)
> Vinay Sajip <vinay_sajip at yahoo.co.uk> wrote:
>> Tarek Ziad? <ziade.tarek <at> gmail.com> writes:
>>
>> > Ideally, if you could push this to hg.python.org/distutils2
>> > (overwriting the existing stuff).
>>
>> Okay, done. I've overwritten existing files and added new ones, only
>> removing/renaming things like index -> pypi and mkcfg -> create. I haven't
>> touched existing code e.g. the top-level test scripts or the _backport
>> directory. The added test_distutils2.py is what I used to run the tests.
>
> Impressive work!
> That said, I'm not sure it was the best moment to backport, since
> test_packaging currently fails under Windows (I think ?ric is supposed
> to look at it).
>

Frankly, I think there's no best moment for this.

We'll need to backport everything we do in packaging/ in distutils2/
(Yeah, painful...)

> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com
>



-- 
Tarek Ziad? | http://ziade.org

From ziade.tarek at gmail.com  Thu Aug 18 11:38:12 2011
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 18 Aug 2011 11:38:12 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <loom.20110818T105443-668@post.gmane.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<CAGSi+Q5pR0_8GNAFCQc9XMWWAJ5XLOCJe29EAUooOCFYkTnMiw@mail.gmail.com>
	<loom.20110818T105443-668@post.gmane.org>
Message-ID: <CAGSi+Q6Dahwqe39hZqz1rC6piaQjj0WzAA9wxCFoCS-LKoZ2PQ@mail.gmail.com>

On Thu, Aug 18, 2011 at 11:16 AM, Vinay Sajip <vinay_sajip at yahoo.co.uk> wrote:
> Tarek Ziad? <ziade.tarek <at> gmail.com> writes:
>
>> Ideally, if you could push this to hg.python.org/distutils2
>> (overwriting the existing stuff).
>
> Okay, done. I've overwritten existing files and added new ones, only
> removing/renaming things like index -> pypi and mkcfg -> create. I haven't
> touched existing code e.g. the top-level test scripts or the _backport
> directory. The added test_distutils2.py is what I used to run the tests.


Thanks again

> Regards,
>
> Vinay Sajip
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com
>



-- 
Tarek Ziad? | http://ziade.org

From vinay_sajip at yahoo.co.uk  Thu Aug 18 18:19:12 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Thu, 18 Aug 2011 16:19:12 +0000 (UTC)
Subject: [Python-Dev] Packaging in Python 2 anyone ?
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<CAGSi+Q5pR0_8GNAFCQc9XMWWAJ5XLOCJe29EAUooOCFYkTnMiw@mail.gmail.com>
	<loom.20110818T105443-668@post.gmane.org>
	<20110818112640.3cfa1455@pitrou.net>
Message-ID: <loom.20110818T181642-969@post.gmane.org>

Antoine Pitrou <solipsis <at> pitrou.net> writes:

> That said, I'm not sure it was the best moment to backport, since
> test_packaging currently fails under Windows (I think ?ric is supposed
> to look at it).

Plus, there are at least half a dozen issues which would need to be addressed in
packaging before final release, but they are not complete show-stoppers and
won't preclude 2.x users giving useful feedback.

Regards,

Vinay Sajip


From stefan at bytereef.org  Thu Aug 18 18:22:54 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 18 Aug 2011 18:22:54 +0200
Subject: [Python-Dev] memoryview: "B", "c", "b" format specifiers
Message-ID: <20110818162254.GA18925@sleipnir.bytereef.org>

Hello,

during my work on PEP-3118 fixes I noticed that memoryview does not handle
the "B" format specifier according to the struct module documentation:


Here's what struct does:

>>> b = bytearray([1,2,3])
>>> struct.pack_into('B', b, 0, b'X')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: required argument is not an integer
>>> struct.pack_into('c', b, 0, b'X')
>>> b
bytearray(b'X\x02\x03')


Here's what memoryview does:

>>> b = bytearray([1,2,3])
>>> m = memoryview(b)
>>> m.format
'B'
>>> m[0] = b'X'
>>> m[0] = 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' does not support the buffer interface


So, memoryview does exactly the opposite of what is specified. It should
reject the bytes object but accept the integer.


I would like to fix this in the features/pep-3118 repository as follows:

  - memoryview should respect the format specifiers.

  - bytearray and friends should set the format specifier to "c"
    in their getbuffer() methods.

  - Introduce a new function PyMemoryView_FromBytes() that can be used
    instead of PyMemoryView_FromBuffer(). PyMemoryView_FromBuffer()
    is usually used in conjunction with PyBuffer_FillInfo(), which
    sets the format specifier to "B".


Are there any general objections to this?


Stefan Krah



From solipsis at pitrou.net  Thu Aug 18 18:40:40 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 18 Aug 2011 18:40:40 +0200
Subject: [Python-Dev] memoryview: "B", "c", "b" format specifiers
References: <20110818162254.GA18925@sleipnir.bytereef.org>
Message-ID: <20110818184040.26dcfcee@pitrou.net>

On Thu, 18 Aug 2011 18:22:54 +0200
Stefan Krah <stefan at bytereef.org> wrote:
> 
> So, memoryview does exactly the opposite of what is specified. It should
> reject the bytes object but accept the integer.

Well, memoryview is quite dumb right now. It ignores the format and
just considers its underlying memory a bytes sequence.

> I would like to fix this in the features/pep-3118 repository as follows:
> 
>   - memoryview should respect the format specifiers.
> 
>   - bytearray and friends should set the format specifier to "c"
>     in their getbuffer() methods.
> 
>   - Introduce a new function PyMemoryView_FromBytes() that can be used
>     instead of PyMemoryView_FromBuffer(). PyMemoryView_FromBuffer()
>     is usually used in conjunction with PyBuffer_FillInfo(), which
>     sets the format specifier to "B".

What would PyMemoryView_FromBytes() do? The name suggests it takes a
bytes object, but you can already use PyMemoryView_FromObject() for
that.

(I personnaly think the general bytes-as-sequence-of-ints behaviour is
a mistake, so I wouldn't care much about an additional C API to enforce
that behaviour :-))

Regards

Antoine.



From stefan at bytereef.org  Thu Aug 18 18:57:00 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 18 Aug 2011 18:57:00 +0200
Subject: [Python-Dev] memoryview: "B", "c", "b" format specifiers
In-Reply-To: <20110818184040.26dcfcee@pitrou.net>
References: <20110818162254.GA18925@sleipnir.bytereef.org>
	<20110818184040.26dcfcee@pitrou.net>
Message-ID: <20110818165700.GA19118@sleipnir.bytereef.org>

Antoine Pitrou <solipsis at pitrou.net> wrote:
> > I would like to fix this in the features/pep-3118 repository as follows:
> > 
> >   - memoryview should respect the format specifiers.
> > 
> >   - bytearray and friends should set the format specifier to "c"
> >     in their getbuffer() methods.
> > 
> >   - Introduce a new function PyMemoryView_FromBytes() that can be used
> >     instead of PyMemoryView_FromBuffer(). PyMemoryView_FromBuffer()
> >     is usually used in conjunction with PyBuffer_FillInfo(), which
> >     sets the format specifier to "B".
> 
> What would PyMemoryView_FromBytes() do? The name suggests it takes a
> bytes object, but you can already use PyMemoryView_FromObject() for
> that.

Oh no, the name isn't quite right then. It should be a replacement
for the combination PyBuffer_FillInfo()/PyMemoryView_FromBuffer()
and it should temporarily wrap a C-string. Also, unlike that combination,
it would set the format specifier to "c". Perhaps this name is better:

PyObject * PyMemoryView_FromCString(char *s, Py_ssize_t size, int flags);

'flags' is just PyBUF_READ or PyBUF_WRITE.


In the Python source tree, it could completely replace PyBuffer_FillInfo()
and PyMemoryView_FromBuffer().


Stefan Krah




From fijall at gmail.com  Thu Aug 18 19:31:06 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 18 Aug 2011 19:31:06 +0200
Subject: [Python-Dev] PyPy 1.6 released
Message-ID: <CAK5idxQGp9W1G2pkUZb3swoegTwgxRs3XfuyEnw7_mXVewHq1A@mail.gmail.com>

========================
PyPy 1.6 - kickass panda
========================

We're pleased to announce the 1.6 release of PyPy. This release brings a lot
of bugfixes and performance improvements over 1.5, and improves support for
Windows 32bit and OS X 64bit. This version fully implements Python 2.7.1 and
has beta level support for loading CPython C extensions.  You can download it
here:

    http://pypy.org/download.html

What is PyPy?
=============

PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7.1. It's fast (`pypy 1.5 and cpython 2.6.2`_ performance comparison)
due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64 or Mac OS X.  Windows 32
is beta (it roughly works but a lot of small issues have not been fixed so
far).  Windows 64 is not yet supported.

The main topics of this release are speed and stability: on average on
our benchmark suite, PyPy 1.6 is between **20% and 30%** faster than PyPy 1.5,
which was already much faster than CPython on our set of benchmarks.

The speed improvements have been made possible by optimizing many of the
layers which compose PyPy.  In particular, we improved: the Garbage Collector,
the JIT warmup time, the optimizations performed by the JIT, the quality of
the generated machine code and the implementation of our Python interpreter.

.. _`pypy 1.5 and cpython 2.6.2`: http://speed.pypy.org


Highlights
==========

* Numerous performance improvements, overall giving considerable speedups:

  - better GC behavior when dealing with very large objects and arrays

  - **fast ctypes:** now calls to ctypes functions are seen and optimized
    by the JIT, and they are up to 60 times faster than PyPy 1.5 and 10 times
    faster than CPython

  - improved generators(1): simple generators now are inlined into the caller
    loop, making performance up to 3.5 times faster than PyPy 1.5.

  - improved generators(2): thanks to other optimizations, even generators
    that are not inlined are between 10% and 20% faster than PyPy 1.5.

  - faster warmup time for the JIT

  - JIT support for single floats (e.g., for ``array('f')``)

  - optimized dictionaries: the internal representation of dictionaries is now
    dynamically selected depending on the type of stored objects, resulting in
    faster code and smaller memory footprint.  For example, dictionaries whose
    keys are all strings, or all integers. Other dictionaries are also smaller
    due to bugfixes.

* JitViewer: this is the first official release which includes the JitViewer,
  a web-based tool which helps you to see which parts of your Python code have
  been compiled by the JIT, down until the assembler. The `jitviewer`_ 0.1 has
  already been release and works well with PyPy 1.6.

* The CPython extension module API has been improved and now supports many
  more extensions. For information on which one are supported, please refer to
  our `compatibility wiki`_.

* Multibyte encoding support: this was of of the last areas in which we were
  still behind CPython, but now we fully support them.

* Preliminary support for NumPy: this release includes a preview of a very
  fast NumPy module integrated with the PyPy JIT.  Unfortunately, this does
  not mean that you can expect to take an existing NumPy program and run it on
  PyPy, because the module is still unfinished and supports only some of the
  numpy API. However, barring some details, what works should be
  blazingly fast :-)

* Bugfixes: since the 1.5 release we fixed 53 bugs in our `bug tracker`_, not
  counting the numerous bugs that were found and reported through other
  channels than the bug tracker.

Cheers,

Hakan Ardo, Carl Friedrich Bolz, Laura Creighton, Antonio Cuni,
Maciej Fijalkowski, Amaury Forgeot d'Arc, Alex Gaynor,
Armin Rigo and the PyPy team

.. _`jitviewer`:
http://morepypy.blogspot.com/2011/08/visualization-of-jitted-code.html
.. _`bug tracker`: https://bugs.pypy.org
.. _`compatibility wiki`: https://bitbucket.org/pypy/compatibility/wiki/Home

From merwok at netwok.org  Thu Aug 18 20:02:45 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Thu, 18 Aug 2011 20:02:45 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <loom.20110818T181642-969@post.gmane.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>	<loom.20110818T002304-368@post.gmane.org>	<CAGSi+Q5pR0_8GNAFCQc9XMWWAJ5XLOCJe29EAUooOCFYkTnMiw@mail.gmail.com>	<loom.20110818T105443-668@post.gmane.org>	<20110818112640.3cfa1455@pitrou.net>
	<loom.20110818T181642-969@post.gmane.org>
Message-ID: <4E4D53C5.8000608@netwok.org>

Le 18/08/2011 18:19, Vinay Sajip a ?crit :
> Antoine Pitrou <solipsis <at> pitrou.net> writes:
>> That said, I'm not sure it was the best moment to backport, since
>> test_packaging currently fails under Windows (I think ?ric is supposed
>> to look at it).

I will; any help is welcome, especially if you have a machine with the
same Windows version (see #12678). I caught Georg?s message on
python-committers but could not do anything in time; I only have
Internet access at a public library so I can?t be as responsive as I would.

> Plus, there are at least half a dozen issues which would need to be addressed in
> packaging before final release, but they are not complete show-stoppers and
> won't preclude 2.x users giving useful feedback.

Yes, there are a few dozen bugs that need addressing before 1.0 (i.e.
Python 3.3), but there?s time.  Alpha and beta releases of distutils2
would be useful.

Regards

From solipsis at pitrou.net  Thu Aug 18 20:19:16 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 18 Aug 2011 20:19:16 +0200
Subject: [Python-Dev] cpython (3.2): NUL -> NULL
References: <E1Qu4qW-0007gQ-V9@dinsdale.python.org>
Message-ID: <20110818201916.35218b7e@pitrou.net>

On Thu, 18 Aug 2011 17:49:28 +0200
benjamin.peterson <python-checkins at python.org> wrote:
> -        PyErr_SetString(PyExc_TypeError, "embedded NUL character");
> +        PyErr_SetString(PyExc_TypeError, "embedded NULL character");

Are you sure? IIRC, NUL is the little name of ASCII character 0
(while NULL would be the NULL pointer).

Regards

Antoine.



From eric at trueblade.com  Thu Aug 18 20:25:36 2011
From: eric at trueblade.com (Eric V. Smith)
Date: Thu, 18 Aug 2011 14:25:36 -0400
Subject: [Python-Dev] cpython (3.2): NUL -> NULL
In-Reply-To: <20110818201916.35218b7e@pitrou.net>
References: <E1Qu4qW-0007gQ-V9@dinsdale.python.org>
	<20110818201916.35218b7e@pitrou.net>
Message-ID: <4E4D5920.4080408@trueblade.com>

On 08/18/2011 02:19 PM, Antoine Pitrou wrote:
> On Thu, 18 Aug 2011 17:49:28 +0200
> benjamin.peterson <python-checkins at python.org> wrote:
>> -        PyErr_SetString(PyExc_TypeError, "embedded NUL character");
>> +        PyErr_SetString(PyExc_TypeError, "embedded NULL character");
> 
> Are you sure? IIRC, NUL is the little name of ASCII character 0
> (while NULL would be the NULL pointer).

That's my understanding, too.

Eric.

From merwok at netwok.org  Thu Aug 18 20:27:30 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Thu, 18 Aug 2011 20:27:30 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <loom.20110818T002304-368@post.gmane.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
Message-ID: <4E4D5992.7070603@netwok.org>

Hi Tarek,

> Doing an automated conversion turned out to be a nightmare, and I was
> about to go ahead and maintain a fork of the packaging package, with
> the few modules that are needed (sysconfig, etc) within a standalone
> release.

Can you give us more info?  Do you have a repo somewhere, or notes?

A related question: what is the minimum 2.x version that we should
support?  2.6 would be a dream, thanks to bytes literal and all that,
but I?m sure it?s not realistic; 2.5 would be nice for the with
statement and hashlib, otherwise 2.4 is okay.

When I talked with ?ukasz in private email about backports and 3to2, we
agreed that there were some serious bugs in 3to2 and we wanted to work
on patches.  I also wanted to make the command-line driver more
flexible, so that it would be easy to run a command to apply only
3.3?3.2 fixes, then another for 3.2?2.7, etc.

Maybe your problems were caused by the state of the packaging codebase.
 The conversion to 3.x was a little messy: in some cases there were
parallel code branches for 2.x and 3.x, on other cases 2to3 was run, and
in many cases the conversion had to be cleaned up (esp. bytes/str
madness).  Even now that the code runs and the tests pass, there may
still be things in need of a cleanup in the codebase, and maybe they
trip up 3to2.

> I am looking for someone that has some free time and that is willing
> to lead this work.

Well, free time is scarce with all these distutils bugs on my plate, but
I am definitely interested in heading the backport, as I stated earlier.
 I think the key point is to avoid making the same work over and over
again, and I see a few ways of managing that.

The first way is to start with a 2.x-converted codebase (thanks Vinay!)
and manually port all cpython/packaging changesets to distutils2, like I
used to do.  This is just as annoying as backporting to 2.7, and just as
simple.

The second way is to work on a conversion tool instead of working on
changesets.  The idea is to make a robust tool based on 3to2 that copies
code and converts it.  This would not be the easiest way, as shown by
your experience, but surely the less cumbersome in the long term.

The third way is to use a new Mercurial repo converted from the cpython
repo, so that we can run ?hg convert? again to pull new changesets.
Convert, test and commit.  The advantage is that it?s not required to
port each changeset: the convert-merge dance can be done once a month,
or just for new releases.

The fourth way is hybrid: start from a 2.x-converted codebase, and each
month, make a diff for cpython/Lib/packaging and apply to distutils2.  I
fear that such diffs would be painful to apply, and consist mostly of
rejects.  With idea #3, we get to use a merge tool, which is much better.

After writing out these ideas, I think the first one is certainly the
simplest thing that could work with minimum pain.


Le 18/08/2011 00:30, Vinay Sajip a ?crit :
> stdlib dependency code is either moved to util.py or test/support.py as
> appropriate.
We need sysconfig, shutil, tarfile, hashlib... Surely that?s a lot to
put in util.py.

> I'm not sure if I'll have much more time to spend on this, but it's there in
> case someone else can look at the remaining test failures, plus Steps 4 and 5;
> hopefully I've broken the back of it :-)
I join my thanks to Tarek?s, and volunteer to follow on :)

Regards

From benjamin at python.org  Thu Aug 18 20:41:15 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 18 Aug 2011 13:41:15 -0500
Subject: [Python-Dev] cpython (3.2): NUL -> NULL
In-Reply-To: <20110818201916.35218b7e@pitrou.net>
References: <E1Qu4qW-0007gQ-V9@dinsdale.python.org>
	<20110818201916.35218b7e@pitrou.net>
Message-ID: <CAPZV6o_48C11PAVT+VweezFP_DqD4voBiGaoGmPAW81nLZuKpQ@mail.gmail.com>

2011/8/18 Antoine Pitrou <solipsis at pitrou.net>:
> On Thu, 18 Aug 2011 17:49:28 +0200
> benjamin.peterson <python-checkins at python.org> wrote:
>> - ? ? ? ?PyErr_SetString(PyExc_TypeError, "embedded NUL character");
>> + ? ? ? ?PyErr_SetString(PyExc_TypeError, "embedded NULL character");
>
> Are you sure? IIRC, NUL is the little name of ASCII character 0
> (while NULL would be the NULL pointer).

NUL is the abbreviation of the "Null character".


-- 
Regards,
Benjamin

From stefan at bytereef.org  Thu Aug 18 20:51:21 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 18 Aug 2011 20:51:21 +0200
Subject: [Python-Dev] cpython (3.2): NUL -> NULL
In-Reply-To: <20110818201916.35218b7e@pitrou.net>
References: <E1Qu4qW-0007gQ-V9@dinsdale.python.org>
	<20110818201916.35218b7e@pitrou.net>
Message-ID: <20110818185121.GA19783@sleipnir.bytereef.org>

Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Thu, 18 Aug 2011 17:49:28 +0200
> benjamin.peterson <python-checkins at python.org> wrote:
> > -        PyErr_SetString(PyExc_TypeError, "embedded NUL character");
> > +        PyErr_SetString(PyExc_TypeError, "embedded NULL character");
> 
> Are you sure? IIRC, NUL is the little name of ASCII character 0
> (while NULL would be the NULL pointer).

Yes, that's the traditional name. I was surprised that the C99 standard uses
"null character" in almost all cases. Example:

"The construction '\0' is commonly used to represent the null character."


So I think it should be either NUL or "null character" with the lower
case spelling.


Stefan Krah

 

From ziade.tarek at gmail.com  Thu Aug 18 20:59:00 2011
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 18 Aug 2011 20:59:00 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <4E4D5992.7070603@netwok.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<4E4D5992.7070603@netwok.org>
Message-ID: <CAGSi+Q42HpxfsVMcyLRQAQsVEBuGXcim-jkg17Dt3f15Rij3hQ@mail.gmail.com>

On Thu, Aug 18, 2011 at 8:27 PM, ?ric Araujo <merwok at netwok.org> wrote:
> Hi Tarek,
>
>> Doing an automated conversion turned out to be a nightmare, and I was
>> about to go ahead and maintain a fork of the packaging package, with
>> the few modules that are needed (sysconfig, etc) within a standalone
>> release.
>
> Can you give us more info? ?Do you have a repo somewhere, or notes?

I tried using relative imports, but that made the whole thing
complicated and not working under older 2.x
then there are a lot of spots where the word 'packaging' is used for
other things than modules.

then there are spots when we needed to change the bytes/str behavior
depending on the py version, making everything complex to maintain

I guess it's the addition of the three that made it too complex :
transparent renaming + 3to2 + 3.xto3.x

>
> A related question: what is the minimum 2.x version that we should
> support? ?2.6 would be a dream, thanks to bytes literal and all that,
> but I?m sure it?s not realistic; 2.5 would be nice for the with
> statement and hashlib, otherwise 2.4 is okay.

2.5 sounds good. I am sold on dropping 2.4 frankly. Maybe we can drop
2.5 in a few months ;)

>
> When I talked with ?ukasz in private email about backports and 3to2, we
> agreed that there were some serious bugs in 3to2 and we wanted to work
> on patches. ?I also wanted to make the command-line driver more
> flexible, so that it would be easy to run a command to apply only
> 3.3?3.2 fixes, then another for 3.2?2.7, etc.
>
> Maybe your problems were caused by the state of the packaging codebase.
> ?The conversion to 3.x was a little messy: in some cases there were
> parallel code branches for 2.x and 3.x, on other cases 2to3 was run, and
> in many cases the conversion had to be cleaned up (esp. bytes/str
> madness). ?Even now that the code runs and the tests pass, there may
> still be things in need of a cleanup in the codebase, and maybe they
> trip up 3to2.

I think that's not worth the effort frankly. keeping a clean fully py3
code without worrying about making it 3to2 friendly, make all
contributors life easier ihmo. The tradeoff is that we will have to
backport to distutils2 changes. That's what was done for a while
between the Python trunk and the Py3k branch, so I guess it's doable
-- if all packaging contributors agree to do this backport work.


>
>> I am looking for someone that has some free time and that is willing
>> to lead this work.
>
> Well, free time is scarce with all these distutils bugs on my plate, but
> I am definitely interested in heading the backport, as I stated earlier.
> ?I think the key point is to avoid making the same work over and over
> again, and I see a few ways of managing that.
>
> The first way is to start with a 2.x-converted codebase (thanks Vinay!)
> and manually port all cpython/packaging changesets to distutils2, like I
> used to do. ?This is just as annoying as backporting to 2.7, and just as
> simple.
>
> The second way is to work on a conversion tool instead of working on
> changesets. ?The idea is to make a robust tool based on 3to2 that copies
> code and converts it. ?This would not be the easiest way, as shown by
> your experience, but surely the less cumbersome in the long term.
>
> The third way is to use a new Mercurial repo converted from the cpython
> repo, so that we can run ?hg convert? again to pull new changesets.
> Convert, test and commit. ?The advantage is that it?s not required to
> port each changeset: the convert-merge dance can be done once a month,
> or just for new releases.
>
> The fourth way is hybrid: start from a 2.x-converted codebase, and each
> month, make a diff for cpython/Lib/packaging and apply to distutils2. ?I
> fear that such diffs would be painful to apply, and consist mostly of
> rejects. ?With idea #3, we get to use a merge tool, which is much better.
>
> After writing out these ideas, I think the first one is certainly the
> simplest thing that could work with minimum pain.

I think so too.  The automatic conversion sounded like a great thing,
but the nature of the project makes it too hard,

Cheers




-- 
Tarek Ziad? | http://ziade.org

From stefan at bytereef.org  Thu Aug 18 22:25:59 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 18 Aug 2011 22:25:59 +0200
Subject: [Python-Dev] memoryview: "B", "c", "b" format specifiers
In-Reply-To: <20110818184040.26dcfcee@pitrou.net>
References: <20110818162254.GA18925@sleipnir.bytereef.org>
	<20110818184040.26dcfcee@pitrou.net>
Message-ID: <20110818202559.GA20296@sleipnir.bytereef.org>

Antoine Pitrou <solipsis at pitrou.net> wrote:
> (I personnaly think the general bytes-as-sequence-of-ints behaviour is
> a mistake, so I wouldn't care much about an additional C API to enforce
> that behaviour :-))

I don't want to abolish the "c" (bytes of length 1) format. :)

I think there are use cases for well defined arrays of small signed/unsigned
integers. Say you want to send a log-ngram array of unsigned chars over the
network. There shouldn't be a bytes object involved in that process. You
would pack the array with ints and unpack as ints.

Unless the struct module and PEP-3118 grow support for int8_t and uint8_t,
I think "b" and "B" should probably be restricted to integers.


Stefan Krah




From solipsis at pitrou.net  Thu Aug 18 22:53:06 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 18 Aug 2011 22:53:06 +0200
Subject: [Python-Dev] memoryview: "B", "c", "b" format specifiers
References: <20110818162254.GA18925@sleipnir.bytereef.org>
	<20110818184040.26dcfcee@pitrou.net>
	<20110818165700.GA19118@sleipnir.bytereef.org>
Message-ID: <20110818225306.76d11bfe@pitrou.net>

On Thu, 18 Aug 2011 18:57:00 +0200
Stefan Krah <stefan at bytereef.org> wrote:
> 
> Oh no, the name isn't quite right then. It should be a replacement
> for the combination PyBuffer_FillInfo()/PyMemoryView_FromBuffer()
> and it should temporarily wrap a C-string.

Ah, nice.

> PyObject * PyMemoryView_FromCString(char *s, Py_ssize_t size, int flags);

It's not really a C string, since it's not null-terminated.
PyMemoryView_FromMemory?

(that would mirror PyUnicode_FromUnicode, for example)

> 'flags' is just PyBUF_READ or PyBUF_WRITE.

Why do we have these in addition to PyBUF_WRITABLE already?

Regards

Antoine.



From vinay_sajip at yahoo.co.uk  Thu Aug 18 23:49:52 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Thu, 18 Aug 2011 21:49:52 +0000 (UTC)
Subject: [Python-Dev] Packaging in Python 2 anyone ?
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<4E4D5992.7070603@netwok.org>
Message-ID: <loom.20110818T233749-283@post.gmane.org>

?ric Araujo <merwok <at> netwok.org> writes:

> Le 18/08/2011 00:30, Vinay Sajip a ?crit :
> > stdlib dependency code is either moved to util.py or test/support.py as
> > appropriate.
> We need sysconfig, shutil, tarfile, hashlib... Surely that?s a lot to
> put in util.py.

Well sysconfig.py/sysconfig.cfg have been copied as is. I've only copied over
specific things we need from shutil/functools/os, etc. so far to util.py. I
haven't looked at 2.4/2.5 support yet: things like hashlib would probably need
to be treated the same way Django handles this sort of backport of functionality.

> I join my thanks to Tarek?s, and volunteer to follow on :)

That's good news :-)

Regards,

Vinay Sajip


From stefan at bytereef.org  Fri Aug 19 00:30:46 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Fri, 19 Aug 2011 00:30:46 +0200
Subject: [Python-Dev] memoryview: "B", "c", "b" format specifiers
In-Reply-To: <20110818225306.76d11bfe@pitrou.net>
References: <20110818162254.GA18925@sleipnir.bytereef.org>
	<20110818184040.26dcfcee@pitrou.net>
	<20110818165700.GA19118@sleipnir.bytereef.org>
	<20110818225306.76d11bfe@pitrou.net>
Message-ID: <20110818223046.GA20738@sleipnir.bytereef.org>

Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Thu, 18 Aug 2011 18:57:00 +0200
> Stefan Krah <stefan at bytereef.org> wrote:
> > 
> > Oh no, the name isn't quite right then. It should be a replacement
> > for the combination PyBuffer_FillInfo()/PyMemoryView_FromBuffer()
> > and it should temporarily wrap a C-string.
> 
> Ah, nice.
> 
> > PyObject * PyMemoryView_FromCString(char *s, Py_ssize_t size, int flags);
> 
> It's not really a C string, since it's not null-terminated.
> PyMemoryView_FromMemory?
> 
> (that would mirror PyUnicode_FromUnicode, for example)

I see, yes. PyMemoryView_FromStringAndSize()? No, too much typing. I prefer
PyMemoryView_FromMemory().


> > 'flags' is just PyBUF_READ or PyBUF_WRITE.
>
> Why do we have these in addition to PyBUF_WRITABLE already?

That's a bit involved, this is how I see it:

There are four buffer *request* flags that can be sent to a buffer provider
and that indicate the amount of complexity that a consumer can handle (in
decreasing order):

PyBUF_INDIRECT  -> suboffsets   (PIL-style)
PyBUF_STRIDES   -> strides      (Numpy-style)
PyBUF_ND        -> C-contiguous, but possibly multi-dimensional
PyBUF_SIMPLE    -> contiguous, one-dimensional, unsigned bytes

Each of those flags can be mixed freely with two additional flags:

PyBUF_WRITABLE
PyBUF_FORMAT

All other buffer request flags are simply combinations of those.
For example, if you use PyBUF_WRITABLE as the only flag, logically
it should be seen as PyBUF_WRITABLE|PyBUF_SIMPLE (this works since
PyBUF_SIMPLE is defined as 0).


PyBUF_READ and PyBUF_WRITE are so far only used for PyMemoryView_GetContiguous().
The PEP still has a flag named PyBUF_UPDATEIFCOPY, but that didn't make it
into object.h.

I thought it might be appropriate to use PyBUF_READ and PyBUF_WRITE
to underline the fact that you cannot send a fine grained buffer
request to PyMemoryView_FromMemory()[1]. Also, PyBUF_READ is easier
to understand than PyBUF_SIMPLE.


But I'd be equally happy with PyBUF_SIMPLE/PyBUF_WRITABLE.


Stefan Krah

[1] The terminology might sound funny, but there is a function that
can act a micro buffer provider:

int PyBuffer_FillInfo(Py_buffer *view, PyObject *obj, void *buf, Py_ssize_t len,
                      int readonly, int infoflags)

An exporter can use this function as a building block for a getbuffer()
method for unsigned bytes, since it reacts correctly to *all* possible
buffer requests in 'infoflags'.



From ncoghlan at gmail.com  Fri Aug 19 03:30:48 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 19 Aug 2011 11:30:48 +1000
Subject: [Python-Dev] cpython (3.2): NUL -> NULL
In-Reply-To: <20110818185121.GA19783@sleipnir.bytereef.org>
References: <E1Qu4qW-0007gQ-V9@dinsdale.python.org>
	<20110818201916.35218b7e@pitrou.net>
	<20110818185121.GA19783@sleipnir.bytereef.org>
Message-ID: <CADiSq7d=2DFosVRNqkkt1O9FmP50=bNoFYm0P=N83+XEDVs+mA@mail.gmail.com>

On Fri, Aug 19, 2011 at 4:51 AM, Stefan Krah <stefan at bytereef.org> wrote:
> So I think it should be either NUL or "null character" with the lower
> case spelling.

+1

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From p.f.moore at gmail.com  Fri Aug 19 17:35:29 2011
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 19 Aug 2011 16:35:29 +0100
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
Message-ID: <CACac1F-Vf1W1ZkmGi92ZOYwWdYM4Ex_1dELiP93mtxpOEGgKwA@mail.gmail.com>

On 15 August 2011 11:31, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
> IOW, the task to do is:
>
> 1/ copy packaging and all its stdlib dependencies in a standalone project
> 2/ rename packaging to distutils2
> 3/ make it work under older 2.x and 3.x (2.x would be the priority) ?<====
> 4/ release it, promote its usage
> 5/ consolidate the API with the feedback received

One thing that I, as a semi-interested bystander, would like to see is
sort of a component of 4. Namely, a document somewhere addressing the
question of why I, as a current user of distutils (setup.py, etc),
should convert my project to use packaging/distutils2 - and what I'd
need to do so.

At the moment, I see no benefit to me in migrating. New projects, or
projects that already know that they want one or more of the benefits
that packaging/distutils2/setuptools bring, are a different matter.
It's projects with needs satisfied by distutils, and code invested a
distutils-based solution, that could do with some persuasion.

I checked the docs, and "Distributing Python Modules" is for new
projects, and "What's new" basically says "we expect you to migrate"
but has no reasons or guidelines.

If someone borrows the time machine and makes this already available,
so much the better. Pointers would be appreciated!

Paul.

From merwok at netwok.org  Fri Aug 19 17:40:28 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Fri, 19 Aug 2011 17:40:28 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <CACac1F-Vf1W1ZkmGi92ZOYwWdYM4Ex_1dELiP93mtxpOEGgKwA@mail.gmail.com>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<CACac1F-Vf1W1ZkmGi92ZOYwWdYM4Ex_1dELiP93mtxpOEGgKwA@mail.gmail.com>
Message-ID: <4E4E83EC.8040708@netwok.org>

> One thing that I, as a semi-interested bystander, would like to see is
> sort of a component of 4. Namely, a document somewhere addressing the
> question of why I, as a current user of distutils (setup.py, etc),
> should convert my project to use packaging/distutils2 - and what I'd
> need to do so.

I?m working on such a document, first in a doc set outside of the Python
docs, and when it?s ready as an official HOWTO.  (I?ll send the URL when
I finish and publish it.)

> I checked the docs, and "Distributing Python Modules" is for new
> projects,

That doc set is for distutils, unless you meant ?Distributing Python
Projects?, which is currently under severe updating.

Regards

From p.f.moore at gmail.com  Fri Aug 19 17:56:37 2011
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 19 Aug 2011 16:56:37 +0100
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <4E4E83EC.8040708@netwok.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<CACac1F-Vf1W1ZkmGi92ZOYwWdYM4Ex_1dELiP93mtxpOEGgKwA@mail.gmail.com>
	<4E4E83EC.8040708@netwok.org>
Message-ID: <CACac1F_N65wgDA1i-RnVv236kxOCpwdQvSgCuq3YxDyXGB+J_Q@mail.gmail.com>

On 19 August 2011 16:40, ?ric Araujo <merwok at netwok.org> wrote:
>> One thing that I, as a semi-interested bystander, would like to see is
>> sort of a component of 4. Namely, a document somewhere addressing the
>> question of why I, as a current user of distutils (setup.py, etc),
>> should convert my project to use packaging/distutils2 - and what I'd
>> need to do so.
>
> I?m working on such a document, first in a doc set outside of the Python
> docs, and when it?s ready as an official HOWTO. ?(I?ll send the URL when
> I finish and publish it.)

Nice :-) I'll try to provide some feedback when it's ready.

>> I checked the docs, and "Distributing Python Modules" is for new
>> projects,
>
> That doc set is for distutils, unless you meant ?Distributing Python
> Projects?, which is currently under severe updating.

Sorry, I did indeed mean "... Projects" - I had looked at the Python
3.3 doc tree, and hadn't noticed that the name had changed.

Paul.

From status at bugs.python.org  Fri Aug 19 18:07:22 2011
From: status at bugs.python.org (Python tracker)
Date: Fri, 19 Aug 2011 18:07:22 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20110819160722.C3BE21CA8D@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-08-12 - 2011-08-19)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    2937 (+14)
  closed 21630 (+28)
  total  24567 (+42)

Open issues with patches: 1266 


Issues opened (31)
==================

#12409: Moving "Documenting Python" to Devguide
http://bugs.python.org/issue12409  reopened by eric.araujo

#12745: Python2 or Python3 page
http://bugs.python.org/issue12745  opened by JBernardo

#12746: normalization is affected by unicode width
http://bugs.python.org/issue12746  opened by benjamin.peterson

#12749: lib re cannot match non-BMP ranges (all versions, all builds)
http://bugs.python.org/issue12749  opened by tchrist

#12750: datetime.strftime('%s') should respect tzinfo
http://bugs.python.org/issue12750  opened by Daniel.O'Connor

#12753: \N{...} neglects formal aliases and named sequences from Unico
http://bugs.python.org/issue12753  opened by tchrist

#12754: Add alternative random number generators
http://bugs.python.org/issue12754  opened by rhettinger

#12757: undefined name in doctest.py
http://bugs.python.org/issue12757  opened by georg.brandl

#12758: time.time() returns local time instead of UTC
http://bugs.python.org/issue12758  opened by maksbotan

#12759: "(?P=)" input for Tools/scripts/redemo.py throw an exception
http://bugs.python.org/issue12759  opened by fredeom

#12760: Add create mode to open()
http://bugs.python.org/issue12760  opened by David.Townshend

#12761: Typo in Doc/license.rst
http://bugs.python.org/issue12761  opened by jwilk

#12762: EnvironmentError_str contributes to unportable code
http://bugs.python.org/issue12762  opened by jwilk

#12764: segfault in ctypes.Struct with bad _fields_
http://bugs.python.org/issue12764  opened by amaury.forgeotdarc

#12765: test_packaging failure under Snow Leopard
http://bugs.python.org/issue12765  opened by pitrou

#12767: document threading.Condition.notify
http://bugs.python.org/issue12767  opened by eli.bendersky

#12768: docstrings for the threading module
http://bugs.python.org/issue12768  opened by eli.bendersky

#12769: String with NUL characters truncated by ctypes when assigning 
http://bugs.python.org/issue12769  opened by Rafal.Dowgird

#12771: 2to3 -d adds extra whitespace
http://bugs.python.org/issue12771  opened by VPeric

#12772: fractional day attribute in datetime class
http://bugs.python.org/issue12772  opened by Miguel.de.Val.Borro

#12774: Warning -- multiprocessing.process._dangling was modified by t
http://bugs.python.org/issue12774  opened by ned.deily

#12775: immense performance problems related to the garbage collector
http://bugs.python.org/issue12775  opened by dsvensson

#12776: argparse: type conversion function should be called only once
http://bugs.python.org/issue12776  opened by arnau

#12777: Inconsistent use of VOLUME_NAME_* with GetFinalPathNameByHandl
http://bugs.python.org/issue12777  opened by pitrou

#12778: JSON-serializing a large container takes too much memory
http://bugs.python.org/issue12778  opened by pitrou

#12779: Update packaging documentation
http://bugs.python.org/issue12779  opened by eric.araujo

#12780: Clean up tests for pyc/pyo in __file__
http://bugs.python.org/issue12780  opened by eric.araujo

#12781: Mention SO_REUSEADDR near socket doc examples
http://bugs.python.org/issue12781  opened by sandro.tosi

#12782: Multiple context expressions do not support parentheses for co
http://bugs.python.org/issue12782  opened by Julian

#12783: test_posix failure on FreeBSD 6.4: test_get_and_set_scheduler_
http://bugs.python.org/issue12783  opened by neologix

#12785: list_distinfo_file is wrong
http://bugs.python.org/issue12785  opened by eric.araujo



Most recent 15 issues with no replies (15)
==========================================

#12785: list_distinfo_file is wrong
http://bugs.python.org/issue12785

#12783: test_posix failure on FreeBSD 6.4: test_get_and_set_scheduler_
http://bugs.python.org/issue12783

#12782: Multiple context expressions do not support parentheses for co
http://bugs.python.org/issue12782

#12776: argparse: type conversion function should be called only once
http://bugs.python.org/issue12776

#12772: fractional day attribute in datetime class
http://bugs.python.org/issue12772

#12771: 2to3 -d adds extra whitespace
http://bugs.python.org/issue12771

#12768: docstrings for the threading module
http://bugs.python.org/issue12768

#12759: "(?P=)" input for Tools/scripts/redemo.py throw an exception
http://bugs.python.org/issue12759

#12742: Add support for CESU-8 encoding
http://bugs.python.org/issue12742

#12739: read stuck with multithreading and simultaneous subprocess.Pop
http://bugs.python.org/issue12739

#12736: Request for python casemapping functions to use full not simpl
http://bugs.python.org/issue12736

#12735: request full Unicode collation support in std python library
http://bugs.python.org/issue12735

#12706: timeout sentinel in ftplib and poplib documentation
http://bugs.python.org/issue12706

#12684: profile does not dump stats on exception like cProfile does
http://bugs.python.org/issue12684

#12668: 3.2 What's New: it's integer->string, not the opposite
http://bugs.python.org/issue12668



Most recent 15 issues waiting for review (15)
=============================================

#12785: list_distinfo_file is wrong
http://bugs.python.org/issue12785

#12781: Mention SO_REUSEADDR near socket doc examples
http://bugs.python.org/issue12781

#12780: Clean up tests for pyc/pyo in __file__
http://bugs.python.org/issue12780

#12778: JSON-serializing a large container takes too much memory
http://bugs.python.org/issue12778

#12776: argparse: type conversion function should be called only once
http://bugs.python.org/issue12776

#12764: segfault in ctypes.Struct with bad _fields_
http://bugs.python.org/issue12764

#12761: Typo in Doc/license.rst
http://bugs.python.org/issue12761

#12760: Add create mode to open()
http://bugs.python.org/issue12760

#12740: Add struct.Struct.nmemb
http://bugs.python.org/issue12740

#12723: Provide an API in tkSimpleDialog for defining custom validatio
http://bugs.python.org/issue12723

#12720: Expose linux extended filesystem attributes
http://bugs.python.org/issue12720

#12708: multiprocessing.Pool is missing a starmap[_async]() method.
http://bugs.python.org/issue12708

#12691: tokenize.untokenize is broken
http://bugs.python.org/issue12691

#12684: profile does not dump stats on exception like cProfile does
http://bugs.python.org/issue12684

#12668: 3.2 What's New: it's integer->string, not the opposite
http://bugs.python.org/issue12668



Top 10 most discussed issues (10)
=================================

#12326: Linux 3: code should avoid using sys.platform == 'linux2'
http://bugs.python.org/issue12326  53 msgs

#10542: Py_UNICODE_NEXT and other macros for surrogates
http://bugs.python.org/issue10542  33 msgs

#12729: Python lib re cannot handle Unicode properly due to	narrow/wid
http://bugs.python.org/issue12729  32 msgs

#12760: Add create mode to open()
http://bugs.python.org/issue12760  13 msgs

#12740: Add struct.Struct.nmemb
http://bugs.python.org/issue12740  12 msgs

#12775: immense performance problems related to the garbage collector
http://bugs.python.org/issue12775  12 msgs

#12749: lib re cannot match non-BMP ranges (all versions, all builds)
http://bugs.python.org/issue12749  11 msgs

#12394: packaging: generate scripts from callable (dotted paths)
http://bugs.python.org/issue12394  10 msgs

#12750: datetime.strftime('%s') should respect tzinfo
http://bugs.python.org/issue12750   9 msgs

#8668: Packaging: add a 'develop' command
http://bugs.python.org/issue8668   8 msgs



Issues closed (25)
==================

#8617: Better document user site-packages in site module doc
http://bugs.python.org/issue8617  closed by eric.araujo

#9173: logger statement not guarded in shutil._make_tarball
http://bugs.python.org/issue9173  closed by eric.araujo

#10745: setup.py install --user option undocumented
http://bugs.python.org/issue10745  closed by eric.araujo

#12204: str.upper converts to title
http://bugs.python.org/issue12204  closed by ezio.melotti

#12256: Link isinstance/issubclass doc to abc module
http://bugs.python.org/issue12256  closed by eric.araujo

#12646: zlib.Decompress.decompress/flush do not raise any exceptions w
http://bugs.python.org/issue12646  closed by nadeem.vawda

#12650: Subprocess leaks fd upon kill()
http://bugs.python.org/issue12650  closed by neologix

#12672: Some problems in documentation extending/newtypes.html
http://bugs.python.org/issue12672  closed by eli.bendersky

#12711: Explain tracker components in devguide
http://bugs.python.org/issue12711  closed by ezio.melotti

#12721: Chaotic use of helper functions in test_shutil for reading and
http://bugs.python.org/issue12721  closed by eric.araujo

#12725: Docs: Odd phrase "floating seconds" in socket.html
http://bugs.python.org/issue12725  closed by ezio.melotti

#12730: Python's casemapping functions are incorrect for non-BMP chars
http://bugs.python.org/issue12730  closed by ezio.melotti

#12732: Can't portably use Unicode in Python identifiers
http://bugs.python.org/issue12732  closed by python-dev

#12744: inefficient pickling of long integers on 64-bit builds
http://bugs.python.org/issue12744  closed by pitrou

#12747: Move devguide into cpython repo
http://bugs.python.org/issue12747  closed by eric.snow

#12748: Problems using IDLE accelerators with OS X Dvorak - Qwerty ???
http://bugs.python.org/issue12748  closed by ned.deily

#12751: Use macros for surrogates in unicodeobject.c
http://bugs.python.org/issue12751  closed by benjamin.peterson

#12752: locale.normalize does not take unicode strings
http://bugs.python.org/issue12752  closed by barry

#12755: Service application crash in python25!PyObject_Malloc
http://bugs.python.org/issue12755  closed by haypo

#12756: datetime.datetime.utcnow should return a UTC timestamp
http://bugs.python.org/issue12756  closed by brett.cannon

#12763: test_posix failure on OpenIndiana
http://bugs.python.org/issue12763  closed by python-dev

#12766: strange interaction between __slots__ and class-level attribut
http://bugs.python.org/issue12766  closed by python-dev

#12770: Email problem on Windows XP SP3 32bits
http://bugs.python.org/issue12770  closed by brian.curtin

#12773: classes should have mutable docstrings
http://bugs.python.org/issue12773  closed by python-dev

#12784: Concatenation of strings returns the wrong string
http://bugs.python.org/issue12784  closed by haypo

From cs at zip.com.au  Sat Aug 20 03:14:19 2011
From: cs at zip.com.au (Cameron Simpson)
Date: Sat, 20 Aug 2011 11:14:19 +1000
Subject: [Python-Dev] cpython (3.2): NUL -> NULL
In-Reply-To: <20110818185121.GA19783@sleipnir.bytereef.org>
References: <20110818185121.GA19783@sleipnir.bytereef.org>
Message-ID: <20110820011419.GA8291@cskk.homeip.net>

On 18Aug2011 20:51, Stefan Krah <stefan at bytereef.org> wrote:
| Antoine Pitrou <solipsis at pitrou.net> wrote:
| > On Thu, 18 Aug 2011 17:49:28 +0200
| > benjamin.peterson <python-checkins at python.org> wrote:
| > > -        PyErr_SetString(PyExc_TypeError, "embedded NUL character");
| > > +        PyErr_SetString(PyExc_TypeError, "embedded NULL character");
| > 
| > Are you sure? IIRC, NUL is the little name of ASCII character 0
| > (while NULL would be the NULL pointer).
| 
| Yes, that's the traditional name. I was surprised that the C99 standard uses
| "null character" in almost all cases. Example:
| 
| "The construction '\0' is commonly used to represent the null character."
| 
| So I think it should be either NUL or "null character" with the lower
| case spelling.

+1 from me, too.
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

I like to keep an open mind, but not so open my brains fall out.
        - New York Times Chairman Arthur Sulzberger

From facundobatista at gmail.com  Sat Aug 20 12:58:13 2011
From: facundobatista at gmail.com (Facundo Batista)
Date: Sat, 20 Aug 2011 07:58:13 -0300
Subject: [Python-Dev] Strange message error in socket.sendto() exception
Message-ID: <CAM09pzQGS9GBg6ZOQg+9DmHgRxicV14kBGd4XGXbG0k4sQHK+A@mail.gmail.com>

Python 3.2 (r32:88445, Mar 25 2011, 19:28:28)
[GCC 4.5.2] on linux2
>>> import socket
>>> s = socket.socket()
>>> print(s.sendto.__doc__)
sendto(data[, flags], address) -> count
...
>>> s.sendto(b'data', ('localhost', 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
socket.error: [Errno 32] Broken pipe

This is ok, I expected this. However, note what happens if I send unicode:

>>> s.sendto('data', ('localhost', 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sendto() takes exactly 3 arguments (2 given)

An error regarding the argument quantity? what?

Furthermore, where this message comes from? I tried to find, but the
only hint I get is that it could come from
"./Modules/_ctypes/_ctypes.c"... are we using ctypes to access socket
methods? it's strange, because "sendto" is defined in socketmodule.c.

Ideas? Thanks!

-- 
.? ? Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From solipsis at pitrou.net  Sat Aug 20 13:08:11 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 20 Aug 2011 13:08:11 +0200
Subject: [Python-Dev] Strange message error in socket.sendto() exception
References: <CAM09pzQGS9GBg6ZOQg+9DmHgRxicV14kBGd4XGXbG0k4sQHK+A@mail.gmail.com>
Message-ID: <20110820130811.76f9118b@pitrou.net>

On Sat, 20 Aug 2011 07:58:13 -0300
Facundo Batista <facundobatista at gmail.com> wrote:
> 
> This is ok, I expected this. However, note what happens if I send unicode:
> 
> >>> s.sendto('data', ('localhost', 3))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: sendto() takes exactly 3 arguments (2 given)
> 
> An error regarding the argument quantity? what?

Here I get (3.2.2, 3.3):

>>> s.sendto('data', ('localhost', 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface




From vinay_sajip at yahoo.co.uk  Sat Aug 20 14:00:55 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Sat, 20 Aug 2011 12:00:55 +0000 (UTC)
Subject: [Python-Dev] Strange message error in socket.sendto() exception
References: <CAM09pzQGS9GBg6ZOQg+9DmHgRxicV14kBGd4XGXbG0k4sQHK+A@mail.gmail.com>
Message-ID: <loom.20110820T135908-769@post.gmane.org>

Facundo Batista <facundobatista <at> gmail.com> writes:


> An error regarding the argument quantity? what?
>
> Ideas? Thanks!
> 

I think this is the same as http://bugs.python.org/issue5421

tl;dr : fixed in recent versions.

Regards,

Vinay Sajip




From p.f.moore at gmail.com  Sat Aug 20 22:52:03 2011
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 20 Aug 2011 21:52:03 +0100
Subject: [Python-Dev] Buildbot failures
Message-ID: <CACac1F8mZXMLtJ07D0AT1ZYbTtronobnsAa+CkoY1ftgLPvXFg@mail.gmail.com>

My buildbot seems to have been failing for a while (I've been away on
holiday) - http://www.python.org/dev/buildbot/buildslaves/moore-windows

The failures seem to generally be in distutils and/or packaging. I see
quite a lot of reds in the waterfall display at the moment, and I
can't see any particular issue with my buildbot, so before I go
digging further, can anyone confirm (or otherwise) if
distutils/packaging is currently generating known failures (and hence,
the alerts can be ignored? (I'd only be looking for
environment-related problems, I'm afraid - I don't have any
distutils/packaging expertise to bring to bear on genuine code
issues...)

Thanks,
Paul.

From merwok at netwok.org  Sun Aug 21 10:17:25 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Sun, 21 Aug 2011 10:17:25 +0200
Subject: [Python-Dev] [Python-checkins] cpython (3.2): #5301: add
 image/vnd.microsoft.icon (.ico) MIME type
In-Reply-To: <E1Qutsa-0001pg-L5@dinsdale.python.org>
References: <E1Qutsa-0001pg-L5@dinsdale.python.org>
Message-ID: <4E50BF15.8020502@netwok.org>

Hi,

However small the commit was, I think it still was a feature request, so
I wonder if it was appropriate for the stable versions.

Regards

From merwok at netwok.org  Sun Aug 21 10:11:32 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Sun, 21 Aug 2011 10:11:32 +0200
Subject: [Python-Dev] Buildbot failures
In-Reply-To: <CACac1F8mZXMLtJ07D0AT1ZYbTtronobnsAa+CkoY1ftgLPvXFg@mail.gmail.com>
References: <CACac1F8mZXMLtJ07D0AT1ZYbTtronobnsAa+CkoY1ftgLPvXFg@mail.gmail.com>
Message-ID: <4E50BDB4.6060809@netwok.org>

Le 20/08/2011 22:52, Paul Moore a ?crit :
> My buildbot seems to have been failing for a while (I've been away on
> holiday) - http://www.python.org/dev/buildbot/buildslaves/moore-windows
> 
> The failures seem to generally be in distutils and/or packaging. I see
> quite a lot of reds in the waterfall display at the moment, and I
> can't see any particular issue with my buildbot, so before I go
> digging further, can anyone confirm (or otherwise) if
> distutils/packaging is currently generating known failures (and hence,
> the alerts can be ignored?

Yes: http://bugs.python.org/issue12678

Regards

From sandro.tosi at gmail.com  Sun Aug 21 11:09:35 2011
From: sandro.tosi at gmail.com (Sandro Tosi)
Date: Sun, 21 Aug 2011 11:09:35 +0200
Subject: [Python-Dev] [Python-checkins] cpython (3.2): #5301: add
 image/vnd.microsoft.icon (.ico) MIME type
In-Reply-To: <4E50BF15.8020502@netwok.org>
References: <E1Qutsa-0001pg-L5@dinsdale.python.org>
	<4E50BF15.8020502@netwok.org>
Message-ID: <CAPdtAj2SJ-8zt4mptTjzLkhN0a7jPQLpuhM7=9ZPSyL5vLD0vQ@mail.gmail.com>

Hi,

On Sun, Aug 21, 2011 at 10:17, ?ric Araujo <merwok at netwok.org> wrote:
> Hi,
>
> However small the commit was, I think it still was a feature request, so
> I wonder if it was appropriate for the stable versions.

I can see your point: the reason I committed it also on the stable
branches is that .ico are already out there (since a long time) and
they were currently not recognized. I can call it a bug.

Anyhow, if it was not appropriate, just tell me and I'll revert on 2.7
and 3.2 . Thanks for your input!

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

From tjreedy at udel.edu  Sun Aug 21 21:12:24 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 21 Aug 2011 15:12:24 -0400
Subject: [Python-Dev] [Python-checkins] cpython (3.2): #5301: add
 image/vnd.microsoft.icon (.ico) MIME type
In-Reply-To: <CAPdtAj2SJ-8zt4mptTjzLkhN0a7jPQLpuhM7=9ZPSyL5vLD0vQ@mail.gmail.com>
References: <E1Qutsa-0001pg-L5@dinsdale.python.org>
	<4E50BF15.8020502@netwok.org>
	<CAPdtAj2SJ-8zt4mptTjzLkhN0a7jPQLpuhM7=9ZPSyL5vLD0vQ@mail.gmail.com>
Message-ID: <j2rlbj$gea$1@dough.gmane.org>

On 8/21/2011 5:09 AM, Sandro Tosi wrote:

>> However small the commit was, I think it still was a feature request, so
>> I wonder if it was appropriate for the stable versions.

Good catch.

> I can see your point: the reason I committed it also on the stable
> branches is that .ico are already out there (since a long time) and
> they were currently not recognized. I can call it a bug.

But it is not (a behavior bug). Every feature request 'fixes' what its 
proposer considers to be a design bug or something.

> Anyhow, if it was not appropriate, just tell me and I'll revert on 2.7
> and 3.2 . Thanks for your input!

It is a new feature for the same reason 
http://bugs.python.org/issue10730 was. If that had not been added for 
3.2.0 (during the beta period, with Georg's permission), it would have 
waited for 3.3.s

Our intent is that the initial CPython x.y.0 release 'freeze' the 
definition of Python x.y. Code that were to use the new feature in 3.2.3 
would not work in 3.2.0,.1,.2, making 3.2.3 define a slight variant. 
People who want the latest version of an stdlib module should upgrade to 
the latest release or even download from the repository. For mimetypes, 
the database can be explicitly augmented in the code and then the code 
would work in all 2.7 or 3.2 releases.

-- 
Terry Jan Reedy


From scott+python-dev at scottdial.com  Mon Aug 22 02:10:47 2011
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Sun, 21 Aug 2011 20:10:47 -0400
Subject: [Python-Dev] [Python-checkins] cpython (3.2): #5301: add
 image/vnd.microsoft.icon (.ico) MIME type
In-Reply-To: <j2rlbj$gea$1@dough.gmane.org>
References: <E1Qutsa-0001pg-L5@dinsdale.python.org>
	<4E50BF15.8020502@netwok.org>
	<CAPdtAj2SJ-8zt4mptTjzLkhN0a7jPQLpuhM7=9ZPSyL5vLD0vQ@mail.gmail.com>
	<j2rlbj$gea$1@dough.gmane.org>
Message-ID: <4E519E87.3000205@scottdial.com>

On 8/21/2011 3:12 PM, Terry Reedy wrote:
> On 8/21/2011 5:09 AM, Sandro Tosi wrote:
>> I can see your point: the reason I committed it also on the stable
>> branches is that .ico are already out there (since a long time) and
>> they were currently not recognized. I can call it a bug.
> 
> But it is not (a behavior bug). Every feature request 'fixes' what its
> proposer considers to be a design bug or something.

What's the feature added? That's a semantic game.

> 
>> Anyhow, if it was not appropriate, just tell me and I'll revert on 2.7
>> and 3.2 . Thanks for your input!
> 
> It is a new feature for the same reason
> http://bugs.python.org/issue10730 was. If that had not been added for
> 3.2.0 (during the beta period, with Georg's permission), it would have
> waited for 3.3.s

ISTM, that Issue #10730 was more contentious because it is *not* an
IANA-assigned mime-type, whereas image/vnd.microsoft.icon is and has
been since 2003. Whereas image/svg+xml didn't get approved until earlier
this month, AFAICT.

> Our intent is that the initial CPython x.y.0 release 'freeze' the
> definition of Python x.y. Code that were to use the new feature in 3.2.3
> would not work in 3.2.0,.1,.2, making 3.2.3 define a slight variant.
> People who want the latest version of an stdlib module should upgrade to
> the latest release or even download from the repository. For mimetypes,
> the database can be explicitly augmented in the code and then the code
> would work in all 2.7 or 3.2 releases.

Doesn't that weaken your own argument that changing the list in
Lib/mimetypes.py doesn't violate the freeze? Considering that the
mime-types are automatically read from a variety of out-of-tree
locations? It's already the case that the list of mime-types recognized
by a given CPython x.y.z is inconsistent from platform-to-platform and
more importantly installation-to-installation (since /etc/mime.types
could be customized by a given distribution or modified by a local
administrator, and on Windows, the mime-types are scrapped from the
registry).

On any reasonable system that I can get access to at the moment (Gentoo,
OS X, Win7), '.ico' is already associated with 'image/x-icon' via either
scrapping the /etc/mime.types or the registry. I think this issue
probably originates with CPython 2.6 on Windows, where there was no help
from the registry or external mime.types file.

Nevertheless, I am +0 for adding entries from the IANA list into stable
versions because I don't see how they could ever harm anyone. Any robust
program would need to be responsible and populate the mimetypes itself,
if it depended on them, otherwise, all bets are off about what types_map
contains from run-to-run of a program (because /etc/mime.types might
have changed).

-- 
Scott Dial
scott at scottdial.com

From stephen at xemacs.org  Mon Aug 22 03:51:42 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 22 Aug 2011 10:51:42 +0900
Subject: [Python-Dev] [Python-checkins] cpython (3.2): #5301: add
 image/vnd.microsoft.icon (.ico) MIME type
In-Reply-To: <4E519E87.3000205@scottdial.com>
References: <E1Qutsa-0001pg-L5@dinsdale.python.org>
	<4E50BF15.8020502@netwok.org>
	<CAPdtAj2SJ-8zt4mptTjzLkhN0a7jPQLpuhM7=9ZPSyL5vLD0vQ@mail.gmail.com>
	<j2rlbj$gea$1@dough.gmane.org> <4E519E87.3000205@scottdial.com>
Message-ID: <871uwekyc1.fsf@uwakimon.sk.tsukuba.ac.jp>

Scott Dial writes:
 > On 8/21/2011 3:12 PM, Terry Reedy wrote:
 > > On 8/21/2011 5:09 AM, Sandro Tosi wrote:
 > >> I can see your point: the reason I committed it also on the stable
 > >> branches is that .ico are already out there (since a long time) and
 > >> they were currently not recognized. I can call it a bug.
 > > 
 > > But it is not (a behavior bug). Every feature request 'fixes' what its
 > > proposer considers to be a design bug or something.
 > 
 > What's the feature added? That's a semantic game.

There's really only one way to fairly objectively resolve this:
"Behavior that varies from documented behavior is a bug."  Everything
else is a feature request, including requests for addition of as-yet
undocumented behavior that is quite exactly analogous to existing
behavior.

Of course you can also play games with the definition of
"documentation".  If the BDFL says that his Original Intent was that
behavior X be supported, I suppose that's Sufficiently Well-Documented
(and due to the time machine Always Has Been).  Or there may be a
blanket statement that "we will conform to the version of external
standard Y that is current / draft / whatever when x.y.0 is released,"
made by the maintainer of the module on python-dev in 1999.

What does the documentation say?

On a separate issue:

 > ISTM, that Issue #10730 was more contentious because it is *not* an
 > IANA-assigned mime-type, whereas image/vnd.microsoft.icon is and has
 > been since 2003.

Is it?  Maybe Microsoft has cleaned up their act, but my experience
with their IANA assignments is that there's no reliable behavior
documented by them -- the registration documents point at internal
Microsoft documents that change over time.  For example, they added
the EURO SYMBOL to several registered MIME charsets without updating
the IANA registrations.  I don't consider a registration that points
to a internal corporate document with variable content to be a
suitable specification for open source implementation, even if the
IANA can be brib^H^H^H^Hfooled into accepting a registration.

 > Nevertheless, I am +0 for adding entries from the IANA list into stable
 > versions because I don't see how they could ever harm anyone.

Features that you can't see how they could ever harm anyone are the
cracker's favorite back door.  Entries in the IANA list enable
arbitrarily complex behavior.

 > Any robust program would need to be responsible and populate the
 > mimetypes itself, if it depended on them, otherwise, all bets are
 > off about what types_map contains from run-to-run of a program
 > (because /etc/mime.types might have changed).

That's precisely why Python should not change this, flipped around.  A
site that carefully controls what's in mime.types should not have to
worry about Python changing types_map behind its back in a patch
release.

The right thing to do is to provide a module that allows the user to
request update of the databases automatically, and document how to do
it by hand for users who are differently abled net-wise.

From tjreedy at udel.edu  Mon Aug 22 04:01:40 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 21 Aug 2011 22:01:40 -0400
Subject: [Python-Dev] [Python-checkins] cpython (3.2): #5301: add
 image/vnd.microsoft.icon (.ico) MIME type
In-Reply-To: <4E519E87.3000205@scottdial.com>
References: <E1Qutsa-0001pg-L5@dinsdale.python.org>
	<4E50BF15.8020502@netwok.org>
	<CAPdtAj2SJ-8zt4mptTjzLkhN0a7jPQLpuhM7=9ZPSyL5vLD0vQ@mail.gmail.com>
	<j2rlbj$gea$1@dough.gmane.org> <4E519E87.3000205@scottdial.com>
Message-ID: <j2sdb0$mer$1@dough.gmane.org>

On 8/21/2011 8:10 PM, Scott Dial wrote:
> On 8/21/2011 3:12 PM, Terry Reedy wrote:

>> But it is not (a behavior bug). Every feature request 'fixes' what its
>> proposer considers to be a design bug or something.
>
> What's the feature added? That's a semantic game.

Please. It is an operational decision. I personally would be ok with 
doing away with bugfix-only releases and just releasing a new version 
with all patches every 6 months. It certainly would make issue 
management easier. But most people don't want such rapid change, even to 
the point of resisting fixes to design errors of 20 years ago. On the 
other hand, most people want their personal fix/feature included right 
away, in the next release. But if we do not include everything every 
release, we make decisions as what to include or not.

>> It is a new feature for the same reason
>> http://bugs.python.org/issue10730 was. If that had not been added for
>> 3.2.0 (during the beta period, with Georg's permission), it would have
>> waited for 3.3.s

In http://bugs.python.org/msg124332 from that issue, David Murray refers 
to "the policy stated in mimetypes". I could not find a policy 
explicitly stated in the doc, not in a quick review of the code. But I 
believe what he meant is "Include the most commonly used subset of 
registered extensions. Add more as requested with every x.y version." If 
it is really not in the doc, I wish it, or an agreed-on revision, were 
added. "Add more as requested with every x.y.x release." is the 
alternative that Sandro seems to have followed.

> ISTM, that Issue #10730 was more contentious because it is *not* an
> IANA-assigned mime-type, whereas image/vnd.microsoft.icon is and has
> been since 2003. Whereas image/svg+xml didn't get approved until earlier
> this month, AFAICT.

If we intended to include all registered mimetypes and this happened to 
be missing, that would be a bug. But there are scads of mimetypes, 
especially vender-specific vnd types, that we do not include. Many 
predate 2003 and are probably obsolete, and hence  well not included. 
There might be others that are used generally.

-- 
Terry Jan Reedy


From barry at python.org  Mon Aug 22 17:44:29 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 22 Aug 2011 11:44:29 -0400
Subject: [Python-Dev] Call for participants: Windows Python security experts
Message-ID: <20110822114429.4491b9f9@resist.wooz.org>

Hi folks,

The Python security team is a small group of core Python developers who triage
and respond to vulnerability reports sent to security at python.org.  We get all
kinds of reports, for which we try to provide guidance and feedback, review
patches, etc.  Python being as secure as it is, traffic is fairly low. :)

We have a dearth of Windows expertise on the team though, so I am putting out
a call for participants.  If you are an expert on Python for Windows operating
systems and can make judgments about the validity of security reports for the
platform, please contact us.  Core developers are preferred, but motivation
and available time is paramount.  You're welcome to apply even if you're not a
Windows expert, if you have the time and ability to help out in general.

If you're interested, you can reach the team at security at python.org.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110822/7271c22f/attachment.pgp>

From torsten.becker at gmail.com  Mon Aug 22 20:58:51 2011
From: torsten.becker at gmail.com (Torsten Becker)
Date: Mon, 22 Aug 2011 14:58:51 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
Message-ID: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>

Hello all,

I have implemented an initial version of PEP 393 -- "Flexible String
Representation" as part of my Google Summer of Code project.  My patch
is hosted as a repository on bitbucket [1] and I created a related
issue on the bug tracker [2].  I posted documentation for the current
state of the development in the wiki [3].

Current tests show a potential reduction of memory by about 20% and
CPU by 50% for a join micro benchmark.  Starting a new interpreter
still causes 3244 calls to create compatibility Py_UNICODE
representations, 263 strings are created using the old API while 62719
are created using the new API.  More measurements are on the wiki page
[3].

If there is interest, I would like to continue working on the patch
with the goal of getting it into Python 3.3.  Any and all feedback is
welcome.


Regards,
Torsten

[1]: http://www.python.org/dev/peps/pep-0393
[2]: http://bugs.python.org/issue12819
[3]: http://wiki.python.org/moin/SummerOfCode/2011/PEP393

From v+python at g.nevcal.com  Mon Aug 22 22:24:44 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 22 Aug 2011 13:24:44 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
Message-ID: <4E52BB0C.3010208@g.nevcal.com>

On 8/22/2011 11:58 AM, Torsten Becker wrote:
> Hello all,
>
> I have implemented an initial version of PEP 393 -- "Flexible String
> Representation" as part of my Google Summer of Code project.  My patch
> is hosted as a repository on bitbucket [1] and I created a related
> issue on the bug tracker [2].  I posted documentation for the current
> state of the development in the wiki [3].
>
> Current tests show a potential reduction of memory by about 20% and
> CPU by 50% for a join micro benchmark.  Starting a new interpreter
> still causes 3244 calls to create compatibility Py_UNICODE
> representations, 263 strings are created using the old API while 62719
> are created using the new API.  More measurements are on the wiki page
> [3].
>
> If there is interest, I would like to continue working on the patch
> with the goal of getting it into Python 3.3.  Any and all feedback is
> welcome.

Sounds like great progress!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110822/981f2933/attachment.html>

From solipsis at pitrou.net  Tue Aug 23 00:14:40 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 23 Aug 2011 00:14:40 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
Message-ID: <20110823001440.433a0f1f@pitrou.net>


Hello,

On Mon, 22 Aug 2011 14:58:51 -0400
Torsten Becker <torsten.becker at gmail.com> wrote:
> 
> I have implemented an initial version of PEP 393 -- "Flexible String
> Representation" as part of my Google Summer of Code project.  My patch
> is hosted as a repository on bitbucket [1] and I created a related
> issue on the bug tracker [2].  I posted documentation for the current
> state of the development in the wiki [3].

A couple of minor comments:

- ?The UTF-8 decoding fast path for ASCII only characters was removed
  and replaced with a memcpy if the entire string is ASCII.? 
  The fast path would still be useful for mostly-ASCII strings, which
  are extremely common (unless UTF-8 has become a no-op?).

- You could trim the debug results from the benchmark results, this may
  make them more readable.

- You could try to run stringbench, which can be found at
  http://svn.python.org/projects/sandbox/trunk/stringbench (*)
  and there's iobench (the text mode benchmarks) in the Tools/iobench
  directory.

(*) (yes, apparently we forgot to convert this one to Mercurial)

Regards

Antoine.



From solipsis at pitrou.net  Tue Aug 23 00:15:43 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 23 Aug 2011 00:15:43 +0200
Subject: [Python-Dev] cpython (3.2): #9200: The str.is* methods now work
 with strings that contain non-BMP
References: <E1QvYLU-00044u-9q@dinsdale.python.org>
Message-ID: <20110823001543.7064e8e3@pitrou.net>

On Mon, 22 Aug 2011 19:31:32 +0200
ezio.melotti <python-checkins at python.org> wrote:
> http://hg.python.org/cpython/rev/06b30c5bcc3d
> changeset:   72035:06b30c5bcc3d
> branch:      3.2
> parent:      72026:c8e73a89150e
> user:        Ezio Melotti <ezio.melotti at gmail.com>
> date:        Mon Aug 22 14:08:38 2011 +0300
> summary:
>   #9200: The str.is* methods now work with strings that contain non-BMP characters even in narrow Unicode builds.

That's a very cool improvement!

cheers

Antoine.



From sandro.tosi at gmail.com  Tue Aug 23 01:09:28 2011
From: sandro.tosi at gmail.com (Sandro Tosi)
Date: Tue, 23 Aug 2011 01:09:28 +0200
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <4E4AF610.5040303@simplistix.co.uk>
References: <4E4AF610.5040303@simplistix.co.uk>
Message-ID: <CAPdtAj3wv2WCePdYM3qRcbRvLfzhAp2G1JpvRjd6-ttw2d1Q2A@mail.gmail.com>

Hi all,

> Any chance the version of sphinx used to generate the docs on
> docs.python.org could be updated?

I'd like to discuss this aspect, in particular for the implication it
has on http://bugs.python.org/issue12409 .

Personally, I do think it has a value to have the same set of tools to
build the Python documentation of the currently active branches.
Currently, only 2.7 is different, since it still fetches (from
svn.python.org... can we fix this too? suggestions welcome!) sphinx
0.6.7 while 3.2/3.3 uses 1.0.7.

If you're worried about the time needed to convert the actual 2.7 doc
to new sphinx format and all the related changes, I volunteer to do
the job (and/or collaborate with whom is already on it), but what I
want to understand if it's an acceptable change.

I see sphinx more as of an internal, building tool, so freezing it
it's like saying "don't upgrade gcc" or so. Now the delta is just the
C functions definitions and some py-specific roles, but during the
years it will increase. Keeping it small, simplifying the forward-port
of doc patches (not needing to have 2 version between 2.7 and 3.x
f.e.) and having a common set of tools for doc building is worth IMHO.

What do you think about it? and yes Georg, I'd like to hear your opinion too :)

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

From stefan_ml at behnel.de  Tue Aug 23 09:02:54 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 23 Aug 2011 09:02:54 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
Message-ID: <j2vjau$u7l$1@dough.gmane.org>

Torsten Becker, 22.08.2011 20:58:
> I have implemented an initial version of PEP 393 -- "Flexible String
> Representation" as part of my Google Summer of Code project.  My patch
> is hosted as a repository on bitbucket [1] and I created a related
> issue on the bug tracker [2].  I posted documentation for the current
> state of the development in the wiki [3].

Very cool!

I've started fixing up Cython for it.

One thing I noticed: on platforms where wchar_t is signed, the comparison 
to "128U" in the Py_UNICODE_ISSPACE() macro may issue a warning when 
applied to a Py_UNICODE value (which it previously was officially defined 
on). For the sake of portability of existing code, this may be worth a 
work-around.

Personally, I wouldn't really mind getting this warning, given that it's 
better to use Py_UCS4 instead of Py_UNICODE. But it may turn out to be an 
annoyance for users, because their code that does this isn't actually 
broken in the new world.

And one thing that I find unfortunate is that we need a new (unexpected) 
_GET_LENGTH() next to the existing (and obvious) _GET_SIZE(), but I guess 
that's a somewhat small price to pay for backwards compatibility...

Stefan


From martin at v.loewis.de  Tue Aug 23 10:55:40 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 23 Aug 2011 10:55:40 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110823001440.433a0f1f@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net>
Message-ID: <4E536B0C.8050008@v.loewis.de>

> - ?The UTF-8 decoding fast path for ASCII only characters was removed
>   and replaced with a memcpy if the entire string is ASCII.? 
>   The fast path would still be useful for mostly-ASCII strings, which
>   are extremely common (unless UTF-8 has become a no-op?).

Is it really extremely common to have strings that are mostly-ASCII but
not completely ASCII? I would agree that pure ASCII strings are
extremely common.

Regards,
Martin

From stefan_ml at behnel.de  Tue Aug 23 11:32:44 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 23 Aug 2011 11:32:44 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E536B0C.8050008@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>
Message-ID: <j2vs3s$j16$1@dough.gmane.org>

"Martin v. L?wis", 23.08.2011 10:55:
>> - ?The UTF-8 decoding fast path for ASCII only characters was removed
>>    and replaced with a memcpy if the entire string is ASCII.?
>>    The fast path would still be useful for mostly-ASCII strings, which
>>    are extremely common (unless UTF-8 has become a no-op?).
>
> Is it really extremely common to have strings that are mostly-ASCII but
> not completely ASCII?

Maybe not as "extremely common" as pure ASCII strings, but at least for 
western European languages, "mostly ASCII" strings are very common indeed.

Stefan


From python-dev at masklinn.net  Tue Aug 23 11:46:12 2011
From: python-dev at masklinn.net (Xavier Morel)
Date: Tue, 23 Aug 2011 11:46:12 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E536B0C.8050008@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
Message-ID: <A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>

On 2011-08-23, at 10:55 , Martin v. L?wis wrote:
>> - ?The UTF-8 decoding fast path for ASCII only characters was removed
>>  and replaced with a memcpy if the entire string is ASCII.? 
>>  The fast path would still be useful for mostly-ASCII strings, which
>>  are extremely common (unless UTF-8 has become a no-op?).
> 
> Is it really extremely common to have strings that are mostly-ASCII but
> not completely ASCII? I would agree that pure ASCII strings are
> extremely common.
Mostly ascii is pretty common for western-european languages (French, for
instance, is probably 90 to 95% ascii). It's also a risk in english, when
the writer "correctly" spells foreign words (r?sum? and the like).

From martin at v.loewis.de  Tue Aug 23 12:20:28 2011
From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 23 Aug 2011 12:20:28 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
Message-ID: <4E537EEC.1070602@v.loewis.de>

Am 23.08.2011 11:46, schrieb Xavier Morel:
> On 2011-08-23, at 10:55 , Martin v. L?wis wrote:
>>> - ?The UTF-8 decoding fast path for ASCII only characters was removed
>>>  and replaced with a memcpy if the entire string is ASCII.? 
>>>  The fast path would still be useful for mostly-ASCII strings, which
>>>  are extremely common (unless UTF-8 has become a no-op?).
>>
>> Is it really extremely common to have strings that are mostly-ASCII but
>> not completely ASCII? I would agree that pure ASCII strings are
>> extremely common.
> Mostly ascii is pretty common for western-european languages (French, for
> instance, is probably 90 to 95% ascii). It's also a risk in english, when
> the writer "correctly" spells foreign words (r?sum? and the like).

I know - I still question whether it is "extremely common" (so much as
to justify a special case). I.e. on what application with what dataset
would you gain what speedup, at the expense of what amount of extra
lines, and potential slow-down for other datasets?

For the record, the optimization in question is the one where it masks
a long word with 0x80808080L, to see whether it is completely
ASCII, and then copies four characters in an unrolled fashion. It stops
doing so when it sees a non-ASCII character, and returns to that mode
when it gets to the next aligned memory address that stores only ASCII
characters.

In the PEP 393 approach, if the string has a two-byte representation,
each character needs to widened to two bytes, and likewise for four
bytes. So three separate copies of the unrolled loop would be needed,
one for each target size.

Regards,
Martin


From solipsis at pitrou.net  Tue Aug 23 13:39:02 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 23 Aug 2011 13:39:02 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E537EEC.1070602@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
Message-ID: <1314099542.3485.10.camel@localhost.localdomain>


> >> Is it really extremely common to have strings that are mostly-ASCII but
> >> not completely ASCII? I would agree that pure ASCII strings are
> >> extremely common.
> > Mostly ascii is pretty common for western-european languages (French, for
> > instance, is probably 90 to 95% ascii). It's also a risk in english, when
> > the writer "correctly" spells foreign words (r?sum? and the like).
> 
> I know - I still question whether it is "extremely common" (so much as
> to justify a special case).

Well, it's:
- all natural languages based on a variant of the latin alphabet
- but also, XML, JSON, HTML documents...
- and log files...
- in short, any kind of parsable format which is structurally ASCII but
and can contain arbitrary unicode

So I would say *most* unicode data out there is mostly-ASCII, even when
it has Japanese characters in it. The rationale is that most unicode
data processed by computers is structured.

This optimization was done when trying to improve the speed of text I/O.

> In the PEP 393 approach, if the string has a two-byte representation,
> each character needs to widened to two bytes, and likewise for four
> bytes. So three separate copies of the unrolled loop would be needed,
> one for each target size.

Do you have three copies of the UTF-8 decoder already, or do you a use a
stringlib-like approach?

Regards

Antoine.



From martin at v.loewis.de  Tue Aug 23 13:51:58 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 23 Aug 2011 13:51:58 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <1314099542.3485.10.camel@localhost.localdomain>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
Message-ID: <4E53945E.1050102@v.loewis.de>

> This optimization was done when trying to improve the speed of text I/O.

So what speedup did it achieve, for the kind of data you talked about?

> Do you have three copies of the UTF-8 decoder already, or do you a use a
> stringlib-like approach?

It's a single implementation - see for yourself.

Regards,
Martin

From stefan_ml at behnel.de  Tue Aug 23 14:14:39 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 23 Aug 2011 14:14:39 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
Message-ID: <j305jf$e7d$1@dough.gmane.org>

Torsten Becker, 22.08.2011 20:58:
> I have implemented an initial version of PEP 393 -- "Flexible String
> Representation" as part of my Google Summer of Code project.  My patch
> is hosted as a repository on bitbucket [1] and I created a related
> issue on the bug tracker [2].  I posted documentation for the current
> state of the development in the wiki [3].

One thing that occurred to me regarding the object struct:

typedef struct {
     PyObject_HEAD
     Py_ssize_t length;       /* Number of code points in the string */
     void *str;               /* Canonical, smallest-form Unicode buffer */
     Py_hash_t hash;          /* Hash value; -1 if not set */
     int state;               /* != 0 if interned. In this case the two
                               * references from the dictionary to this
                               * object are *not* counted in ob_refcnt.
                               * See SSTATE_KIND_* for other bits */
     Py_ssize_t utf8_length;  /* Number of bytes in utf8, excluding the
                               * terminating \0. */
     char *utf8;              /* UTF-8 representation (null-terminated) */
     Py_ssize_t wstr_length;  /* Number of code points in wstr, possible
                               * surrogates count as two code points. */
     wchar_t *wstr;           /* wchar_t representation (null-terminated) */
} PyUnicodeObject;


Wouldn't the "normal" approach be to use a union for the str field? I.e.

     union str {
        unsigned char* latin1;
        Py_UCS2* ucs2;
        Py_UCS4* ucs4;
     }

Given that they're all pointers, all fields have the same size, but I find 
it more readable to write

     u.str.latin1

than

     ((const unsigned char*)u.str)

Plus, the three types would be given by the struct, rather than by a 
per-usage cast.

Has this been considered before? Was there a reason to decide against it?

Stefan


From solipsis at pitrou.net  Tue Aug 23 14:15:45 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 23 Aug 2011 14:15:45 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E53945E.1050102@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
Message-ID: <1314101745.3485.18.camel@localhost.localdomain>

Le mardi 23 ao?t 2011 ? 13:51 +0200, "Martin v. L?wis" a ?crit :
> > This optimization was done when trying to improve the speed of text I/O.
> 
> So what speedup did it achieve, for the kind of data you talked about?

Since I don't have the number anymore, I've just saved the contents of
https://linuxfr.org/news/le-noyau-linux-est-disponible-en-version%C2%A030
as a "linuxfr.html" file and then did:

$ ./python -m timeit "with open('linuxfr.html', encoding='utf8') as f: f.read()"
1000 loops, best of 3: 859 usec per loop

After disabling the fast path, I ran the micro-benchmark again:

$ ./python -m timeit "with open('linuxfr.html', encoding='utf8') as f: f.read()"
1000 loops, best of 3: 1.09 msec per loop

so that's a 20% speedup.

> > Do you have three copies of the UTF-8 decoder already, or do you a use a
> > stringlib-like approach?
> 
> It's a single implementation - see for yourself.

So why would you need three separate implementation of the unrolled
loop? You already have a macro named WRITE_FLEXIBLE_OR_WSTR.

Even without taking into account the unrolled loop, I wonder how much
slower UTF-8 decoding becomes with that approach, by the way. Instead of
testing the "kind" variable at each loop iteration, using a
stringlib-like approach may be a better deal IMO.

Of course we would first need to have various benchmark numbers once the
current PEP 393 implementation is complete.

Regards

Antoine.



From martin at v.loewis.de  Tue Aug 23 15:06:25 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 23 Aug 2011 15:06:25 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <1314101745.3485.18.camel@localhost.localdomain>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
Message-ID: <4E53A5D1.2040808@v.loewis.de>

> So why would you need three separate implementation of the unrolled
> loop? You already have a macro named WRITE_FLEXIBLE_OR_WSTR.

Depending on where the speedup comes from in this optimization, it
may well be that the overhead of figuring out where to store the
result eats the gain from the fast test.

> Even without taking into account the unrolled loop, I wonder how much
> slower UTF-8 decoding becomes with that approach, by the way. 

In some cases, tests show that it gets faster, overall, compared to 3.2.
This is probably because strings take less memory, which means less
copying, more cache locality, etc.

Of course, it still may be possible to apply micro-optimizations to
the new implementation.

> Instead of
> testing the "kind" variable at each loop iteration, using a
> stringlib-like approach may be a better deal IMO.

Well, things have to be done in order:
1. the PEP needs to be approved
2. the performance bottlenecks need to be identified
3. optimizations should be applied.

I'm not sure what you mean by "stringlib-like" approach - if you are
talking about templating, I'd rather avoid this for maintainability
reasons, unless significant improvements can be demonstrated. Torsten
had a version that used macros for that, and it was a pain to debug.
So we put correctness and readability first.

Regards,
Martin

From martin at v.loewis.de  Tue Aug 23 15:17:46 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 23 Aug 2011 15:17:46 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j305jf$e7d$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j305jf$e7d$1@dough.gmane.org>
Message-ID: <4E53A87A.1070306@v.loewis.de>

> Has this been considered before? Was there a reason to decide against it?

I think we simply didn't consider it. An early version of the PEP used
the lower bits for the pointer to encode the kind, in which case it even
stopped being a pointer. Modules are not expected to access this
pointer except through the macros, so it may not matter that much.

OTOH, it's certainly not too late to change it.

Regards,
Martin

From solipsis at pitrou.net  Tue Aug 23 15:20:38 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 23 Aug 2011 15:20:38 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E53A5D1.2040808@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
Message-ID: <1314105638.3485.23.camel@localhost.localdomain>


> Well, things have to be done in order:
> 1. the PEP needs to be approved
> 2. the performance bottlenecks need to be identified
> 3. optimizations should be applied.

Sure, but the whole point of the PEP is to improve performance (I am
dumping "memory consumption" in the "performance" bucket). That is, I
suppose it will get approval based on its demonstrated benefits.

> I'm not sure what you mean by "stringlib-like" approach - if you are
> talking about templating, I'd rather avoid this for maintainability
> reasons, unless significant improvements can be demonstrated. Torsten
> had a version that used macros for that, and it was a pain to debug.

The point of templating is precisely to avoid macros, so that the code
is natural to read and write and the compiler gives you the right line
number when it finds an error.

Regards

Antoine.



From victor.stinner at haypocalc.com  Tue Aug 23 15:21:20 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 23 Aug 2011 15:21:20 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E53A5D1.2040808@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
Message-ID: <4E53A950.30005@haypocalc.com>

Le 23/08/2011 15:06, "Martin v. L?wis" a ?crit :
> Well, things have to be done in order:
> 1. the PEP needs to be approved
> 2. the performance bottlenecks need to be identified
> 3. optimizations should be applied.

I would not vote for the PEP if it slows down Python, especially if it's 
much slower. But Torsten says that it speeds up Python, which is 
surprising. I have to do my own benchmarks :-)

Victor

From stefan_ml at behnel.de  Tue Aug 23 16:02:54 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 23 Aug 2011 16:02:54 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E53A87A.1070306@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<j305jf$e7d$1@dough.gmane.org>
	<4E53A87A.1070306@v.loewis.de>
Message-ID: <j30buf$pe6$1@dough.gmane.org>

"Martin v. L?wis", 23.08.2011 15:17:
>> Has this been considered before? Was there a reason to decide against it?
>
> I think we simply didn't consider it. An early version of the PEP used
> the lower bits for the pointer to encode the kind, in which case it even
> stopped being a pointer. Modules are not expected to access this
> pointer except through the macros, so it may not matter that much.

The difference is that you *could* access them directly in a safe way, if 
it was a union.

So, for an efficient character loop, replicated for performance reasons or 
for character range handling reasons or whatever, you could just check the 
string kind and then jump to the loop implementation that handles that 
type, without using any further macros.

Stefan


From ncoghlan at gmail.com  Tue Aug 23 16:05:14 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 24 Aug 2011 00:05:14 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E53A87A.1070306@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j305jf$e7d$1@dough.gmane.org> <4E53A87A.1070306@v.loewis.de>
Message-ID: <CADiSq7c45HK-Mcr=KMY13ONbVeuMSGcx41B-aV=HN1qFRRF_eA@mail.gmail.com>

On Tue, Aug 23, 2011 at 11:17 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Has this been considered before? Was there a reason to decide against it?
>
> I think we simply didn't consider it. An early version of the PEP used
> the lower bits for the pointer to encode the kind, in which case it even
> stopped being a pointer. Modules are not expected to access this
> pointer except through the macros, so it may not matter that much.
>
> OTOH, it's certainly not too late to change it.

It would make the macro implementations a bit clearer, so +1 for the
union approach from me.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From solipsis at pitrou.net  Tue Aug 23 16:08:20 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 23 Aug 2011 16:08:20 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j305jf$e7d$1@dough.gmane.org> <4E53A87A.1070306@v.loewis.de>
	<j30buf$pe6$1@dough.gmane.org>
Message-ID: <20110823160820.08754ffe@pitrou.net>

On Tue, 23 Aug 2011 16:02:54 +0200
Stefan Behnel <stefan_ml at behnel.de> wrote:
> "Martin v. L?wis", 23.08.2011 15:17:
> >> Has this been considered before? Was there a reason to decide against it?
> >
> > I think we simply didn't consider it. An early version of the PEP used
> > the lower bits for the pointer to encode the kind, in which case it even
> > stopped being a pointer. Modules are not expected to access this
> > pointer except through the macros, so it may not matter that much.
> 
> The difference is that you *could* access them directly in a safe way, if 
> it was a union.
> 
> So, for an efficient character loop, replicated for performance reasons or 
> for character range handling reasons or whatever, you could just check the 
> string kind and then jump to the loop implementation that handles that 
> type, without using any further macros.

Macros are useful to shield the abstraction from the implementation. If
you access the members directly, and the unicode object is represented
differently in some future version of Python (say e.g. with tagged
pointers), your code doesn't compile anymore.

Regards

Antoine.



From ncoghlan at gmail.com  Tue Aug 23 16:13:11 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 24 Aug 2011 00:13:11 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E53A950.30005@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
Message-ID: <CADiSq7enu9viwdT12SffxpVafgMCtcxqo+EVWUe3Wo+ByEWkbg@mail.gmail.com>

On Tue, Aug 23, 2011 at 11:21 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Le 23/08/2011 15:06, "Martin v. L?wis" a ?crit :
>>
>> Well, things have to be done in order:
>> 1. the PEP needs to be approved
>> 2. the performance bottlenecks need to be identified
>> 3. optimizations should be applied.
>
> I would not vote for the PEP if it slows down Python, especially if it's
> much slower. But Torsten says that it speeds up Python, which is surprising.
> I have to do my own benchmarks :-)

As Martin noted, cache misses hurt performance so much on modern
processors that making things use less memory overall can actually be
a speed optimisation as well. Guessing where the remaining bottlenecks
are is unlikely to be effective - profiling of the preliminary
implementation will be needed.

However, the idea that reducing the size of pure ASCII strings (which
include all the identifiers in most code) by a factor of 2 or 4 (or
so) results in a net speed increase definitely sounds plausible to me,
even for non-string processing code.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From stefan_ml at behnel.de  Tue Aug 23 17:18:13 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 23 Aug 2011 17:18:13 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110823160820.08754ffe@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<j305jf$e7d$1@dough.gmane.org>
	<4E53A87A.1070306@v.loewis.de>	<j30buf$pe6$1@dough.gmane.org>
	<20110823160820.08754ffe@pitrou.net>
Message-ID: <j30gbl$ohi$1@dough.gmane.org>

Antoine Pitrou, 23.08.2011 16:08:
> On Tue, 23 Aug 2011 16:02:54 +0200
> Stefan Behnel wrote:
>> "Martin v. L?wis", 23.08.2011 15:17:
>>>> Has this been considered before? Was there a reason to decide against it?
>>>
>>> I think we simply didn't consider it. An early version of the PEP used
>>> the lower bits for the pointer to encode the kind, in which case it even
>>> stopped being a pointer. Modules are not expected to access this
>>> pointer except through the macros, so it may not matter that much.
>>
>> The difference is that you *could* access them directly in a safe way, if
>> it was a union.
>>
>> So, for an efficient character loop, replicated for performance reasons or
>> for character range handling reasons or whatever, you could just check the
>> string kind and then jump to the loop implementation that handles that
>> type, without using any further macros.
>
> Macros are useful to shield the abstraction from the implementation. If
> you access the members directly, and the unicode object is represented
> differently in some future version of Python (say e.g. with tagged
> pointers), your code doesn't compile anymore.

Even with tagged pointers, you could just provide a macro that unpacks the 
pointer to the buffer for a given string kind. I don't think there's much 
more to be done to keep up the abstraction. I don't see a reason to prevent 
users from accessing the memory buffer directly, especially not by 
(accidental, as I understand it) obfuscation through a void*.

Stefan


From martin at v.loewis.de  Tue Aug 23 18:12:32 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 23 Aug 2011 18:12:32 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j30gbl$ohi$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<j305jf$e7d$1@dough.gmane.org>	<4E53A87A.1070306@v.loewis.de>	<j30buf$pe6$1@dough.gmane.org>	<20110823160820.08754ffe@pitrou.net>
	<j30gbl$ohi$1@dough.gmane.org>
Message-ID: <4E53D170.1030404@v.loewis.de>

> Even with tagged pointers, you could just provide a macro that unpacks
> the pointer to the buffer for a given string kind.

These macros are indeed available.

> I don't think there's
> much more to be done to keep up the abstraction. I don't see a reason to
> prevent users from accessing the memory buffer directly, especially not
> by (accidental, as I understand it) obfuscation through a void*.

It's not about preventing them from accessing the representation. It's
an "internal public" structure just as all other object layouts (i.e.
feel free to use them, but expect them to change with the next release).

However, I still think that people rarely will:
- most code treats strings as opaque, just as any other PyObject*
- code that is aware of strings typically wants them in an encoded
  form, often UTF-8, or whatever the underlying C library expects.
- code that does need to look at individual characters should be fine
  with the accessor macros.

That said, I can readily believe that Cython would have a use for direct
access to the structure. I just wouldn't want people to rewrite their
code in four versions (three for the different 3.3 representations,
plus one for 3.2 and earlier).

Regards,
Martin

From nir at winpdb.org  Tue Aug 23 20:02:25 2011
From: nir at winpdb.org (Nir Aides)
Date: Tue, 23 Aug 2011 21:02:25 +0300
Subject: [Python-Dev] issue 6721 "Locks in python standard library should be
	sanitized on fork"
Message-ID: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>

Hi all,

Please consider this invitation to stick your head into an interesting
problem:
http://bugs.python.org/issue6721

Nir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110823/ad1f62a7/attachment.html>

From solipsis at pitrou.net  Tue Aug 23 20:20:04 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 23 Aug 2011 20:20:04 +0200
Subject: [Python-Dev] FileSystemError or FilesystemError?
Message-ID: <20110823202004.0bb63490@pitrou.net>


Hello,

When reviewing the PEP 3151 implementation (*), Ezio commented that
"FileSystemError" looks a bit strange and that "FilesystemError" would
be a better spelling. What is your opinion?

(*) http://bugs.python.org/issue12555

Thank you

Antoine.



From sandro.tosi at gmail.com  Tue Aug 23 20:30:50 2011
From: sandro.tosi at gmail.com (Sandro Tosi)
Date: Tue, 23 Aug 2011 20:30:50 +0200
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <20110823202004.0bb63490@pitrou.net>
References: <20110823202004.0bb63490@pitrou.net>
Message-ID: <CAPdtAj1iLWVWG5Adcbxfbr0mmxpUu26O2UkPXRksyiVvpT0gRw@mail.gmail.com>

On Tue, Aug 23, 2011 at 20:20, Antoine Pitrou <solipsis at pitrou.net> wrote:
> When reviewing the PEP 3151 implementation (*), Ezio commented that
> "FileSystemError" looks a bit strange and that "FilesystemError" would
> be a better spelling. What is your opinion?

FilesystemError.

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi

From rosslagerwall at gmail.com  Tue Aug 23 20:39:36 2011
From: rosslagerwall at gmail.com (Ross Lagerwall)
Date: Tue, 23 Aug 2011 20:39:36 +0200
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <20110823202004.0bb63490@pitrou.net>
References: <20110823202004.0bb63490@pitrou.net>
Message-ID: <1314124776.1538.2.camel@hobo>

> When reviewing the PEP 3151 implementation (*), Ezio commented that
> "FileSystemError" looks a bit strange and that "FilesystemError" would
> be a better spelling. What is your opinion?

I don't think it really matters since both "file system" and
"filesystem" appear to be in common usage.

I would say +1 to "FileSystemError" -- i.e. take file system as two
words.

Cheers
Ross


From cf.natali at gmail.com  Tue Aug 23 20:43:25 2011
From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Tue, 23 Aug 2011 20:43:25 +0200
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
Message-ID: <CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>

2011/8/23, Nir Aides <nir at winpdb.org>:
> Hi all,

Hello Nir,

> Please consider this invitation to stick your head into an interesting
> problem:
> http://bugs.python.org/issue6721

Just for the record, I'm now in favor of the atfork mechanism. It
won't solve the problem for I/O locks, but it'll at least make room
for a clean and cross-library way to setup atfork handlers. I just
skimmed over it, but it seemed Gregory's atfork module could be a good
starting point.

cf

From nadeem.vawda at gmail.com  Tue Aug 23 20:43:45 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Tue, 23 Aug 2011 20:43:45 +0200
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <1314124776.1538.2.camel@hobo>
References: <20110823202004.0bb63490@pitrou.net> <1314124776.1538.2.camel@hobo>
Message-ID: <CANF4RMnn1Jrvd5Z877JTH4mxyV0K5RfCBRsTDdWDGB+Ps1h0VQ@mail.gmail.com>

On Tue, Aug 23, 2011 at 8:39 PM, Ross Lagerwall <rosslagerwall at gmail.com> wrote:
>> When reviewing the PEP 3151 implementation (*), Ezio commented that
>> "FileSystemError" looks a bit strange and that "FilesystemError" would
>> be a better spelling. What is your opinion?

I think "FilesystemError" looks nicer, but it's not something I'd lose
sleep over either way.

Cheers,
Nadeem

From brian.curtin at gmail.com  Tue Aug 23 20:46:09 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Tue, 23 Aug 2011 13:46:09 -0500
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <20110823202004.0bb63490@pitrou.net>
References: <20110823202004.0bb63490@pitrou.net>
Message-ID: <CAD+XWwrkdjD9T0Pa62YAfJTTmCkqiPOE0XLqkjGiW6jg_hAChQ@mail.gmail.com>

On Tue, Aug 23, 2011 at 13:20, Antoine Pitrou <solipsis at pitrou.net> wrote:

>
> Hello,
>
> When reviewing the PEP 3151 implementation (*), Ezio commented that
> "FileSystemError" looks a bit strange and that "FilesystemError" would
> be a better spelling. What is your opinion?
>
> (*) http://bugs.python.org/issue12555
>
> Thank you
>
> Antoine.


I don't care all that much but I'm reminded of the .NET FileSystemWatcher
class, so put me down for +0.5 on FileSystemError.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110823/24432eea/attachment.html>

From barry at python.org  Tue Aug 23 20:46:40 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 23 Aug 2011 14:46:40 -0400
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <1314124776.1538.2.camel@hobo>
References: <20110823202004.0bb63490@pitrou.net> <1314124776.1538.2.camel@hobo>
Message-ID: <20110823144640.3aad9853@resist.wooz.org>

On Aug 23, 2011, at 08:39 PM, Ross Lagerwall wrote:

>> When reviewing the PEP 3151 implementation (*), Ezio commented that
>> "FileSystemError" looks a bit strange and that "FilesystemError" would
>> be a better spelling. What is your opinion?
>
>I don't think it really matters since both "file system" and
>"filesystem" appear to be in common usage.
>
>I would say +1 to "FileSystemError" -- i.e. take file system as two
>words.

My online dictionaries prefer "file system" to be two words, so for me,
FileSystemError is preferred.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110823/d4806dc7/attachment.pgp>

From solipsis at pitrou.net  Tue Aug 23 20:51:47 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 23 Aug 2011 20:51:47 +0200
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
Message-ID: <20110823205147.3349eaa8@pitrou.net>

On Tue, 23 Aug 2011 20:43:25 +0200
Charles-Fran?ois Natali <cf.natali at gmail.com> wrote:
> > Please consider this invitation to stick your head into an interesting
> > problem:
> > http://bugs.python.org/issue6721
> 
> Just for the record, I'm now in favor of the atfork mechanism. It
> won't solve the problem for I/O locks, but it'll at least make room
> for a clean and cross-library way to setup atfork handlers. I just
> skimmed over it, but it seemed Gregory's atfork module could be a good
> starting point.

Well, I would consider the I/O locks the most glaring problem. Right
now, your program can freeze if you happen to do a fork() while e.g.
the stderr lock is taken by another thread (which is quite common when
debugging).

Regards

Antoine.



From ethan at stoneleaf.us  Tue Aug 23 21:11:54 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 23 Aug 2011 12:11:54 -0700
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <20110823202004.0bb63490@pitrou.net>
References: <20110823202004.0bb63490@pitrou.net>
Message-ID: <4E53FB7A.5030506@stoneleaf.us>

Antoine Pitrou wrote:
> Hello,
> 
> When reviewing the PEP 3151 implementation (*), Ezio commented that
> "FileSystemError" looks a bit strange and that "FilesystemError" would
> be a better spelling. What is your opinion?

FileSystemError

From stefan-usenet at bytereef.org  Tue Aug 23 21:06:05 2011
From: stefan-usenet at bytereef.org (Stefan Krah)
Date: Tue, 23 Aug 2011 21:06:05 +0200
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <20110823144640.3aad9853@resist.wooz.org>
References: <20110823202004.0bb63490@pitrou.net> <1314124776.1538.2.camel@hobo>
	<20110823144640.3aad9853@resist.wooz.org>
Message-ID: <20110823190605.GA16790@sleipnir.bytereef.org>

Barry Warsaw <barry at python.org> wrote:
> My online dictionaries prefer "file system" to be two words, so for me,
> FileSystemError is preferred.

+1


Stefan Krah




From steve at pearwood.info  Tue Aug 23 21:19:23 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 24 Aug 2011 05:19:23 +1000
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <20110823202004.0bb63490@pitrou.net>
References: <20110823202004.0bb63490@pitrou.net>
Message-ID: <4E53FD3B.7000705@pearwood.info>

Antoine Pitrou wrote:
> Hello,
> 
> When reviewing the PEP 3151 implementation (*), Ezio commented that
> "FileSystemError" looks a bit strange and that "FilesystemError" would
> be a better spelling. What is your opinion?

It's a file system (two words), not filesystem (not in any dictionary or 
spell checker I've ever used).

(Nor do we write filingsystem, governmentsystem, politicalsystem or 
schoolsystem. This is English, not German.)



-- 
Steven


From neologix at free.fr  Tue Aug 23 22:07:22 2011
From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Tue, 23 Aug 2011 22:07:22 +0200
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <20110823205147.3349eaa8@pitrou.net>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
Message-ID: <CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>

2011/8/23 Antoine Pitrou <solipsis at pitrou.net>:
> Well, I would consider the I/O locks the most glaring problem. Right
> now, your program can freeze if you happen to do a fork() while e.g.
> the stderr lock is taken by another thread (which is quite common when
> debugging).

Indeed.
To solve this, a similar mechanism could be used: after fork(), in the
child process:
- just reset each I/O lock (destroy/re-create the lock) if we can
guarantee that the file object is in a consistent state (i.e. that all
the invariants hold). That's the approach I used in my initial patch.
- call a fileobject method which resets the I/O lock and sets the file
object to a consistent state (in other word, an atfork handler)

From vinay_sajip at yahoo.co.uk  Tue Aug 23 22:11:21 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Tue, 23 Aug 2011 20:11:21 +0000 (UTC)
Subject: [Python-Dev] FileSystemError or FilesystemError?
References: <20110823202004.0bb63490@pitrou.net>
Message-ID: <loom.20110823T220956-380@post.gmane.org>

Antoine Pitrou <solipsis <at> pitrou.net> writes:

> When reviewing the PEP 3151 implementation (*), Ezio commented that
> "FileSystemError" looks a bit strange and that "FilesystemError" would
> be a better spelling. What is your opinion?

+1 for FileSystemError as I, like others, don't regard "filesystem" as a proper
word.

Regards,

Vinay Sajip


From _ at lvh.cc  Tue Aug 23 22:17:34 2011
From: _ at lvh.cc (Laurens Van Houtven)
Date: Tue, 23 Aug 2011 22:17:34 +0200
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <20110823144640.3aad9853@resist.wooz.org>
References: <20110823202004.0bb63490@pitrou.net> <1314124776.1538.2.camel@hobo>
	<20110823144640.3aad9853@resist.wooz.org>
Message-ID: <CAE_Hg6YfEA09rxXrud=va-dWh7acKKtUPyoC4ZKUmmd-zffOFg@mail.gmail.com>

On Tue, Aug 23, 2011 at 8:46 PM, Barry Warsaw <barry at python.org> wrote:

> On Aug 23, 2011, at 08:39 PM, Ross Lagerwall wrote:
>
> >> When reviewing the PEP 3151 implementation (*), Ezio commented that
> >> "FileSystemError" looks a bit strange and that "FilesystemError" would
> >> be a better spelling. What is your opinion?
> >
> >I don't think it really matters since both "file system" and
> >"filesystem" appear to be in common usage.
> >
> >I would say +1 to "FileSystemError" -- i.e. take file system as two
> >words.
>
> My online dictionaries prefer "file system" to be two words, so for me,
> FileSystemError is preferred.
>
> -Barry
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/_%40lvh.cc
>
>
+1

-- 
cheers
lvh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110823/cbd2b74f/attachment.html>

From solipsis at pitrou.net  Tue Aug 23 22:29:22 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 23 Aug 2011 22:29:22 +0200
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
Message-ID: <1314131362.3485.36.camel@localhost.localdomain>

Le mardi 23 ao?t 2011 ? 22:07 +0200, Charles-Fran?ois Natali a ?crit :
> 2011/8/23 Antoine Pitrou <solipsis at pitrou.net>:
> > Well, I would consider the I/O locks the most glaring problem. Right
> > now, your program can freeze if you happen to do a fork() while e.g.
> > the stderr lock is taken by another thread (which is quite common when
> > debugging).
> 
> Indeed.
> To solve this, a similar mechanism could be used: after fork(), in the
> child process:
> - just reset each I/O lock (destroy/re-create the lock) if we can
> guarantee that the file object is in a consistent state (i.e. that all
> the invariants hold). That's the approach I used in my initial patch.

For I/O locks I think that would work.
There could also be a process-wide "fork lock" to serialize locks and
other operations, if we want 100% guaranteed consistency of I/O objects
across forks.

> - call a fileobject method which resets the I/O lock and sets the file
> object to a consistent state (in other word, an atfork handler)

I fear that the complication with atfork handlers is that you have to
manage their lifecycle as well (i.e., when an IO object is destroyed,
you have to unregister the handler).

Regards

Antoine.



From barry at python.org  Tue Aug 23 23:03:57 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 23 Aug 2011 17:03:57 -0400
Subject: [Python-Dev] PEP 3151 from the BDFOP
Message-ID: <20110823170357.3b3ab2fc@resist.wooz.org>

I am sending this review as the BDFOP for PEP 3151.  I've read the PEP and
reviewed the python-dev discussion via Gmane.  I have not reviewed the hg
branch where Antoine has implemented it.

I'm not quite ready to pronounce, but I do have some questions and comments.
First off, thanks to Antoine for taking this issue on, and for his well
written and well reasoned PEP.  There's definitely a problem here and I think
Python will be better off for having addressed it.  I, for one, will be very
happy when I can eliminate the majority of `import errno`s from my code. ;)

One guiding principle for me is that we should keep the abstraction as thin as
possible.  In particular, I'm concerned about mapping multiple errnos into a
single Error.  For example both EPIPE and ESHUTDOWN mapping to BrokePipeError,
or EACESS or EPERM to PermissionError.  I think we should resist this, so that
one errno maps to exactly one Error.  Where grouping is desired, Python
already has mechanisms to deal with that, e.g. superclasses and multiple
inheritance.  Therefore, I think it would be better to have

+ FileSystemPermissionError
  + AccessError (EACCES)
  + PermissionError (EPERM)

Yes, it makes the hierarchy deeper, and means you have to come up with a few
more names, but I think it will also make it easier for the programmer to use
and debug.  Also, some of the artificial hierarchy introduced in the PEP may
not be necessary (e.g. the faux superclass FileSystemPermissionError above).
This might lead to the elimination of FileSystemError as some have suggested
(I too question its utility).

Similarly, I think it would be helpful to have the errno name (e.g. ENOENT) in
the error message string.  That way, it won't get in the way for most code,
but would be usefully printed out for uncaught exceptions.

A second guiding principle should be that careful code that works in Python
3.2 must continue to work in Python 3.3 once PEP 3151 is accepted, but also
for Python 2 code ported straight to Python 3.3.  Given the PEP's emphasis on
"useful compatibility", I think this will be the case.  Do be prepared for
complaints about compatibility for careless code though - there's a ton of
that out in the wild, and people will always complain with their "working"
code breaks due to an upgrade.  Be *very* explicit about this in the release
notes and NEWS file, and put your asbestos underoos on.  On the plus side,
there's not so much Python 3 code to break :).  Also, do clearly explain any
required migration strategy for existing code, probably in this PEP.

Have you considered the impact of this PEP on other Python implementations?
My hazy memory of Jython tells me that errnos don't really leak into Java and
thus Jython much, but what about PyPy and IronPython?  E.g. step 1's
deprecation strategy seems pretty CPython-centric.

As for step 1 (coalescing the errors).  This makes sense and I'm generally
agreeable, but I'm wondering whether it's best to re-use IOError for this
rather than introduce a new exception.  Not that I can think of a good name
for that.  I'm just not totally convinced that existing code when upgrading to
Python 3.3 won't introduce silent failures.  If an existing error is to be
re-used for this, I'm torn on whether IOError or OSError is a better choice.
Popularity aside, OSError *feels* more right.

What is the impact of the PEP on tools such as 2to3 and 3to2?

Just to be clear, am I right that (on POSIX systems at least) IOError and its
subclasses will always have an errno attribute still?  And that anything
raising an exception (e.g. via PyErr_SetFromErrno) other than the new ones
will raise IOError?

I also think that rather than transforming exception when raised from Python,
i.e. via __new__ hackery, perhaps it should be a ValueError in its own right
to raise IOError with an error represented by one of the subclasses.  Chained
exceptions would mean that the original exception needn't get lost.

I surveyed some of my own code and observed (as others have) that EISDIR and
ENOTDIR are pretty rare.  I found more examples of ECHILD and ESRCH than the
former two.  How'd you like to add those two to make your BDFOP happy? :)

What follows are some crazier ideas.  I'm just throwing them out there, not
necessarily suggesting they should go into the PEP.

The new syntax (e.g. if clause on except) is certainly appealing at first
glance, and might be of more general use for Python, but I agree with the
problems as stated in the PEP.  However, there might be a few things that
*can* be done to make even the uncommon cases easier. E.g.

What if all the errno symbolic names were mapped as attributes on IOError?
The only advantage of that would be to eliminate the need to import errno, or
for the ugly `e.errno == errno.ENOENT` stuff.  That would then be rewritten as
`e.errno == IOError.ENOENT`.  A mild savings to be sure, but still.

How dumb/useless/unworkable would it be to add an __future__ to switch from
the old hierarchy to the new one?  Probably pretty. ;)

What about an api that applications/libraries could use to add additional
exceptions based on other errnos they cared about?  This could be consulted in
PyErr_SetFromErrno() and raised instead of IOError.  Okay, yeah, that's
probably pretty dumb too.

Anyway, that's all I have.  I certainly feel like this PEP is pretty close to
being accepted.  Good work!

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110823/583f70bf/attachment.pgp>

From victor.stinner at haypocalc.com  Wed Aug 24 00:27:37 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 00:27:37 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110823001440.433a0f1f@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net>
Message-ID: <201108240027.37788.victor.stinner@haypocalc.com>

Le mardi 23 ao?t 2011 00:14:40, Antoine Pitrou a ?crit :
> Hello,
> 
> On Mon, 22 Aug 2011 14:58:51 -0400
> 
> Torsten Becker <torsten.becker at gmail.com> wrote:
> > I have implemented an initial version of PEP 393 -- "Flexible String
> > Representation" as part of my Google Summer of Code project.  My patch
> > is hosted as a repository on bitbucket [1] and I created a related
> > issue on the bug tracker [2].  I posted documentation for the current
> > state of the development in the wiki [3].
> 
> A couple of minor comments:
> 
> - ?The UTF-8 decoding fast path for ASCII only characters was removed
>   and replaced with a memcpy if the entire string is ASCII.?
>   The fast path would still be useful for mostly-ASCII strings, which
>   are extremely common (unless UTF-8 has become a no-op?).

I posted a patch to re-add it:
http://bugs.python.org/issue12819#msg142867

Victor


From victor.stinner at haypocalc.com  Wed Aug 24 00:38:00 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 00:38:00 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110823001440.433a0f1f@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net>
Message-ID: <201108240038.00801.victor.stinner@haypocalc.com>

Le mardi 23 ao?t 2011 00:14:40, Antoine Pitrou a ?crit :
> - You could try to run stringbench, which can be found at
>   http://svn.python.org/projects/sandbox/trunk/stringbench (*)
>   and there's iobench (the text mode benchmarks) in the Tools/iobench
>   directory.

Some raw numbers.

stringbench:
"147.07 203.07 72.4 TOTAL" for the PEP 393
"146.81 140.39 104.6 TOTAL" for default
=> PEP is 45% slower

run test_unicode 50 times:
0m19.487s for PEP
0m17.187s for default
=> PEP is 13% slower

time ./python -m test -j4 ("real" time):
3m16.886s (334 tests) for the PEP
3m21.984s (335 tests) for default
... default has 1 more test!

Only 13% slower on test_unicode is *good*. There are still a lot of code using 
the legacy API in unicode.c, so it cam be much better.

stringbench only shows the overhead of the conversion from compact unicode to 
Py_UNICODE* (wchar_t*). stringlib does still use the legacy API.

Victor


From victor.stinner at haypocalc.com  Wed Aug 24 00:46:16 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 00:46:16 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
Message-ID: <201108240046.16058.victor.stinner@haypocalc.com>

Le lundi 22 ao?t 2011 20:58:51, Torsten Becker a ?crit :
> [1]: http://www.python.org/dev/peps/pep-0393

state:
lowest 2 bits (mask 0x03) - interned-state (SSTATE_*) as in 3.2
next 2 bits (mask 0x0C) - form of str:
00 => reserved
01 => 1 byte (Latin-1)
10 => 2 byte (UCS-2)
11 => 4 byte (UCS-4);
next bit (mask 0x10): 1 if str memory follows PyUnicodeObject

kind=0 is used and public, it's PyUnicode_WCHAR_KIND. Is it still necessary? 
It looks to be only used in PyUnicode_DecodeUnicodeEscape().

Victor


From victor.stinner at haypocalc.com  Wed Aug 24 00:56:48 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 00:56:48 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <201108240046.16058.victor.stinner@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<201108240046.16058.victor.stinner@haypocalc.com>
Message-ID: <201108240056.48170.victor.stinner@haypocalc.com>

Le mercredi 24 ao?t 2011 00:46:16, Victor Stinner a ?crit :
> Le lundi 22 ao?t 2011 20:58:51, Torsten Becker a ?crit :
> > [1]: http://www.python.org/dev/peps/pep-0393
> 
> state:
> lowest 2 bits (mask 0x03) - interned-state (SSTATE_*) as in 3.2
> next 2 bits (mask 0x0C) - form of str:
> 00 => reserved
> 01 => 1 byte (Latin-1)
> 10 => 2 byte (UCS-2)
> 11 => 4 byte (UCS-4);
> next bit (mask 0x10): 1 if str memory follows PyUnicodeObject
> 
> kind=0 is used and public, it's PyUnicode_WCHAR_KIND. Is it still
> necessary? It looks to be only used in PyUnicode_DecodeUnicodeEscape().

If it can be removed, it would be nice to have kind in [0; 2] instead of kind 
in [1; 2], to be able to have a list (of 3 items) => callback or label. I 
suppose that compilers prefer a switch with all cases defined, 0 a first item 
and contiguous values. We may need an enum.

Victor


From tjreedy at udel.edu  Wed Aug 24 01:04:10 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 23 Aug 2011 19:04:10 -0400
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <CAD+XWwrkdjD9T0Pa62YAfJTTmCkqiPOE0XLqkjGiW6jg_hAChQ@mail.gmail.com>
References: <20110823202004.0bb63490@pitrou.net>
	<CAD+XWwrkdjD9T0Pa62YAfJTTmCkqiPOE0XLqkjGiW6jg_hAChQ@mail.gmail.com>
Message-ID: <j31bm9$coq$1@dough.gmane.org>

On 8/23/2011 2:46 PM, Brian Curtin wrote:

> I don't care all that much but I'm reminded of the .NET
> FileSystemWatcher class, so put me down for +0.5 on FileSystemError.

For other reasons, I am at lease +.5 for FileSystemError also.


-- 
Terry Jan Reedy


From solipsis at pitrou.net  Wed Aug 24 01:57:56 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 24 Aug 2011 01:57:56 +0200
Subject: [Python-Dev] PEP 3151 from the BDFOP
References: <20110823170357.3b3ab2fc@resist.wooz.org>
Message-ID: <20110824015756.51cdceac@pitrou.net>


Hi,

> One guiding principle for me is that we should keep the abstraction as thin as
> possible.  In particular, I'm concerned about mapping multiple errnos into a
> single Error.  For example both EPIPE and ESHUTDOWN mapping to BrokePipeError,
> or EACESS or EPERM to PermissionError.  I think we should resist this, so that
> one errno maps to exactly one Error.  Where grouping is desired, Python
> already has mechanisms to deal with that, e.g. superclasses and multiple
> inheritance.  Therefore, I think it would be better to have
> 
> + FileSystemPermissionError
>   + AccessError (EACCES)
>   + PermissionError (EPERM)

I'm not sure that's a good idea:

- EPERM is not only about filesystem permissions, see for example
  http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_timedwait.html

- EACCES and EPERM as a low-level distinction makes sense, but at the
  Python programmer's high-level point of view, the AccessError /
  PermissionError distinction does not seem to convey any useful
  meaning.
  (or perhaps that's just my bad understanding of English)

- the "errno" attribute is still there (and still displayed - see below)
  for people who know their system calls and want to inspect the
  original error code

> Also, some of the artificial hierarchy introduced in the PEP may
> not be necessary (e.g. the faux superclass FileSystemPermissionError above).
> This might lead to the elimination of FileSystemError as some have suggested
> (I too question its utility).

Yes, FileSystemError might be removed. I thought that it would be
useful, in some library routines, to catch all filesystem-related
errors indistinctly, but it's not a complete catchall actually (for
example, AccessError is outside of the FileSystemError subtree).

> Similarly, I think it would be helpful to have the errno name (e.g. ENOENT) in
> the error message string.  That way, it won't get in the way for most code,
> but would be usefully printed out for uncaught exceptions.

Agreed, but I think that's a feature request quite orthogonal from the
PEP. The errno *number* is still printed as it was before:

>>> open("foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'foo'

(see e.g. http://bugs.python.org/issue12762)

> A second guiding principle should be that careful code that works in Python
> 3.2 must continue to work in Python 3.3 once PEP 3151 is accepted, but also
> for Python 2 code ported straight to Python 3.3.

I don't porting straight to 3.3 would make a difference, especially now
that the idea of deprecating old exception names has been abandoned.

> Do be prepared for
> complaints about compatibility for careless code though - there's a ton of
> that out in the wild, and people will always complain with their "working"
> code breaks due to an upgrade.  Be *very* explicit about this in the release
> notes and NEWS file, and put your asbestos underoos on.

I'll take care about that :)

> Have you considered the impact of this PEP on other Python implementations?
> My hazy memory of Jython tells me that errnos don't really leak into Java and
> thus Jython much, but what about PyPy and IronPython?  E.g. step 1's
> deprecation strategy seems pretty CPython-centric.

Alternative implementations already have to implement errno codes in a
way or another if they want to have a chance of running existing code.
So I don't think the PEP makes much of a difference for them.
But their implementors can give their opinion on this.

> As for step 1 (coalescing the errors).  This makes sense and I'm generally
> agreeable, but I'm wondering whether it's best to re-use IOError for this
> rather than introduce a new exception.  Not that I can think of a good name
> for that.  I'm just not totally convinced that existing code when upgrading to
> Python 3.3 won't introduce silent failures.  If an existing error is to be
> re-used for this, I'm torn on whether IOError or OSError is a better choice.
> Popularity aside, OSError *feels* more right.

I don't have any personal preference. Previous discussions seemed to
indicate people preferred IOError. But changing the implementation to
OSError would be simple. I agree OSError feels slightly more right, as
in more generic.

> What is the impact of the PEP on tools such as 2to3 and 3to2?

I'd say none for 2to3. For 3to2 I'm not sure. Obviously if you write
code taking advantage of new features, it will be difficultly
back-portable to 2.x. But that's not specific to PEP 3151. Python 3.2
has lot such stuff already:
http://docs.python.org/py3k/whatsnew/3.2.html

> Just to be clear, am I right that (on POSIX systems at least) IOError and its
> subclasses will always have an errno attribute still?

Yes!

> And that anything
> raising an exception (e.g. via PyErr_SetFromErrno) other than the new ones
> will raise IOError?

I'm not sure I understand the question precisely. The errno mapping
mechanism is implemented in IOError.__new__, but it gets called only if
the class is exactly IOError, not a subclass:

>>> IOError(errno.EPERM, "foo")
PermissionError(1, 'foo')
>>> class MyIOError(IOError): pass
... 
>>> MyIOError(errno.EPERM, "foo")
MyIOError(1, 'foo')

Using IOError.__new__ is the easiest way to ensure that all code
raising IO errors takes advantage of the errno mapping. Otherwise you
may get APIs raising the proper subclasses, and other APIs always
raising base IOError (it doesn't happen often, but some Python
library code raises an IOError with an explicit errno).

> I also think that rather than transforming exception when raised from Python,
> i.e. via __new__ hackery, perhaps it should be a ValueError in its own right
> to raise IOError with an error represented by one of the subclasses.

That would make it harder to keep compatibility while adding new
subclasses in future Python versions. Imagine a lot of people lobby for
a dedicated EBADF subclass and obtain it, then IOError(EBADF, "some
message") would suddenly raise a ValueError. Or do I misunderstand your
proposal?

> I surveyed some of my own code and observed (as others have) that EISDIR and
> ENOTDIR are pretty rare.

Yes, I think they are common in shutil-like code.

> I found more examples of ECHILD and ESRCH than the
> former two.  How'd you like to add those two to make your BDFOP happy? :)

Wow, I didn't know ESRCH.
How would you call the respective exceptions?
- ChildProcessError for ECHILD?
- ProcessLookupError for ESRCH?

> What if all the errno symbolic names were mapped as attributes on IOError?
> The only advantage of that would be to eliminate the need to import errno, or
> for the ugly `e.errno == errno.ENOENT` stuff.  That would then be rewritten as
> `e.errno == IOError.ENOENT`.  A mild savings to be sure, but still.

Hmm, I guess that's explorable as an orthogonal idea.

> How dumb/useless/unworkable would it be to add an __future__ to switch from
> the old hierarchy to the new one?  Probably pretty. ;)

Well, the hierarchy is built-in, since it's about standard exceptions.
Also, you usually get the exception from some library API, so a
__future__ in your own module would not achieve much.

> What about an api that applications/libraries could use to add additional
> exceptions based on other errnos they cared about?  This could be consulted in
> PyErr_SetFromErrno() and raised instead of IOError.  Okay, yeah, that's
> probably pretty dumb too.

The problem is that behaviour becomes inconsistent accross libraries.
I'm not sure that's very helpful to the user.

Regards

Antoine.



From tjreedy at udel.edu  Wed Aug 24 02:46:00 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 23 Aug 2011 20:46:00 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E537EEC.1070602@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
Message-ID: <j31hl7$dp5$1@dough.gmane.org>

On 8/23/2011 6:20 AM, "Martin v. L?wis" wrote:
> Am 23.08.2011 11:46, schrieb Xavier Morel:
>> Mostly ascii is pretty common for western-european languages (French, for
>> instance, is probably 90 to 95% ascii). It's also a risk in english, when
>> the writer "correctly" spells foreign words (r?sum? and the like).
>
> I know - I still question whether it is "extremely common" (so much as
> to justify a special case). I.e. on what application with what dataset
> would you gain what speedup, at the expense of what amount of extra
> lines, and potential slow-down for other datasets?
[snip]
> In the PEP 393 approach, if the string has a two-byte representation,
> each character needs to widened to two bytes, and likewise for four
> bytes. So three separate copies of the unrolled loop would be needed,
> one for each target size.

I fully support the declared purpose of the PEP, which I understand to 
be to have a full,correct Unicode implementation on all new Python 
releases without paying unnecessary space (and consequent time) 
penalties. I think the erroneous length, iteration, indexing, and 
slicing for strings with non-BMP chars in narrow builds needs to be 
fixed for future versions. I think we should at least consider 
alternatives to the PEP393 solution of double or quadrupling space if 
needed for at least one char.

In utf16.py, attached to http://bugs.python.org/issue12729
I propose for consideration a prototype of different solution to the 
'mostly BMP chars, few non-BMP chars' case. Rather than expand every 
character from 2 bytes to 4, attach an array cpdex of character (ie code 
point, not code unit) indexes. Then for indexing and slicing, the 
correction is simple, simpler than I first expected:
   code-unit-index = char-index + bisect.bisect_left(cpdex, char_index)
where code-unit-index is the adjusted index into the full underlying 
double-byte array. This adds a time penalty of log2(len(cpdex)), but 
avoids most of the space penalty and the consequent time penalty of 
moving more bytes around and increasing cache misses.

I believe the same idea would work for utf8 and the mostly-ascii case. 
The main difference is that non-ascii chars have various byte sizes 
rather than the 1 extra double-byte of non-BMP chars in UCS2 builds. So 
the offset correction would not simply be the bisect-left return but 
would require another lookup
   byte-index = char-index + offsets[bisect-left(cpdex, char-index)]

If possible, I would have the with-index-array versions be separate 
subtypes, as in utf16.py. I believe either index-array implementation 
might benefit from a subtype for single multi-unit chars, as a single 
non-ASCII or non-BMP char does not need an auxiliary [0] array and a 
senseless lookup therein but does need its length fixed at 1 instead of 
the number of base array units.

-- 
Terry Jan Reedy



From tjreedy at udel.edu  Wed Aug 24 02:46:06 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 23 Aug 2011 20:46:06 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E53A950.30005@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
Message-ID: <j31hlc$dp5$2@dough.gmane.org>

On 8/23/2011 9:21 AM, Victor Stinner wrote:
> Le 23/08/2011 15:06, "Martin v. L?wis" a ?crit :
>> Well, things have to be done in order:
>> 1. the PEP needs to be approved
>> 2. the performance bottlenecks need to be identified
>> 3. optimizations should be applied.
>
> I would not vote for the PEP if it slows down Python, especially if it's
> much slower. But Torsten says that it speeds up Python, which is
> surprising. I have to do my own benchmarks :-)

The current UCS2 Unicode string implementation, by design, quickly gives 
WRONG answers for len(), iteration, indexing, and slicing if a string 
contains any non-BMP (surrogate pair) Unicode characters. That may have 
been excusable when there essentially were no such extended chars, and 
the few there were were almost never used. But now there are many more, 
with more being added to each Unicode edition. They include cursive Math 
letters that are used in English documents today. The problem will 
slowly get worse and Python, at least on Windows, will become a language 
to avoid for dependable Unicode document processing. 3.x needs a proper 
Unicode implementation that works for all strings on all builds.

utf16.py, attached to http://bugs.python.org/issue12729
prototypes a different solution than the PEP for the above problems for 
the 'mostly BMP' case. I will discuss it in a different post.

-- 
Terry Jan Reedy



From torsten.becker at gmail.com  Wed Aug 24 04:35:32 2011
From: torsten.becker at gmail.com (Torsten Becker)
Date: Tue, 23 Aug 2011 22:35:32 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110823001440.433a0f1f@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net>
Message-ID: <CAP_a28HDCRSb0vFBQHstXAcDbGCNCEpktNV5Lw+JMBYzD011=A@mail.gmail.com>

On Mon, Aug 22, 2011 at 18:14, Antoine Pitrou <solipsis at pitrou.net> wrote:
> - You could trim the debug results from the benchmark results, this may
> ?make them more readable.

Good point, I removed them from the wiki page.

On Tue, Aug 23, 2011 at 18:38, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Le mardi 23 ao?t 2011 00:14:40, Antoine Pitrou a ?crit :
>> - You could try to run stringbench, which can be found at
>> ? http://svn.python.org/projects/sandbox/trunk/stringbench (*)
>> ? and there's iobench (the text mode benchmarks) in the Tools/iobench
>> ? directory.
>
> Some raw numbers.
> [...]

Thank you Victor for running stringbench, I did not get to it in time.


Regards,
Torsten

From ncoghlan at gmail.com  Wed Aug 24 04:31:12 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 24 Aug 2011 12:31:12 +1000
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <4E53FD3B.7000705@pearwood.info>
References: <20110823202004.0bb63490@pitrou.net>
	<4E53FD3B.7000705@pearwood.info>
Message-ID: <CADiSq7e=sWZpq3PRB77td4MuJDjUgpkk1DTGhnccAwJ5qVeGvA@mail.gmail.com>

On Wed, Aug 24, 2011 at 5:19 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> Antoine Pitrou wrote:
>>
>> Hello,
>>
>> When reviewing the PEP 3151 implementation (*), Ezio commented that
>> "FileSystemError" looks a bit strange and that "FilesystemError" would
>> be a better spelling. What is your opinion?
>
> It's a file system (two words), not filesystem (not in any dictionary or
> spell checker I've ever used).

I rarely find spell checkers to be useful sources of data on correct
spelling of technical jargon (and the computing usage of the term
'filesystem' definitely qualifies as jargon).

> (Nor do we write filingsystem, governmentsystem, politicalsystem or
> schoolsystem. This is English, not German.)

Personally, I think 'filesystem' is a portmanteau in the process of
coming into existence (as evidenced by usage like 'FHS' standing for
'Filesystem Hierarchy Standard'). However, the two word form is still
useful at times, particularly for disambiguation of acronyms (as
evidenced by usage like 'NFS' and 'GFS' for 'Network File System' and
'Google File System'). The Wikipedia article on the topic mixes and
matches the two forms, but overall does favour the two word form.

Since I tend to use the one word 'filesystem' form myself (ditto for
'filename'), I'm +1 for FilesystemError, but I'm only -0 for
FileSystemError (so I expect that will be the option chosen, given
other responses).

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From torsten.becker at gmail.com  Wed Aug 24 04:39:49 2011
From: torsten.becker at gmail.com (Torsten Becker)
Date: Tue, 23 Aug 2011 22:39:49 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <1314101745.3485.18.camel@localhost.localdomain>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
Message-ID: <CAP_a28EnYwZm266whtMJuc2nNYsLev+x9wog3d8zCtnAjsMJMw@mail.gmail.com>

On Tue, Aug 23, 2011 at 08:15, Antoine Pitrou <solipsis at pitrou.net> wrote:
> So why would you need three separate implementation of the unrolled
> loop? You already have a macro named WRITE_FLEXIBLE_OR_WSTR.

The WRITE_FLEXIBLE_OR_WSTR macro does a check for kind and then
writes.  Using this macro for the fast path would be inefficient, to
have a real fast path, you would need a outer if to check for kind and
then in each condition body the matching access to the string (1, 2,
or 4 bytes) and for each body also write 4 or 8 times (guarded by
#ifdef, depending on platform).

As all these cases bloated up the C code, we went for the simple
solution with the goal of profiling the code again afterwards to see
where the new performance bottlenecks would be.

> Even without taking into account the unrolled loop, I wonder how much
> slower UTF-8 decoding becomes with that approach, by the way. Instead of
> testing the "kind" variable at each loop iteration, using a
> stringlib-like approach may be a better deal IMO.

To me this feels like this would complicate the C source code and
decrease readability.  For each function you would need a wrapper
which does the kind checking logic and then, in a separate file, the
implementation of the function which then gets included three times
for each character width.


Regards,
Torsten

From torsten.becker at gmail.com  Wed Aug 24 04:41:59 2011
From: torsten.becker at gmail.com (Torsten Becker)
Date: Tue, 23 Aug 2011 22:41:59 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <201108240027.37788.victor.stinner@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net>
	<201108240027.37788.victor.stinner@haypocalc.com>
Message-ID: <CAP_a28GrYY_=VBSwg031bEwfivJdkxJ4yLTGTyJnGhWjoA234Q@mail.gmail.com>

On Tue, Aug 23, 2011 at 18:27, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> I posted a patch to re-add it:
> http://bugs.python.org/issue12819#msg142867

Thank you for the patch!  Note that this patch adds the fast path only
to the helper function which determines the length of the string and
the maximum character.  The decoding part is still without a fast path
for ASCII runs.


Regards,
Torsten

From ncoghlan at gmail.com  Wed Aug 24 04:42:58 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 24 Aug 2011 12:42:58 +1000
Subject: [Python-Dev] Planned PEP status changes
Message-ID: <CADiSq7e7nj+EV4remUejKJYZczCmrs27iekQv9y9TEnkwx4=SA@mail.gmail.com>

Unless I hear any objections, I plan to adjust the current PEP
statuses as follows some time this weekend:

Move from Accepted to Finished:

    389  argparse - New Command Line Parsing Module              Bethard
    391  Dictionary-Based Configuration For Logging              Sajip
    3108  Standard Library Reorganization                         Cannon
    3135  New Super
Spealman, Delaney, Ryan

Move from Accepted to Withdrawn (with a reference to Reid Kleckner's blog post)
    3146  Merging Unladen Swallow into CPython
Winter, Yasskin, Kleckner


The PEP 3118 enhanced buffer protocol has some ongoing semantic and
implementation issues still to be worked out, so I plan to leave that
at Accepted. Ditto for PEP 3121 (extension module finalisation), since
that doesn't play nicely with the current 'set everything to None'
approach to breaking cycles during module finalisation.

The other Accepted PEPs are either packaging standards related or
genuinely not implemented yet.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From torsten.becker at gmail.com  Wed Aug 24 04:41:20 2011
From: torsten.becker at gmail.com (Torsten Becker)
Date: Tue, 23 Aug 2011 22:41:20 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110823160820.08754ffe@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j305jf$e7d$1@dough.gmane.org> <4E53A87A.1070306@v.loewis.de>
	<j30buf$pe6$1@dough.gmane.org> <20110823160820.08754ffe@pitrou.net>
Message-ID: <CAP_a28HW07E=A7A7oAoHymUY1XBKwLTRXeJNLaLoQMazEw=eVA@mail.gmail.com>

On Tue, Aug 23, 2011 at 10:08, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Macros are useful to shield the abstraction from the implementation. If
> you access the members directly, and the unicode object is represented
> differently in some future version of Python (say e.g. with tagged
> pointers), your code doesn't compile anymore.

I agree with Antoine, from the experience of porting C code from 3.2
to the PEP 393 unicode API, the additional encapsulation by macros
made it much easier to change the implementation of what is a field,
what is a field's actual name, and what needs to be calculated through
a function.

So, I would like to keep primary access as a macro but I see the point
that it would make the struct clearer to access and I would not mind
changing the struct to use a union.  But then most access currently is
through macros so I am not sure how much benefit the union would bring
as it mostly complicates the struct definition.

Also, common, now simple, checks for "unicode->str == NULL" would look
more ambiguous with a union ("unicode->str.latin1 == NULL").


Regards,
Torsten

From ncoghlan at gmail.com  Wed Aug 24 04:51:29 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 24 Aug 2011 12:51:29 +1000
Subject: [Python-Dev] PEP 3151 from the BDFOP
In-Reply-To: <20110824015756.51cdceac@pitrou.net>
References: <20110823170357.3b3ab2fc@resist.wooz.org>
	<20110824015756.51cdceac@pitrou.net>
Message-ID: <CADiSq7dB_6Vb-=tENjVtGD2RcvkmM6wp2dcg6V6FY5wUWfjf=Q@mail.gmail.com>

On Wed, Aug 24, 2011 at 9:57 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> I don't have any personal preference. Previous discussions seemed to
> indicate people preferred IOError. But changing the implementation to
> OSError would be simple. I agree OSError feels slightly more right, as
> in more generic.

IIRC, the preference for IOError was formed when we were going to
deprecate the 'legacy' names. Now that using the old names won't
trigger any kind of warning, +1 for using OSError as the official name
of the base class with IOError as a legacy alias.

>> And that anything
>> raising an exception (e.g. via PyErr_SetFromErrno) other than the new ones
>> will raise IOError?
>
> I'm not sure I understand the question precisely. The errno mapping
> mechanism is implemented in IOError.__new__, but it gets called only if
> the class is exactly IOError, not a subclass:
>
>>>> IOError(errno.EPERM, "foo")
> PermissionError(1, 'foo')
>>>> class MyIOError(IOError): pass
> ...
>>>> MyIOError(errno.EPERM, "foo")
> MyIOError(1, 'foo')
>
> Using IOError.__new__ is the easiest way to ensure that all code
> raising IO errors takes advantage of the errno mapping. Otherwise you
> may get APIs raising the proper subclasses, and other APIs always
> raising base IOError (it doesn't happen often, but some Python
> library code raises an IOError with an explicit errno).

It's also the natural place to put the errno->exception type mapping
so that existing code will raise the new errors without requiring
modification. We could spell it as a new class method ("from_errno" or
similar), but there isn't any ambiguity in doing it directly in
__new__, so a class method seems pointlessly inconvenient.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From torsten.becker at gmail.com  Wed Aug 24 04:56:57 2011
From: torsten.becker at gmail.com (Torsten Becker)
Date: Tue, 23 Aug 2011 22:56:57 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <201108240056.48170.victor.stinner@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<201108240046.16058.victor.stinner@haypocalc.com>
	<201108240056.48170.victor.stinner@haypocalc.com>
Message-ID: <CAP_a28G2tDKdwiDw6cdG-VDsqEGs+-WttgiXSr0iZ7wjmhPs4g@mail.gmail.com>

On Tue, Aug 23, 2011 at 18:56, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
>> kind=0 is used and public, it's PyUnicode_WCHAR_KIND. Is it still
>> necessary? It looks to be only used in PyUnicode_DecodeUnicodeEscape().
>
> If it can be removed, it would be nice to have kind in [0; 2] instead of kind
> in [1; 2], to be able to have a list (of 3 items) => callback or label.

It is also used in PyUnicode_DecodeUTF8Stateful() and there might be
some cases which I missed converting checks for 0 when I introduced
the macro.  The question was more if this should be written as 0 or as
a named constant.  I preferred the named constant for readability.

An alternative would be to have kind values be the same as the number
of bytes for the string representation so it would be 0 (wstr), 1
(1-byte), 2 (2-byte), or 4 (4-byte).

I think the value for wstr/uninitialized/reserved should not be
removed.  The wstr representation is still used in the error case in
the utf8 decoder because these strings can be resized. Also having one
designated value for "uninitialized" limits comparisons in the
affected functions to the kind value, otherwise they would need to
check the str field for NULL to determine in which buffer to write a
character.

> I suppose that compilers prefer a switch with all cases defined, 0 a first item
> and contiguous values. We may need an enum.

During the Summer of Code, Martin and I did a experiment with GCC and
it did not seem to produce a jump table as an optimization for three
cases but generated comparison instructions anyway.  I am not sure how
much we should optimize for potential compiler optimizations here.


Regards,
Torsten

From scott+python-dev at scottdial.com  Wed Aug 24 06:59:26 2011
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Wed, 24 Aug 2011 00:59:26 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <201108240038.00801.victor.stinner@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net>
	<201108240038.00801.victor.stinner@haypocalc.com>
Message-ID: <4E54852E.9000601@scottdial.com>

On 8/23/2011 6:38 PM, Victor Stinner wrote:
> Le mardi 23 ao?t 2011 00:14:40, Antoine Pitrou a ?crit :
>> - You could try to run stringbench, which can be found at
>>   http://svn.python.org/projects/sandbox/trunk/stringbench (*)
>>   and there's iobench (the text mode benchmarks) in the Tools/iobench
>>   directory.
> 
> Some raw numbers.
> 
> stringbench:
> "147.07 203.07 72.4 TOTAL" for the PEP 393
> "146.81 140.39 104.6 TOTAL" for default
> => PEP is 45% slower

I ran the same benchmark and couldn't make a distinction in performance
between them:

pep-393.txt
182.17  175.47  103.8   TOTAL
cpython.txt
183.26  177.97  103.0   TOTAL

pep-393-wide-unicode.txt
181.61  198.69  91.4    TOTAL
cpython-wide-unicode.txt
181.27  195.58  92.7    TOTAL

I ran it a couple times and have seen either default or pep-393 being up
to +/- 10 sec slower on the unicode tests. The results of the 8-bit
string tests seem to have less variance on my test machine.

> run test_unicode 50 times:
> 0m19.487s for PEP
> 0m17.187s for default
> => PEP is 13% slower

$ time ./python -m test `python -c 'print "test_unicode " * 50'`

pep-393-wide-unicode.txt
real    0m33.409s
cpython-wide-unicode.txt
real    0m33.489s

Nothing in it for me.. except your system is obviously faster, in general.

-- 
Scott Dial
scott at scottdial.com

From ezio.melotti at gmail.com  Wed Aug 24 07:39:24 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Wed, 24 Aug 2011 08:39:24 +0300
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <CADiSq7e=sWZpq3PRB77td4MuJDjUgpkk1DTGhnccAwJ5qVeGvA@mail.gmail.com>
References: <20110823202004.0bb63490@pitrou.net>
	<4E53FD3B.7000705@pearwood.info>
	<CADiSq7e=sWZpq3PRB77td4MuJDjUgpkk1DTGhnccAwJ5qVeGvA@mail.gmail.com>
Message-ID: <4E548E8C.8040701@gmail.com>

On 24/08/2011 5.31, Nick Coghlan wrote:
> On Wed, Aug 24, 2011 at 5:19 AM, Steven D'Aprano<steve at pearwood.info>  wrote:
>> (Nor do we write filingsystem, governmentsystem, politicalsystem or
>> schoolsystem. This is English, not German.)
> Personally, I think 'filesystem' is a portmanteau in the process of
> coming into existence (as evidenced by usage like 'FHS' standing for
> 'Filesystem Hierarchy Standard'). However, the two word form is still
> useful at times, particularly for disambiguation of acronyms (as
> evidenced by usage like 'NFS' and 'GFS' for 'Network File System' and
> 'Google File System'). The Wikipedia article on the topic mixes and
> matches the two forms, but overall does favour the two word form.
>
> Since I tend to use the one word 'filesystem' form myself (ditto for
> 'filename'), I'm +1 for FilesystemError, but I'm only -0 for
> FileSystemError (so I expect that will be the option chosen, given
> other responses).

This pretty much summarizes my thoughts.  I saw the wiki article using 
both and since I consider 'filesystem' a single word I was wondering if 
anyone else preferred FilesystemError.  I'm totally fine with 
FileSystemError too though, if most people prefer it.

Best Regards,
Ezio Melotti

>
> Regards,
> Nick.
>


From stefan_ml at behnel.de  Wed Aug 24 08:57:54 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 24 Aug 2011 08:57:54 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP_a28HW07E=A7A7oAoHymUY1XBKwLTRXeJNLaLoQMazEw=eVA@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<j305jf$e7d$1@dough.gmane.org>
	<4E53A87A.1070306@v.loewis.de>	<j30buf$pe6$1@dough.gmane.org>
	<20110823160820.08754ffe@pitrou.net>
	<CAP_a28HW07E=A7A7oAoHymUY1XBKwLTRXeJNLaLoQMazEw=eVA@mail.gmail.com>
Message-ID: <j327di$cr1$1@dough.gmane.org>

Torsten Becker, 24.08.2011 04:41:
> Also, common, now simple, checks for "unicode->str == NULL" would look
> more ambiguous with a union ("unicode->str.latin1 == NULL").

You could just add yet another field "any", i.e.

     union {
        unsigned char* latin1;
        Py_UCS2* ucs2;
        Py_UCS4* ucs4;
        void* any;
     } str;

That way, the above test becomes

     if (!unicode->str.any)

or

     if (unicode->str.any == NULL)

Or maybe even call it "initialised" to match the intended purpose:

     if (!unicode->str.initialised)

That being said, I don't mind "unicode->str.latin1 == NULL" either, given 
that it will (as mentioned by others) be hidden behind a macro most of the 
time anyway.

Stefan


From stephen at xemacs.org  Wed Aug 24 09:51:40 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 24 Aug 2011 16:51:40 +0900
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <CADiSq7e=sWZpq3PRB77td4MuJDjUgpkk1DTGhnccAwJ5qVeGvA@mail.gmail.com>
References: <20110823202004.0bb63490@pitrou.net>
	<4E53FD3B.7000705@pearwood.info>
	<CADiSq7e=sWZpq3PRB77td4MuJDjUgpkk1DTGhnccAwJ5qVeGvA@mail.gmail.com>
Message-ID: <87sjorb62b.fsf@uwakimon.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > Since I tend to use the one word 'filesystem' form myself (ditto for
 > 'filename'), I'm +1 for FilesystemError, but I'm only -0 for
 > FileSystemError (so I expect that will be the option chosen, given
 > other responses).

I slightly prefer FilesystemError because it parses unambiguously.
Cf. FileSystemError vs FileUserError.

From v+python at g.nevcal.com  Wed Aug 24 09:56:56 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 24 Aug 2011 00:56:56 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j31hl7$dp5$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de> <j31hl7$dp5$1@dough.gmane.org>
Message-ID: <4E54AEC8.7040702@g.nevcal.com>

On 8/23/2011 5:46 PM, Terry Reedy wrote:
> On 8/23/2011 6:20 AM, "Martin v. L?wis" wrote:
>> Am 23.08.2011 11:46, schrieb Xavier Morel:
>>> Mostly ascii is pretty common for western-european languages 
>>> (French, for
>>> instance, is probably 90 to 95% ascii). It's also a risk in english, 
>>> when
>>> the writer "correctly" spells foreign words (r?sum? and the like).
>>
>> I know - I still question whether it is "extremely common" (so much as
>> to justify a special case). I.e. on what application with what dataset
>> would you gain what speedup, at the expense of what amount of extra
>> lines, and potential slow-down for other datasets?
> [snip]
>> In the PEP 393 approach, if the string has a two-byte representation,
>> each character needs to widened to two bytes, and likewise for four
>> bytes. So three separate copies of the unrolled loop would be needed,
>> one for each target size.
>
> I fully support the declared purpose of the PEP, which I understand to 
> be to have a full,correct Unicode implementation on all new Python 
> releases without paying unnecessary space (and consequent time) 
> penalties. I think the erroneous length, iteration, indexing, and 
> slicing for strings with non-BMP chars in narrow builds needs to be 
> fixed for future versions. I think we should at least consider 
> alternatives to the PEP393 solution of double or quadrupling space if 
> needed for at least one char.
>
> In utf16.py, attached to http://bugs.python.org/issue12729
> I propose for consideration a prototype of different solution to the 
> 'mostly BMP chars, few non-BMP chars' case. Rather than expand every 
> character from 2 bytes to 4, attach an array cpdex of character (ie 
> code point, not code unit) indexes. Then for indexing and slicing, the 
> correction is simple, simpler than I first expected:
>   code-unit-index = char-index + bisect.bisect_left(cpdex, char_index)
> where code-unit-index is the adjusted index into the full underlying 
> double-byte array. This adds a time penalty of log2(len(cpdex)), but 
> avoids most of the space penalty and the consequent time penalty of 
> moving more bytes around and increasing cache misses.
>
> I believe the same idea would work for utf8 and the mostly-ascii case. 
> The main difference is that non-ascii chars have various byte sizes 
> rather than the 1 extra double-byte of non-BMP chars in UCS2 builds. 
> So the offset correction would not simply be the bisect-left return 
> but would require another lookup
>   byte-index = char-index + offsets[bisect-left(cpdex, char-index)]
>
> If possible, I would have the with-index-array versions be separate 
> subtypes, as in utf16.py. I believe either index-array implementation 
> might benefit from a subtype for single multi-unit chars, as a single 
> non-ASCII or non-BMP char does not need an auxiliary [0] array and a 
> senseless lookup therein but does need its length fixed at 1 instead 
> of the number of base array units.
>
So am I correctly reading between the lines when, after reading this 
thread so far, and the complete issue discussion so far, that I see a 
PEP 393 revision or replacement that has the following characteristics:

1) Narrow builds are dropped.  The conceptual idea of PEP 393 eliminates 
the need for narrow builds, as the internal string data structures 
adjust to the actuality of the data.  If you want a narrow build, just 
don't use code points > 65535.

2) There are more, or different, internal kinds of strings, which affect 
the processing patterns.  Here is an enumeration of the ones I can think 
of, as complete as possible, with recognition that benchmarking and 
clever algorithms may eliminate the need for some of them.

a) all ASCII
b) latin-1 (8-bit codepoints, the first 256 Unicode codepoints) This 
kind may not be able to support a "mostly" variation, and may be no more 
efficient than case b).  But it might also be popular in parts of Europe 
:)  And appropriate benchmarks may discover whether or not it has worth.
c) mostly ASCII (utf8) with clever indexing/caching to be efficient
d) UTF-8 with clever indexing/caching to be efficient
e) 16-bit codepoints
f) UTF-16 with clever indexing/caching to be efficient
g) 32-bit codepoints
h) UTF-32

When instantiating a str, a new parameter or subtype would restrict the 
implementation to using only a), b), d), f), and h) when fully 
conformant Unicode behavior is desired.  No lone surrogates, no out of 
range code points, no illegal codepoints. A default str would prefer a), 
b), c), e), and g) for efficiency and flexibility.

When manipulations outside of Unicode are necessary [Windows seems to 
use e) for example, suffering from the same sorts of backward 
compatibility problems as Python, in some ways], the default str type 
would permit them, using e) and g) kinds of representations.  Although 
the surrogate escape codec only uses prefix surrogates (or is it only 
suffix ones?) which would never match up, note that a conversion from 
16-bit codepoints to other formats may produce matches between the 
results of the surrogate escape codec, and other unchecked data 
introduced by the user/program.

A method should be provided to validate and promote a string from 
default, unchecked str type to the subtype or variation that enforces 
Unicode, if it qualifies; if it doesn't qualify, an exception would be 
raised by the method.  (This could generally be done in place if the 
value is bound to only a single variable, but would generate a copy and 
rebind the variable to the promoted copy if it is multiply referenced?)

Another parameter or subtype of the conformant str would add grapheme 
support, which has a different set of rules for the clever 
indexing/caching, but could be applied to any of a)?, c)?, d), f), or h).

? It is unnecessary to apply clever indexing/caching to a) and c) kinds 
of string internals, because there is a one-to-one mapping between 
bytes, codepoints, and graphemes in these ranges.  So plain array 
indexing can be used in the implementation of these kinds.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110824/9c09ff6b/attachment.html>

From victor.stinner at haypocalc.com  Wed Aug 24 10:10:17 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 10:10:17 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP_a28HW07E=A7A7oAoHymUY1XBKwLTRXeJNLaLoQMazEw=eVA@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<j305jf$e7d$1@dough.gmane.org>
	<4E53A87A.1070306@v.loewis.de>	<j30buf$pe6$1@dough.gmane.org>
	<20110823160820.08754ffe@pitrou.net>
	<CAP_a28HW07E=A7A7oAoHymUY1XBKwLTRXeJNLaLoQMazEw=eVA@mail.gmail.com>
Message-ID: <4E54B1E9.8080404@haypocalc.com>

Le 24/08/2011 04:41, Torsten Becker a ?crit :
> On Tue, Aug 23, 2011 at 10:08, Antoine Pitrou<solipsis at pitrou.net>  wrote:
>> Macros are useful to shield the abstraction from the implementation. If
>> you access the members directly, and the unicode object is represented
>> differently in some future version of Python (say e.g. with tagged
>> pointers), your code doesn't compile anymore.
>
> I agree with Antoine, from the experience of porting C code from 3.2
> to the PEP 393 unicode API, the additional encapsulation by macros
> made it much easier to change the implementation of what is a field,
> what is a field's actual name, and what needs to be calculated through
> a function.
>
> So, I would like to keep primary access as a macro but I see the point
> that it would make the struct clearer to access and I would not mind
> changing the struct to use a union.  But then most access currently is
> through macros so I am not sure how much benefit the union would bring
> as it mostly complicates the struct definition.

An union helps debugging in gdb: you don't have to cast manually to 
unsigned char*/Py_UCS2*/Py_UCS4*.

> Also, common, now simple, checks for "unicode->str == NULL" would look
> more ambiguous with a union ("unicode->str.latin1 == NULL").

We can rename "str" to something else, to "data" for example.

Victor

From victor.stinner at haypocalc.com  Wed Aug 24 10:11:50 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 10:11:50 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E54852E.9000601@scottdial.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net>
	<201108240038.00801.victor.stinner@haypocalc.com>
	<4E54852E.9000601@scottdial.com>
Message-ID: <4E54B246.5020008@haypocalc.com>

Le 24/08/2011 06:59, Scott Dial a ?crit :
> On 8/23/2011 6:38 PM, Victor Stinner wrote:
>> Le mardi 23 ao?t 2011 00:14:40, Antoine Pitrou a ?crit :
>>> - You could try to run stringbench, which can be found at
>>>    http://svn.python.org/projects/sandbox/trunk/stringbench (*)
>>>    and there's iobench (the text mode benchmarks) in the Tools/iobench
>>>    directory.
>>
>> Some raw numbers.
>>
>> stringbench:
>> "147.07 203.07 72.4 TOTAL" for the PEP 393
>> "146.81 140.39 104.6 TOTAL" for default
>> =>  PEP is 45% slower
>
> I ran the same benchmark and couldn't make a distinction in performance
> between them:

Hum, are you sure that you used the PEP 383? Make sure that you are 
using the pep-383 branch! I also started my benchmark on the wrong 
branch :-)

Victor

From victor.stinner at haypocalc.com  Wed Aug 24 10:17:58 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 10:17:58 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP_a28GrYY_=VBSwg031bEwfivJdkxJ4yLTGTyJnGhWjoA234Q@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net>
	<201108240027.37788.victor.stinner@haypocalc.com>
	<CAP_a28GrYY_=VBSwg031bEwfivJdkxJ4yLTGTyJnGhWjoA234Q@mail.gmail.com>
Message-ID: <4E54B3B6.1020205@haypocalc.com>

Le 24/08/2011 04:41, Torsten Becker a ?crit :
> On Tue, Aug 23, 2011 at 18:27, Victor Stinner
> <victor.stinner at haypocalc.com>  wrote:
>> I posted a patch to re-add it:
>> http://bugs.python.org/issue12819#msg142867
>
> Thank you for the patch!  Note that this patch adds the fast path only
> to the helper function which determines the length of the string and
> the maximum character.  The decoding part is still without a fast path
> for ASCII runs.

Ah? If utf8_max_char_size_and_has_errors() returns no error hand 
maxchar=127: memcpy() is used. You mean that memcpy() is too slow? :-)

maxchar = utf8_max_char_size_and_has_errors(s, size, &unicode_size,
                                             &has_errors);
if (has_errors) {
   ...
}
else {
    unicode = (PyUnicodeObject *)PyUnicode_New(unicode_size, maxchar);
    if (!unicode) return NULL;
         /* When the string is ASCII only, just use memcpy and return. */
         if (maxchar < 128) {
             assert(unicode_size == size);
             Py_MEMCPY(PyUnicode_1BYTE_DATA(unicode), s, unicode_size);
             return (PyObject *)unicode;
         }
     ...
}

But yes, my patch only optimize ASCII only strings, not "mostly-ASCII" 
strings (e.g. 100 ASCII + 1 latin1 character). It can be optimized 
later. I didn't benchmark my patch.

Victor

From martin at v.loewis.de  Wed Aug 24 10:18:20 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 24 Aug 2011 10:18:20 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E54AEC8.7040702@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>
	<j31hl7$dp5$1@dough.gmane.org> <4E54AEC8.7040702@g.nevcal.com>
Message-ID: <4E54B3CC.9040900@v.loewis.de>

> So am I correctly reading between the lines when, after reading this
> thread so far, and the complete issue discussion so far, that I see a
> PEP 393 revision or replacement that has the following characteristics:
> 
> 1) Narrow builds are dropped.

PEP 393 already drops narrow builds.

> 2) There are more, or different, internal kinds of strings, which affect
> the processing patterns.

This is the basic idea of PEP 393.

> a) all ASCII
> b) latin-1 (8-bit codepoints, the first 256 Unicode codepoints) This
> kind may not be able to support a "mostly" variation, and may be no more
> efficient than case b).  But it might also be popular in parts of Europe

This two cases are already in PEP 393.

> c) mostly ASCII (utf8) with clever indexing/caching to be efficient
> d) UTF-8 with clever indexing/caching to be efficient

I see neither a need nor a means to consider these.

> e) 16-bit codepoints

These are in PEP 393.

> f) UTF-16 with clever indexing/caching to be efficient

Again, -1.

> g) 32-bit codepoints

This is in PEP 393.

> h) UTF-32

What's that, as opposed to g)?

I'm not open to revise PEP 393 in the direction of adding more
representations.

Regards,
Martin

From turnbull at sk.tsukuba.ac.jp  Wed Aug 24 10:22:37 2011
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Wed, 24 Aug 2011 17:22:37 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j31hlc$dp5$2@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
Message-ID: <87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>

Terry Reedy writes:

 > The current UCS2 Unicode string implementation, by design, quickly gives 
 > WRONG answers for len(), iteration, indexing, and slicing if a string 
 > contains any non-BMP (surrogate pair) Unicode characters. That may have 
 > been excusable when there essentially were no such extended chars, and 
 > the few there were were almost never used.

Well, no, it gives the right answer according to the design.  unicode
objects do not contain character strings.  By design, they contain
code point strings.  Guido has made that absolutely clear on a number
of occasions.  And the reasons have very little to do with lack of
non-BMP characters to trip up the implementation.  Changing those
semantics should have been done before the release of Python 3.

It is not clear to me that it is a good idea to try to decide on "the"
correct implementation of Unicode strings in Python even today.  There
are a number of approaches that I can think of.

1.  The "too bad if you can't take a joke" approach: do nothing and
    recommend UTF-32 to those who want len() to DTRT.
2.  The "slope is slippery" approach: Implement UTF-16 objects as
    built-ins, and then try to fend off requests for correct treatment
    of unnormalized composed characters, normalization, compatibility
    substitutions, bidi, etc etc.
3.  The "are we not hackers?" approach: Implement a transform that
    maps characters that are not represented by a single code point
    into Unicode private space, and then see if anybody really needs
    more than 6400 non-BMP characters.  (Note that this would
    generalize to composed characters that don't have a one-code-point
    NFC form and similar non-standardized cases that nonstandard users
    might want handled.)
4.  The "42" approach: sadly, I can't think deeply enough to explain it.

There are probably others.

It's true that Python is going to need good libraries to provide
correct handling of Unicode strings (as opposed to unicode objects).
But it's not clear to me given the wide variety of implementations I
can imagine that there will be one best implementation, let alone
which ones are good and Pythonic, and which not so.


From victor.stinner at haypocalc.com  Wed Aug 24 10:27:21 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 10:27:21 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP_a28G2tDKdwiDw6cdG-VDsqEGs+-WttgiXSr0iZ7wjmhPs4g@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<201108240046.16058.victor.stinner@haypocalc.com>
	<201108240056.48170.victor.stinner@haypocalc.com>
	<CAP_a28G2tDKdwiDw6cdG-VDsqEGs+-WttgiXSr0iZ7wjmhPs4g@mail.gmail.com>
Message-ID: <4E54B5E9.3070905@haypocalc.com>

Le 24/08/2011 04:56, Torsten Becker a ?crit :
> On Tue, Aug 23, 2011 at 18:56, Victor Stinner
> <victor.stinner at haypocalc.com>  wrote:
>>> kind=0 is used and public, it's PyUnicode_WCHAR_KIND. Is it still
>>> necessary? It looks to be only used in PyUnicode_DecodeUnicodeEscape().
>>
>> If it can be removed, it would be nice to have kind in [0; 2] instead of kind
>> in [1; 2], to be able to have a list (of 3 items) =>  callback or label.
>
> It is also used in PyUnicode_DecodeUTF8Stateful() and there might be
> some cases which I missed converting checks for 0 when I introduced
> the macro.  The question was more if this should be written as 0 or as
> a named constant.  I preferred the named constant for readability.
>
> An alternative would be to have kind values be the same as the number
> of bytes for the string representation so it would be 0 (wstr), 1
> (1-byte), 2 (2-byte), or 4 (4-byte).

Please don't do that: it's more common to need contiguous arrays (for a 
jump table/callback list) than having to know the character size. You 
can use an array giving the character size: CHARACTER_SIZE[kind] which 
is the array {0, 1, 2, 4} (or maybe sizeof(wchar_t) instead of 0 ?).

> I think the value for wstr/uninitialized/reserved should not be
> removed.  The wstr representation is still used in the error case in
> the utf8 decoder because these strings can be resized.

In Python, you can resize an object if it has only one reference. Why is 
it not possible in your branch?

Oh, I missed the UTF-8 decoder because you wrote "kind = 0": please, use 
PyUnicode_WCHAR_KIND instead!

I don't like "reserved" value, especially if its value is 0, the first 
value. See Microsoft file formats: they waste a lot of space because 
most fields are reserved, and 10 years later, these fields are still 
unused. Can't we add the value 4 when we will need a new kind?

> Also having one
> designated value for "uninitialized" limits comparisons in the
> affected functions to the kind value, otherwise they would need to
> check the str field for NULL to determine in which buffer to write a
> character.

I have to read the code more carefully, I don't know this 
"uninitialized" state.

For kind=0: "wstr" means that str is NULL but wstr is set? I didn't 
understand that str can be NULL for an initialized string. I should read 
the PEP again :-)

>> I suppose that compilers prefer a switch with all cases defined, 0 a first item
>> and contiguous values. We may need an enum.
>
> During the Summer of Code, Martin and I did a experiment with GCC and
> it did not seem to produce a jump table as an optimization for three
> cases but generated comparison instructions anyway.

You mean with a switch with a case for each possible value? I don't 
think that GCC knows that all cases are defined if you don't use an enum.

> I am not sure how much we should optimize for potential compiler
 > optimizations here.

Oh, it was just a suggestion. Sure, it's not the best moment to care of 
micro-optimizations.

Victor

From cs at zip.com.au  Wed Aug 24 10:54:48 2011
From: cs at zip.com.au (Cameron Simpson)
Date: Wed, 24 Aug 2011 18:54:48 +1000
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <CADiSq7e=sWZpq3PRB77td4MuJDjUgpkk1DTGhnccAwJ5qVeGvA@mail.gmail.com>
References: <CADiSq7e=sWZpq3PRB77td4MuJDjUgpkk1DTGhnccAwJ5qVeGvA@mail.gmail.com>
Message-ID: <20110824085448.GA10991@cskk.homeip.net>

On 24Aug2011 12:31, Nick Coghlan <ncoghlan at gmail.com> wrote:
| On Wed, Aug 24, 2011 at 5:19 AM, Steven D'Aprano <steve at pearwood.info> wrote:
| > Antoine Pitrou wrote:
| >> When reviewing the PEP 3151 implementation (*), Ezio commented that
| >> "FileSystemError" looks a bit strange and that "FilesystemError" would
| >> be a better spelling. What is your opinion?
| >
| > It's a file system (two words), not filesystem (not in any dictionary or
| > spell checker I've ever used).
| 
| I rarely find spell checkers to be useful sources of data on correct
| spelling of technical jargon (and the computing usage of the term
| 'filesystem' definitely qualifies as jargon).
| 
| > (Nor do we write filingsystem, governmentsystem, politicalsystem or
| > schoolsystem. This is English, not German.)
| 
| Personally, I think 'filesystem' is a portmanteau in the process of
| coming into existence (as evidenced by usage like 'FHS' standing for
| 'Filesystem Hierarchy Standard'). However, the two word form is still
| useful at times, particularly for disambiguation of acronyms (as
| evidenced by usage like 'NFS' and 'GFS' for 'Network File System' and
| 'Google File System').

Funny, I thought NFS stood for Not a File System :-)

| Since I tend to use the one word 'filesystem' form myself (ditto for
| 'filename'), I'm +1 for FilesystemError, but I'm only -0 for
| FileSystemError (so I expect that will be the option chosen, given
| other responses).

I also use "filesystem" as a one word piece of jargon, but I am
persuaded by the language arguments. So I'm +1 for FileSystemError.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Bolts get me through times of no courage better than courage gets me
through times of no bolts!
        - Eric Hirst <eric at u.washington.edu>

From v+python at g.nevcal.com  Wed Aug 24 11:22:58 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 24 Aug 2011 02:22:58 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E54B3CC.9040900@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>
	<j31hl7$dp5$1@dough.gmane.org> <4E54AEC8.7040702@g.nevcal.com>
	<4E54B3CC.9040900@v.loewis.de>
Message-ID: <4E54C2F2.2090606@g.nevcal.com>

On 8/24/2011 1:18 AM, "Martin v. L?wis" wrote:
>> So am I correctly reading between the lines when, after reading this
>> thread so far, and the complete issue discussion so far, that I see a
>> PEP 393 revision or replacement that has the following characteristics:
>>
>> 1) Narrow builds are dropped.
> PEP 393 already drops narrow builds.

I'd forgotten that.

>
>> 2) There are more, or different, internal kinds of strings, which affect
>> the processing patterns.
> This is the basic idea of PEP 393.

Agreed.
>
>> a) all ASCII
>> b) latin-1 (8-bit codepoints, the first 256 Unicode codepoints) This
>> kind may not be able to support a "mostly" variation, and may be no more
>> efficient than case b).  But it might also be popular in parts of Europe
> This two cases are already in PEP 393.
Sure.  Wanted to enumerate all, rather than just add-ons.

>> c) mostly ASCII (utf8) with clever indexing/caching to be efficient
>> d) UTF-8 with clever indexing/caching to be efficient
> I see neither a need nor a means to consider these.

The discussion about "mostly ASCII" strings seems convincing that there 
could be a significant space savings if such were implemented.

>> e) 16-bit codepoints
> These are in PEP 393.
>
>> f) UTF-16 with clever indexing/caching to be efficient
> Again, -1.

This is probably the one I would pick as least likely to be useful if 
the rest were implemented.

>> g) 32-bit codepoints
> This is in PEP 393.
>
>> h) UTF-32
> What's that, as opposed to g)?

g) would permit codes greater than u+10ffff and would permit the illegal 
codepoints and lone surrogates.  h) would be strict Unicode 
conformance.  Sorry that the 4 paragraphs of explanation that you didn't 
quote didn't make that clear.

> I'm not open to revise PEP 393 in the direction of adding more
> representations.
>
It's your PEP.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110824/8200a71c/attachment.html>

From scott+python-dev at scottdial.com  Wed Aug 24 11:25:18 2011
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Wed, 24 Aug 2011 05:25:18 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E54B246.5020008@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net>
	<201108240038.00801.victor.stinner@haypocalc.com>
	<4E54852E.9000601@scottdial.com> <4E54B246.5020008@haypocalc.com>
Message-ID: <4E54C37E.6040305@scottdial.com>

On 8/24/2011 4:11 AM, Victor Stinner wrote:
> Le 24/08/2011 06:59, Scott Dial a ?crit :
>> On 8/23/2011 6:38 PM, Victor Stinner wrote:
>>> Le mardi 23 ao?t 2011 00:14:40, Antoine Pitrou a ?crit :
>>>> - You could try to run stringbench, which can be found at
>>>>    http://svn.python.org/projects/sandbox/trunk/stringbench (*)
>>>>    and there's iobench (the text mode benchmarks) in the Tools/iobench
>>>>    directory.
>>>
>>> Some raw numbers.
>>>
>>> stringbench:
>>> "147.07 203.07 72.4 TOTAL" for the PEP 393
>>> "146.81 140.39 104.6 TOTAL" for default
>>> =>  PEP is 45% slower
>>
>> I ran the same benchmark and couldn't make a distinction in performance
>> between them:
> 
> Hum, are you sure that you used the PEP 383? Make sure that you are
> using the pep-383 branch! I also started my benchmark on the wrong
> branch :-)

You are right. I used the "Get Source" link on bitbucket to save pulling
the whole clone, but the "Get Source" link seems to be whatever branch
has the lastest revision (maybe?) even if you switch branches on the
webpage. To correct my previous post:

cpython.txt
183.26  177.97  103.0   TOTAL
cpython-wide-unicode.txt
181.27  195.58  92.7    TOTAL
pep-393.txt
181.40  270.34  67.1    TOTAL

And,

cpython.txt
real    0m32.493s
cpython-wide-unicode.txt
real    0m33.489s
pep-393.txt
real    0m36.206s

-- 
Scott Dial
scott at scottdial.com

From martin at v.loewis.de  Wed Aug 24 12:04:28 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 24 Aug 2011 12:04:28 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E54B3B6.1020205@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<201108240027.37788.victor.stinner@haypocalc.com>	<CAP_a28GrYY_=VBSwg031bEwfivJdkxJ4yLTGTyJnGhWjoA234Q@mail.gmail.com>
	<4E54B3B6.1020205@haypocalc.com>
Message-ID: <4E54CCAC.5080408@v.loewis.de>

Am 24.08.2011 10:17, schrieb Victor Stinner:
> Le 24/08/2011 04:41, Torsten Becker a ?crit :
>> On Tue, Aug 23, 2011 at 18:27, Victor Stinner
>> <victor.stinner at haypocalc.com>  wrote:
>>> I posted a patch to re-add it:
>>> http://bugs.python.org/issue12819#msg142867
>>
>> Thank you for the patch!  Note that this patch adds the fast path only
>> to the helper function which determines the length of the string and
>> the maximum character.  The decoding part is still without a fast path
>> for ASCII runs.
> 
> Ah? If utf8_max_char_size_and_has_errors() returns no error hand
> maxchar=127: memcpy() is used. You mean that memcpy() is too slow? :-)

No: the pure-ASCII case is already optimized with memcpy. It's the
mostly-ASCII case that is not optimized anymore in this PEP 393
implementation (the one with "ASCII runs" instead of "pure ASCII").

Regards,
Martin

From tjreedy at udel.edu  Wed Aug 24 12:06:39 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 24 Aug 2011 06:06:39 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <j32igg$hd7$1@dough.gmane.org>

On 8/24/2011 4:22 AM, Stephen J. Turnbull wrote:
> Terry Reedy writes:
>
>   >  The current UCS2 Unicode string implementation, by design, quickly gives
>   >  WRONG answers for len(), iteration, indexing, and slicing if a string
>   >  contains any non-BMP (surrogate pair) Unicode characters. That may have
>   >  been excusable when there essentially were no such extended chars, and
>   >  the few there were were almost never used.
>
> Well, no, it gives the right answer according to the design.  unicode
> objects do not contain character strings.

Excuse me for believing the fine 3.2 manual that says
"Strings contain Unicode characters." (And to a naive reader, that 
implies that string iteration and indexing should produce Unicode 
characters.)

>  By design, they contain code point strings.

For the purpose of my sentence, the same thing in that code points 
correspond to characters, where 'character' includes ascii control 
'characters' and unicode analogs. The problem is that on narrow builds 
strings are NOT code point sequences. They are 2-byte code *unit* 
sequences. Single non-BMP code points are seen as 2 code units and hence 
given a length of 2, not 1. Strings iterate, index, and slice by 2-byte 
code units, not by code points.

Python floats try to follow the IEEE standard as interpreted for Python 
(Python has its software exceptions rather than signalling versus 
non-signalling hardware signals). Python decimals slavishly follow the 
IEEE decimal standard. Python narrow build unicode breaks the standard 
for non-BMP code points and cosequently, breaks the re module even when 
it works for wide builds. As sys.maxunicode more or less says, only the 
BMP subset is fully supported. Any narrow build string with even 1 
non-BMP char violates the standard.

> Guido has made that absolutely clear on a number
> of occasions.

It is not clear what you mean, but recently on python-ideas he has 
reiterated that he intends bytes and strings to be conceptually 
different. Bytes are computer-oriented binary arrays; strings are 
supposedly human-oriented character/codepoint arrays. Except they are 
not for non-BMP characters/codepoints. Narrow build unicode is 
effectively an array of two-byte binary units.

 > And the reasons have very little to do with lack of
> non-BMP characters to trip up the implementation.  Changing those
> semantics should have been done before the release of Python 3.

The documentation was changed at least a bit for 3.0, and anyway, as 
indicated above, it is easy (especially for new users) to read the docs 
in a way that makes the current behavior buggy. I agree that the 
implementation should have been changed already.

Currently, the meaning of Python code differs on narrow versus wide 
build, and in a way that few users would expect or want. PEP 393 
abolishes narrow builds as we now know them and changes semantics. I was 
answering a complaint about that change. If you do not like the PEP, fine.

My separate proposal in my other post is for an alternative 
implementation but with, I presume, pretty the same visible changes.

> It is not clear to me that it is a good idea to try to decide on "the"
> correct implementation of Unicode strings in Python even today.

If the implementation is invisible to the Python user, as I believe it 
should be without specially introspection, and mostly invisible in the 
C-API except for those who intentionally poke into the details, then the 
implementation can be changed as the consensus on best implementation 
changes.

> There are a number of approaches that I can think of.
>
> 1.  The "too bad if you can't take a joke" approach: do nothing and
>      recommend UTF-32 to those who want len() to DTRT.
> 2.  The "slope is slippery" approach: Implement UTF-16 objects as
>      built-ins, and then try to fend off requests for correct treatment
>      of unnormalized composed characters, normalization, compatibility
>      substitutions, bidi, etc etc.
> 3.  The "are we not hackers?" approach: Implement a transform that
>      maps characters that are not represented by a single code point
>      into Unicode private space, and then see if anybody really needs
>      more than 6400 non-BMP characters.  (Note that this would
>      generalize to composed characters that don't have a one-code-point
>      NFC form and similar non-standardized cases that nonstandard users
>      might want handled.)
> 4.  The "42" approach: sadly, I can't think deeply enough to explain it.
>
> There are probably others.
>
> It's true that Python is going to need good libraries to provide
> correct handling of Unicode strings (as opposed to unicode objects).

Given that 3.0 unicode (string) objects are defined as Unicode character 
strings, I do not see the opposition.

> But it's not clear to me given the wide variety of implementations I
> can imagine that there will be one best implementation, let alone
> which ones are good and Pythonic, and which not so.

-- 
Terry Jan Reedy


From martin at v.loewis.de  Wed Aug 24 12:27:12 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 24 Aug 2011 12:27:12 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E54B5E9.3070905@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<201108240046.16058.victor.stinner@haypocalc.com>	<201108240056.48170.victor.stinner@haypocalc.com>	<CAP_a28G2tDKdwiDw6cdG-VDsqEGs+-WttgiXSr0iZ7wjmhPs4g@mail.gmail.com>
	<4E54B5E9.3070905@haypocalc.com>
Message-ID: <4E54D200.5060200@v.loewis.de>

>> I think the value for wstr/uninitialized/reserved should not be
>> removed.  The wstr representation is still used in the error case in
>> the utf8 decoder because these strings can be resized.
> 
> In Python, you can resize an object if it has only one reference. Why is
> it not possible in your branch?

If you use the new API to create a string (knowing how many characters
you have, and what the maximum character is), the Unicode object is
allocated as a single memory block. It can then not be resized.

If you allocate in the old style (i.e. giving NULL as the data pointer,
and a length), it still creates a second memory blocks for the
Py_UNICODE[], and allows resizing. When you then call PyUnicode_Ready,
the object gets frozen.

> I don't like "reserved" value, especially if its value is 0, the first
> value. See Microsoft file formats: they waste a lot of space because
> most fields are reserved, and 10 years later, these fields are still
> unused. Can't we add the value 4 when we will need a new kind?

I don't get the analogy, or the relationship with the value 0.
"Reserving" the value 0 is entirely different from reserving a field.
In a field, it wastes space; the value 0 however fills the same space
as the values 1,2,3. It's just used to denote an object where the
str pointer is not filled out yet, i.e. which can still be resized.

>>> I suppose that compilers prefer a switch with all cases defined, 0 a
>>> first item
>>> and contiguous values. We may need an enum.
>>
>> During the Summer of Code, Martin and I did a experiment with GCC and
>> it did not seem to produce a jump table as an optimization for three
>> cases but generated comparison instructions anyway.
> 
> You mean with a switch with a case for each possible value? 

No, a computed jump on the assembler level. Consider this code

enum kind {null,ucs1,ucs2,ucs4};

void foo(void *d, enum kind k, int i, int v)
{
    switch(k){
        case ucs1:((unsigned char*)d)[i] = v;break;
        case ucs2:((unsigned short*)d)[i] = v;break;
        case ucs4:((unsigned int*)d)[i] = v;break;
    }
}

gcc 4.6.1 compiles this to

foo:
.LFB0:
        .cfi_startproc
        cmpl    $2, %esi
        je      .L4
        cmpl    $3, %esi
        je      .L5
        cmpl    $1, %esi
        je      .L7
        .p2align 4,,5
        rep
        ret
        .p2align 4,,10
        .p2align 3
.L7:
        movslq  %edx, %rdx
        movb    %cl, (%rdi,%rdx)
        ret
        .p2align 4,,10
        .p2align 3
.L5:
        movslq  %edx, %rdx
        movl    %ecx, (%rdi,%rdx,4)
        ret
        .p2align 4,,10
        .p2align 3
.L4:
        movslq  %edx, %rdx
        movw    %cx, (%rdi,%rdx,2)
        ret
        .cfi_endproc

As you can see, it generates a chain of compares, rather than an
indirect jump through a jump table.

Regards,
Martin

From eliben at gmail.com  Wed Aug 24 13:09:46 2011
From: eliben at gmail.com (Eli Bendersky)
Date: Wed, 24 Aug 2011 14:09:46 +0300
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <20110823202004.0bb63490@pitrou.net>
References: <20110823202004.0bb63490@pitrou.net>
Message-ID: <CAF-Rda8fmf02HsPdhRsbk+APnGXh=5cX_zybn1G39hgRpCP14Q@mail.gmail.com>

> When reviewing the PEP 3151 implementation (*), Ezio commented that
> "FileSystemError" looks a bit strange and that "FilesystemError" would
> be a better spelling. What is your opinion?
>
> (*) http://bugs.python.org/issue12555
>

+1 for FileSystemError

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110824/de90281a/attachment.html>

From ncoghlan at gmail.com  Wed Aug 24 14:50:34 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 24 Aug 2011 22:50:34 +1000
Subject: [Python-Dev] sendmsg/recvmsg on Mac OS X
Message-ID: <CADiSq7ehQpDPHztVRX=xsj7_k0mE4ZEBekVwX9iks6c-eezpmg@mail.gmail.com>

The buildbots are complaining about some of tests for the new
socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that
provide CMSG_LEN.

http://www.python.org/dev/buildbot/all/builders/AMD64%20Snow%20Leopard%202%203.x/builds/831/steps/test/logs/stdio

Before I start trying to figure this out without a Mac to test on, are
any of the devs that actually use Mac OS X seeing the failure in their
local builds?

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Wed Aug 24 15:06:23 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 24 Aug 2011 23:06:23 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j31hl7$dp5$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de> <j31hl7$dp5$1@dough.gmane.org>
Message-ID: <CADiSq7eGo2C-GTHcSbJWwtu3O19PkdsFNpMfY29u4g+e==0=PA@mail.gmail.com>

On Wed, Aug 24, 2011 at 10:46 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> In utf16.py, attached to http://bugs.python.org/issue12729
> I propose for consideration a prototype of different solution to the 'mostly
> BMP chars, few non-BMP chars' case. Rather than expand every character from
> 2 bytes to 4, attach an array cpdex of character (ie code point, not code
> unit) indexes. Then for indexing and slicing, the correction is simple,
> simpler than I first expected:
> ?code-unit-index = char-index + bisect.bisect_left(cpdex, char_index)
> where code-unit-index is the adjusted index into the full underlying
> double-byte array. This adds a time penalty of log2(len(cpdex)), but avoids
> most of the space penalty and the consequent time penalty of moving more
> bytes around and increasing cache misses.

Interesting idea, but putting on my C programmer hat, I say -1.

Non-uniform cell size = not a C array = standard C array manipulation
idioms don't work = pain (no matter how simple the index correction
happens to be).

The nice thing about PEP 383 is that it gives us the smallest storage
array that is both an ordinary C array and has sufficiently large
individual elements to handle every character in the string.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From neologix at free.fr  Wed Aug 24 15:31:50 2011
From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Wed, 24 Aug 2011 15:31:50 +0200
Subject: [Python-Dev] sendmsg/recvmsg on Mac OS X
In-Reply-To: <CADiSq7ehQpDPHztVRX=xsj7_k0mE4ZEBekVwX9iks6c-eezpmg@mail.gmail.com>
References: <CADiSq7ehQpDPHztVRX=xsj7_k0mE4ZEBekVwX9iks6c-eezpmg@mail.gmail.com>
Message-ID: <CAH_1eM1k6rsqQfNF3T1Sn0RcKwh_B_vDOzw0cpjMe2Z1YozjrA@mail.gmail.com>

> The buildbots are complaining about some of tests for the new
> socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that
> provide CMSG_LEN.

Looks like kernel bugs:
http://developer.apple.com/library/mac/#qa/qa1541/_index.html

"""
Yes. Mac OS X 10.5 fixes a number of kernel bugs related to descriptor passing
[...]
Avoid passing two or more descriptors back-to-back.
"""

We should probably add
@requires_mac_ver(10, 5)

for testFDPassSeparate and testFDPassSeparateMinSpace.

As for InterruptedSendTimeoutTest and testInterruptedSendmsgTimeout,
it also looks like a kernel bug: the syscall should fail with EINTR
once the socket buffer is full. I guess one should skip those on OS-X.

From stefan_ml at behnel.de  Wed Aug 24 18:00:42 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 24 Aug 2011 18:00:42 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CADiSq7eGo2C-GTHcSbJWwtu3O19PkdsFNpMfY29u4g+e==0=PA@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>
	<j31hl7$dp5$1@dough.gmane.org>
	<CADiSq7eGo2C-GTHcSbJWwtu3O19PkdsFNpMfY29u4g+e==0=PA@mail.gmail.com>
Message-ID: <j3377b$f21$1@dough.gmane.org>

Nick Coghlan, 24.08.2011 15:06:
> On Wed, Aug 24, 2011 at 10:46 AM, Terry Reedy wrote:
>> In utf16.py, attached to http://bugs.python.org/issue12729
>> I propose for consideration a prototype of different solution to the 'mostly
>> BMP chars, few non-BMP chars' case. Rather than expand every character from
>> 2 bytes to 4, attach an array cpdex of character (ie code point, not code
>> unit) indexes. Then for indexing and slicing, the correction is simple,
>> simpler than I first expected:
>>   code-unit-index = char-index + bisect.bisect_left(cpdex, char_index)
>> where code-unit-index is the adjusted index into the full underlying
>> double-byte array. This adds a time penalty of log2(len(cpdex)), but avoids
>> most of the space penalty and the consequent time penalty of moving more
>> bytes around and increasing cache misses.
>
> Interesting idea, but putting on my C programmer hat, I say -1.
>
> Non-uniform cell size = not a C array = standard C array manipulation
> idioms don't work = pain (no matter how simple the index correction
> happens to be).
>
> The nice thing about PEP 383 is that it gives us the smallest storage
> array that is both an ordinary C array and has sufficiently large
> individual elements to handle every character in the string.

+1

Stefan


From stephen at xemacs.org  Wed Aug 24 18:34:17 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 25 Aug 2011 01:34:17 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j32igg$hd7$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
Message-ID: <87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>

Terry Reedy writes:

 > Excuse me for believing the fine 3.2 manual that says
 > "Strings contain Unicode characters."

The manual is wrong, then, subject to a pronouncement to the contrary,
of course.  I was on your side of the fence when this was discussed,
pre-release.  I was wrong then.  My bet is that we are still wrong,
now.

 > For the purpose of my sentence, the same thing in that code points 
 > correspond to characters,

Not in Unicode, they do not.  By definition, a small number of code
points (eg, U+FFFF) *never* did and *never* will correspond to
characters.  Since about Unicode 3.0, the same is true of surrogate
code points.  Some restrictions have been placed on what can be done
with composed characters, so even with the PEP (which gives us code
point arrays) we do not really get arrays of Unicode characters that
fully conform to the model.

 > strings are NOT code point sequences. They are 2-byte code *unit* 
 > sequences.

I stand corrected on Unicode terminology.  "Code unit" is what I meant,
and what I understand Guido to have defined unicode objects as arrays of.

 > Any narrow build string with even 1 non-BMP char violates the
 > standard.

Yup.  That's by design.

 > > Guido has made that absolutely clear on a number
 > > of occasions.
 > 
 > It is not clear what you mean, but recently on python-ideas he has 
 > reiterated that he intends bytes and strings to be conceptually 
 > different.

Sure.  Nevertheless, practicality beat purity long ago, and that
decision has never been rescinded AFAIK.

 > Bytes are computer-oriented binary arrays; strings are 
 > supposedly human-oriented character/codepoint arrays.

And indeed they are, in UCS-4 builds.  But they are *not* in Unicode!
Unicode violates the array model.  Specifically, in handling composing
characters, and in bidi, where arbitrary slicing of direction control
characters will result in garbled display.

The thing is, that 90% of applications are not really going to care
about full conformance to the Unicode standard.  Of the remaining 10%,
90% are not going to need both huge strings *and* ABI interoperability
with C modules compiled for UCS-2, so UCS-4 is satisfactory.  Of the
remaining 1% of all applications, those that deal with huge strings
*and* need full Unicode conformance, well, they need efficiency too
almost by definition.  They probably are going to want something more
efficient than either the UTF-16 or the UTF-32 representation can
provide, and therefore will need trickier, possibly app-specific,
algorithms that probably do not belong in an initial implementation.

 >  > And the reasons have very little to do with lack of
 > > non-BMP characters to trip up the implementation.  Changing those
 > > semantics should have been done before the release of Python 3.
 > 
 > The documentation was changed at least a bit for 3.0, and anyway, as 
 > indicated above, it is easy (especially for new users) to read the docs 
 > in a way that makes the current behavior buggy. I agree that the 
 > implementation should have been changed already.

I don't.  I suspect Guido does not, even today.

 > Currently, the meaning of Python code differs on narrow versus wide
 > build, and in a way that few users would expect or want.

Let them become developers, then, and show us how to do it better.

 > PEP 393 abolishes narrow builds as we now know them and changes
 > semantics. I was answering a complaint about that change. If you do
 > not like the PEP, fine.

No, I do like the PEP.  However, it is only a step, a rather
conservative one in some ways, toward conformance to the Unicode
character model.  In particular, it does nothing to resolve the fact
that len() will give different answers for character count depending
on normalization, and that slicing and indexing will allow you to cut
characters in half (even in NFC, since not all composed characters
have fully composed forms).

 > > It is not clear to me that it is a good idea to try to decide on "the"
 > > correct implementation of Unicode strings in Python even today.
 > 
 > If the implementation is invisible to the Python user, as I believe it 
 > should be without specially introspection, and mostly invisible in the 
 > C-API except for those who intentionally poke into the details, then the 
 > implementation can be changed as the consensus on best implementation 
 > changes.

A naive implementation of UTF-16 will be quite visible in terms of
performance, I suspect, and performance-oriented applications will "go
behind the API's back" to get it.  We're already seeing that in the
people who insist that bytes are characters too, and string APIs
should work on them just as they do on (Unicode) strings.

 > > It's true that Python is going to need good libraries to provide
 > > correct handling of Unicode strings (as opposed to unicode objects).
 > 
 > Given that 3.0 unicode (string) objects are defined as Unicode character 
 > strings, I do not see the opposition.

I think they're not, I think they're defined as Unicode code unit
arrays, and that the documentation is in error.  If the documentation
is correct, then Python 3.0 was released about 5 years too early,
because correct handling of those objects as arrays of Unicode
characters has never been implemented or even discussed in terms of
proposed code that I know of.

Martin has long claimed that the fact that I/O is done in terms of
UTF-16 means that the internal representation is UTF-16, so I could be
wrong.  But when issues of slicing, len() values and so on have come
up in the past, Guido has always said "no, there will be no change in
semantics of builtins here".


From solipsis at pitrou.net  Wed Aug 24 18:38:46 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 24 Aug 2011 18:38:46 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <20110824183846.2b392f77@pitrou.net>

On Thu, 25 Aug 2011 01:34:17 +0900
"Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> 
> Martin has long claimed that the fact that I/O is done in terms of
> UTF-16 means that the internal representation is UTF-16

Which I/O?




From solipsis at pitrou.net  Wed Aug 24 18:49:27 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 24 Aug 2011 18:49:27 +0200
Subject: [Python-Dev] sendmsg/recvmsg on Mac OS X
References: <CADiSq7ehQpDPHztVRX=xsj7_k0mE4ZEBekVwX9iks6c-eezpmg@mail.gmail.com>
	<CAH_1eM1k6rsqQfNF3T1Sn0RcKwh_B_vDOzw0cpjMe2Z1YozjrA@mail.gmail.com>
Message-ID: <20110824184927.2697b0af@pitrou.net>

On Wed, 24 Aug 2011 15:31:50 +0200
Charles-Fran?ois Natali <neologix at free.fr> wrote:
> > The buildbots are complaining about some of tests for the new
> > socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that
> > provide CMSG_LEN.
> 
> Looks like kernel bugs:
> http://developer.apple.com/library/mac/#qa/qa1541/_index.html
> 
> """
> Yes. Mac OS X 10.5 fixes a number of kernel bugs related to descriptor passing
> [...]
> Avoid passing two or more descriptors back-to-back.
> """

But Snow Leopard, where these failures occur, is OS X 10.6.

Antoine.



From riscutiavlad at gmail.com  Wed Aug 24 18:57:56 2011
From: riscutiavlad at gmail.com (Vlad Riscutia)
Date: Wed, 24 Aug 2011 09:57:56 -0700
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <CAF-Rda8fmf02HsPdhRsbk+APnGXh=5cX_zybn1G39hgRpCP14Q@mail.gmail.com>
References: <20110823202004.0bb63490@pitrou.net>
	<CAF-Rda8fmf02HsPdhRsbk+APnGXh=5cX_zybn1G39hgRpCP14Q@mail.gmail.com>
Message-ID: <CAJ-9HZ2OezdrySeUzbub1Ohy4XPb9SGOVKQuAtv_sng=sU2ciw@mail.gmail.com>

+1 for FileSystemError. I see myself misspelling it as FileSystemError if we
go with alternate spelling. I'll probably won't be the only one.

Thank you,
Vlad

On Wed, Aug 24, 2011 at 4:09 AM, Eli Bendersky <eliben at gmail.com> wrote:

>
> When reviewing the PEP 3151 implementation (*), Ezio commented that
>> "FileSystemError" looks a bit strange and that "FilesystemError" would
>> be a better spelling. What is your opinion?
>>
>> (*) http://bugs.python.org/issue12555
>>
>
> +1 for FileSystemError
>
> Eli
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/riscutiavlad%40gmail.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110824/a888a152/attachment.html>

From stephen at xemacs.org  Wed Aug 24 19:15:48 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 25 Aug 2011 02:15:48 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110824183846.2b392f77@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110824183846.2b392f77@pitrou.net>
Message-ID: <87mxeyzq63.fsf@uwakimon.sk.tsukuba.ac.jp>

Antoine Pitrou writes:
 > On Thu, 25 Aug 2011 01:34:17 +0900
 > "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
 > > 
 > > Martin has long claimed that the fact that I/O is done in terms of
 > > UTF-16 means that the internal representation is UTF-16
 > 
 > Which I/O?

Eg, display of characters in the interpreter.

From solipsis at pitrou.net  Wed Aug 24 19:16:29 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 24 Aug 2011 19:16:29 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87mxeyzq63.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110824183846.2b392f77@pitrou.net>
	<87mxeyzq63.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <1314206189.3549.2.camel@localhost.localdomain>

Le jeudi 25 ao?t 2011 ? 02:15 +0900, Stephen J. Turnbull a ?crit :
> Antoine Pitrou writes:
>  > On Thu, 25 Aug 2011 01:34:17 +0900
>  > "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
>  > > 
>  > > Martin has long claimed that the fact that I/O is done in terms of
>  > > UTF-16 means that the internal representation is UTF-16
>  > 
>  > Which I/O?
> 
> Eg, display of characters in the interpreter.

I don't know why you say it's "done in terms of UTF-16", then. Unicode
strings are simply encoded to whatever character set is detected as the
terminal's character set.

Regards

Antoine.



From victor.stinner at haypocalc.com  Wed Aug 24 19:45:27 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 19:45:27 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j31hlc$dp5$2@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
Message-ID: <4E5538B7.8010709@haypocalc.com>

Le 24/08/2011 02:46, Terry Reedy a ?crit :
> On 8/23/2011 9:21 AM, Victor Stinner wrote:
>> Le 23/08/2011 15:06, "Martin v. L?wis" a ?crit :
>>> Well, things have to be done in order:
>>> 1. the PEP needs to be approved
>>> 2. the performance bottlenecks need to be identified
>>> 3. optimizations should be applied.
>>
>> I would not vote for the PEP if it slows down Python, especially if it's
>> much slower. But Torsten says that it speeds up Python, which is
>> surprising. I have to do my own benchmarks :-)
>
> The current UCS2 Unicode string implementation, by design, quickly gives
> WRONG answers for len(), iteration, indexing, and slicing if a string
> contains any non-BMP (surrogate pair) Unicode characters. That may have
> been excusable when there essentially were no such extended chars, and
> the few there were were almost never used. But now there are many more,
> with more being added to each Unicode edition. They include cursive Math
> letters that are used in English documents today. The problem will
> slowly get worse and Python, at least on Windows, will become a language
> to avoid for dependable Unicode document processing. 3.x needs a proper
> Unicode implementation that works for all strings on all builds.

I don't think that using UTF-16 with surrogate pairs is really a big 
problem. A lot of work has been done to hide this. For example, 
repr(chr(0x10ffff)) now displays '\U0010ffff' instead of two characters. 
Ezio fixed recently str.is*() methods in Python 3.2+.

For len(str): its a known problem, but if you really care of the number 
of *character* and not the number of UTF-16 units, it's easy to 
implement your own character_length() function. len(str) gives the 
UTF-16 units instead of the number of character for a simple reason: 
it's faster: O(1), whereas character_length() is O(n).

> utf16.py, attached to http://bugs.python.org/issue12729
> prototypes a different solution than the PEP for the above problems for
> the 'mostly BMP' case. I will discuss it in a different post.

Yeah, you can workaround UTF-16 limits using O(n) algorithms.

PEP-393 provides support of the full Unicode charset (U+0000-U+10FFFF) 
an all platforms with a small memory footprint and only O(1) functions.

Note: Java and the Qt library use also UTF-16 strings and have exactly 
the same "limitations" for str[n] and len(str).

Victor

From martin at v.loewis.de  Wed Aug 24 19:50:13 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 24 Aug 2011 19:50:13 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E5539D5.60500@v.loewis.de>

>  > PEP 393 abolishes narrow builds as we now know them and changes
>  > semantics. I was answering a complaint about that change. If you do
>  > not like the PEP, fine.
> 
> No, I do like the PEP.  However, it is only a step, a rather
> conservative one in some ways, toward conformance to the Unicode
> character model.

I'd like to point out that the improved compatibility is only a side
effect, not the primary objective of the PEP. The primary objective
is the reduction in memory usage. (any changes in runtime are also
side effects, and it's not really clear yet whether you get speedups
or slowdowns on average, or no effect).

>  > Given that 3.0 unicode (string) objects are defined as Unicode character 
>  > strings, I do not see the opposition.
> 
> I think they're not, I think they're defined as Unicode code unit
> arrays, and that the documentation is in error.

That's just a description of the implementation, and not part of the
language, though. My understanding is that the "abstract Python language
definition" considers this aspect implementation-defined: PyPy,
Jython, IronPython etc. would be free to do things differently
(and I understand that there are plans to do PEP-393 style Unicode
 objects in PyPy).

> Martin has long claimed that the fact that I/O is done in terms of
> UTF-16 means that the internal representation is UTF-16, so I could be
> wrong.  But when issues of slicing, len() values and so on have come
> up in the past, Guido has always said "no, there will be no change in
> semantics of builtins here".

Not with these words, though. As I recall, it's rather like (still
with different words) "len() will stay O(1) forever, regardless of
any perceived incorrectness of this choice". An attempt to change
the builtins to introduce higher complexity for the sake of correctness
is what he rejects. I think PEP 393 balances this well, keeping
the O(1) operations in that complexity, while improving the cross-
platform "correctness" of these functions.

Regards,
Martin

From martin at v.loewis.de  Wed Aug 24 19:54:06 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 24 Aug 2011 19:54:06 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <1314206189.3549.2.camel@localhost.localdomain>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>	<20110824183846.2b392f77@pitrou.net>	<87mxeyzq63.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314206189.3549.2.camel@localhost.localdomain>
Message-ID: <4E553ABE.7@v.loewis.de>

>> Eg, display of characters in the interpreter.
> 
> I don't know why you say it's "done in terms of UTF-16", then. Unicode
> strings are simply encoded to whatever character set is detected as the
> terminal's character set.

I think what he means (and what I meant when I said something similar):
I/O will consider surrogate pairs in the representation when converting
to the output encoding. This is actually relevant only for UTF-8 (I
think), which converts surrogate pairs "correctly". This can be taken
as a proof that Python 3.2 is "UTF-16 aware" (in some places, but not in
others).

With Python's I/O architecture, it is of course not *actually* the I/O
which considers UTF-16, but the codec.

Regards,
Martin

From victor.stinner at haypocalc.com  Wed Aug 24 20:00:45 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 20:00:45 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E54C2F2.2090606@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<j31hl7$dp5$1@dough.gmane.org>
	<4E54AEC8.7040702@g.nevcal.com>	<4E54B3CC.9040900@v.loewis.de>
	<4E54C2F2.2090606@g.nevcal.com>
Message-ID: <4E553C4D.4060104@haypocalc.com>

Le 24/08/2011 11:22, Glenn Linderman a ?crit :
>>> c) mostly ASCII (utf8) with clever indexing/caching to be efficient
>>> d) UTF-8 with clever indexing/caching to be efficient
>> I see neither a need nor a means to consider these.
>
> The discussion about "mostly ASCII" strings seems convincing that there
> could be a significant space savings if such were implemented.

Antoine's optimization in the UTF-8 decoder has been removed. It doesn't 
change the memory footprint, it is just slower to create the Unicode object.

When you decode an UTF-8 string:

  - "abc" string uses "latin1" (8 bits) units
  - "a?" string uses "latin1" (8 bits) units <= cool!
  - "a?" string uses UCS2 (16 bits) units
  - "a\U0010FFFF" string uses UCS4 (32 bits) units

Victor

From martin at v.loewis.de  Wed Aug 24 20:15:24 2011
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 24 Aug 2011 20:15:24 +0200
Subject: [Python-Dev] PEP 393 review
Message-ID: <4E553FBC.7080501@v.loewis.de>

Guido has agreed to eventually pronounce on PEP 393. Before that can
happen, I'd like to collect feedback on it. There have been a number
of voice supporting the PEP in principle, so I'm now interested in
comments in the following areas:

- principle objection. I'll list them in the PEP.
- issues to be considered (unclarities, bugs, limitations, ...)
- conditions you would like to pose on the implementation before
  acceptance. I'll see which of these can be resolved, and list
  the ones that remain open.

Regards,
Martin

From solipsis at pitrou.net  Wed Aug 24 20:32:28 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 24 Aug 2011 20:32:28 +0200
Subject: [Python-Dev] PEP 393 review
References: <4E553FBC.7080501@v.loewis.de>
Message-ID: <20110824203228.3e00874d@pitrou.net>

On Wed, 24 Aug 2011 20:15:24 +0200
"Martin v. L?wis" <martin at v.loewis.de> wrote:
> - issues to be considered (unclarities, bugs, limitations, ...)

With this PEP, the unicode object overhead grows to 10 pointer-sized
words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine.
Does it have any adverse effects?

Are there any plans to make instantiation of small strings fast enough?
Or is it already as fast as it should be?

When interfacing with the Win32 "wide" APIs, what is the recommended
way to get the required LPCWSTR?

Will the format codes returning a Py_UNICODE pointer with
PyArg_ParseTuple be deprecated?

Do you think the wstr representation could be removed in some future
version of Python?

Is PyUnicode_Ready() necessary for all unicode objects, or only those
allocated through the legacy API?

?The Py_Unicode representation is not instantaneously available?: you
mean the Py_UNICODE representation?

> - conditions you would like to pose on the implementation before
>   acceptance. I'll see which of these can be resolved, and list
>   the ones that remain open.

That it doesn't significantly slow down benchmarks such as stringbench
and iobench.

Regards

Antoine.



From nad at acm.org  Wed Aug 24 20:37:20 2011
From: nad at acm.org (Ned Deily)
Date: Wed, 24 Aug 2011 11:37:20 -0700
Subject: [Python-Dev] sendmsg/recvmsg on Mac OS X
References: <CADiSq7ehQpDPHztVRX=xsj7_k0mE4ZEBekVwX9iks6c-eezpmg@mail.gmail.com>
	<CAH_1eM1k6rsqQfNF3T1Sn0RcKwh_B_vDOzw0cpjMe2Z1YozjrA@mail.gmail.com>
	<20110824184927.2697b0af@pitrou.net>
Message-ID: <nad-0A7792.11372024082011@news.gmane.org>

In article <20110824184927.2697b0af at pitrou.net>,
 Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Wed, 24 Aug 2011 15:31:50 +0200
> Charles-Fran?ois Natali <neologix at free.fr> wrote:
> > > The buildbots are complaining about some of tests for the new
> > > socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that
> > > provide CMSG_LEN.
> > 
> > Looks like kernel bugs:
> > http://developer.apple.com/library/mac/#qa/qa1541/_index.html
> > 
> > """
> > Yes. Mac OS X 10.5 fixes a number of kernel bugs related to descriptor 
> > passing
> > [...]
> > Avoid passing two or more descriptors back-to-back.
> > """
> 
> But Snow Leopard, where these failures occur, is OS X 10.6.

But chances are the build is using the default 10.4 ABI.  Adding 
MACOSX_DEPLOYMENT_TARGET=10.6 as an env variable to ./configure may fix 
it.  There is an open issue to change configure to use better defaults 
for this.  (I'm right in the middle of reconfiguring my development 
systems so I can't test it myself immediately but I'll report back 
shortly.)

-- 
 Ned Deily,
 nad at acm.org


From tjreedy at udel.edu  Wed Aug 24 20:46:21 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 24 Aug 2011 14:46:21 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5539D5.60500@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5539D5.60500@v.loewis.de>
Message-ID: <j33gut$k6h$1@dough.gmane.org>

On 8/24/2011 1:50 PM, "Martin v. L?wis" wrote:

> I'd like to point out that the improved compatibility is only a side
> effect, not the primary objective of the PEP.

Then why does the Rationale start with "on systems only supporting 
UTF-16, users complain that non-BMP characters are not properly supported."?

A Windows user can only solve this problem by switching to *nix.

> The primary objective is the reduction in memory usage.

On average (perhaps). As I understand the PEP, for some strings, Windows 
users will see a doubling of memory usage. Statistically, that doubling 
is probably more likely in longer texts. Ascii-only Python code and 
other limited-to-ascii text will benefit. Typical English business 
documents will see no change as they often have proper non-ascii quotes 
and occasional accented characters, trademark symbols, and other things.

I think you have the objectives backwards. Adding memory is a lot easier 
than switching OSes.

-- 
Terry Jan Reedy



From solipsis at pitrou.net  Wed Aug 24 20:50:47 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 24 Aug 2011 20:50:47 +0200
Subject: [Python-Dev] sendmsg/recvmsg on Mac OS X
References: <CADiSq7ehQpDPHztVRX=xsj7_k0mE4ZEBekVwX9iks6c-eezpmg@mail.gmail.com>
	<CAH_1eM1k6rsqQfNF3T1Sn0RcKwh_B_vDOzw0cpjMe2Z1YozjrA@mail.gmail.com>
	<20110824184927.2697b0af@pitrou.net>
	<nad-0A7792.11372024082011@news.gmane.org>
Message-ID: <20110824205047.6be49525@pitrou.net>

On Wed, 24 Aug 2011 11:37:20 -0700
Ned Deily <nad at acm.org> wrote:

> In article <20110824184927.2697b0af at pitrou.net>,
>  Antoine Pitrou <solipsis at pitrou.net> wrote:
> > On Wed, 24 Aug 2011 15:31:50 +0200
> > Charles-Fran?ois Natali <neologix at free.fr> wrote:
> > > > The buildbots are complaining about some of tests for the new
> > > > socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that
> > > > provide CMSG_LEN.
> > > 
> > > Looks like kernel bugs:
> > > http://developer.apple.com/library/mac/#qa/qa1541/_index.html
> > > 
> > > """
> > > Yes. Mac OS X 10.5 fixes a number of kernel bugs related to descriptor 
> > > passing
> > > [...]
> > > Avoid passing two or more descriptors back-to-back.
> > > """
> > 
> > But Snow Leopard, where these failures occur, is OS X 10.6.
> 
> But chances are the build is using the default 10.4 ABI.  Adding 
> MACOSX_DEPLOYMENT_TARGET=10.6 as an env variable to ./configure may fix 
> it.

Does the ABI affect kernel bugs?

Regards

Antoine.



From v+python at g.nevcal.com  Wed Aug 24 20:52:51 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 24 Aug 2011 11:52:51 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j3377b$f21$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>
	<j31hl7$dp5$1@dough.gmane.org>
	<CADiSq7eGo2C-GTHcSbJWwtu3O19PkdsFNpMfY29u4g+e==0=PA@mail.gmail.com>
	<j3377b$f21$1@dough.gmane.org>
Message-ID: <4E554883.5020908@g.nevcal.com>

On 8/24/2011 9:00 AM, Stefan Behnel wrote:
> Nick Coghlan, 24.08.2011 15:06:
>> On Wed, Aug 24, 2011 at 10:46 AM, Terry Reedy wrote:
>>> In utf16.py, attached to http://bugs.python.org/issue12729
>>> I propose for consideration a prototype of different solution to the 
>>> 'mostly
>>> BMP chars, few non-BMP chars' case. Rather than expand every 
>>> character from
>>> 2 bytes to 4, attach an array cpdex of character (ie code point, not 
>>> code
>>> unit) indexes. Then for indexing and slicing, the correction is simple,
>>> simpler than I first expected:
>>>   code-unit-index = char-index + bisect.bisect_left(cpdex, char_index)
>>> where code-unit-index is the adjusted index into the full underlying
>>> double-byte array. This adds a time penalty of log2(len(cpdex)), but 
>>> avoids
>>> most of the space penalty and the consequent time penalty of moving 
>>> more
>>> bytes around and increasing cache misses.
>>
>> Interesting idea, but putting on my C programmer hat, I say -1.
>>
>> Non-uniform cell size = not a C array = standard C array manipulation
>> idioms don't work = pain (no matter how simple the index correction
>> happens to be).
>>
>> The nice thing about PEP 383 is that it gives us the smallest storage
>> array that is both an ordinary C array and has sufficiently large
>> individual elements to handle every character in the string.
>
> +1 

Yes, this sounds like a nice benefit, but the problem is it is false.  
The correct statement would be:

The nice thing about PEP 383 is that it gives us the smallest storage
array that is both an ordinary C array and has sufficiently large
individual elements to handle every Unicode codepoint in the string.

As Tom eloquently describes in the referenced issue (is Tom ever 
non-eloquent?), not all characters can be represented in a single codepoint.

It seems there are three concepts in Unicode, code units, codepoints, 
and characters, none of which are equivalent (and the first of which 
varies according to the encoding).  It also seems (to me) that Unicode 
has failed in its original premise, of being an easy way to handle "big 
char" for "all languages" with fixed size elements, but it is not clear 
that its original premise is achievable regardless of the size of "big 
char", when mixed directionality is desired, and it seems that support 
of some single languages require mixed directionality, not to mention 
mixed language support.

Given the required variability of character size in all presently 
Unicode defined encodings, I tend to agree with Tom that UTF-8, together 
with some technique of translating character index to code unit offset, 
may provide the best overall space utilization, and adequate CPU 
efficiency.  On the other hand, there are large subsets of applications 
that simply do not require support for bidirectional text or composed 
characters, and for those that do not, it remains to be seen if the 
price to be paid for supporting those features is too high a price for 
such applications. So far, we don't have implementations to benchmark to 
figure that out!

What does this mean for Python?  Well, if Python is willing to limit its 
support for applications to the subset for which the "big char" solution 
sufficient, then PEP 393 provides a way to do that, that looks to be 
pretty effective for reducing memory consumption for those applications 
that use short strings most of which can be classified by content into 
the 1 byte or 2 byte representations.  Applications that support long 
strings are more likely to bitten by the occasional "outlier" character 
that is longer than the average character, doubling or quadrupling the 
space needed to represent such strings, and eliminating a significant 
portion of the space savings the PEP is providing for other 
applications.  Benchmarks may or may not fully reflect the actual 
requirements of all applications, so conclusions based on benchmarking 
can easily be blind-sided the realities of other applications, unless 
the benchmarks are carefully constructed.

It is possible that the ideas in PEP 393, with its support for multiple 
underlying representations, could be the basis for some more complex 
representations that would better support characters rather than only 
supporting code points, but Martin has stated he is not open to 
additional representations, so the PEP itself cannot be that basis 
(although with care which may or may not be taken in the implementation 
of the PEP, the implementation may still provide that basis).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110824/742d60db/attachment.html>

From cf.natali at gmail.com  Wed Aug 24 21:02:59 2011
From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Wed, 24 Aug 2011 21:02:59 +0200
Subject: [Python-Dev] sendmsg/recvmsg on Mac OS X
In-Reply-To: <20110824184927.2697b0af@pitrou.net>
References: <CADiSq7ehQpDPHztVRX=xsj7_k0mE4ZEBekVwX9iks6c-eezpmg@mail.gmail.com>
	<CAH_1eM1k6rsqQfNF3T1Sn0RcKwh_B_vDOzw0cpjMe2Z1YozjrA@mail.gmail.com>
	<20110824184927.2697b0af@pitrou.net>
Message-ID: <CAH_1eM30T-8g9UBdPRuMkSL_YiScLPuiFFz32Z4W0y1PcJj4pA@mail.gmail.com>

> But Snow Leopard, where these failures occur, is OS X 10.6.

*sighs*
It still looks like a kernel/libc bug to me: AFAICT, both the code and
the tests are correct.
And apparently, there are still issues pertaining to FD passing on
10.5 (and maybe later, I couldn't find a public access to their bug
tracker):
http://lists.apple.com/archives/Darwin-dev/2008/Feb/msg00033.html

Anyway, if someone with a recent OS X release could run test_socket,
it would probably help. Follow ups to http://bugs.python.org/issue6560

From guido at python.org  Wed Aug 24 21:34:34 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 24 Aug 2011 12:34:34 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E554883.5020908@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de> <j31hl7$dp5$1@dough.gmane.org>
	<CADiSq7eGo2C-GTHcSbJWwtu3O19PkdsFNpMfY29u4g+e==0=PA@mail.gmail.com>
	<j3377b$f21$1@dough.gmane.org> <4E554883.5020908@g.nevcal.com>
Message-ID: <CAP7+vJ+oyyyUXgZUmk0i1AAm-btB_uew--9AOGptuuQPoERvXg@mail.gmail.com>

On Wed, Aug 24, 2011 at 11:52 AM, Glenn Linderman <v+python at g.nevcal.com> wrote:
> On 8/24/2011 9:00 AM, Stefan Behnel wrote:
>
> Nick Coghlan, 24.08.2011 15:06:
>
> On Wed, Aug 24, 2011 at 10:46 AM, Terry Reedy wrote:
>
> In utf16.py, attached to http://bugs.python.org/issue12729
> I propose for consideration a prototype of different solution to the 'mostly
> BMP chars, few non-BMP chars' case. Rather than expand every character from
> 2 bytes to 4, attach an array cpdex of character (ie code point, not code
> unit) indexes. Then for indexing and slicing, the correction is simple,
> simpler than I first expected:
> ? code-unit-index = char-index + bisect.bisect_left(cpdex, char_index)
> where code-unit-index is the adjusted index into the full underlying
> double-byte array. This adds a time penalty of log2(len(cpdex)), but avoids
> most of the space penalty and the consequent time penalty of moving more
> bytes around and increasing cache misses.
>
> Interesting idea, but putting on my C programmer hat, I say -1.
>
> Non-uniform cell size = not a C array = standard C array manipulation
> idioms don't work = pain (no matter how simple the index correction
> happens to be).
>
> The nice thing about PEP 383 is that it gives us the smallest storage
> array that is both an ordinary C array and has sufficiently large
> individual elements to handle every character in the string.
>
> +1
>
> Yes, this sounds like a nice benefit, but the problem is it is false.? The
> correct statement would be:
>
>   The nice thing about PEP 383 is that it gives us the smallest storage
>   array that is both an ordinary C array and has sufficiently large
>   individual elements to handle every Unicode codepoint in the string.

(PEP 393, I presume. :-)

> As Tom eloquently describes in the referenced issue (is Tom ever
> non-eloquent?), not all characters can be represented in a single codepoint.

But this is also besides the point (except insofar where we have to
remind ourselves not to confuse the two in docs).

> It seems there are three concepts in Unicode, code units, codepoints, and
> characters, none of which are equivalent (and the first of which varies
> according to the encoding). It also seems (to me) that Unicode has failed
> in its original premise, of being an easy way to handle "big char" for "all
> languages" with fixed size elements, but it is not clear that its original
> premise is achievable regardless of the size of "big char", when mixed
> directionality is desired, and it seems that support of some single
> languages require mixed directionality, not to mention mixed language
> support.

I see nothing wrong with having the language's fundamental data types
(i.e., the unicode object, and even the re module) to be defined in
terms of codepoints, not characters, and I see nothing wrong with
len() returning the number of codepoints (as long as it is advertised
as such). After all UTF-8 also defines an encoding for a sequence of
code points. Characters that require two or more codepoints are not
represented special in UTF-8 -- they are represented as two or more
encoded codepoints. The added requirement that UTF-8 must only be used
to represent valid characters is just that -- it doesn't affect how
strings are encoded, just what is considered valid at a higher level.

> Given the required variability of character size in all presently Unicode
> defined encodings, I tend to agree with Tom that UTF-8, together with some
> technique of translating character index to code unit offset, may provide
> the best overall space utilization, and adequate CPU efficiency.

There is no doubt that UTF-8 is the most space efficient. I just don't
think it is worth giving up O(1) indexing of codepoints -- it would
change programmers' expectations too much.

OTOH I am sold on getting rid of the added complexities of "narrow
builds" where not even all codepoints can be represented without using
surrogate pairs (i.e. two code units per codepoint) and indexing uses
code units instead of codepoints. I think this is an area where PEP
393 has a huge advantage: users can get rid of their exceptions for
narrow builds.

> On the
> other hand, there are large subsets of applications that simply do not
> require support for bidirectional text or composed characters, and for those
> that do not, it remains to be seen if the price to be paid for supporting
> those features is too high a price for such applications. So far, we don't
> have implementations to benchmark to figure that out!

I think you are saying that many apps can ignore the distinction
between codepoints and characters. Given the complexity of bidi
rendering and normalization (which will always remain an issue) I
agree; this is much less likely to be a burden than the narrow-build
issues with code units vs. codepoints.

What should the stdlib do? It should try to skirt the issue where it
can (using the garbage-in-garbage-out principle) and advertise what it
supports where there is a difference. I don't see why all the stdlib
should be made aware of multi-codepoint-characters and other bidi
requirements, but it should be clear to the user who has such
requirements which stdlib operations they can safely use.

> What does this mean for Python?? Well, if Python is willing to limit its
> support for applications to the subset for which the "big char" solution
> sufficient, then PEP 393 provides a way to do that, that looks to be pretty
> effective for reducing memory consumption for those applications that use
> short strings most of which can be classified by content into the 1 byte or
> 2 byte representations.? Applications that support long strings are more
> likely to bitten by the occasional "outlier" character that is longer than
> the average character, doubling or quadrupling the space needed to represent
> such strings, and eliminating a significant portion of the space savings the
> PEP is providing for other applications.

This seems more of an intuition than a fact. I could easily imagine
the facts being that even for large strings, usually either there are
no outliers, or there is a significant number of outliers. (E.g. Tom
Christiansen's OSCON preso falls in the latter category :-).

As long as it *works* I don't really mind that there are some extreme
cases that are slow. You'll always have that.

> Benchmarks may or may not fully
> reflect the actual requirements of all applications, so conclusions based on
> benchmarking can easily be blind-sided the realities of other applications,
> unless the benchmarks are carefully constructed.

Yeah, it's a learning process.

> It is possible that the ideas in PEP 393, with its support for multiple
> underlying representations, could be the basis for some more complex
> representations that would better support characters rather than only
> supporting code points, but Martin has stated he is not open to additional
> representations, so the PEP itself cannot be that basis (although with care
> which may or may not be taken in the implementation of the PEP, the
> implementation may still provide that basis).

There is always the possibility of representations that are defined
purely by userland code and can only be manipulated by that specific
code. But expecting C extensions to support new representations that
haven't been defined yet sounds like a bad idea.

-- 
--Guido van Rossum (python.org/~guido)

From tjreedy at udel.edu  Wed Aug 24 21:55:09 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 24 Aug 2011 15:55:09 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <j33kvu$f9d$1@dough.gmane.org>

On 8/24/2011 12:34 PM, Stephen J. Turnbull wrote:
> Terry Reedy writes:
>
>   >  Excuse me for believing the fine 3.2 manual that says
>   >  "Strings contain Unicode characters."
>
> The manual is wrong, then, subject to a pronouncement to the contrary,

Please suggest a re-wording then, as it is a bug for doc and behavior to 
disagree.

>   >  For the purpose of my sentence, the same thing in that code points
>   >  correspond to characters,
>
> Not in Unicode, they do not.  By definition, a small number of code
> points (eg, U+FFFF) *never* did and *never* will correspond to
> characters.

On computers, characters are represented by code points. What about the 
other way around? http://www.unicode.org/glossary/#C says
code point:
1) i in range(0x11000) <broad definition>
2) "A value, or position, for a character" <narrow definition>
(To muddy the waters more, 'character' has multiple definitions also.)
You are using 1), I am using 2) ;-(.

>   >  Any narrow build string with even 1 non-BMP char violates the
>   >  standard.
>
> Yup.  That's by design.
[...]
> Sure.  Nevertheless, practicality beat purity long ago, and that
> decision has never been rescinded AFAIK.

I think you have it backwards. I see the current situation as the purity 
of the C code beating the practicality for the user of getting right 
answers.

> The thing is, that 90% of applications are not really going to care
> about full conformance to the Unicode standard.

I remember when Intel argued that 99% of applications were not going to 
be affected when the math coprocessor in its then new chips occasionally 
gave 'non-standard' answers with certain divisors.

>   >  Currently, the meaning of Python code differs on narrow versus wide
>   >  build, and in a way that few users would expect or want.
>
> Let them become developers, then, and show us how to do it better.

I posted a proposal with a link to a prototype implementation in Python. 
It pretty well solves the problem of narrow builds acting different from 
wide builds with respect to the basic operations of len(), iterations, 
indexing, and slicing.

> No, I do like the PEP.  However, it is only a step, a rather
> conservative one in some ways, toward conformance to the Unicode
> character model.  In particular, it does nothing to resolve the fact
> that len() will give different answers for character count depending
> on normalization, and that slicing and indexing will allow you to cut
> characters in half (even in NFC, since not all composed characters
> have fully composed forms).

I believe my scheme could be extended to solve that also. It would 
require more pre-processing and more knowledge than I currently have of 
normalization. I have the impression that the grapheme problem goes 
further than just normalization.

-- 
Terry Jan Reedy


From nad at acm.org  Wed Aug 24 22:15:27 2011
From: nad at acm.org (Ned Deily)
Date: Wed, 24 Aug 2011 13:15:27 -0700
Subject: [Python-Dev] sendmsg/recvmsg on Mac OS X
References: <CADiSq7ehQpDPHztVRX=xsj7_k0mE4ZEBekVwX9iks6c-eezpmg@mail.gmail.com>
	<CAH_1eM1k6rsqQfNF3T1Sn0RcKwh_B_vDOzw0cpjMe2Z1YozjrA@mail.gmail.com>
	<20110824184927.2697b0af@pitrou.net>
	<CAH_1eM30T-8g9UBdPRuMkSL_YiScLPuiFFz32Z4W0y1PcJj4pA@mail.gmail.com>
Message-ID: <nad-EFF427.13152724082011@news.gmane.org>

In article 
<CAH_1eM30T-8g9UBdPRuMkSL_YiScLPuiFFz32Z4W0y1PcJj4pA at mail.gmail.com>,
 Charles-Francois Natali <cf.natali at gmail.com> wrote:
> > But Snow Leopard, where these failures occur, is OS X 10.6.
> 
> *sighs*
> It still looks like a kernel/libc bug to me: AFAICT, both the code and
> the tests are correct.
> And apparently, there are still issues pertaining to FD passing on
> 10.5 (and maybe later, I couldn't find a public access to their bug
> tracker):
> http://lists.apple.com/archives/Darwin-dev/2008/Feb/msg00033.html
> 
> Anyway, if someone with a recent OS X release could run test_socket,
> it would probably help. Follow ups to http://bugs.python.org/issue6560

I was able to do a quick test on 10.7 Lion and the 8 test failures still 
occur regardless of deployment target.  Sorry, I don't have time to 
further investigate.

-- 
 Ned Deily,
 nad at acm.org


From nad at acm.org  Wed Aug 24 22:18:20 2011
From: nad at acm.org (Ned Deily)
Date: Wed, 24 Aug 2011 13:18:20 -0700
Subject: [Python-Dev] sendmsg/recvmsg on Mac OS X
References: <CADiSq7ehQpDPHztVRX=xsj7_k0mE4ZEBekVwX9iks6c-eezpmg@mail.gmail.com>
	<CAH_1eM1k6rsqQfNF3T1Sn0RcKwh_B_vDOzw0cpjMe2Z1YozjrA@mail.gmail.com>
	<20110824184927.2697b0af@pitrou.net>
	<nad-0A7792.11372024082011@news.gmane.org>
	<20110824205047.6be49525@pitrou.net>
Message-ID: <nad-3B0A4C.13182024082011@news.gmane.org>

In article <20110824205047.6be49525 at pitrou.net>,
 Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Wed, 24 Aug 2011 11:37:20 -0700
> Ned Deily <nad at acm.org> wrote:
> > In article <20110824184927.2697b0af at pitrou.net>,
> >  Antoine Pitrou <solipsis at pitrou.net> wrote:
> > > On Wed, 24 Aug 2011 15:31:50 +0200
> > > Charles-Fran?ois Natali <neologix at free.fr> wrote:
> > > > > The buildbots are complaining about some of tests for the new
> > > > > socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that
> > > > > provide CMSG_LEN.
> > > > 
> > > > Looks like kernel bugs:
> > > > http://developer.apple.com/library/mac/#qa/qa1541/_index.html
> > > > 
> > > > """
> > > > Yes. Mac OS X 10.5 fixes a number of kernel bugs related to descriptor 
> > > > passing
> > > > [...]
> > > > Avoid passing two or more descriptors back-to-back.
> > > > """
> > > 
> > > But Snow Leopard, where these failures occur, is OS X 10.6.
> > 
> > But chances are the build is using the default 10.4 ABI.  Adding 
> > MACOSX_DEPLOYMENT_TARGET=10.6 as an env variable to ./configure may fix 
> > it.
> 
> Does the ABI affect kernel bugs?

If it's more of a "libc" sort of bug (i.e. somewhere below the app 
layer), it could.  But, unfortunately, that doesn't seem to be the case 
here.

-- 
 Ned Deily,
 nad at acm.org


From tjreedy at udel.edu  Wed Aug 24 22:37:21 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 24 Aug 2011 16:37:21 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5538B7.8010709@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<4E5538B7.8010709@haypocalc.com>
Message-ID: <j33nf2$v69$1@dough.gmane.org>

On 8/24/2011 1:45 PM, Victor Stinner wrote:
> Le 24/08/2011 02:46, Terry Reedy a ?crit :

> I don't think that using UTF-16 with surrogate pairs is really a big
> problem. A lot of work has been done to hide this. For example,
> repr(chr(0x10ffff)) now displays '\U0010ffff' instead of two characters.
> Ezio fixed recently str.is*() methods in Python 3.2+.

I greatly appreciate that he did. The * (lower,upper,title) methods 
apparently are not fixed yet as the corresponding new tests are 
currently skipped for narrow builds.

> For len(str): its a known problem, but if you really care of the number
> of *character* and not the number of UTF-16 units, it's easy to
> implement your own character_length() function. len(str) gives the
> UTF-16 units instead of the number of character for a simple reason:
> it's faster: O(1), whereas character_length() is O(n).

It is O(1) after a one-time O(n) preproccessing, which is the same  time 
order for creating the string in the first place.

Anyway, I think the most important deficiency is with iteration:

 >>> from unicodedata import name
 >>> name('\U0001043c')
'DESERET SMALL LETTER DEE'
 >>> for c in 'abc\U0001043c':
	print(name(c))
	
LATIN SMALL LETTER A
LATIN SMALL LETTER B
LATIN SMALL LETTER C
Traceback (most recent call last):
   File "<pyshell#9>", line 2, in <module>
     print(name(c))
ValueError: no such name

This would work on wide builds but does not here (win7) because narrow 
build iteration produces a naked non-character surrogate code unit that 
has no specific entry in the Unicode Character Database.

I believe that most new people who read "Strings contain Unicode 
characters." would expect string iteration to always produce the Unicode 
characters that they put in the string. The extra time per char needed 
to produce the surrogate pair that represents the character entered is 
O(1).

>> utf16.py, attached to http://bugs.python.org/issue12729
>> prototypes a different solution than the PEP for the above problems for
>> the 'mostly BMP' case. I will discuss it in a different post.
>
> Yeah, you can workaround UTF-16 limits using O(n) algorithms.

I presented O(log(number of non-BMP chars)) algorithms for indexing and 
slicing. For the mostly BMP case, that is hugely better than O(n).

> PEP-393 provides support of the full Unicode charset (U+0000-U+10FFFF)
> an all platforms with a small memory footprint and only O(1) functions.

For Windows users, I believe it will nearly double the memory footprint 
if there are any non-BMP chars. On my new machine, I should not mind 
that in exchange for correct behavior.

-- 
Terry Jan Reedy



From ethan at stoneleaf.us  Wed Aug 24 23:26:54 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 24 Aug 2011 14:26:54 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j33nf2$v69$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>	<4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>	<4E5538B7.8010709@haypocalc.com>
	<j33nf2$v69$1@dough.gmane.org>
Message-ID: <4E556C9E.7070605@stoneleaf.us>

Terry Reedy wrote:
>> PEP-393 provides support of the full Unicode charset (U+0000-U+10FFFF)
>> an all platforms with a small memory footprint and only O(1) functions.
> 
> For Windows users, I believe it will nearly double the memory footprint 
> if there are any non-BMP chars. On my new machine, I should not mind 
> that in exchange for correct behavior.
> 

+1

Heck, I wouldn't mind it on my /old/ machine in exchange for correct 
behavior!

~Ethan~

From victor.stinner at haypocalc.com  Wed Aug 24 23:10:32 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 24 Aug 2011 23:10:32 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E554883.5020908@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j3377b$f21$1@dough.gmane.org> <4E554883.5020908@g.nevcal.com>
Message-ID: <201108242310.32446.victor.stinner@haypocalc.com>

Le mercredi 24 ao?t 2011 20:52:51, Glenn Linderman a ?crit :
> Given the required variability of character size in all presently
> Unicode defined encodings, I tend to agree with Tom that UTF-8, together
> with some technique of translating character index to code unit offset,
> may provide the best overall space utilization, and adequate CPU
> efficiency.

UTF-8 can use more space than latin1 or UCS2:
>>> text="abc"; len(text.encode("latin1")), len(text.encode("utf8"))
(3, 3)
>>> text="???"; len(text.encode("latin1")), len(text.encode("utf8"))
(3, 6)
>>> text="???"; len(text.encode("utf-16-le")), len(text.encode("utf8"))
(6, 9)
>>> text="??"; len(text.encode("utf-16-le")), len(text.encode("utf8"))
(4, 6)

UTF-8 uses less space than PEP 393 only if you have few non-ASCII characters 
(or few non-BMP characters).

About speed, I guess than O(n) (UTF8 indexing) is slower than O(1) 
(PEP 393 indexing).

> ...  Applications that support long
> strings are more likely to bitten by the occasional "outlier" character
> that is longer than the average character, doubling or quadrupling the
> space needed to represent such strings, and eliminating a significant
> portion of the space savings the PEP is providing for other
> applications.

In these worst cases, the PEP 393 is not worse than the current 
implementation: it just as much memory than Python in wide mode (mode used on 
Linux and Mac OS X because wchar_t is 32 bits).  But it uses the double of 
Python in narrow mode (Windows).

I agree than UTF-8 is better in these corner cases, but I also bet than most 
Python programs will use less memory and will be faster with the PEP 393. You 
can already try the pep-393 branch on your own programs.

> Benchmarks may or may not fully reflect the actual
> requirements of all applications, so conclusions based on benchmarking
> can easily be blind-sided the realities of other applications, unless
> the benchmarks are carefully constructed.

I used stringbench and "./python -m test test_unicode". I plan to try iobench.

Which other benchmark tool should be used? Should we write a new one?

> It is possible that the ideas in PEP 393, with its support for multiple
> underlying representations, could be the basis for some more complex
> representations that would better support characters rather than only
> supporting code points, ...

I don't think that the *default* Unicode type is the best place for this. The 
base Unicode type has to be *very* efficient.

If you have unusual needs, write your own type. Maybe based on the base type?

Victor


From martin at v.loewis.de  Thu Aug 25 00:02:31 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 25 Aug 2011 00:02:31 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j33nf2$v69$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>	<4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>	<4E5538B7.8010709@haypocalc.com>
	<j33nf2$v69$1@dough.gmane.org>
Message-ID: <4E5574F7.1060605@v.loewis.de>

> For Windows users, I believe it will nearly double the memory footprint
> if there are any non-BMP chars. On my new machine, I should not mind
> that in exchange for correct behavior.

In addition, strings with non-BMP chars are much more rare than strings
with all Latin-1, for which memory usage halves on Windows.

Regards,
Martin

From victor.stinner at haypocalc.com  Thu Aug 25 00:29:19 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 25 Aug 2011 00:29:19 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <20110824203228.3e00874d@pitrou.net>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
Message-ID: <201108250029.19506.victor.stinner@haypocalc.com>

> With this PEP, the unicode object overhead grows to 10 pointer-sized
> words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine.
> Does it have any adverse effects?

For pure ASCII, it might be possible to use a shorter struct:

typedef struct {
    PyObject_HEAD
    Py_ssize_t length;
    Py_hash_t hash;
    int state;
    Py_ssize_t wstr_length;
    wchar_t *wstr;
    /* no more utf8_length, utf8, str */
    /* followed by ascii data */
} _PyASCIIObject;
(-2 pointer -1 ssize_t: 56 bytes)

=> "a" is 58 bytes (with utf8 for free, without wchar_t)

For object allocated with the new API, we can use a shorter struct:

typedef struct {
    PyObject_HEAD
    Py_ssize_t length;
    Py_hash_t hash;
    int state;
    Py_ssize_t wstr_length;
    wchar_t *wstr;
    Py_ssize_t utf8_length;
    char *utf8;
    /* no more str pointer */
    /* followed by latin1/ucs2/ucs4 data */
} _PyNewUnicodeObject;
(-1 pointer: 72 bytes)

=> "?" is 74 bytes (without utf8 / wchar_t)

For the legacy API:

typedef struct {
    PyObject_HEAD
    Py_ssize_t length;
    Py_hash_t hash;
    int state;
    Py_ssize_t wstr_length;
    wchar_t *wstr;
    Py_ssize_t utf8_length;
    char *utf8;
    void *str;
} _PyLegacyUnicodeObject;
(same size: 80 bytes)

=> "a" is 80+2 (2 malloc) bytes (without utf8 / wchar_t)

The current struct:

typedef struct {
    PyObject_HEAD
    Py_ssize_t length;
    Py_UNICODE *str;
    Py_hash_t hash;
    int state;
    PyObject *defenc;
} PyUnicodeObject;

=> "a" is 56+2 (2 malloc) bytes (without utf8, with wchar_t if Py_UNICODE is 
wchar_t)

... but the code (maybe only the macros?) and debuging will be more complex.

> Will the format codes returning a Py_UNICODE pointer with
> PyArg_ParseTuple be deprecated?

Because Python 2.x is still dominant and it's already hard enough to port C 
modules, it's not the best moment to deprecate the legacy API (Py_UNICODE*).

> Do you think the wstr representation could be removed in some future
> version of Python?

Conversion to wchar_t* is common, especially on Windows. But I don't know if 
we *have to* cache the result. Is it cached by the way? Or is wstr only used 
when a string is created from Py_UNICODE?

Victor


From v+python at g.nevcal.com  Thu Aug 25 00:29:54 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 24 Aug 2011 15:29:54 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJ+oyyyUXgZUmk0i1AAm-btB_uew--9AOGptuuQPoERvXg@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de> <j31hl7$dp5$1@dough.gmane.org>
	<CADiSq7eGo2C-GTHcSbJWwtu3O19PkdsFNpMfY29u4g+e==0=PA@mail.gmail.com>
	<j3377b$f21$1@dough.gmane.org> <4E554883.5020908@g.nevcal.com>
	<CAP7+vJ+oyyyUXgZUmk0i1AAm-btB_uew--9AOGptuuQPoERvXg@mail.gmail.com>
Message-ID: <4E557B62.7030200@g.nevcal.com>

On 8/24/2011 12:34 PM, Guido van Rossum wrote:
> On Wed, Aug 24, 2011 at 11:52 AM, Glenn Linderman<v+python at g.nevcal.com>  wrote:
>> On 8/24/2011 9:00 AM, Stefan Behnel wrote:
>>
>> Nick Coghlan, 24.08.2011 15:06:
>>
>> On Wed, Aug 24, 2011 at 10:46 AM, Terry Reedy wrote:
>>
>> In utf16.py, attached to http://bugs.python.org/issue12729
>> I propose for consideration a prototype of different solution to the 'mostly
>> BMP chars, few non-BMP chars' case. Rather than expand every character from
>> 2 bytes to 4, attach an array cpdex of character (ie code point, not code
>> unit) indexes. Then for indexing and slicing, the correction is simple,
>> simpler than I first expected:
>>    code-unit-index = char-index + bisect.bisect_left(cpdex, char_index)
>> where code-unit-index is the adjusted index into the full underlying
>> double-byte array. This adds a time penalty of log2(len(cpdex)), but avoids
>> most of the space penalty and the consequent time penalty of moving more
>> bytes around and increasing cache misses.
>>
>> Interesting idea, but putting on my C programmer hat, I say -1.
>>
>> Non-uniform cell size = not a C array = standard C array manipulation
>> idioms don't work = pain (no matter how simple the index correction
>> happens to be).
>>
>> The nice thing about PEP 383 is that it gives us the smallest storage
>> array that is both an ordinary C array and has sufficiently large
>> individual elements to handle every character in the string.
>>
>> +1
>>
>> Yes, this sounds like a nice benefit, but the problem is it is false.  The
>> correct statement would be:
>>
>>    The nice thing about PEP 383 is that it gives us the smallest storage
>>    array that is both an ordinary C array and has sufficiently large
>>    individual elements to handle every Unicode codepoint in the string.
> (PEP 393, I presume. :-)

This statement might yet be made true :)

>> As Tom eloquently describes in the referenced issue (is Tom ever
>> non-eloquent?), not all characters can be represented in a single codepoint.
> But this is also besides the point (except insofar where we have to
> remind ourselves not to confuse the two in docs).

In the docs, yes, and in programmer's minds (influenced by docs).

>> It seems there are three concepts in Unicode, code units, codepoints, and
>> characters, none of which are equivalent (and the first of which varies
>> according to the encoding). It also seems (to me) that Unicode has failed
>> in its original premise, of being an easy way to handle "big char" for "all
>> languages" with fixed size elements, but it is not clear that its original
>> premise is achievable regardless of the size of "big char", when mixed
>> directionality is desired, and it seems that support of some single
>> languages require mixed directionality, not to mention mixed language
>> support.
> I see nothing wrong with having the language's fundamental data types
> (i.e., the unicode object, and even the re module) to be defined in
> terms of codepoints, not characters, and I see nothing wrong with
> len() returning the number of codepoints (as long as it is advertised
> as such).

Me neither.

> After all UTF-8 also defines an encoding for a sequence of
> code points. Characters that require two or more codepoints are not
> represented special in UTF-8 -- they are represented as two or more
> encoded codepoints. The added requirement that UTF-8 must only be used
> to represent valid characters is just that -- it doesn't affect how
> strings are encoded, just what is considered valid at a higher level.

Yes, this is true.  In one sense, though, since UTF-8-supporting code 
already has to deal with variable length codepoint encoding, support for 
variable length character encoding seems like a minor extension, not 
upsetting any concept of fixed-width optimizations, because such cannot 
be used.

>> Given the required variability of character size in all presently Unicode
>> defined encodings, I tend to agree with Tom that UTF-8, together with some
>> technique of translating character index to code unit offset, may provide
>> the best overall space utilization, and adequate CPU efficiency.
> There is no doubt that UTF-8 is the most space efficient. I just don't
> think it is worth giving up O(1) indexing of codepoints -- it would
> change programmers' expectations too much.

Programmers that have to deal with bidi or composed characters shouldn't 
have such expectations, of course.   But there are many programmers who 
do not, or at least who think they do not, and they can retain their 
O(1) expectations, I suppose, until it bites them.

> OTOH I am sold on getting rid of the added complexities of "narrow
> builds" where not even all codepoints can be represented without using
> surrogate pairs (i.e. two code units per codepoint) and indexing uses
> code units instead of codepoints. I think this is an area where PEP
> 393 has a huge advantage: users can get rid of their exceptions for
> narrow builds.

Yep, the only justification for narrow builds is in interfacing to 
underlying broken OS that happen to use that encoding... it might be 
slightly more efficient when doing API calls to such an OS.  But most 
interesting programs do much more than I/O.

>> On the
>> other hand, there are large subsets of applications that simply do not
>> require support for bidirectional text or composed characters, and for those
>> that do not, it remains to be seen if the price to be paid for supporting
>> those features is too high a price for such applications. So far, we don't
>> have implementations to benchmark to figure that out!
> I think you are saying that many apps can ignore the distinction
> between codepoints and characters. Given the complexity of bidi
> rendering and normalization (which will always remain an issue) I
> agree; this is much less likely to be a burden than the narrow-build
> issues with code units vs. codepoints.
>
> What should the stdlib do? It should try to skirt the issue where it
> can (using the garbage-in-garbage-out principle) and advertise what it
> supports where there is a difference. I don't see why all the stdlib
> should be made aware of multi-codepoint-characters and other bidi
> requirements, but it should be clear to the user who has such
> requirements which stdlib operations they can safely use.

It would seem helpful if the stdlib could have some support for 
efficient handling of Unicode characters in some representation.  It 
would help address the class of applications that does care.  Adding 
extra support for Unicode character handling sooner rather than later 
could be an performance boost to applications that do care about full 
character support, and I can only see the numbers of such applications 
increasing over time.  Such could be built as a subtype of str, perhaps, 
but if done in Python, there would likely be a significant performance 
hit when going from str to "unicodeCharacterStr".

>> What does this mean for Python?  Well, if Python is willing to limit its
>> support for applications to the subset for which the "big char" solution
>> sufficient, then PEP 393 provides a way to do that, that looks to be pretty
>> effective for reducing memory consumption for those applications that use
>> short strings most of which can be classified by content into the 1 byte or
>> 2 byte representations.  Applications that support long strings are more
>> likely to bitten by the occasional "outlier" character that is longer than
>> the average character, doubling or quadrupling the space needed to represent
>> such strings, and eliminating a significant portion of the space savings the
>> PEP is providing for other applications.
> This seems more of an intuition than a fact. I could easily imagine
> the facts being that even for large strings, usually either there are
> no outliers, or there is a significant number of outliers. (E.g. Tom
> Christiansen's OSCON preso falls in the latter category :-).
>
> As long as it *works* I don't really mind that there are some extreme
> cases that are slow. You'll always have that.

Yes, it is intuition, regarding memory consumption. It is not at all 
clear how different the "occasional outlier character" is than your 
"significant number of outliers".  Tom's presentation certainly was 
regarding bodies of text which varied from ASCII to fully non-ASCII.

The memory characteristics of long string handling would certainly be 
non-intuitive, when you can process a file of size N with a particular 
program, but can't process a smaller file because it has a funny 
character in it, and suddenly you are out of space.

>
>> Benchmarks may or may not fully
>> reflect the actual requirements of all applications, so conclusions based on
>> benchmarking can easily be blind-sided the realities of other applications,
>> unless the benchmarks are carefully constructed.
> Yeah, it's a learning process.
>
>> It is possible that the ideas in PEP 393, with its support for multiple
>> underlying representations, could be the basis for some more complex
>> representations that would better support characters rather than only
>> supporting code points, but Martin has stated he is not open to additional
>> representations, so the PEP itself cannot be that basis (although with care
>> which may or may not be taken in the implementation of the PEP, the
>> implementation may still provide that basis).
> There is always the possibility of representations that are defined
> purely by userland code and can only be manipulated by that specific
> code. But expecting C extensions to support new representations that
> haven't been defined yet sounds like a bad idea.

While they can and should be prototyped in Python for functional 
correctness, I would rather expect such representations to be 
significantly slower in Python than in C.  But that is just intuition 
also.  The PEP makes a nice extension to str representations, but I'm 
not sure it picks the most useful ones, in that while it is picking 
cases that are well understood and are in use, they may not be the most 
effective ones (due to the strange memory consumption characteristics 
that outliers can introduce).  My intuition says that a UTF-8 
representation (or Tom's/Perl's looser utf8) would be a handy 
representation to have.  But maybe it should be a different type than 
str... str8?  I suppose that is -ideas land.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110824/90ca2144/attachment.html>

From timothy.c.delaney at gmail.com  Thu Aug 25 00:33:33 2011
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Thu, 25 Aug 2011 08:33:33 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <201108242310.32446.victor.stinner@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j3377b$f21$1@dough.gmane.org> <4E554883.5020908@g.nevcal.com>
	<201108242310.32446.victor.stinner@haypocalc.com>
Message-ID: <CAN8CLgkxhJ8Hjn202CZ4R0i3b8Q2=CJOwVOOZNZK3PL4iEkKaw@mail.gmail.com>

On 25 August 2011 07:10, Victor Stinner <victor.stinner at haypocalc.com>wrote:

>
> I used stringbench and "./python -m test test_unicode". I plan to try
> iobench.
>
> Which other benchmark tool should be used? Should we write a new one?


I think that the PyPy benchmarks (or at least selected tests such as
slowspitfire) would probably exercise things quite well.

http://speed.pypy.org/about/

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110825/731120cd/attachment.html>

From guido at python.org  Thu Aug 25 01:28:53 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 24 Aug 2011 16:28:53 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E557B62.7030200@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de> <j31hl7$dp5$1@dough.gmane.org>
	<CADiSq7eGo2C-GTHcSbJWwtu3O19PkdsFNpMfY29u4g+e==0=PA@mail.gmail.com>
	<j3377b$f21$1@dough.gmane.org> <4E554883.5020908@g.nevcal.com>
	<CAP7+vJ+oyyyUXgZUmk0i1AAm-btB_uew--9AOGptuuQPoERvXg@mail.gmail.com>
	<4E557B62.7030200@g.nevcal.com>
Message-ID: <CAP7+vJJZGvqdwZvUjDbPAEerj4N_jtyA4SDKcAoRX4OD+ze7kg@mail.gmail.com>

On Wed, Aug 24, 2011 at 3:29 PM, Glenn Linderman <v+python at g.nevcal.com> wrote:
> It would seem helpful if the stdlib could have some support for efficient
> handling of Unicode characters in some representation.? It would help
> address the class of applications that does care.

I claim that we have insufficient understanding of their needs to put
anything in the stdlib. Wait and see is a good strategy here.

> Adding extra support for
> Unicode character handling sooner rather than later could be an performance
> boost to applications that do care about full character support, and I can
> only see the numbers of such applications increasing over time.? Such could
> be built as a subtype of str, perhaps, but if done in Python, there would
> likely be a significant performance hit when going from str to
> "unicodeCharacterStr".

Sounds like overengineering to me. The right time to add something to
the stdlib is when a large number of apps *currently* need something,
not when you expect that they might need it in the future. (There just
are too many possible futures to plan for them all. YAGNI rules.)

-- 
--Guido van Rossum (python.org/~guido)

From turnbull at sk.tsukuba.ac.jp  Thu Aug 25 02:11:42 2011
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Thu, 25 Aug 2011 09:11:42 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <1314206189.3549.2.camel@localhost.localdomain>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110824183846.2b392f77@pitrou.net>
	<87mxeyzq63.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314206189.3549.2.camel@localhost.localdomain>
Message-ID: <87mxeyxsch.fsf@uwakimon.sk.tsukuba.ac.jp>

Antoine Pitrou writes:
 > Le jeudi 25 ao?t 2011 ? 02:15 +0900, Stephen J. Turnbull a ?crit :
 > > Antoine Pitrou writes:
 > >  > On Thu, 25 Aug 2011 01:34:17 +0900
 > >  > "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
 > >  > > 
 > >  > > Martin has long claimed that the fact that I/O is done in terms of
 > >  > > UTF-16 means that the internal representation is UTF-16
 > >  > 
 > >  > Which I/O?
 > > 
 > > Eg, display of characters in the interpreter.
 > 
 > I don't know why you say it's "done in terms of UTF-16", then. Unicode
 > strings are simply encoded to whatever character set is detected as the
 > terminal's character set.

But it's not "simple" at the level we're talking about!

Specifically, *in-memory* surrogates are properly respected when doing
the encoding, and therefore such I/O is not UCS-2 or "raw code units".
This treatment is different from sizing and indexing of unicodes,
where surrogates are not treated differently from other code points.



From turnbull at sk.tsukuba.ac.jp  Thu Aug 25 02:31:30 2011
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Thu, 25 Aug 2011 09:31:30 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j33kvu$f9d$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
Message-ID: <87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>

Terry Reedy writes:

 > Please suggest a re-wording then, as it is a bug for doc and behavior to 
 > disagree.

    Strings contain Unicode code units, which for most purposes can be
    treated as Unicode characters.  However, even as "simple" an
    operation as "s1[0] == s2[0]" cannot be relied upon to give
    Unicode-conforming results.

The second sentence remains true under PEP 393.

 > >   >  For the purpose of my sentence, the same thing in that code points
 > >   >  correspond to characters,
 > >
 > > Not in Unicode, they do not.  By definition, a small number of code
 > > points (eg, U+FFFF) *never* did and *never* will correspond to
 > > characters.
 > 
 > On computers, characters are represented by code points. What about the 
 > other way around? http://www.unicode.org/glossary/#C says
 > code point:
 > 1) i in range(0x11000) <broad definition>
 > 2) "A value, or position, for a character" <narrow definition>
 > (To muddy the waters more, 'character' has multiple definitions also.)
 > You are using 1), I am using 2) ;-(.

No, you're not.  You are claiming an isomorphism, which Unicode goes
to great trouble to avoid.

 > I think you have it backwards. I see the current situation as the purity 
 > of the C code beating the practicality for the user of getting right 
 > answers.

Sophistry.  "Always getting the right answer" is purity.

 > > The thing is, that 90% of applications are not really going to care
 > > about full conformance to the Unicode standard.
 > 
 > I remember when Intel argued that 99% of applications were not going to 
 > be affected when the math coprocessor in its then new chips occasionally 
 > gave 'non-standard' answers with certain divisors.

In the case of Intel, the people who demanded standard answers did so
for efficiency reasons -- they needed the FPU to DTRT because
implementing FP in software was always going to be too slow.  CPython,
IMO, can afford to trade off because the implementation will
necessarily be in software, and can be added later as a Python or C module.

 > I believe my scheme could be extended to solve [conformance for
 > composing characters] also. It would require more pre-processing
 > and more knowledge than I currently have of normalization. I have
 > the impression that the grapheme problem goes further than just
 > normalization.

Yes and yes.  But now you're talking about database lookups for every
character (to determine if it's a composing character).  Efficiency of
a generic implementation isn't going to happen.

Anyway, in Martin's rephrasing of my (imperfect) memory of Guido's
pronouncement, "indexing is going to be O(1)".  And Nick's point about
non-uniform arrays is telling.  I have 20 years of experience with an
implementation of text as a non-uniform array which presents an array
API, and *everything* needs to be special-cased for efficiency, and
*any* small change can have show-stopping performance implications.

Python can probably do better than Emacs has done due to much better
leadership in this area, but I still think it's better to make full
conformance optional.

From stephen at xemacs.org  Thu Aug 25 02:36:14 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 25 Aug 2011 09:36:14 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJ+oyyyUXgZUmk0i1AAm-btB_uew--9AOGptuuQPoERvXg@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de> <j31hl7$dp5$1@dough.gmane.org>
	<CADiSq7eGo2C-GTHcSbJWwtu3O19PkdsFNpMfY29u4g+e==0=PA@mail.gmail.com>
	<j3377b$f21$1@dough.gmane.org> <4E554883.5020908@g.nevcal.com>
	<CAP7+vJ+oyyyUXgZUmk0i1AAm-btB_uew--9AOGptuuQPoERvXg@mail.gmail.com>
Message-ID: <87k4a2xr7l.fsf@uwakimon.sk.tsukuba.ac.jp>

Guido van Rossum writes:

 > I see nothing wrong with having the language's fundamental data types
 > (i.e., the unicode object, and even the re module) to be defined in
 > terms of codepoints, not characters, and I see nothing wrong with
 > len() returning the number of codepoints (as long as it is advertised
 > as such).

In fact, the Unicode Standard, Version 6, goes farther (to code units):

    2.7  Unicode Strings

    A Unicode string data type is simply an ordered sequence of code
    units. Thus a Unicode 8-bit string is an ordered sequence of 8-bit
    code units, a Unicode 16-bit string is an ordered sequence of
    16-bit code units, and a Unicode 32-bit string is an ordered
    sequence of 32-bit code units. 

    Depending on the programming environment, a Unicode string may or
    may not be required to be in the corresponding Unicode encoding
    form. For example, strings in Java, C#, or ECMAScript are Unicode
    16-bit strings, but are not necessarily well-formed UTF-16
    sequences.

(p. 32).


From guido at python.org  Thu Aug 25 04:29:39 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 24 Aug 2011 19:29:39 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>

On Wed, Aug 24, 2011 at 5:31 PM, Stephen J. Turnbull
<turnbull at sk.tsukuba.ac.jp> wrote:
> Terry Reedy writes:
>
> ?> Please suggest a re-wording then, as it is a bug for doc and behavior to
> ?> disagree.
>
> ? ?Strings contain Unicode code units, which for most purposes can be
> ? ?treated as Unicode characters. ?However, even as "simple" an
> ? ?operation as "s1[0] == s2[0]" cannot be relied upon to give
> ? ?Unicode-conforming results.
>
> The second sentence remains true under PEP 393.

Really? If strings contain code units, that expression compares code
units. What is non-conforming about comparing two code points? They
are just integers.

Seriously, what does Unicode-conforming mean here? It would be better
to specify chapter and verse (e.g. is it a specific thing defined by
the dreaded TR18?)

> ?> > ? > ?For the purpose of my sentence, the same thing in that code points
> ?> > ? > ?correspond to characters,
> ?> >
> ?> > Not in Unicode, they do not. ?By definition, a small number of code
> ?> > points (eg, U+FFFF) *never* did and *never* will correspond to
> ?> > characters.
> ?>
> ?> On computers, characters are represented by code points. What about the
> ?> other way around? http://www.unicode.org/glossary/#C says
> ?> code point:
> ?> 1) i in range(0x11000) <broad definition>
> ?> 2) "A value, or position, for a character" <narrow definition>
> ?> (To muddy the waters more, 'character' has multiple definitions also.)
> ?> You are using 1), I am using 2) ;-(.
>
> No, you're not. ?You are claiming an isomorphism, which Unicode goes
> to great trouble to avoid.

I don't know that we will be able to educate our users to the point
where they will use code unit, code point, character, glyph, character
set, encoding, and other technical terms correctly. TBH even though
less than two hours ago I composed a reply in this thread, I've
already forgotten which is a code point and which is a code unit.

> ?> I think you have it backwards. I see the current situation as the purity
> ?> of the C code beating the practicality for the user of getting right
> ?> answers.
>
> Sophistry. ?"Always getting the right answer" is purity.

Eh? In most other areas Python is pretty careful not to promise to
"always get the right answer" since what is right is entirely in the
user's mind. We often go to great lengths of defining how things work
so as to set the right expectations. For example, variables in Python
work differently than in most other languages.

Now I am happy to admit that for many Unicode issues the level at
which we have currently defined things (code units, I think -- the
thingies that encodings are made of) is confusing, and it would be
better to switch to the others (code points, I think). But characters
are right out.

> ?> > The thing is, that 90% of applications are not really going to care
> ?> > about full conformance to the Unicode standard.
> ?>
> ?> I remember when Intel argued that 99% of applications were not going to
> ?> be affected when the math coprocessor in its then new chips occasionally
> ?> gave 'non-standard' answers with certain divisors.
>
> In the case of Intel, the people who demanded standard answers did so
> for efficiency reasons -- they needed the FPU to DTRT because
> implementing FP in software was always going to be too slow. ?CPython,
> IMO, can afford to trade off because the implementation will
> necessarily be in software, and can be added later as a Python or C module.

It is not so easy to change expectations about O(1) vs. O(N) behavior
of indexing however. IMO we shouldn't try and hence we're stuck with
operations defined in terms of code thingies instead of (mostly
mythical) characters.

> ?> I believe my scheme could be extended to solve [conformance for
> ?> composing characters] also. It would require more pre-processing
> ?> and more knowledge than I currently have of normalization. I have
> ?> the impression that the grapheme problem goes further than just
> ?> normalization.
>
> Yes and yes. ?But now you're talking about database lookups for every
> character (to determine if it's a composing character). ?Efficiency of
> a generic implementation isn't going to happen.

Let's take small steps. Do the evolutionary thing. Let's get things
right so users won't have to worry about code points vs. code units
any more. A conforming library for all things at the character level
can be developed later, once we understand things better at that level
(again, most developers don't even understand most of the subtleties,
so I claim we're not ready).

> Anyway, in Martin's rephrasing of my (imperfect) memory of Guido's
> pronouncement, "indexing is going to be O(1)".

I still think that. It would be too big of a cultural upheaval to change it.

> ?And Nick's point about
> non-uniform arrays is telling. ?I have 20 years of experience with an
> implementation of text as a non-uniform array which presents an array
> API, and *everything* needs to be special-cased for efficiency, and
> *any* small change can have show-stopping performance implications.
>
> Python can probably do better than Emacs has done due to much better
> leadership in this area, but I still think it's better to make full
> conformance optional.

This I agree with (though if you were referring to me with
"leadership" I consider myself woefully underinformed about Unicode
subtleties). I also suspect that Unicode "conformance" (however
defined) is more part of a political battle than an actual necessity.
I'd much rather have us fix Tom Christiansen's specific bugs than
chase the elusive "standard conforming".

(Hey, I feel a QOTW coming. "Standards? We don't need no stinkin'
standards." http://en.wikipedia.org/wiki/Stinking_badges :-)

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Thu Aug 25 04:33:51 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 24 Aug 2011 19:33:51 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87k4a2xr7l.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de> <j31hl7$dp5$1@dough.gmane.org>
	<CADiSq7eGo2C-GTHcSbJWwtu3O19PkdsFNpMfY29u4g+e==0=PA@mail.gmail.com>
	<j3377b$f21$1@dough.gmane.org> <4E554883.5020908@g.nevcal.com>
	<CAP7+vJ+oyyyUXgZUmk0i1AAm-btB_uew--9AOGptuuQPoERvXg@mail.gmail.com>
	<87k4a2xr7l.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAP7+vJ+RWXZVyZaT0SOo-Kos4cUTkN4sLUs79h5qDtTNcqVsLA@mail.gmail.com>

On Wed, Aug 24, 2011 at 5:36 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Guido van Rossum writes:
>
> ?> I see nothing wrong with having the language's fundamental data types
> ?> (i.e., the unicode object, and even the re module) to be defined in
> ?> terms of codepoints, not characters, and I see nothing wrong with
> ?> len() returning the number of codepoints (as long as it is advertised
> ?> as such).
>
> In fact, the Unicode Standard, Version 6, goes farther (to code units):
>
> ? ?2.7 ?Unicode Strings
>
> ? ?A Unicode string data type is simply an ordered sequence of code
> ? ?units. Thus a Unicode 8-bit string is an ordered sequence of 8-bit
> ? ?code units, a Unicode 16-bit string is an ordered sequence of
> ? ?16-bit code units, and a Unicode 32-bit string is an ordered
> ? ?sequence of 32-bit code units.
>
> ? ?Depending on the programming environment, a Unicode string may or
> ? ?may not be required to be in the corresponding Unicode encoding
> ? ?form. For example, strings in Java, C#, or ECMAScript are Unicode
> ? ?16-bit strings, but are not necessarily well-formed UTF-16
> ? ?sequences.
>
> (p. 32).

I am assuming that that definition only applies to use of the term
"unicode string" within the standard and has no bearing on how
programming languages are allowed to use the term, as that would be
preposterous. (They can define what they mean by terms like
well-formed and conforming etc., and I won't try to go against that.
But limiting what can be called a unicode string feels like
unproductive coddling.)

-- 
--Guido van Rossum (python.org/~guido)

From ncoghlan at gmail.com  Thu Aug 25 04:47:20 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 25 Aug 2011 12:47:20 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
Message-ID: <CADiSq7d6p_-k6M20uiF8JFj++R0Gh2FvTcjDLioVnGcQuBqTag@mail.gmail.com>

On Thu, Aug 25, 2011 at 12:29 PM, Guido van Rossum <guido at python.org> wrote:
> Now I am happy to admit that for many Unicode issues the level at
> which we have currently defined things (code units, I think -- the
> thingies that encodings are made of) is confusing, and it would be
> better to switch to the others (code points, I think). But characters
> are right out.

Indeed, code points are the abstract concept and code units are the
specific byte sequences that are used for serialisation (FWIW, I'm
going to try to keep this straight in the future by remembering that
the Unicode character set is defined as abstract points on planes,
just like geometry).

With narrow builds, code units can currently come into play
internally, but with PEP 393 everything internal will be working
directly with code points. Normalisation, combining characters and
bidi issues may still affect the correctness of unicode comparison and
slicing (and other text manipulation), but there are limits to how
much of the underlying complexity we can effectively hide without
being misleading.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From guido at python.org  Thu Aug 25 05:11:20 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 24 Aug 2011 20:11:20 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CADiSq7d6p_-k6M20uiF8JFj++R0Gh2FvTcjDLioVnGcQuBqTag@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<CADiSq7d6p_-k6M20uiF8JFj++R0Gh2FvTcjDLioVnGcQuBqTag@mail.gmail.com>
Message-ID: <CAP7+vJ+CUo=Tausdc_91gB+ec9owY8iZDGNth31G5_X0Du9uVg@mail.gmail.com>

On Wed, Aug 24, 2011 at 7:47 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Thu, Aug 25, 2011 at 12:29 PM, Guido van Rossum <guido at python.org> wrote:
>> Now I am happy to admit that for many Unicode issues the level at
>> which we have currently defined things (code units, I think -- the
>> thingies that encodings are made of) is confusing, and it would be
>> better to switch to the others (code points, I think). But characters
>> are right out.
>
> Indeed, code points are the abstract concept and code units are the
> specific byte sequences that are used for serialisation (FWIW, I'm
> going to try to keep this straight in the future by remembering that
> the Unicode character set is defined as abstract points on planes,
> just like geometry).

Hm, code points still look pretty concrete to me (integers in the
range 0 .. 2**21) and code units don't feel like byte sequences to me
(at least not UTF-16 code units -- in Python at least you can think of
them as integers in the range 0 .. 2**16).

> With narrow builds, code units can currently come into play
> internally, but with PEP 393 everything internal will be working
> directly with code points. Normalisation, combining characters and
> bidi issues may still affect the correctness of unicode comparison and
> slicing (and other text manipulation), but there are limits to how
> much of the underlying complexity we can effectively hide without
> being misleading.

Let's just define a Unicode string to be a sequence of code points and
let libraries deal with the rest. Ok, methods like lower() should
consider characters, but indexing/slicing should refer to code points.
Same for '=='; we can have a library that compares by applying (or
assuming?) certain normalizations. Tom C tells me that case-less
comparison cannot use a.lower() == b.lower(); fine, we can add that
operation to the library too. But this exceeds the scope of PEP 393,
right?

-- 
--Guido van Rossum (python.org/~guido)

From ncoghlan at gmail.com  Thu Aug 25 05:48:49 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 25 Aug 2011 13:48:49 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJ+CUo=Tausdc_91gB+ec9owY8iZDGNth31G5_X0Du9uVg@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<CADiSq7d6p_-k6M20uiF8JFj++R0Gh2FvTcjDLioVnGcQuBqTag@mail.gmail.com>
	<CAP7+vJ+CUo=Tausdc_91gB+ec9owY8iZDGNth31G5_X0Du9uVg@mail.gmail.com>
Message-ID: <CADiSq7fAn13rDb_cbzBh7VsEiC59v_eeMzynAsHgG3=ddCqe+Q@mail.gmail.com>

On Thu, Aug 25, 2011 at 1:11 PM, Guido van Rossum <guido at python.org> wrote:
>> With narrow builds, code units can currently come into play
>> internally, but with PEP 393 everything internal will be working
>> directly with code points. Normalisation, combining characters and
>> bidi issues may still affect the correctness of unicode comparison and
>> slicing (and other text manipulation), but there are limits to how
>> much of the underlying complexity we can effectively hide without
>> being misleading.
>
> Let's just define a Unicode string to be a sequence of code points and
> let libraries deal with the rest. Ok, methods like lower() should
> consider characters, but indexing/slicing should refer to code points.
> Same for '=='; we can have a library that compares by applying (or
> assuming?) certain normalizations. Tom C tells me that case-less
> comparison cannot use a.lower() == b.lower(); fine, we can add that
> operation to the library too. But this exceeds the scope of PEP 393,
> right?

Yep, I was agreeing with you on this point - I think you're right that
if we provide a solid code point based core Unicode type (perhaps with
some character based methods), then library support can fill the gap
between handling code points and handling characters.

In particular, a unicode character based string type would be
significantly easier to write in Python than it would be in C (after
skimming Tom's bug report at http://bugs.python.org/issue12729, I
better understand the motivation and desire for that kind of interface
and it sounds like Terry's prototype is along those lines). Once those
mappings are thrashed out outside the core, then there may be
something to incorporate directly around the 3.4 timeframe (or
potentially even in 3.3, since it should already be possible to
develop such a wrapper based on UCS4 builds of 3.2)

However, there may an important distinction to be made on the
Python-the-language vs CPython-the-implementation front: is another
implementation (e.g. PyPy) *allowed* to implement character based
indexing instead of code point based for 2.x unicode/3.x str type? Or
is the code point indexing part of the language spec, and any
character based indexing needs to be provided via a separate type or
module?

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From stephen at xemacs.org  Thu Aug 25 06:12:17 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 25 Aug 2011 13:12:17 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
Message-ID: <87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>

Guido van Rossum writes:

 > On Wed, Aug 24, 2011 at 5:31 PM, Stephen J. Turnbull
 > <turnbull at sk.tsukuba.ac.jp> wrote:

 > > ? ?Strings contain Unicode code units, which for most purposes can be
 > > ? ?treated as Unicode characters. ?However, even as "simple" an
 > > ? ?operation as "s1[0] == s2[0]" cannot be relied upon to give
 > > ? ?Unicode-conforming results.
 > >
 > > The second sentence remains true under PEP 393.
 > 
 > Really? If strings contain code units, that expression compares code
 > units.

That's true out of context, but in context it's "which for most
purposes can be treated as Unicode characters", and this is what Terry
is concerned with, as well.

 > What is non-conforming about comparing two code points?

Unicode conformance means treating characters correctly.  In
particular, s1 and s2 might be NFC and NFD forms of the same string
with a combining character at s2[1], or s1[1] and s[2] might be a
non-combining character and a combining character respectively.

 > Seriously, what does Unicode-conforming mean here?

Chapter 3, all verses.  Here, specifically C6, p. 60.  One would have
to define the process executing "s1[0] == s2[0]" to be sure that even
in the cases cited in the previous paragraph non-conformance is
occurring, but one example of a process where that is non-conforming
(without additional code to check for trailing combining characters)
is in comparison of Vietnamese filenames generated on a Mac vs. those
generated on a Linux host.

 > > No, you're not. ?You are claiming an isomorphism, which Unicode goes
 > > to great trouble to avoid.
 > 
 > I don't know that we will be able to educate our users to the point
 > where they will use code unit, code point, character, glyph, character
 > set, encoding, and other technical terms correctly.

Sure.  I got it wrong myself earlier.

I think that the right thing to do is to provide a conformant
implementation of Unicode text in the stdlib (a long run goal, see
below), and call that "Unicode", while we call strings "strings".

 > Now I am happy to admit that for many Unicode issues the level at
 > which we have currently defined things (code units, I think -- the
 > thingies that encodings are made of) is confusing, and it would be
 > better to switch to the others (code points, I think).

Yes, and AFAICT (I'm better at reading standards than I am at reading
Python implementation) PEP 393 allows that.

 > But characters are right out.

+1

 > It is not so easy to change expectations about O(1) vs. O(N) behavior
 > of indexing however. IMO we shouldn't try and hence we're stuck with
 > operations defined in terms of code thingies instead of (mostly
 > mythical) characters.

Well, O(N) is not really the question.  It's really O(log N), as Terry
says.  Is that out, too?  I can verify that it's possible to do it in
practice in the long term.  In my experience with Emacs, even with 250
MB files, O(log N) mostly gives acceptable performance in an
interactive editor, as well as many scripted textual applications.

The problems that I see are

(1) It's very easy to write algorithms that would be O(N) for a true
    array, but then become O(N log N) or worse (and the coefficient on
    the O(log N) algorithm is way higher to start).  I guess this
    would kill the idea, but.

(2) Maintenance is fragile; it's easy to break the necessary caches
    with feature additions and bug fixes.  (However, I don't think
    this would be as big a problem for Python, due to its more
    disciplined process, as it has been for XEmacs.)

You might think space for the caches would be a problem, but that has
turned out not to be the case for Emacsen.

 > Let's take small steps. Do the evolutionary thing. Let's get things
 > right so users won't have to worry about code points vs. code units
 > any more. A conforming library for all things at the character level
 > can be developed later, once we understand things better at that level
 > (again, most developers don't even understand most of the subtleties,
 > so I claim we're not ready).

I don't think anybody does.  That's one reason there's a new version
of Unicode every few years.

 > This I agree with (though if you were referring to me with
 > "leadership" I consider myself woefully underinformed about Unicode
 > subtleties).

<wink/>  MvL and MAL are not, however, and there are plenty of others
who make contributions -- in an orderly fashion.

 > I also suspect that Unicode "conformance" (however defined) is more
 > part of a political battle than an actual necessity.  I'd much
 > rather have us fix Tom Christiansen's specific bugs than chase the
 > elusive "standard conforming".

Well, I would advocate specifying which parts of the standard we
target and which not (for any given version).  The goal of full
"Chapter 3" conformance should be left up to a library on PyPI for the
nonce IMO.  I agree that fixing specific bugs should be given
precedence over "conformance chasing," but implementation should
conform to the appropriate parts of the standard.

 > (Hey, I feel a QOTW coming. "Standards? We don't need no stinkin'
 > standards." http://en.wikipedia.org/wiki/Stinking_badges :-)

RMS beat you to that.  Not good company to be in, in this case: he
specifically disclaims the goal of portability to non-GNU-System
systems.


From stefan_ml at behnel.de  Thu Aug 25 06:46:50 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 25 Aug 2011 06:46:50 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <201108250029.19506.victor.stinner@haypocalc.com>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<201108250029.19506.victor.stinner@haypocalc.com>
Message-ID: <j34k3q$nk8$1@dough.gmane.org>

Victor Stinner, 25.08.2011 00:29:
>> With this PEP, the unicode object overhead grows to 10 pointer-sized
>> words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine.
>> Does it have any adverse effects?
>
> For pure ASCII, it might be possible to use a shorter struct:
>
> typedef struct {
>      PyObject_HEAD
>      Py_ssize_t length;
>      Py_hash_t hash;
>      int state;
>      Py_ssize_t wstr_length;
>      wchar_t *wstr;
>      /* no more utf8_length, utf8, str */
>      /* followed by ascii data */
> } _PyASCIIObject;
> (-2 pointer -1 ssize_t: 56 bytes)
>
> =>  "a" is 58 bytes (with utf8 for free, without wchar_t)
>
> For object allocated with the new API, we can use a shorter struct:
>
> typedef struct {
>      PyObject_HEAD
>      Py_ssize_t length;
>      Py_hash_t hash;
>      int state;
>      Py_ssize_t wstr_length;
>      wchar_t *wstr;
>      Py_ssize_t utf8_length;
>      char *utf8;
>      /* no more str pointer */
>      /* followed by latin1/ucs2/ucs4 data */
> } _PyNewUnicodeObject;
> (-1 pointer: 72 bytes)
>
> =>  "?" is 74 bytes (without utf8 / wchar_t)
>
> For the legacy API:
>
> typedef struct {
>      PyObject_HEAD
>      Py_ssize_t length;
>      Py_hash_t hash;
>      int state;
>      Py_ssize_t wstr_length;
>      wchar_t *wstr;
>      Py_ssize_t utf8_length;
>      char *utf8;
>      void *str;
> } _PyLegacyUnicodeObject;
> (same size: 80 bytes)
>
> =>  "a" is 80+2 (2 malloc) bytes (without utf8 / wchar_t)
>
> The current struct:
>
> typedef struct {
>      PyObject_HEAD
>      Py_ssize_t length;
>      Py_UNICODE *str;
>      Py_hash_t hash;
>      int state;
>      PyObject *defenc;
> } PyUnicodeObject;
>
> =>  "a" is 56+2 (2 malloc) bytes (without utf8, with wchar_t if Py_UNICODE is
> wchar_t)
>
> ... but the code (maybe only the macros?) and debuging will be more complex.

That's an interesting idea. However, it's not required to do this as part 
of the PEP 393 implementation. This can be added later on if the need 
evidently arises in general practice.

Also, there is always the possibility to simply intern very short strings 
in order to avoid their multiplication in memory. Long strings don't suffer 
from this as the data size quickly dominates. User code that works with a 
lot of short strings would likely do the same.

BTW, I would expect that many short strings either go away as quickly as 
they appeared (e.g. in a parser) or were brought in as literals and are 
therefore interned anyway. That's just one reason why I suggest to wait for 
a prove of inefficiency in the real world (and, obviously, to test your own 
code with this as quickly as possible).


>> Will the format codes returning a Py_UNICODE pointer with
>> PyArg_ParseTuple be deprecated?
>
> Because Python 2.x is still dominant and it's already hard enough to port C
> modules, it's not the best moment to deprecate the legacy API (Py_UNICODE*).

Well, it will be quite inefficient in future CPython versions, so I think 
if it's not officially deprecated at some point, it will deprecate itself 
for efficiency reasons. Better make it clear that it's worth investing in 
better performance here.


>> Do you think the wstr representation could be removed in some future
>> version of Python?
>
> Conversion to wchar_t* is common, especially on Windows.

That's an issue. However, I cannot say how common this really is in 
practice. Surely depends on the specific code, right? How common is it in 
core CPython?


> But I don't know if
> we *have to* cache the result. Is it cached by the way? Or is wstr only used
> when a string is created from Py_UNICODE?

If it's so common on Windows, maybe it should only be cached there?

Stefan


From stefan_ml at behnel.de  Thu Aug 25 07:09:28 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 25 Aug 2011 07:09:28 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E553FBC.7080501@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de>
Message-ID: <j34le8$tec$1@dough.gmane.org>

"Martin v. L?wis", 24.08.2011 20:15:
> Guido has agreed to eventually pronounce on PEP 393. Before that can
> happen, I'd like to collect feedback on it. There have been a number
> of voice supporting the PEP in principle

Absolutely.


> - conditions you would like to pose on the implementation before
>    acceptance. I'll see which of these can be resolved, and list
>    the ones that remain open.

Just repeating here that I'd like to see the buffer void* changed into a 
union of pointers that state the exact layout type. IMHO, that would 
clarify the implementation and make it clearer that it's correct to access 
the data buffer as a flat array. (Obviously, code that does that is subject 
to future changes, that's why there are macros.)

Stefan


From v+python at g.nevcal.com  Thu Aug 25 07:34:12 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 24 Aug 2011 22:34:12 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
Message-ID: <4E55DED4.1000803@g.nevcal.com>

On 8/24/2011 7:29 PM, Guido van Rossum wrote:
> (Hey, I feel a QOTW coming. "Standards? We don't need no stinkin'
> standards."http://en.wikipedia.org/wiki/Stinking_badges  :-)

Which deserves an appropriate, follow-on, misquote:

Guido says the Unicode standard stinks.

??? <- and a Unicode smiley to go with it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110824/8d1c675d/attachment.html>

From stephen at xemacs.org  Thu Aug 25 07:58:10 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 25 Aug 2011 14:58:10 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CADiSq7fAn13rDb_cbzBh7VsEiC59v_eeMzynAsHgG3=ddCqe+Q@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<CADiSq7d6p_-k6M20uiF8JFj++R0Gh2FvTcjDLioVnGcQuBqTag@mail.gmail.com>
	<CAP7+vJ+CUo=Tausdc_91gB+ec9owY8iZDGNth31G5_X0Du9uVg@mail.gmail.com>
	<CADiSq7fAn13rDb_cbzBh7VsEiC59v_eeMzynAsHgG3=ddCqe+Q@mail.gmail.com>
Message-ID: <87sjoq9gnh.fsf@uwakimon.sk.tsukuba.ac.jp>

Nick Coghlan writes:
 > GvR writes:

 > > Let's just define a Unicode string to be a sequence of code points and
 > > let libraries deal with the rest. Ok, methods like lower() should
 > > consider characters, but indexing/slicing should refer to code points.
 > > Same for '=='; we can have a library that compares by applying (or
 > > assuming?) certain normalizations. Tom C tells me that case-less
 > > comparison cannot use a.lower() == b.lower(); fine, we can add that
 > > operation to the library too. But this exceeds the scope of PEP 393,
 > > right?
 > 
 > Yep, I was agreeing with you on this point - I think you're right that
 > if we provide a solid code point based core Unicode type (perhaps with
 > some character based methods), then library support can fill the gap
 > between handling code points and handling characters.

+1  I don't really see an alternative to this approach.  The
underlying array has to be exposed because there are too many
applications that can take advantage of it, and analysis of decomposed
characters requires it.

Making that array be an array of code points is a really good idea,
and Python already has that in the UCS-4 build.  PEP 393 is "just" a
space optimization that allows getting rid of the narrow build, with
all its wartiness.

 > something to incorporate directly around the 3.4 timeframe (or
 > potentially even in 3.3, since it should already be possible to
 > develop such a wrapper based on UCS4 builds of 3.2)

I agree that it's possible, but I estimate that it's not feasible for
3.3 because we don't yet know the requirements.  This one really needs
to ferment and mature in PyPI for a while because we just don't know
how far the scope of user needs is going to extend.  Bidi is a
mudball[1], confusable character indexes sound like a cool idea for
the web and email but is anybody really going to use them?, etc.

 > However, there may an important distinction to be made on the
 > Python-the-language vs CPython-the-implementation front: is another
 > implementation (e.g. PyPy) *allowed* to implement character based
 > indexing instead of code point based for 2.x unicode/3.x str type? Or
 > is the code point indexing part of the language spec, and any
 > character based indexing needs to be provided via a separate type or
 > module?

+1 for language spec.  Remember, there are cases in Unicode where
you'd like to access base characters and the like.  So you need to be
able to get at individual code points in an NFD string.  You shouldn't
need to use different code for that in different implementations of
Python.

Footnotes: 
[1]  Sure, we can implement the UAX#9 bidi algorithm, but it's not
good enough by itself: something as simple as

    "File name (default {0}): ".format(name)

can produce disconcerting results if the whole resulting string is
treated by the UBA.  Specifically, using the usual convention of
uppercase letters being an RTL script, name = "ABCD" will result in
the prompt:

    File name (default :(DCBA _

(where _ denotes the position of the insertion cursor).  The Hebrew
speakers on emacs-devel agreed that an example using a real Hebrew
string didn't look right to them, either.

From jxo6948 at rit.edu  Thu Aug 25 09:12:56 2011
From: jxo6948 at rit.edu (John O'Connor)
Date: Thu, 25 Aug 2011 00:12:56 -0700
Subject: [Python-Dev] FileSystemError or FilesystemError?
In-Reply-To: <CAJ-9HZ2OezdrySeUzbub1Ohy4XPb9SGOVKQuAtv_sng=sU2ciw@mail.gmail.com>
References: <20110823202004.0bb63490@pitrou.net>
	<CAF-Rda8fmf02HsPdhRsbk+APnGXh=5cX_zybn1G39hgRpCP14Q@mail.gmail.com>
	<CAJ-9HZ2OezdrySeUzbub1Ohy4XPb9SGOVKQuAtv_sng=sU2ciw@mail.gmail.com>
Message-ID: <CABCbifVb5ADuqQz4cgwJYDEwkOBDy_oBMkQxqsj+1TSWcROZVA@mail.gmail.com>

+1 FileSystemError - For already stated reasons.

- John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110825/26ce8582/attachment.html>

From greg.ewing at canterbury.ac.nz  Thu Aug 25 05:34:27 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 25 Aug 2011 15:34:27 +1200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
Message-ID: <4E55C2C3.3060205@canterbury.ac.nz>

On 25/08/11 14:29, Guido van Rossum wrote:
> Let's get things
> right so users won't have to worry about code points vs. code units
> any more.

What about things like the surrogateescape codec that
deliberately use code units in non-standard ways? Will
tricks like that still be possible if the code-unit
level is hidden from the programmer?

-- 
Greg

From martin at v.loewis.de  Thu Aug 25 09:36:06 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 25 Aug 2011 09:36:06 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>	<4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>	<j33kvu$f9d$1@dough.gmane.org>	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
Message-ID: <4E55FB66.8030802@v.loewis.de>

>>    Strings contain Unicode code units, which for most purposes can be
>>    treated as Unicode characters.  However, even as "simple" an
>>    operation as "s1[0] == s2[0]" cannot be relied upon to give
>>    Unicode-conforming results.
>>
>> The second sentence remains true under PEP 393.
> 
> Really? If strings contain code units, that expression compares code
> units. What is non-conforming about comparing two code points? They
> are just integers.
> 
> Seriously, what does Unicode-conforming mean here?

I think he's referring to combining characters and normal forms. 2.12
starts with

"In cases involving two or more sequences considered to be equivalent,
the Unicode Standard does not prescribe one particular sequence as being
the  correct one; instead, each  sequence is merely equivalent to the
others"

That could be read to imply that the == operator should determine
whether two strings are equivalent. However, the Unicode standard
clearly leaves API design to the programming environment, and has
the notion of conformance only for processes. So saying that Python
is or is not unicode-conforming is, strictly speaking, meaningless.

The closest conformance requirement in that respect is C6

"A process shall not assume that the interpretations of two
canonical-equivalent character sequences are distinct."

However, that explicitly does *not* support the conformance statement
that Stephen made. They elaborate

"Ideally, an implementation would always interpret two
canonical-equivalent  character sequences identically. There are
practical circumstances under which  implementations may reasonably
distinguish them."

So practicality beats purity even in Unicode conformance: the
== operator of Python can reasonably treat equivalent strings
as unequal (and there is a good reason for that, indeed). Processes
should not expect that other applications make the same distinction,
so they need to cope if it matters to them. There are different way
to do that:
- normalize all strings on input, and then use ==
- use a different comparison operation that always normalizes
  its input first

> This I agree with (though if you were referring to me with
> "leadership" I consider myself woefully underinformed about Unicode
> subtleties). I also suspect that Unicode "conformance" (however
> defined) is more part of a political battle than an actual necessity.

Fortunately, it's much better than that. Unicode had very clear
conformance requirements for a long time, and they aren't hard
to meet.

Wrt. C6, Python could certainly improve, e.g. by caching whether
a string had been determined to be in normal form, so that applications
can more reasonably apply normalization to all strings they ever
want to compare.

Regards,
Martin

From martin at v.loewis.de  Thu Aug 25 09:45:48 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 25 Aug 2011 09:45:48 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>	<j33kvu$f9d$1@dough.gmane.org>	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E55FDAC.9010605@v.loewis.de>

>  > What is non-conforming about comparing two code points?
> 
> Unicode conformance means treating characters correctly.

Re-read the text. You are interpreting something that isn't there.


>  > Seriously, what does Unicode-conforming mean here?
> 
> Chapter 3, all verses.  Here, specifically C6, p. 60.  One would have
> to define the process executing "s1[0] == s2[0]" to be sure that even
> in the cases cited in the previous paragraph non-conformance is
> occurring

No, that's explicitly *not* what C6 says. Instead, it says that a
process that treats s1 and s2 differently shall not assume that others
will do the same, i.e. that it is ok to treat them the same even though
they have different code points. Treating them differently is also
conforming.

Regards,
Martin

From martin at v.loewis.de  Thu Aug 25 09:50:08 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 25 Aug 2011 09:50:08 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E55C2C3.3060205@canterbury.ac.nz>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>	<j33kvu$f9d$1@dough.gmane.org>	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<4E55C2C3.3060205@canterbury.ac.nz>
Message-ID: <4E55FEB0.6070706@v.loewis.de>

> What about things like the surrogateescape codec that
> deliberately use code units in non-standard ways? Will
> tricks like that still be possible if the code-unit
> level is hidden from the programmer?

Most certainly. In the PEP-393 representation, the surrogate
characters can readily be represented (and would imply atleast
the two-byte form), but they will never take their UTF-16
function (i.e. the UTF-8 codec won't try to combine surrogate
pairs), so they can be used for surrogateescape and other
functions. Of course, in strict error mode, codecs will
refuse to encode them (notice that surrogateescape is an error
handler, not a codec).

Regards,
Martin


From martin at v.loewis.de  Thu Aug 25 10:24:39 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 25 Aug 2011 10:24:39 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <20110824203228.3e00874d@pitrou.net>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
Message-ID: <4E5606C7.9000404@v.loewis.de>

> With this PEP, the unicode object overhead grows to 10 pointer-sized
> words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine.
> Does it have any adverse effects?

If I count correctly, it's only three *additional* words (compared to
3.2): four new ones, minus one that is removed. In addition, it drops
a memory block. Assuming a malloc overhead of two pointers per malloc
block, we get one additional pointer.

On a 32-bit machine with a 32-bit wchar_t, pure-ASCII strings of length
1 (+NUL) will take the same memory either way: 8 bytes for the
characters in 3.2, 2 bytes in 3.3 + extra pointer + padding. Strings
of 2 or more characters will take more space in 3.2.

On a 32-bit machine with a 16-bit wchar_t, pure-ASCII strings up
to 3 characters take the same space either way; space savings start at
four characters.

On a 64-bit machine with a 16-bit wchar_t, assuming a malloc minimum
block size of 16 bytes, pure-ASCII strings of up to 7 characters take
the same space. For 8 characters, 3.2 will need 32 bytes for the
characters, whereas 3.3 will only take 16 bytes (due to padding).

So: no, I can't see any adverse effects. Details depend on the
malloc implementation, though. A slight memory increase may occur
on compared to a narrow build may occur for strings that use
non-Latin-1, and a large increase for strings that use non-BMP
characters.

The real issue of memory consumption is the alternative representations,
if created. That applies for the default encoding in 3.2 as well as
the wchar_t and UTF-8 representations in 3.3.

> Are there any plans to make instantiation of small strings fast enough?
> Or is it already as fast as it should be?

I don't have any plans, and I don't see potential. Compared to 3.2, it
saves a malloc call, which may be quite an improvement. OTOH, it needs
to iterate over the characters twice, to find the largest character.

If you are referring to the reuse of Unicode objects: that's currently
not done, and is difficult to do in the 3.2 way due to the various sizes
of characters. One idea might be to only reuse UCS1 strings, and then
keep a freelist for these based on the string length.

> When interfacing with the Win32 "wide" APIs, what is the recommended
> way to get the required LPCWSTR?

As before: PyUnicode_AsUnicode.

> Will the format codes returning a Py_UNICODE pointer with
> PyArg_ParseTuple be deprecated?

Not for 3.3, no.

> Do you think the wstr representation could be removed in some future
> version of Python?

Yes. This probably has to wait for Python 4, though.

> Is PyUnicode_Ready() necessary for all unicode objects, or only those
> allocated through the legacy API?

Only for the latter (although it doesn't hurt to apply it to all
of them).

> ?The Py_Unicode representation is not instantaneously available?: you
> mean the Py_UNICODE representation?

Thanks, fixed.

>> - conditions you would like to pose on the implementation before
>>   acceptance. I'll see which of these can be resolved, and list
>>   the ones that remain open.
> 
> That it doesn't significantly slow down benchmarks such as stringbench
> and iobench.

Can you please quantify "significantly"? Also, having a complete list
of benchmarks to perform prior to acceptance would be helpful.

Thanks,
Martin

From victor.stinner at haypocalc.com  Thu Aug 25 11:10:50 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 25 Aug 2011 11:10:50 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>	<j33kvu$f9d$1@dough.gmane.org>	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E56119A.8010900@haypocalc.com>

Le 25/08/2011 06:12, Stephen J. Turnbull a ?crit :
>   >  Let's take small steps. Do the evolutionary thing. Let's get things
>   >  right so users won't have to worry about code points vs. code units
>   >  any more. A conforming library for all things at the character level
>   >  can be developed later, once we understand things better at that level
>   >  (again, most developers don't even understand most of the subtleties,
>   >  so I claim we're not ready).
>
> I don't think anybody does.  That's one reason there's a new version
> of Unicode every few years.

It took some weeks (months?) to write the PEP, and months to implement 
it. This PEP is only a minor change of the implementation of Unicode in 
Python. A larger change will take much more time (and maybe change/break 
the C and/or Python API a little bit more).

If you are able to implement your specfication (a Unicode type with a 
"real" character API), please write a PEP and implement it. You may 
begin with a prototype in Python, and then rewrite it in C.

But I don't think that any core developer will do that for you. It's not 
how free software works. At least, I don't think that anyone will do 
that for free :-) (I bet that many developers will accept to implement 
that for money :-))

Victor

From victor.stinner at haypocalc.com  Thu Aug 25 11:17:06 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 25 Aug 2011 11:17:06 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <j34k3q$nk8$1@dough.gmane.org>
References: <4E553FBC.7080501@v.loewis.de>
	<20110824203228.3e00874d@pitrou.net>	<201108250029.19506.victor.stinner@haypocalc.com>
	<j34k3q$nk8$1@dough.gmane.org>
Message-ID: <4E561312.3030404@haypocalc.com>

Le 25/08/2011 06:46, Stefan Behnel a ?crit :
>> Conversion to wchar_t* is common, especially on Windows.
>
> That's an issue. However, I cannot say how common this really is in
> practice. Surely depends on the specific code, right? How common is it
> in core CPython?

Quite all functions taking text as argument on Windows expects wchar_t* 
strings (UTF-16). In Python, we pass a "Py_UNICODE*" 
(PyUnicode_AS_UNICODE or PyUnicode_AsUnicode) because Py_UNICODE is 
wchar_t on Windows.

Victor

From stephen at xemacs.org  Thu Aug 25 11:39:46 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 25 Aug 2011 18:39:46 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E55FDAC.9010605@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E55FDAC.9010605@v.loewis.de>
Message-ID: <87liuhakyl.fsf@uwakimon.sk.tsukuba.ac.jp>

"Martin v. L?wis" writes:

 > No, that's explicitly *not* what C6 says. Instead, it says that a
 > process that treats s1 and s2 differently shall not assume that others
 > will do the same, i.e. that it is ok to treat them the same even though
 > they have different code points. Treating them differently is also
 > conforming.

Then what requirement does C6 impose, in your opinion?  It sounds like
you don't think it imposes any, in practice.

Note that in the discussion of C6, the standard says,

- Ideally, an implementation would *always* interpret two
  canonical-equivalent sequences *identically*.  There are practical
  circumstances under which implementations may reasonably distinguish
  them.  (Emphasis mine.)

The examples given are things like "inspecting memory representation
structure" (which properly speaking is really outside of Unicode
conformance) and "ignoring collation behavior of combining sequences
outside the repertoire of a specified language."  That sounds like
"Special cases aren't special enough to break the rules. Although
practicality beats purity." to me.  Treating things differently is an
exceptional case, that requires sufficient justification.

My understanding is that if those strings are exchanged with an
another process, then whether or not treating them differently is
allowed depends on whether the results will be output to another
process, and what the definition of our process is.  Sometimes it will
be allowed, but mostly it won't.  Take file names as an example.

If our process is working with an external process (the OS's file
system driver) whose definition includes the statement that "File
names are sequences of Unicode characters", then C6 says our process
must compare canonically equivalent sequences that it takes to be file
names as the same, whether or not they are in the same normalized
form, or normalized at all, because we can't assume the file system
will treat them as different.  If we do treat them as different, our
users will get very upset (eg, if we don't signal a duplicate file
name input by the user, and then the OS proceeds to overwrite an
existing file).

Dually, having made the statement that file names are Unicode, C6 says
that the OS driver must return the same file given two canonically
equivalent strings that happen to have different code points in them,
because it may not assume that *we* will treat those strings as
different names of different files.

*Users* will certainly take the viewpoint that two strings that
display the same on their monitor should identify the same file when
they use them as file names.

Now, I'm *not* saying that Python's strings *should* conform to the
Unicode standard in this respect yet (or ever, for that matter; I'm
with Guido on that).  I'm simply saying that the current
implementation of strings, as improved by PEP 393, can not be said to
be conforming.

I would like to see something much more conformant done as a separate
library (the Python Components for Unicode, say), intended to support
users who need character-based behavior, Unicode-ly correct collation,
etc., more than efficiency.  Applications that need both will have to
make their own way at first, either by contributing improvements to
the library or by using application-specific algorithms.


From martin at v.loewis.de  Thu Aug 25 11:57:53 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 25 Aug 2011 11:57:53 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87liuhakyl.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>	<j33kvu$f9d$1@dough.gmane.org>	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>	<87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>	<4E55FDAC.9010605@v.loewis.de>
	<87liuhakyl.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E561CA1.8020500@v.loewis.de>

Am 25.08.2011 11:39, schrieb Stephen J. Turnbull:
> "Martin v. L?wis" writes:
> 
>  > No, that's explicitly *not* what C6 says. Instead, it says that a
>  > process that treats s1 and s2 differently shall not assume that others
>  > will do the same, i.e. that it is ok to treat them the same even though
>  > they have different code points. Treating them differently is also
>  > conforming.
> 
> Then what requirement does C6 impose, in your opinion? 

In IETF terminology, it's a weak SHOULD requirement. Unless there are
reasons not to, equivalent strings should be treated differently. It's
a weak requirement because the reasons not to treat them equivalent are
wide-spread.

> - Ideally, an implementation would *always* interpret two
>   canonical-equivalent sequences *identically*.  There are practical
>   circumstances under which implementations may reasonably distinguish
>   them.  (Emphasis mine.)

Ok, so let me put emphasis on *ideally*. They acknowledge that for
practical reasons, the equivalent strings may need to be
distinguished.

> The examples given are things like "inspecting memory representation
> structure" (which properly speaking is really outside of Unicode
> conformance) and "ignoring collation behavior of combining sequences
> outside the repertoire of a specified language."  That sounds like
> "Special cases aren't special enough to break the rules. Although
> practicality beats purity." to me.  Treating things differently is an
> exceptional case, that requires sufficient justification.

And the common justification is efficiency, along with the desire
to support the representation of unnormalized strings (else there
would be an efficient implementation).

> If our process is working with an external process (the OS's file
> system driver) whose definition includes the statement that "File
> names are sequences of Unicode characters", then C6 says our process
> must compare canonically equivalent sequences that it takes to be file
> names as the same, whether or not they are in the same normalized
> form, or normalized at all, because we can't assume the file system
> will treat them as different.

It may well happen that this requirement is met in a plain Python
application. If the file system and GUI libraries always return
NFD strings, then the Python process *will* compare equivalent
sequences correctly (since it won't ever get any other
representations).

> *Users* will certainly take the viewpoint that two strings that
> display the same on their monitor should identify the same file when
> they use them as file names.

Yes, but that's the operating system's choice first of all.
Some operating systems do allow file names in a single directory
that are equivalent yet use different code points. Python then
needs to support this operating system, despite the permission of the
Unicode standard to ignore the difference.

> I'm simply saying that the current
> implementation of strings, as improved by PEP 393, can not be said to
> be conforming.

I continue to disagree. The Unicode standard deliberately allows
Python's behavior as conforming.

> I would like to see something much more conformant done as a separate
> library (the Python Components for Unicode, say), intended to support
> users who need character-based behavior, Unicode-ly correct collation,
> etc., more than efficiency.

Wrt. normalization, I think all that's needed is already there.
Applications just need to normalize all strings to a normal form of
their liking, and be done. That's easier than using a separate library
throughout the code base (let alone using yet another string type).

Regards,
Martin

From solipsis at pitrou.net  Thu Aug 25 13:27:34 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 25 Aug 2011 13:27:34 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5606C7.9000404@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de>
Message-ID: <20110825132734.1c236d17@pitrou.net>


Hello,

On Thu, 25 Aug 2011 10:24:39 +0200
"Martin v. L?wis" <martin at v.loewis.de> wrote:
> 
> On a 32-bit machine with a 32-bit wchar_t, pure-ASCII strings of length
> 1 (+NUL) will take the same memory either way: 8 bytes for the
> characters in 3.2, 2 bytes in 3.3 + extra pointer + padding. Strings
> of 2 or more characters will take more space in 3.2.
> 
> On a 32-bit machine with a 16-bit wchar_t, pure-ASCII strings up
> to 3 characters take the same space either way; space savings start at
> four characters.
> 
> On a 64-bit machine with a 16-bit wchar_t, assuming a malloc minimum
> block size of 16 bytes, pure-ASCII strings of up to 7 characters take
> the same space. For 8 characters, 3.2 will need 32 bytes for the
> characters, whereas 3.3 will only take 16 bytes (due to padding).

That's very good. For future reference, could you add this information
to the PEP?

> >> - conditions you would like to pose on the implementation before
> >>   acceptance. I'll see which of these can be resolved, and list
> >>   the ones that remain open.
> > 
> > That it doesn't significantly slow down benchmarks such as stringbench
> > and iobench.
> 
> Can you please quantify "significantly"? Also, having a complete list
> of benchmarks to perform prior to acceptance would be helpful.

I would say no more than a 15% slowdown on each of the following
benchmarks:

- stringbench.py -u
  (http://svn.python.org/view/sandbox/trunk/stringbench/)
- iobench.py -t
  (in Tools/iobench/)
- the json_dump, json_load and regex_v8 tests from
  http://hg.python.org/benchmarks/

I believe these are representative of string-heavy operations.

Additionally, it would be nice if you could run at least some of the
test_bigmem tests, according to your system's available RAM.

Regards

Antoine.

From ncoghlan at gmail.com  Thu Aug 25 13:54:36 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 25 Aug 2011 21:54:36 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E561CA1.8020500@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E55FDAC.9010605@v.loewis.de>
	<87liuhakyl.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E561CA1.8020500@v.loewis.de>
Message-ID: <CADiSq7dR7gOyhaC=5PXOfddHVNNcP5QhMVUaGaSuY6F89+rV3g@mail.gmail.com>

On Thu, Aug 25, 2011 at 7:57 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Am 25.08.2011 11:39, schrieb Stephen J. Turnbull:
>> I'm simply saying that the current
>> implementation of strings, as improved by PEP 393, can not be said to
>> be conforming.
>
> I continue to disagree. The Unicode standard deliberately allows
> Python's behavior as conforming.

I'd actually put it slightly differently: it seems to me that Python,
in and of itself, can neither conform to nor violate that part of the
standard, since conformance depends on how the *application* processes
the data.

However, we can make it harder or easier for applications to be
conformant. UCS2 builds make it harder, since some code points have to
be represented as code units internally. UCS4 builds and future PEP
393 builds (which should exhibit current UCS4 build semantics at the
Python layer) make it easier, since the internal representation
consistently uses code points, with code units only appearing as part
of the encoding and decoding process.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From stephen at xemacs.org  Thu Aug 25 13:58:30 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 25 Aug 2011 20:58:30 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E561CA1.8020500@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E55FDAC.9010605@v.loewis.de>
	<87liuhakyl.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E561CA1.8020500@v.loewis.de>
Message-ID: <87ipplaejd.fsf@uwakimon.sk.tsukuba.ac.jp>

"Martin v. L?wis" writes:

 > Am 25.08.2011 11:39, schrieb Stephen J. Turnbull:
 > > "Martin v. L?wis" writes:
 > > 
 > >  > No, that's explicitly *not* what C6 says. Instead, it says that a
 > >  > process that treats s1 and s2 differently shall not assume that others
 > >  > will do the same, i.e. that it is ok to treat them the same even though
 > >  > they have different code points. Treating them differently is also
 > >  > conforming.
 > > 
 > > Then what requirement does C6 impose, in your opinion? 
 > 
 > In IETF terminology, it's a weak SHOULD requirement. Unless there are
 > reasons not to, equivalent strings should be treated differently. It's
 > a weak requirement because the reasons not to treat them equivalent are
 > wide-spread.

There are no "weak SHOULDs" and no "wide-spread reasons" in RFC 2119.
RFC 2119 specifies "particular circumstances" and "full implications"
that are "carefully weighed" before varying from SHOULD behavior.

IMHO the Unicode Standard intends a full RFC 2119 "SHOULD" here.

 > Yes, but that's the operating system's choice first of all.  Some
 > operating systems do allow file names in a single directory that
 > are equivalent yet use different code points. Python then needs to
 > support this operating system, despite the permission of the
 > Unicode standard to ignore the difference.

Sure, and that's one of several such reasons why I think the PEP's
implementation of unicodes as arrays of code points is an optimal
balance.  But the Unicode standard does not "permit" ignoring the
difference here, except in the sense that *the Unicode standard
doesn't apply at all* and therefore doesn't forbid it.  The OSes in
question are not conforming processes, and presumably don't claim to
be.

Because most of the processes Python interacts with won't be
conforming processes (not even the majority of textual applications,
for a while), Python does not need to be, and *should not* be, a
conforming Unicode process for most of what it does.  Not even for
much of its text processing.

Also, to the extent that Python is a general-purpose language, I see
nothing wrong and lots of good in having a non-conformant code point
array type as the platform for implementing conforming Unicode
library(ies).

But this is not user/developer-friendly at all:

 > Wrt. normalization, I think all that's needed is already there.
 > Applications just need to normalize all strings to a normal form of
 > their liking, and be done. That's easier than using a separate library
 > throughout the code base (let alone using yet another string type).

But many users have never heard of normalization.  And that's *just*
normalization.  There is a whole raft of other requirements for
conformance (collation, case, etc).

The point is that with such a library and string type, various aspects
of conformance to Unicode, as well as conformance to associated
standards (eg, the dreaded UTS #18 ;-) can be added to the library
over time, and most users (those who don't need to squeeze every ounce
of performance out of Python) can be blissfully unaware of what, if
anything, they're conforming to.  Just upgrade the library to get the
best Unicode support (in terms of conformance) that Python has to
offer.

But for the reasons you (and Guido and Nick and ...) give, it's not
reasonable to put all that into core Python, not anytime soon.  Not to
mention that as a work-in-progress, it can hardly be considered stable
enough for the stdlib.

That is what Terry Reedy is getting at, AIUI.  "Batteries included"
should mean as much Unicode conformance as we can reasonably provide
should be *conveniently* available.  The ideal (given the caveat about
efficiency) would be *one* import statement and a ConformingUnicode type
that acts "just like a string" in all ways, except that (1) it indexes
and counts on characters (preferably "grapheme clusters" :-), (2) does
collation, regexps, and the like conformant to the Unicode standard,
and (3) may be quite inefficient from the point of view of bit-
shoveling net applications and the like.

Of course most of (2) is going to take quite a while, but (1) and (3)
should not be that hard to accomplish (especially (3) ;-).

 > > I'm simply saying that the current implementation of strings, as
 > > improved by PEP 393, can not be said to be conforming.
 > 
 > I continue to disagree. The Unicode standard deliberately allows
 > Python's behavior as conforming.

That's up to you.  I doubt very many users or application developers
will see it that way, though.  I think they would prefer that we be
conservative about what we call "conformant", and tell them precisely
what they need to do to get what they consider conformant behavior
from Python.  That's easier if we share definitions of conformant with
them.  And surely there would be great joy on the battlements if there
were a one-import way to spell "all the Unicode conformance you can
give me, please".

The problem with your legalistic approach, as I see it, is that if our
definition is looser than the users', all their surprises will be
unpleasant.  That's not good.

From ncoghlan at gmail.com  Thu Aug 25 13:59:31 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 25 Aug 2011 21:59:31 +1000
Subject: [Python-Dev] [Python-checkins] devguide: #12792: document the
 "type" field of the tracker.
In-Reply-To: <E1QvcKL-0000ia-Ep@dinsdale.python.org>
References: <E1QvcKL-0000ia-Ep@dinsdale.python.org>
Message-ID: <CADiSq7eYyJ3ktB+wN7ifYdHjUm7eK8xwEb=Bpzwi1+FWWJMznQ@mail.gmail.com>

On Tue, Aug 23, 2011 at 7:46 AM, ezio.melotti
<python-checkins at python.org> wrote:
> +security
> + ? ?Issues that might have security implications. ?If you think the issue
> + ? ?should not be made public, please report it to security at python.org instead.

A link to http://www.python.org/news/security/ would be handy here,
since that has the GPG key to send encrypted messages to the security
list.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Thu Aug 25 14:01:00 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 25 Aug 2011 22:01:00 +1000
Subject: [Python-Dev] [Python-checkins] devguide: #12792: document the
 "type" field of the tracker.
In-Reply-To: <CADiSq7eYyJ3ktB+wN7ifYdHjUm7eK8xwEb=Bpzwi1+FWWJMznQ@mail.gmail.com>
References: <E1QvcKL-0000ia-Ep@dinsdale.python.org>
	<CADiSq7eYyJ3ktB+wN7ifYdHjUm7eK8xwEb=Bpzwi1+FWWJMznQ@mail.gmail.com>
Message-ID: <CADiSq7doXbk-frYvxzW9qjjCxQPvMjZkYea5+EM--JLSTrh-SA@mail.gmail.com>

On Thu, Aug 25, 2011 at 9:59 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> A link to http://www.python.org/news/security/ would be handy here,
> since that has the GPG key to send encrypted messages to the security
> list.

http://www.python.org/security/ is a better variant of the link,
though (it redirects to the security advisory page, but looks nicer)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From facundobatista at gmail.com  Thu Aug 25 14:42:10 2011
From: facundobatista at gmail.com (Facundo Batista)
Date: Thu, 25 Aug 2011 09:42:10 -0300
Subject: [Python-Dev] DNS problem with ar.pycon.org
Message-ID: <CAM09pzR=MagBSEKGeWc8jR7qWRy6dxEY1AOt5926PhW3v__eDw@mail.gmail.com>

Sorry for the crossposting, but I don't know who admins the pycon.org site.

it seems that something happened to "ar.pycon.org", it should point to
the same IP than "pycon.python.org.ar" (190.228.30.157).

Somebody knows who can fix it?

BTW, how do I update that page? We're having the third PyCon in
Argentina this year...

Thank you!

-- 

.? ? Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From torsten.becker at gmail.com  Thu Aug 25 20:12:04 2011
From: torsten.becker at gmail.com (Torsten Becker)
Date: Thu, 25 Aug 2011 14:12:04 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j327di$cr1$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j305jf$e7d$1@dough.gmane.org> <4E53A87A.1070306@v.loewis.de>
	<j30buf$pe6$1@dough.gmane.org> <20110823160820.08754ffe@pitrou.net>
	<CAP_a28HW07E=A7A7oAoHymUY1XBKwLTRXeJNLaLoQMazEw=eVA@mail.gmail.com>
	<j327di$cr1$1@dough.gmane.org>
Message-ID: <CAP_a28EQzBOy1R2RpHW-tTHGxTZP+HC-Ss_ypsAmD8Gf0PHZng@mail.gmail.com>

Okay, I am convinced. :)   If Martin does not object, I would change
the "void *str" field to

     union {
         void *any;
         unsigned char *latin1;
         Py_UCS2 *ucs2;
         Py_UCS4 *ucs4;
     } data;


Regards,
Torsten


On Wed, Aug 24, 2011 at 02:57, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Torsten Becker, 24.08.2011 04:41:
>>
>> Also, common, now simple, checks for "unicode->str == NULL" would look
>> more ambiguous with a union ("unicode->str.latin1 == NULL").
>
> You could just add yet another field "any", i.e.
>
> ? ?union {
> ? ? ? unsigned char* latin1;
> ? ? ? Py_UCS2* ucs2;
> ? ? ? Py_UCS4* ucs4;
> ? ? ? void* any;
> ? ?} str;
>
> That way, the above test becomes
>
> ? ?if (!unicode->str.any)
>
> or
>
> ? ?if (unicode->str.any == NULL)
>
> Or maybe even call it "initialised" to match the intended purpose:
>
> ? ?if (!unicode->str.initialised)
>
> That being said, I don't mind "unicode->str.latin1 == NULL" either, given
> that it will (as mentioned by others) be hidden behind a macro most of the
> time anyway.
>
> Stefan

From stefan_ml at behnel.de  Thu Aug 25 20:47:25 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 25 Aug 2011 20:47:25 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E553FBC.7080501@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de>
Message-ID: <j365bt$p1o$1@dough.gmane.org>

"Martin v. L?wis", 24.08.2011 20:15:
> - issues to be considered (unclarities, bugs, limitations, ...)

A problem of the current implementation is the need for calling 
PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g. due to 
insufficient memory). Basically, this means that even something as trivial 
as trying to get the length of a Unicode string can now result in an error.

I just noticed this when rewriting Cython's helper function that searches a 
unicode string for a (Py_UCS4) character. Previously, the entire function 
was safe, could never produce an error and therefore always returned a 
boolean result. In the new world, the caller of this function must check 
and propagate errors. This may not be a major issue in most cases, but it 
can have a non-trivial impact on user code, depending on how deep in a call 
chain this happens and on how much control the user has over the call chain 
(think of a C callback, for example).

Also, even in the case that there is no error, the potential need to build 
up the string on request means that the run time and memory requirements of 
an algorithm are less predictable now as they depend on the origin of the 
input and not just its Python level string content.

I would be happier with an implementation that avoided this by always 
instantiating the data buffer right from the start, instead of carrying 
only a Py_UNICODE buffer for old-style instances.

Stefan


From lukasz at langa.pl  Thu Aug 25 22:37:30 2011
From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=)
Date: Thu, 25 Aug 2011 22:37:30 +0200
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <CAPdtAj3wv2WCePdYM3qRcbRvLfzhAp2G1JpvRjd6-ttw2d1Q2A@mail.gmail.com>
References: <4E4AF610.5040303@simplistix.co.uk>
	<CAPdtAj3wv2WCePdYM3qRcbRvLfzhAp2G1JpvRjd6-ttw2d1Q2A@mail.gmail.com>
Message-ID: <CE7B4BC9-F40C-420F-9CDC-1F23C9812EC4@langa.pl>


Wiadomo?? napisana przez Sandro Tosi w dniu 23 sie 2011, o godz. 01:09:

> What I want to understand if it's an acceptable change.
> 
> I see sphinx more as of an internal, building tool, so freezing it
> it's like saying "don't upgrade gcc" or so.

Normally I'd say it's natural for us to specify that for a legacy release we're using build tools in versions up to so-and-so. Plus, requiring changes in the repository additionally points that this is indeed touching "frozen" code.

In case of 2.7 though, it's our "LTS release" so I think if Georg agrees, I'm also in favor of the upgrade.

As for Sphinx using svn.python.org, the main issue is not altering the scripts to use Hg, it's the weight of the whole Sphinx repository that would have to be cloned for each distclean. By using SVN you're only downloading a specifically tagged source tree.

-- 
Best regards,
?ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.


Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??!
Please consider the environment before printing out this e-mail.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110825/630fdb84/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 1898 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110825/630fdb84/attachment.jpg>

From stefan_ml at behnel.de  Thu Aug 25 23:30:13 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 25 Aug 2011 23:30:13 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <j365bt$p1o$1@dough.gmane.org>
References: <4E553FBC.7080501@v.loewis.de> <j365bt$p1o$1@dough.gmane.org>
Message-ID: <j36et5$oa6$1@dough.gmane.org>

Stefan Behnel, 25.08.2011 20:47:
> "Martin v. L?wis", 24.08.2011 20:15:
>> - issues to be considered (unclarities, bugs, limitations, ...)
>
> A problem of the current implementation is the need for calling
> PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g. due to
> insufficient memory). Basically, this means that even something as trivial
> as trying to get the length of a Unicode string can now result in an error.

Oh, and the same applies to PyUnicode_AS_UNICODE() now. I doubt that there 
is *any* code out there that expects this macro to ever return NULL. This 
means that the current implementation has actually broken the old API. Just 
allocate an "80% of your memory" long string using the new API and then 
call PyUnicode_AS_UNICODE() on it to see what I mean.

Sadly, a quick look at a couple of recent commits in the pep-393 branch 
suggested that it is not even always obvious to you as the authors which 
macros can be called safely and which cannot. I immediately spotted a bug 
in one of the updated core functions (unicode_repr, IIRC) where 
PyUnicode_GET_LENGTH() is called without a previous call to 
PyUnicode_FAST_READY().

I find it everything but obvious that calling PyUnicode_DATA() and 
PyUnicode_KIND() is safe as long as the return value is being checked for 
errors, but calling PyUnicode_GET_LENGTH() is not safe unless there was a 
previous call to PyUnicode_Ready().


> I just noticed this when rewriting Cython's helper function that searches a
> unicode string for a (Py_UCS4) character. Previously, the entire function
> was safe, could never produce an error and therefore always returned a
> boolean result. In the new world, the caller of this function must check
> and propagate errors. This may not be a major issue in most cases, but it
> can have a non-trivial impact on user code, depending on how deep in a call
> chain this happens and on how much control the user has over the call chain
> (think of a C callback, for example).
>
> Also, even in the case that there is no error, the potential need to build
> up the string on request means that the run time and memory requirements of
> an algorithm are less predictable now as they depend on the origin of the
> input and not just its Python level string content.
>
> I would be happier with an implementation that avoided this by always
> instantiating the data buffer right from the start, instead of carrying
> only a Py_UNICODE buffer for old-style instances.

Stefan


From guido at python.org  Thu Aug 25 23:55:22 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 14:55:22 -0700
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5606C7.9000404@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de>
Message-ID: <CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>

On Thu, Aug 25, 2011 at 1:24 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> With this PEP, the unicode object overhead grows to 10 pointer-sized
>> words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine.
>> Does it have any adverse effects?
>
> If I count correctly, it's only three *additional* words (compared to
> 3.2): four new ones, minus one that is removed. In addition, it drops
> a memory block. Assuming a malloc overhead of two pointers per malloc
> block, we get one additional pointer.
[...]

But strings are allocated via PyObject_Malloc(), i.e. the custom
arena-based allocator -- isn't its overhead (for small objects) less
than 2 pointers per block?

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Fri Aug 26 00:29:34 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 15:29:34 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP_a28HW07E=A7A7oAoHymUY1XBKwLTRXeJNLaLoQMazEw=eVA@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j305jf$e7d$1@dough.gmane.org> <4E53A87A.1070306@v.loewis.de>
	<j30buf$pe6$1@dough.gmane.org> <20110823160820.08754ffe@pitrou.net>
	<CAP_a28HW07E=A7A7oAoHymUY1XBKwLTRXeJNLaLoQMazEw=eVA@mail.gmail.com>
Message-ID: <CAP7+vJL-rBEtvXvK3qGt3+ijYzoFek36Nh-C-Edu64yDqo0B4A@mail.gmail.com>

On Tue, Aug 23, 2011 at 7:41 PM, Torsten Becker
<torsten.becker at gmail.com> wrote:
> On Tue, Aug 23, 2011 at 10:08, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Macros are useful to shield the abstraction from the implementation. If
>> you access the members directly, and the unicode object is represented
>> differently in some future version of Python (say e.g. with tagged
>> pointers), your code doesn't compile anymore.
>
> I agree with Antoine, from the experience of porting C code from 3.2
> to the PEP 393 unicode API, the additional encapsulation by macros
> made it much easier to change the implementation of what is a field,
> what is a field's actual name, and what needs to be calculated through
> a function.
>
> So, I would like to keep primary access as a macro but I see the point
> that it would make the struct clearer to access and I would not mind
> changing the struct to use a union. ?But then most access currently is
> through macros so I am not sure how much benefit the union would bring
> as it mostly complicates the struct definition.

+1

> Also, common, now simple, checks for "unicode->str == NULL" would look
> more ambiguous with a union ("unicode->str.latin1 == NULL").

You could add an extra union field for that:

unicode->str.voidptr == NULL

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Fri Aug 26 00:44:38 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 15:44:38 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAP7+vJ+DRiJY1Eo_YVHRKUXvqW-NDjj2HyFDzozkRWnrXsOE2w@mail.gmail.com>

On Wed, Aug 24, 2011 at 1:22 AM, Stephen J. Turnbull
<turnbull at sk.tsukuba.ac.jp> wrote:
> Well, no, it gives the right answer according to the design. ?unicode
> objects do not contain character strings. ?By design, they contain
> code point strings. ?Guido has made that absolutely clear on a number
> of occasions.

Actually, the situation is that in narrow builds, they contain code
units (which may have surrogates); in wide builds they contain code
points. I think this is the crux of Tom Christian's complaints about
narrow builds.

Here's proof that narrow builds contain code units, not code points
(i.e. use UTF-16, not UCS-2):

$ ./python
Python 2.7.2+ (2.7:498b03a55297, Aug 25 2011, 15:14:01)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxunicode
65535
>>> a = u'\U00012345'
>>> a
u'\U00012345'
>>> len(a)
2
>>>

It's pretty clear that the interpreter is surrogate-aware, which to me
indicates the use of UTF-16.

Now in the PEP 393 branch:

./python
Python 3.3.0a0 (pep-393:c60556059719, Aug 25 2011, 15:31:05)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxunicode
1114111
>>> a = '\U00012345'
>>> a
'?'
>>> len(a)
1
>>>

And some proof that this branch does not care about surrogates:

>>> a = '\ud808'
>>> b = '\udf45'
>>> a
'\ud808'
>>> b
'\udf45'
>>> a + b
'\ud808\udf45'
>>> len(a+b)
2
>>>

However:

a = '\ud808\udf45'
>>> a
'?'
>>> len(a)
1
>>>

Which to me merely shows it is smart when parsing string literals.

(I expect that regular 3.3 narrow builds behave similar to the 2.7
narrow build, and 3.3 wide builds behave similar to the pep-393 build;
I didn't have those lying around.)

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Fri Aug 26 00:54:03 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 15:54:03 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j32igg$hd7$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
Message-ID: <CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>

On Wed, Aug 24, 2011 at 3:06 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> Excuse me for believing the fine 3.2 manual that says
> "Strings contain Unicode characters." (And to a naive reader, that implies
> that string iteration and indexing should produce Unicode characters.)

The naive reader also doesn't know the difference between characters,
code points and code units. It's the advanced, Unicode-aware reader
who is confused by this phrase in the docs. It should say code units;
or perhaps code units for narrow builds and code points for wide
builds. With PEP 393 we can unconditionally say code points, which is
much better. We should try to remove our use of "characters" -- or
else we should *define* our use of the term "characters" as "what the
Unicode standard calls code points".

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Fri Aug 26 01:10:02 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 16:10:02 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5538B7.8010709@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<4E5538B7.8010709@haypocalc.com>
Message-ID: <CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>

[Apologies for sending out a long stream of pointed responses, written
before I have fully digested this entire mega-thread. I don't have the
patience today to collect them all into a single mega-response.]

On Wed, Aug 24, 2011 at 10:45 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Note: Java and the Qt library use also UTF-16 strings and have exactly the
> same "limitations" for str[n] and len(str).

Which reminds me. The PEP does not say what other Python
implementations besides CPython should do. presumably Jython and
IronPython will continue to use UTF-16, so presumably the language
reference will still have to document that strings contain code units
(not code points) and the objections Tom Christiansen raised against
this will remain true for those versions of Python. (I don't know
about PyPy, they can presumably decide when they start their Py3k
port.)

OTOH perhaps IronPython 3.3 and Jython 3.3 can use a similar approach
and we can lay the narrow build issues to rest? Can someone here speak
for them?

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Fri Aug 26 01:31:02 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 16:31:02 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5539D5.60500@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5539D5.60500@v.loewis.de>
Message-ID: <CAP7+vJL=grm-bRLFn3oGT_jpG_utLY4ArLJ0VGKxgO3Bbw=Fvg@mail.gmail.com>

On Wed, Aug 24, 2011 at 10:50 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Not with these words, though. As I recall, it's rather like (still
> with different words) "len() will stay O(1) forever, regardless of
> any perceived incorrectness of this choice".

And indexing/slicing will also be O(1).

> An attempt to change
> the builtins to introduce higher complexity for the sake of correctness
> is what he rejects. I think PEP 393 balances this well, keeping
> the O(1) operations in that complexity, while improving the cross-
> platform "correctness" of these functions.

+1, I am comfortable with the balance struck by the PEP.

-- 
--Guido van Rossum (python.org/~guido)

From ezio.melotti at gmail.com  Fri Aug 26 02:00:10 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Fri, 26 Aug 2011 03:00:10 +0300
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j33nf2$v69$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org> <4E5538B7.8010709@haypocalc.com>
	<j33nf2$v69$1@dough.gmane.org>
Message-ID: <CACBhJdGp-uBm6nbavnU2yhUzoCum=Vz4SEs3on3Z2Jgr6XFaqg@mail.gmail.com>

On Wed, Aug 24, 2011 at 11:37 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> On 8/24/2011 1:45 PM, Victor Stinner wrote:
>
>> Le 24/08/2011 02:46, Terry Reedy a ?crit :
>>
>
>  I don't think that using UTF-16 with surrogate pairs is really a big
>> problem. A lot of work has been done to hide this. For example,
>> repr(chr(0x10ffff)) now displays '\U0010ffff' instead of two characters.
>> Ezio fixed recently str.is*() methods in Python 3.2+.
>>
>
> I greatly appreciate that he did. The * (lower,upper,title) methods
> apparently are not fixed yet as the corresponding new tests are currently
> skipped for narrow builds.


There are two reasons for this:
1) the str.is* methods get the string and return True/False, so it's enough
to iterate on the string, combine the surrogates, and check if the result
islower/upper/etc.
Methods like lower/upper/etc, afaiu, currently get only a copy of the
string, and modify that in place.  The current macros advance to the next
char during reading and writing, so it's not possible to use them to
read/write from/to the same string.  We could either change the macros to
not advance the pointer [0] (and do it manually in the other functions like
is*) or change the function to get the original string too.
2) I'm on vacation.

Best Regards,
Ezio Melotti

[0]: for lower/upper/title it should be possible to modify the string in
place, because these operations never converts a non-BMP char to a BMP one
(and vice versa), so if two surrogates are read, two surrogates will be
written after the transformation.  I'm not sure this will work with all the
methods though (e.g. str.translate).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/89dfd129/attachment.html>

From dinov at microsoft.com  Fri Aug 26 02:01:42 2011
From: dinov at microsoft.com (Dino Viehland)
Date: Fri, 26 Aug 2011 00:01:42 +0000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org> <4E5538B7.8010709@haypocalc.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
Message-ID: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>

Guido wrote:
> Which reminds me. The PEP does not say what other Python
> implementations besides CPython should do. presumably Jython and
> IronPython will continue to use UTF-16, so presumably the language
> reference will still have to document that strings contain code units (not code
> points) and the objections Tom Christiansen raised against this will remain
> true for those versions of Python. (I don't know about PyPy, they can
> presumably decide when they start their Py3k
> port.)
> 
> OTOH perhaps IronPython 3.3 and Jython 3.3 can use a similar approach and
> we can lay the narrow build issues to rest? Can someone here speak for
> them?

The biggest difficulty for IronPython here would be dealing w/ .NET interop.
We can certainly introduce either an IronPython specific string class which
is similar to CPython's PyUnicodeObject or we could have multiple distinct
.NET types (IronPython.Runtime.AsciiString, System.String, and 
IronPython.Runtime.Ucs4String) which all appear as the same type to Python. 

But when Python is calling a .NET API it's always going to return a System.String 
which is UTF-16.  If we had to check and convert all of those strings when they 
cross into Python it would be very bad for performance.  Presumably we could
have a 4th type of "interop" string which lazily computes this but if we start
wrapping .Net strings we could also get into object identity issues.

We could stop using System.String in IronPython all together and say when 
working w/ .NET strings you get the .NET behavior and when working w/ Python 
strings you get the Python behavior.  I'm not sure how weird and confusing that 
would be but conversion from an Ipy string to a .NET string could remain cheap if 
both were UTF-16, and conversions from .NET strings to Ipy strings would only 
happen if the user did so explicitly.  

But it's a huge change - it'll almost certainly touch every single source file in 
IronPython.  I would think we'd get 3.2 done first and then think about what to
do here.


From guido at python.org  Fri Aug 26 02:26:53 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 17:26:53 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E55C2C3.3060205@canterbury.ac.nz>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<4E55C2C3.3060205@canterbury.ac.nz>
Message-ID: <CAP7+vJL+rP15bDRZxYGvnF=9O8xZ2AtCy3TaM3aBBTiHFEb8zQ@mail.gmail.com>

On Wed, Aug 24, 2011 at 8:34 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> What about things like the surrogateescape codec that
> deliberately use code units in non-standard ways? Will
> tricks like that still be possible if the code-unit
> level is hidden from the programmer?

I would think that it should still be possible to explicitly put
surrogates into a string, using the appropriate \uxxxx escape or
chr(i) or some such approach; the basic string operations IMO
shouldn't bother with checking for well-formed character sequences
(just as they shouldn't care about normal forms). But decoding bytes
from UTF-16 should not leave any surrogate pairs in, since
interpreting those is part of the decoding.

I'm not sure what should happen with UTF-8 when it (in flagrant
violation of the standard, I presume) contains two separately-encoded
surrogates forming a valid surrogate pair; probably whatever the UTF-8
codec does on a wide build today should be good enough. Similarly for
encoding to UTF-8 on a wide build if one managed to create a string
containing a surrogate pair. Basically, I'm for a
garbage-in-garbage-out approach (with separate library functions to
detect garbage if the app is worried about it).

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Fri Aug 26 02:34:44 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 17:34:44 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87liuhakyl.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E55FDAC.9010605@v.loewis.de>
	<87liuhakyl.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAP7+vJKPEvqN6ZrH3JqqFmJZ4uWY8KfUKEcvu9q3aY-B1_BjWw@mail.gmail.com>

On Thu, Aug 25, 2011 at 2:39 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> If our process is working with an external process (the OS's file
> system driver) whose definition includes the statement that "File
> names are sequences of Unicode characters",

Does any OS actually say that? Don't they usually say "in a specific
normal form" or "they're just bytes"?

> then C6 says our process
> must compare canonically equivalent sequences that it takes to be file
> names as the same, whether or not they are in the same normalized
> form, or normalized at all, because we can't assume the file system
> will treat them as different. ?If we do treat them as different, our
> users will get very upset (eg, if we don't signal a duplicate file
> name input by the user, and then the OS proceeds to overwrite an
> existing file).

The solution here is to let the OS do the check, e.g. with
os.path.exists() or os.stat(). It would be wrong to write an app that
checked for file existence by doing naive lookups in os.listdir()
output.

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Fri Aug 26 02:40:22 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 17:40:22 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87ipplaejd.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<87ty969ljy.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E55FDAC.9010605@v.loewis.de>
	<87liuhakyl.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E561CA1.8020500@v.loewis.de>
	<87ipplaejd.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAP7+vJKv3MFVo4ndUrZtxCpeCSoG5ASff8zYnCrAC+bdsGigQA@mail.gmail.com>

On Thu, Aug 25, 2011 at 4:58 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> The problem with your legalistic approach, as I see it, is that if our
> definition is looser than the users', all their surprises will be
> unpleasant. ?That's not good.

I see no alternative to explicitly spelling out what all operations do
and let the user figure out whether that meets their needs. E.g. we
needn't say that the str type or its == operator conforms to the
Unicode standard. We just need to say that the string type is a
sequence of code points, that string operations don't do validation or
normalization, and that to do a comparison that takes the Unicode
std's definition of equivalence (or collation, etc.) into account you
must call a certain library method.

-- 
--Guido van Rossum (python.org/~guido)

From ezio.melotti at gmail.com  Fri Aug 26 03:40:33 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Fri, 26 Aug 2011 04:40:33 +0300
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
Message-ID: <CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>

On Fri, Aug 26, 2011 at 1:54 AM, Guido van Rossum <guido at python.org> wrote:

> On Wed, Aug 24, 2011 at 3:06 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> > Excuse me for believing the fine 3.2 manual that says
> > "Strings contain Unicode characters." (And to a naive reader, that
> implies
> > that string iteration and indexing should produce Unicode characters.)
>
> The naive reader also doesn't know the difference between characters,
> code points and code units. It's the advanced, Unicode-aware reader
> who is confused by this phrase in the docs. It should say code units;
> or perhaps code units for narrow builds and code points for wide
> builds.


For UTF-16/32 (i.e. narrow/wide), talking about "code units"[0] should be
correct.  Also note that:
  * for both, every "code unit" has a specific "codepoint" (including lone
surrogates), so it might be OK to talk about "codepoints" too, but
  * only for wide builds every "codepoints" is represented by a single,
32-bits "code unit".  In narrow builds, non-BMP chars are represented by a
"code unit sequence" of two elements (i.e. a "surrogate pair").

Since "code unit" refers to the *minimal* bit combination, in UTF-8
characters that needs 2/3/4 bytes, are represented with a "code unit
sequence" made of 2/3/4 "code units" (so in UTF-8 "code units" and "code
points" overlaps only for the ASCII range).


> With PEP 393 we can unconditionally say code points, which is
> much better. We should try to remove our use of "characters" -- or
> else we should *define* our use of the term "characters" as "what the
> Unicode standard calls code points".
>

Character usually works fine, especially for naive readers.  Even
Unicode-aware readers often confuse between the several terms, so using a
simple term and pointing to a more accurate description sounds like a better
idea to me.

Note that there's also another important term[1]:
"""
*Unicode Scalar Value*. Any Unicode * code
point<http://unicode.org/glossary/#code_point>
* except high-surrogate and low-surrogate code points. In other words, the
ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive.
"""
For example the UTF codecs produce sequences of "code units" (of 8, 16, 32
bits) that represent "scalar values"[2][3]:

Chapter 3 [4] says:
"""
3.9 Unicode Encoding Forms
The Unicode Standard supports three character encoding forms: UTF-32,
UTF-16, and UTF-8. Each encoding form maps the Unicode code points
U+0000..U+D7FF and U+E000..U+10FFFF to unique code unit sequences. [...]
 D76 Unicode scalar value: Any Unicode code point except high-surrogate and
low-surrogate code points.
     ? As a result of this definition, the set of Unicode scalar values
consists of the ranges 0 to D7FF and E000 to 10FFFF, inclusive.
 D77 Code unit: The minimal bit combination that can represent a unit of
encoded text for processing or interchange.
[...]
 D79 A Unicode encoding form assigns each Unicode scalar value to a unique
code unit sequence.
"""

On the other hand, Python Unicode strings are not limited to scalar values,
because they can also contain lone surrogates.


I hope this helps clarify the terminology a bit and doesn't add more
confusion, but if we want to use the Unicode terms we should get them
right.  (Also note that I might have misunderstood something, even if I've
been careful with the terms and I double-checked and quoted the relevant
parts of the Unicode standard.)

Best Regards,
Ezio Melotti


[0]: From the chapter 3 [4],
 D77 Code unit: The minimal bit combination that can represent a unit of
encoded text for processing or interchange.
   ? Code units are particular units of computer storage. Other character
encoding standards typically use code units defined as 8-bit units?that is,
octets.
     The Unicode Standard uses 8-bit code units in the UTF-8 encoding form,
16-bit code units in the UTF-16 encoding form, and 32-bit code units in the
UTF-32 encoding form.
[1]: http://unicode.org/glossary/#unicode_scalar_value
[2]: Apparently Python 3 raises an error while encoding lone surrogates in
UTF-8, but it doesn't for UTF-16 and UTF-32.
>From the chapter 3 [4],
 D91: "Because surrogate code points are not Unicode scalar values, isolated
UTF-16 code units in the range 0xD800..0xDFFF are ill-formed."
 D92: "Because surrogate code points are not included in the set of Unicode
scalar values, UTF-32 code units in the range 0x0000D800..0x0000DFFF are
ill-formed."
I think this should be fixed.
[3]: Note that I'm talking about codecs used to encode/decode Unicode
strings to/from bytes here, it's perfectly fine for Python itself to
represent lone surrogates in its *internal* representations, regardless of
what encoding it's using.
[4]: Chapter 3: http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/1959223e/attachment.html>

From ijmorlan at uwaterloo.ca  Fri Aug 26 04:28:06 2011
From: ijmorlan at uwaterloo.ca (Isaac Morland)
Date: Thu, 25 Aug 2011 22:28:06 -0400 (EDT)
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJL+rP15bDRZxYGvnF=9O8xZ2AtCy3TaM3aBBTiHFEb8zQ@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<4E55C2C3.3060205@canterbury.ac.nz>
	<CAP7+vJL+rP15bDRZxYGvnF=9O8xZ2AtCy3TaM3aBBTiHFEb8zQ@mail.gmail.com>
Message-ID: <Pine.GSO.4.64.1108252119360.21993@core.cs.uwaterloo.ca>

On Thu, 25 Aug 2011, Guido van Rossum wrote:

> I'm not sure what should happen with UTF-8 when it (in flagrant
> violation of the standard, I presume) contains two separately-encoded
> surrogates forming a valid surrogate pair; probably whatever the UTF-8
> codec does on a wide build today should be good enough. Similarly for
> encoding to UTF-8 on a wide build if one managed to create a string
> containing a surrogate pair. Basically, I'm for a
> garbage-in-garbage-out approach (with separate library functions to
> detect garbage if the app is worried about it).

If it's called UTF-8, there is no decision to be taken as to decoder 
behaviour - any byte sequence not permitted by the Unicode standard must 
result in an error (although, of course, *how* the error is to be reported 
could legitimately be the subject of endless discussion).  There are 
security implications to violating the standard so this isn't just 
legalistic purity.

Hmmm, doesn't look good:

Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> '\xed\xb0\x80'.decode ('utf-8')
u'\udc00'
>>>

Incorrect!  Although this is a narrow build - I can't say what the wide 
build would do.

For reasons of practicality, it may be appropriate to provide easy access 
to a CESU-8 decoder in addition to the normal UTF-8 decoder, but it must 
not be called UTF-8.  Other variations may also find use if provided.

See UTF-8 RFC: http://www.ietf.org/rfc/rfc3629.txt

And CESU-8 technical report: http://www.unicode.org/reports/tr26/

Isaac Morland			CSCF Web Guru
DC 2554C, x36650		WWW Software Specialist

From guido at python.org  Fri Aug 26 04:52:09 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 19:52:09 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
Message-ID: <CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>

On Thu, Aug 25, 2011 at 6:40 PM, Ezio Melotti <ezio.melotti at gmail.com> wrote:
> On Fri, Aug 26, 2011 at 1:54 AM, Guido van Rossum <guido at python.org> wrote:
>>
>> On Wed, Aug 24, 2011 at 3:06 AM, Terry Reedy <tjreedy at udel.edu> wrote:
>> > Excuse me for believing the fine 3.2 manual that says
>> > "Strings contain Unicode characters." (And to a naive reader, that
>> > implies
>> > that string iteration and indexing should produce Unicode characters.)
>>
>> The naive reader also doesn't know the difference between characters,
>> code points and code units. It's the advanced, Unicode-aware reader
>> who is confused by this phrase in the docs. It should say code units;
>> or perhaps code units for narrow builds and code points for wide
>> builds.
>
> For UTF-16/32 (i.e. narrow/wide), talking about "code units"[0] should be
> correct.? Also note that:
> ? * for both, every "code unit" has a specific "codepoint" (including lone
> surrogates), so it might be OK to talk about "codepoints" too, but
> ? * only for wide builds every "codepoints" is represented by a single,
> 32-bits "code unit".? In narrow builds, non-BMP chars are represented by a
> "code unit sequence" of two elements (i.e. a "surrogate pair").

The more I think about it the more it seems to me that the biggest
problem is that in narrow builds it is ambiguous whether (unicode)
strings contain code units, i.e. are *encoded* code points, or whether
they contain (decoded) code points. In a sense this is repeating the
ambiguity of 8-bit strings in Python 2, which are sometimes assumed to
contain ASCII or Latin-1 (i.e., code points with a limited range) or
UTF-8 (i.e., code units).

I know that by now I am repeating myself, but I think it would be
really good if we could get rid of this ambiguity. PEP 393 seems the
best way forward, even if it doesn't directly address what to do for
IronPython or Jython, both of which have to deal with a pervasive
native string type that contains UTF-16.

IIUC, CPython on Windows will work just fine with PEP 393, even if it
means that there is a bit more translation between Python strings and
the OS native wchar_t[] type. I assume that the data volumes going
through the OS APIs is relatively constrained, since data actually
written to or read from a file will still be bytes, possibly run
through a codec (if it's a text file), and not go through one of the
wchar_t[] APIs -- the latter are used for things like filenames, which
are much smaller.

> Since "code unit" refers to the *minimal* bit combination, in UTF-8
> characters that needs 2/3/4 bytes, are represented with a "code unit
> sequence" made of 2/3/4 "code units" (so in UTF-8 "code units" and "code
> points" overlaps only for the ASCII range).

Actually I think UTF-8 is best thought of as an encoding for code
points, not characters -- the subtle difference between these two
should be of no concern to the UTF-8 codec (unless it is a validating
codec).

>> With PEP 393 we can unconditionally say code points, which is
>> much better. We should try to remove our use of "characters" -- or
>> else we should *define* our use of the term "characters" as "what the
>> Unicode standard calls code points".
>
> Character usually works fine, especially for naive readers.? Even
> Unicode-aware readers often confuse between the several terms, so using a
> simple term and pointing to a more accurate description sounds like a better
> idea to me.

We may well have no choice -- there is just too much documentation
that naively refers to characters while really referring to code units
or code points.

> Note that there's also another important term[1]:
> """
> Unicode Scalar Value. Any Unicode code point except high-surrogate and
> low-surrogate code points. In other words, the ranges of integers 0 to
> D7FF16 and E00016 to 10FFFF16 inclusive.
> """

This seems to involve validation. I think all validation should be
sequestered to specific APIs (e.g. certain codecs) and the string type
should not care about it. Depending on what they are doing,
applications may have to be aware of many subtleties in order to
always avoid generating "invalid" (or not well-formed-- what's the
difference?) strings.

> For example the UTF codecs produce sequences of "code units" (of 8, 16, 32
> bits) that represent "scalar values"[2][3]:
>
> Chapter 3 [4] says:
> """
> 3.9 Unicode Encoding Forms
> The Unicode Standard supports three character encoding forms: UTF-32,
> UTF-16, and UTF-8. Each encoding form maps the Unicode code points
> U+0000..U+D7FF and U+E000..U+10FFFF to unique code unit sequences. [...]

I really don't mind whether our codecs actually make exceptions for
surrogates (lone or otherwise). The only requirement I care about is
that surrogate-free strings round-trip correctly. Again, apps that
want to conform to the requirements regarding surrogates can implement
their own validation, and certainly at some point we should offer a
validation library as part of the stdlib -- but it should be up to the
app whether and when to use it.

> ?D76 Unicode scalar value: Any Unicode code point except high-surrogate and
> low-surrogate code points.
> ???? ? As a result of this definition, the set of Unicode scalar values
> consists of the ranges 0 to D7FF and E000 to 10FFFF, inclusive.
> ?D77 Code unit: The minimal bit combination that can represent a unit of
> encoded text for processing or interchange.
> [...]
> ?D79 A Unicode encoding form assigns each Unicode scalar value to a unique
> code unit sequence.
> """
>
> On the other hand, Python Unicode strings are not limited to scalar values,
> because they can also contain lone surrogates.

Right.

> I hope this helps clarify the terminology a bit and doesn't add more
> confusion, but if we want to use the Unicode terms we should get them
> right.? (Also note that I might have misunderstood something, even if I've
> been careful with the terms and I double-checked and quoted the relevant
> parts of the Unicode standard.)

I'm not more confused than I was, but I think we should reduce the
number of Unicode terms we care about rather than increase them. If we
only ever had to talk about code points and encoded byte sequences I'd
be happy -- although in practice we also need to acknowledge the
existence of characters that may be represented by multiple code
points, since islower(), lower() etc. may need these (and also the re
module). Other concepts we may have to at least acknowledge include
various normal forms, equivalence, and collation sequences (which are
language-dependent?). It would be lovely if someone wrote up an
informational PEP so that we don't all have to lug around a copy of
the Unicode standard.

> Best Regards,
> Ezio Melotti
>
>
> [0]: From the chapter 3 [4],
> ?D77 Code unit: The minimal bit combination that can represent a unit of
> encoded text for processing or interchange.
> ?? ? Code units are particular units of computer storage. Other character
> encoding standards typically use code units defined as 8-bit units?that is,
> octets.
> ? ?? The Unicode Standard uses 8-bit code units in the UTF-8 encoding form,
> 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the
> UTF-32 encoding form.
> [1]: http://unicode.org/glossary/#unicode_scalar_value
> [2]: Apparently Python 3 raises an error while encoding lone surrogates in
> UTF-8, but it doesn't for UTF-16 and UTF-32.
> From the chapter 3 [4],
> ?D91: "Because surrogate code points are not Unicode scalar values, isolated
> UTF-16 code units in the range 0xD800..0xDFFF are ill-formed."
> ?D92: "Because surrogate code points are not included in the set of Unicode
> scalar values, UTF-32 code units in the range 0x0000D800..0x0000DFFF are
> ill-formed."
> I think this should be fixed.
> [3]: Note that I'm talking about codecs used to encode/decode Unicode
> strings to/from bytes here, it's perfectly fine for Python itself to
> represent lone surrogates in its *internal* representations, regardless of
> what encoding it's using.
> [4]: Chapter 3: http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Fri Aug 26 04:59:10 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Aug 2011 19:59:10 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <Pine.GSO.4.64.1108252119360.21993@core.cs.uwaterloo.ca>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<4E55C2C3.3060205@canterbury.ac.nz>
	<CAP7+vJL+rP15bDRZxYGvnF=9O8xZ2AtCy3TaM3aBBTiHFEb8zQ@mail.gmail.com>
	<Pine.GSO.4.64.1108252119360.21993@core.cs.uwaterloo.ca>
Message-ID: <CAP7+vJLrMsFepdXooxF33UXbUaQyNWnwuv7gwD4HjOJtdC5mCw@mail.gmail.com>

On Thu, Aug 25, 2011 at 7:28 PM, Isaac Morland <ijmorlan at uwaterloo.ca> wrote:
> On Thu, 25 Aug 2011, Guido van Rossum wrote:
>
>> I'm not sure what should happen with UTF-8 when it (in flagrant
>> violation of the standard, I presume) contains two separately-encoded
>> surrogates forming a valid surrogate pair; probably whatever the UTF-8
>> codec does on a wide build today should be good enough. Similarly for
>> encoding to UTF-8 on a wide build if one managed to create a string
>> containing a surrogate pair. Basically, I'm for a
>> garbage-in-garbage-out approach (with separate library functions to
>> detect garbage if the app is worried about it).
>
> If it's called UTF-8, there is no decision to be taken as to decoder
> behaviour - any byte sequence not permitted by the Unicode standard must
> result in an error (although, of course, *how* the error is to be reported
> could legitimately be the subject of endless discussion). ?There are
> security implications to violating the standard so this isn't just
> legalistic purity.

You have a point. The security issues cannot be seen separate from all
the other issues. The folks inside Google who care about Unicode often
harp on this. So I stand corrected. I am fine with codecs treating
code points or code point sequences that the Unicode standard doesn't
like (e.g. lone surrogates) the same way as more severe errors in the
encoded bytes (lots of byte sequences already aren't valid UTF-8). I
just hope this doesn't require normal forms or other expensive
operations; I hope it's limited to rejecting invalid use of surrogates
or other values that are not valid code points (e.g. 0, or >= 2**21).

> Hmmm, doesn't look good:
>
> Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
> [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>>
>>>> '\xed\xb0\x80'.decode ('utf-8')
>
> u'\udc00'
>>>>
>
> Incorrect! ?Although this is a narrow build - I can't say what the wide
> build would do.
>
> For reasons of practicality, it may be appropriate to provide easy access to
> a CESU-8 decoder in addition to the normal UTF-8 decoder, but it must not be
> called UTF-8. ?Other variations may also find use if provided.
>
> See UTF-8 RFC: http://www.ietf.org/rfc/rfc3629.txt
>
> And CESU-8 technical report: http://www.unicode.org/reports/tr26/

Thanks for the links! I also like the term "supplemental character" (a
code point >= 2**16). And I note that they talk about characters were
we've just agreed that we should say code points...

-- 
--Guido van Rossum (python.org/~guido)

From andrew.pennebaker at gmail.com  Fri Aug 26 06:04:10 2011
From: andrew.pennebaker at gmail.com (Andrew Pennebaker)
Date: Fri, 26 Aug 2011 00:04:10 -0400
Subject: [Python-Dev] Windows installers and %PATH%
Message-ID: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>

Please have the Windows installers add the Python installation directory to
the PATH environment variable.

Many newbies dive in without knowing that they must manually add C:\PythonXY
to PATH. It's yak shaving, something perfectly automatable that should have
been done by the installers way back in Python 1.0.

Please also add PYTHONROOT\Scripts. It's where cool things like
easy_install.exe are stored. More yak shaving.

The only potential downside to this is upsetting users who manage multiple
python installations. It's not a problem: they already manually adjust PATH
to their liking.

Cheers,

Andrew Pennebaker
www.yellosoft.us
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/e4190b52/attachment.html>

From jxo6948 at rit.edu  Fri Aug 26 06:07:20 2011
From: jxo6948 at rit.edu (John O'Connor)
Date: Thu, 25 Aug 2011 21:07:20 -0700
Subject: [Python-Dev] Windows installers and %PATH%
In-Reply-To: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>
References: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>
Message-ID: <CABCbifWeL-wVCCbaScQvcH4qM=rxkgP6T1+ek0vQBxPnn4EiGg@mail.gmail.com>

+ 0 for automatically adding to %PATH%

+ 1 for providing an option to the user during install

- John


On Thu, Aug 25, 2011 at 9:04 PM, Andrew Pennebaker <
andrew.pennebaker at gmail.com> wrote:

> Please have the Windows installers add the Python installation directory to
> the PATH environment variable.
>
> Many newbies dive in without knowing that they must manually add
> C:\PythonXY to PATH. It's yak shaving, something perfectly automatable that
> should have been done by the installers way back in Python 1.0.
>
> Please also add PYTHONROOT\Scripts. It's where cool things like
> easy_install.exe are stored. More yak shaving.
>
> The only potential downside to this is upsetting users who manage multiple
> python installations. It's not a problem: they already manually adjust PATH
> to their liking.
>
> Cheers,
>
> Andrew Pennebaker
> www.yellosoft.us
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/tehjcon%40gmail.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110825/07675b41/attachment.html>

From stefan_ml at behnel.de  Fri Aug 26 06:35:26 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 26 Aug 2011 06:35:26 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <Pine.GSO.4.64.1108252119360.21993@core.cs.uwaterloo.ca>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>	<4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>	<j33kvu$f9d$1@dough.gmane.org>	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>	<4E55C2C3.3060205@canterbury.ac.nz>	<CAP7+vJL+rP15bDRZxYGvnF=9O8xZ2AtCy3TaM3aBBTiHFEb8zQ@mail.gmail.com>
	<Pine.GSO.4.64.1108252119360.21993@core.cs.uwaterloo.ca>
Message-ID: <j377qe$nif$1@dough.gmane.org>

Isaac Morland, 26.08.2011 04:28:
> On Thu, 25 Aug 2011, Guido van Rossum wrote:
>> I'm not sure what should happen with UTF-8 when it (in flagrant
>> violation of the standard, I presume) contains two separately-encoded
>> surrogates forming a valid surrogate pair; probably whatever the UTF-8
>> codec does on a wide build today should be good enough. Similarly for
>> encoding to UTF-8 on a wide build if one managed to create a string
>> containing a surrogate pair. Basically, I'm for a
>> garbage-in-garbage-out approach (with separate library functions to
>> detect garbage if the app is worried about it).
>
> If it's called UTF-8, there is no decision to be taken as to decoder
> behaviour - any byte sequence not permitted by the Unicode standard must
> result in an error (although, of course, *how* the error is to be reported
> could legitimately be the subject of endless discussion). There are
> security implications to violating the standard so this isn't just
> legalistic purity.
>
> Hmmm, doesn't look good:
>
> Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
> [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> '\xed\xb0\x80'.decode ('utf-8')
> u'\udc00'
> >>>
>
> Incorrect! Although this is a narrow build - I can't say what the wide
> build would do.

Works the same for me in a wide Py2.7 build, but gives me this in Py3:

Python 3.1.2 (r312:79147, Sep 27 2010, 09:57:50)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> b'\xed\xb0\x80'.decode ('utf-8')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: 
illegal encoding

Same for current Py3.3 and the PEP393 build (although both have a better 
exception message now: "UnicodeDecodeError: 'utf8' codec can't decode bytes 
in position 0-1: invalid continuation byte").

Stefan


From ncoghlan at gmail.com  Fri Aug 26 06:52:07 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 26 Aug 2011 14:52:07 +1000
Subject: [Python-Dev] Windows installers and %PATH%
In-Reply-To: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>
References: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>
Message-ID: <CADiSq7dtpc-oLoP3iUCNyc0NMQ_jUkKReUP_MqbR3XwnFZO2Gw@mail.gmail.com>

On Fri, Aug 26, 2011 at 2:04 PM, Andrew Pennebaker
<andrew.pennebaker at gmail.com> wrote:
> Please have the Windows installers add the Python installation directory to
> the PATH environment variable.

Please read PEP 397: Python Launcher for Windows.

Or at least do us the courtesy of acknowledging that if the issue was
as simple as "just munge the PATH", it would have been done long ago.
Windows is a developer hostile platform unless you completely buy into
the Microsoft toolchain, which is not an option for cross-platform
projects like Python.

It's well within Microsoft's capabilities to create and support a
POSIX compatibility layer that allows applications to look and feel
like native ones, but they choose not to, since they see
cross-platform development as a competitive threat to their desktop
dominance. There's a reason many open source projects don't offer
native support at all, instead instructing people to use Cygwin as a
compatibility layer.

It irks me greatly when people place the blame for this situation on
volunteer programmers giving them stuff for free instead of where it
belongs (i.e. on the multibillion dollar corporation deliberately
failing to implement a widely recognised OS interoperability
standard).

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From stefan_ml at behnel.de  Fri Aug 26 07:21:11 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 26 Aug 2011 07:21:11 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <j36et5$oa6$1@dough.gmane.org>
References: <4E553FBC.7080501@v.loewis.de> <j365bt$p1o$1@dough.gmane.org>
	<j36et5$oa6$1@dough.gmane.org>
Message-ID: <j37ag7$4oe$1@dough.gmane.org>

Stefan Behnel, 25.08.2011 23:30:
> Stefan Behnel, 25.08.2011 20:47:
>> "Martin v. L?wis", 24.08.2011 20:15:
>>> - issues to be considered (unclarities, bugs, limitations, ...)
>>
>> A problem of the current implementation is the need for calling
>> PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g. due to
>> insufficient memory). Basically, this means that even something as trivial
>> as trying to get the length of a Unicode string can now result in an error.
>
> Oh, and the same applies to PyUnicode_AS_UNICODE() now. I doubt that there
> is *any* code out there that expects this macro to ever return NULL. This
> means that the current implementation has actually broken the old API. Just
> allocate an "80% of your memory" long string using the new API and then
> call PyUnicode_AS_UNICODE() on it to see what I mean.
>
> Sadly, a quick look at a couple of recent commits in the pep-393 branch
> suggested that it is not even always obvious to you as the authors which
> macros can be called safely and which cannot. I immediately spotted a bug
> in one of the updated core functions (unicode_repr, IIRC) where
> PyUnicode_GET_LENGTH() is called without a previous call to
> PyUnicode_FAST_READY().
>
> I find it everything but obvious that calling PyUnicode_DATA() and
> PyUnicode_KIND() is safe as long as the return value is being checked for
> errors, but calling PyUnicode_GET_LENGTH() is not safe unless there was a
> previous call to PyUnicode_Ready().

And, adding to my own mail yet another time, the current header file states 
this:

"""
/* String contains only wstr byte characters.  This is only possible
    when the string was created with a legacy API and PyUnicode_Ready()
    has not been called yet.  Note that PyUnicode_KIND() calls
    PyUnicode_FAST_READY() so PyUnicode_WCHAR_KIND is only possible as a
    intialized value not as a result of PyUnicode_KIND(). */
#define PyUnicode_WCHAR_KIND 0
"""

 From my understanding, this is incorrect. When I call PyUnicode_KIND() on 
an old style object and it fails to allocate the string buffer, I would 
expect that I actually get PyUnicode_WCHAR_KIND back as a result, as the 
SSTATE_KIND_* value in the "state" field has not been initialised yet at 
that point.

Stefan


From nir at winpdb.org  Fri Aug 26 09:18:15 2011
From: nir at winpdb.org (Nir Aides)
Date: Fri, 26 Aug 2011 10:18:15 +0300
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <1314131362.3485.36.camel@localhost.localdomain>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
Message-ID: <CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>

Another face of the discussion is about whether to deprecate the mixing of
the threading and processing modules and what to do about the
multiprocessing module which is implemented with worker threads.



On Tue, Aug 23, 2011 at 11:29 PM, Antoine Pitrou <solipsis at pitrou.net>wrote:

> Le mardi 23 ao?t 2011 ? 22:07 +0200, Charles-Fran?ois Natali a ?crit :
> > 2011/8/23 Antoine Pitrou <solipsis at pitrou.net>:
> > > Well, I would consider the I/O locks the most glaring problem. Right
> > > now, your program can freeze if you happen to do a fork() while e.g.
> > > the stderr lock is taken by another thread (which is quite common when
> > > debugging).
> >
> > Indeed.
> > To solve this, a similar mechanism could be used: after fork(), in the
> > child process:
> > - just reset each I/O lock (destroy/re-create the lock) if we can
> > guarantee that the file object is in a consistent state (i.e. that all
> > the invariants hold). That's the approach I used in my initial patch.
>
> For I/O locks I think that would work.
> There could also be a process-wide "fork lock" to serialize locks and
> other operations, if we want 100% guaranteed consistency of I/O objects
> across forks.
>
> > - call a fileobject method which resets the I/O lock and sets the file
> > object to a consistent state (in other word, an atfork handler)
>
> I fear that the complication with atfork handlers is that you have to
> manage their lifecycle as well (i.e., when an IO object is destroyed,
> you have to unregister the handler).
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/f54ec461/attachment.html>

From p.f.moore at gmail.com  Fri Aug 26 10:29:27 2011
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 26 Aug 2011 09:29:27 +0100
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
Message-ID: <CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>

On 26 August 2011 03:52, Guido van Rossum <guido at python.org> wrote:
> I know that by now I am repeating myself, but I think it would be
> really good if we could get rid of this ambiguity. PEP 393 seems the
> best way forward, even if it doesn't directly address what to do for
> IronPython or Jython, both of which have to deal with a pervasive
> native string type that contains UTF-16.

Hmm, I'm completely naive in this area, but from reading the thread,
would a possible approach be to say that Python (the language
definition) is defined in terms of code points (as we already do, even
if the wording might benefit from some clarification). Then, under PEP
393, and currently in wide builds, CPython conforms to that definition
(and retains the property of basic operations being O(1), which is not
in the language definition but is a user expectation and your
expressed requirement).

IronPython and Jython can retain UTF-16 as their native form if that
makes interop cleaner, but in doing so they need to ensure that basic
operations like indexing and len work in terms of code points, not
code units, if they are to conform. Presumably this will be easier
than moving to a UCS-4 representation, as they can defer to runtime
support routines via interop (which presumably get this right - or at
the very least can be blamed for any errors :-)) They lose the O(1)
guarantee, but that's easily defensible as a tradeoff to conform to
underlying runtime semantics.

Does this make sense, or have I completely misunderstood things?

Paul.

PS Thanks to all for the discussion in general, I'm learning a lot
about Unicode from all of this!

From mal at egenix.com  Fri Aug 26 10:54:09 2011
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 26 Aug 2011 10:54:09 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j377qe$nif$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>	<j33kvu$f9d$1@dough.gmane.org>	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>	<4E55C2C3.3060205@canterbury.ac.nz>	<CAP7+vJL+rP15bDRZxYGvnF=9O8xZ2AtCy3TaM3aBBTiHFEb8zQ@mail.gmail.com>	<Pine.GSO.4.64.1108252119360.21993@core.cs.uwaterloo.ca>
	<j377qe$nif$1@dough.gmane.org>
Message-ID: <4E575F31.5010709@egenix.com>

Stefan Behnel wrote:
> Isaac Morland, 26.08.2011 04:28:
>> On Thu, 25 Aug 2011, Guido van Rossum wrote:
>>> I'm not sure what should happen with UTF-8 when it (in flagrant
>>> violation of the standard, I presume) contains two separately-encoded
>>> surrogates forming a valid surrogate pair; probably whatever the UTF-8
>>> codec does on a wide build today should be good enough. Similarly for
>>> encoding to UTF-8 on a wide build if one managed to create a string
>>> containing a surrogate pair. Basically, I'm for a
>>> garbage-in-garbage-out approach (with separate library functions to
>>> detect garbage if the app is worried about it).
>>
>> If it's called UTF-8, there is no decision to be taken as to decoder
>> behaviour - any byte sequence not permitted by the Unicode standard must
>> result in an error (although, of course, *how* the error is to be
>> reported
>> could legitimately be the subject of endless discussion). There are
>> security implications to violating the standard so this isn't just
>> legalistic purity.
>>
>> Hmmm, doesn't look good:
>>
>> Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
>> [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> '\xed\xb0\x80'.decode ('utf-8')
>> u'\udc00'
>> >>>
>>
>> Incorrect! Although this is a narrow build - I can't say what the wide
>> build would do.
> 
> Works the same for me in a wide Py2.7 build, but gives me this in Py3:
> 
> Python 3.1.2 (r312:79147, Sep 27 2010, 09:57:50)
> [GCC 4.4.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> b'\xed\xb0\x80'.decode ('utf-8')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
> illegal encoding
> 
> Same for current Py3.3 and the PEP393 build (although both have a better
> exception message now: "UnicodeDecodeError: 'utf8' codec can't decode
> bytes in position 0-1: invalid continuation byte").

The reason for this is that the UTF-8 codec in Python 2.x
has never rejected lone surrogates and it was used to
store Unicode literals in pyc files (using marshal)
and also by pickle for transferring Unicode strings,
so we could simply reject lone surrogates, since this
would have caused compatibility problems.

That change was made in Python 3.x by having a special
error handler surrogatepass which allows the UTF-8
codec to process lone surrogates as well.

BTW: I'd love to join the discussion about PEP 393, but
unfortunately I'm swamped with work, so these are just
a few comments...

What I'm missing in the discussion is statistics of the
effects of the patch (both memory and performance) and
the effect on 3rd party extensions.

I'm not convinced that the memory/speed tradeoff is worth the
breakage or whether the patch actually saves memory in real world
applications and I'm unsure whether the needed code changes to
the binary Python Unicode API can be done in a minor Python
release.

Note that in the worst case, a PEP 393 Unicode object will
save three versions of the same string, e.g. on Windows
with sizeof(wchar_t)==2: A UCS4 version in str,
a UTF-8 version in utf8 (this gets build whenever Python needs
a UTF-8 version of the Object) and a wchar_t version in wstr
(which gets build whenever Python codecs or extensions need
Py_UNICODE or a wchar_t representation).
On all platforms, in the case where you store a Latin-1
non-ASCII string: str holds the Latin-1 string, utf8 the
UTF-8 version and wstr the 2- or 4-bytes wchar_t version.


* A note on terminology: Python stores Unicode as code points.

A Unicode "code point" refers to any value in the Unicode code
range which is 0 - 0x10FFFF. Lone surrogates, unassigned
and illegal code points are all still code points - this is
a detail people often forget. Various code points in Unicode
have special meanings and some are not allowed to be
used in encodings, but that does not make them rule them
out from being stored and processed as code points.

Code units are only used in encoded versions Unicode, e.g.
the UTF-8, -16, -32. Mixing code units and code points
can cause much confusion, so it's better to talk only
about code point when referring to Python Unicode objects,
since you only ever meet code units when looking at the
the bytes output of the codecs.

This is important to know, since Python is not only meant
to process Unicode, but also to build Unicode strings, so
a careful distinction has to be made when considering what
is correct and what not: codecs have to follow much more
strict rules than Python itself.


* A note on surrogates: These are just one particular problem
where you run into the situation where splitting a Unicode
string potentially breaks a combination of code points.
There are a few other types of code points that cause similar
problems, e.g. combining code points.

Simply going with UCS-4 does not solve the problem, since
even with UCS-4 storage, you can still have surrogates in your
Python Unicode string. As with many things, it is important
to be aware of the potential problem, but there's no
automatic fix to get rid of it. What we can do, is make
the best of it and this has happened already in many areas,
e.g. codecs joining surrogates automatically, chr()
creating surrogates, etc.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 26 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                39 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From ezio.melotti at gmail.com  Fri Aug 26 11:14:13 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Fri, 26 Aug 2011 12:14:13 +0300
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJLrMsFepdXooxF33UXbUaQyNWnwuv7gwD4HjOJtdC5mCw@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<4E55C2C3.3060205@canterbury.ac.nz>
	<CAP7+vJL+rP15bDRZxYGvnF=9O8xZ2AtCy3TaM3aBBTiHFEb8zQ@mail.gmail.com>
	<Pine.GSO.4.64.1108252119360.21993@core.cs.uwaterloo.ca>
	<CAP7+vJLrMsFepdXooxF33UXbUaQyNWnwuv7gwD4HjOJtdC5mCw@mail.gmail.com>
Message-ID: <CACBhJdE_pCh+29R-Z4aiZscSm5n5yF2jtaKH64N30Xgx=H8nMA@mail.gmail.com>

On Fri, Aug 26, 2011 at 5:59 AM, Guido van Rossum <guido at python.org> wrote:

> On Thu, Aug 25, 2011 at 7:28 PM, Isaac Morland <ijmorlan at uwaterloo.ca>
> wrote:
> > On Thu, 25 Aug 2011, Guido van Rossum wrote:
> >
> >> I'm not sure what should happen with UTF-8 when it (in flagrant
> >> violation of the standard, I presume) contains two separately-encoded
> >> surrogates forming a valid surrogate pair; probably whatever the UTF-8
> >> codec does on a wide build today should be good enough.
>

Surrogates are used and valid only in UTF-16.
In UTF-8/32 they are invalid, even if they are in pair (see
http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf ).  Of course Python
can/should be able to represent them internally regardless of the build
type.

>>Similarly for
> >> encoding to UTF-8 on a wide build if one managed to create a string
> >> containing a surrogate pair. Basically, I'm for a
> >> garbage-in-garbage-out approach (with separate library functions to
> >> detect garbage if the app is worried about it).
> >
> > If it's called UTF-8, there is no decision to be taken as to decoder
> > behaviour - any byte sequence not permitted by the Unicode standard must
> > result in an error (although, of course, *how* the error is to be
> reported
> > could legitimately be the subject of endless discussion).
>

What do you mean?  We use the "strict" error handler by default and we can
specify other handlers already.


>  There are
> > security implications to violating the standard so this isn't just
> > legalistic purity.
>
> You have a point. The security issues cannot be seen separate from all
> the other issues. The folks inside Google who care about Unicode often
> harp on this. So I stand corrected. I am fine with codecs treating
> code points or code point sequences that the Unicode standard doesn't
> like (e.g. lone surrogates) the same way as more severe errors in the
> encoded bytes (lots of byte sequences already aren't valid UTF-8).


Codecs that use the official names should stick to the standards.  For
example s.encode('utf-32') should either produce a valid utf-32 byte string
or raise an error if 's' contains invalid characters (e.g. surrogates).
We can have other internal codecs that are based on the UTF-* encodings but
allow the representation of lone surrogates and even expose them if we want,
but they should have a different name (even 'utf-*-something' should be ok,
see http://bugs.python.org/issue12729#msg142053 from "Unicode says you can't
put surrogates or noncharacters in a UTF-anything stream.").


> I
> just hope this doesn't require normal forms or other expensive
> operations; I hope it's limited to rejecting invalid use of surrogates
> or other values that are not valid code points (e.g. 0, or >= 2**21).
>

I think there shouldn't be any normalization done automatically by the
codecs.


>
> > Hmmm, doesn't look good:
> >
> > Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
> > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
> > Type "help", "copyright", "credits" or "license" for more information.
> >>>>
> >>>> '\xed\xb0\x80'.decode ('utf-8')
> >
> > u'\udc00'
> >>>>
> >
> > Incorrect!  Although this is a narrow build - I can't say what the wide
> > build would do.
>

The UTF-8 codec used to follow RFC 2279 and only recently has been updated
to RFC 3629 (see http://bugs.python.org/issue8271#msg107074 ).  On Python
2.x it still produces invalid UTF-8 because changing it is backward
incompatible.  In Python 2 UTF-8 can be used to encode every codepoint from
0 to 10FFFF, and it always works.  If we change it now it might start
raising errors for an operation that never raised them before (see
http://bugs.python.org/issue12729#msg142047 ).
Luckily this is fixed in Python 3.x.
I think there are more codepoints/byte sequences that should be rejected
while encoding/decoding though, in both UTF-8 and UTF-16/32, but I haven't
looked at them yet (I would be happy to fix these for 3.3 or even 2.7/3.2
(if applicable), so if you find mismatches with the Unicode standard and
report an issue, feel free to assign it to me).

Best Regards,
Ezio Melotti


>
> > For reasons of practicality, it may be appropriate to provide easy access
> to
> > a CESU-8 decoder in addition to the normal UTF-8 decoder, but it must not
> be
> > called UTF-8.  Other variations may also find use if provided.
> >
> > See UTF-8 RFC: http://www.ietf.org/rfc/rfc3629.txt
> >
> > And CESU-8 technical report: http://www.unicode.org/reports/tr26/
>
> Thanks for the links! I also like the term "supplemental character" (a
> code point >= 2**16). And I note that they talk about characters were
> we've just agreed that we should say code points...
>
> --
> --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
> _______________________________________________
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/928797bf/attachment.html>

From martin at v.loewis.de  Fri Aug 26 11:29:55 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 26 Aug 2011 11:29:55 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
Message-ID: <4E576793.2010203@v.loewis.de>

> IronPython and Jython can retain UTF-16 as their native form if that
> makes interop cleaner, but in doing so they need to ensure that basic
> operations like indexing and len work in terms of code points, not
> code units, if they are to conform.

That means that they won't conform, period. There is no efficient
maintainable implementation strategy to achieve that property, and
it may take well years until somebody provides an efficient
unmaintainable implementation.

> Does this make sense, or have I completely misunderstood things?

You seem to assume it is ok for Jython/IronPython to provide indexing in
O(n). It is not.

However, non-conformance may not be that much of an issue. They do not
conform in many other aspects, either (such as not supporting Python 3,
for example, or not supporting the C API) that they may well chose to
ignore such a minor requirement if there was one. For BMP strings,
they conform fine, and it may well be that Jython eithers either don't
have non-BMP strings, or don't care whether len() or indexing of their
non-BMP strings is "correct".

Regards,
Martin

From stefan_ml at behnel.de  Fri Aug 26 12:29:56 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 26 Aug 2011 12:29:56 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E576793.2010203@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de>
Message-ID: <j37sj4$e3l$1@dough.gmane.org>

"Martin v. L?wis", 26.08.2011 11:29:
> You seem to assume it is ok for Jython/IronPython to provide indexing in
> O(n). It is not.

I think we can leave this discussion aside. Jython and IronPython have 
their own platform specific constraints to which they need to adapt their 
implementation. For a Jython user, it means a lot to be able to efficiently 
pass strings (and other data) back and forth between Jython and other JVM 
code, and it's not hard to guess that the same is true for IronPython/.NET 
users. After all, the platform integration is the very *reason* for most 
users to select one of these implementations.

Besides, what if these implementations provided indexing in, say, O(log N) 
instead of O(1) or O(N), e.g. by building a tree index into each string? 
You could have an index that simply marks runs of surrogate pairs and BMP 
substrings, thus providing a likely-to-be-somewhat-compact index. That 
index would obviously have to be built, but so do the different string 
representations in post-PEP-393 CPython, especially on Windows, as I have 
learned.

Would such a less severe violation of the strict O(1) rule still be "not 
ok"? I think this is not such a clear black-and-white issue. Both 
implementations have notably different performance characteristics than 
CPython in some more or less important areas, as does PyPy. At some point, 
the language compliance label has to account for that.

Stefan


From martin at v.loewis.de  Fri Aug 26 12:29:29 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 26 Aug 2011 12:29:29 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>
References: <4E553FBC.7080501@v.loewis.de>
	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>
	<CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>
Message-ID: <4E577589.4030809@v.loewis.de>

> But strings are allocated via PyObject_Malloc(), i.e. the custom
> arena-based allocator -- isn't its overhead (for small objects) less
> than 2 pointers per block?

Ah, right, I missed that. Indeed, those have no header, and the only
overhead is the padding to a multiple of 8.

That shifts the picture; I hope the table below is correct,
assuming ASCII strings.
3.2: 7 pointers (adds 4 bytes padding on 32-bit systems)
393: 10 pointers

string | 32-bit pointer | 32-bit pointer | 64-bit pointer
size   | 16-bit wchar_t | 32-bit wchar_t | 32-bit wchar_t
       | 3.2     |  393 | 3.2    |  393  | 3.2    |  393  |
-----------------------------------------------------------
1      | 40      | 48   | 40     |  48   | 64     | 88    |
2      | 40      | 48   | 48     |  48   | 72     | 88    |
3      | 40      | 48   | 48     |  48   | 72     | 88    |
4      | 48      | 48   | 56     |  48   | 80     | 88    |
5      | 48      | 48   | 56     |  48   | 80     | 88    |
6      | 48      | 48   | 64     |  48   | 88     | 88    |
7      | 48      | 48   | 64     |  48   | 88     | 88    |
8      | 56      | 56   | 72     |  56   | 96     | 86    |

So 1-byte strings increase in size; very short strings increase
on 16-bit-wchar_t systems and 64-bit systems. Short strings
keep there size, and long strings save.

Regards,
Martin



From solipsis at pitrou.net  Fri Aug 26 12:51:30 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 26 Aug 2011 12:51:30 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org> <4E5538B7.8010709@haypocalc.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
Message-ID: <20110826125130.591b142b@pitrou.net>


Why would PEP 393 apply to other implementations than CPython?

Regards

Antoine.



On Fri, 26 Aug 2011 00:01:42 +0000
Dino Viehland <dinov at microsoft.com> wrote:
> Guido wrote:
> > Which reminds me. The PEP does not say what other Python
> > implementations besides CPython should do. presumably Jython and
> > IronPython will continue to use UTF-16, so presumably the language
> > reference will still have to document that strings contain code units (not code
> > points) and the objections Tom Christiansen raised against this will remain
> > true for those versions of Python. (I don't know about PyPy, they can
> > presumably decide when they start their Py3k
> > port.)
> > 
> > OTOH perhaps IronPython 3.3 and Jython 3.3 can use a similar approach and
> > we can lay the narrow build issues to rest? Can someone here speak for
> > them?
> 
> The biggest difficulty for IronPython here would be dealing w/ .NET interop.
> We can certainly introduce either an IronPython specific string class which
> is similar to CPython's PyUnicodeObject or we could have multiple distinct
> .NET types (IronPython.Runtime.AsciiString, System.String, and 
> IronPython.Runtime.Ucs4String) which all appear as the same type to Python. 
> 
> But when Python is calling a .NET API it's always going to return a System.String 
> which is UTF-16.  If we had to check and convert all of those strings when they 
> cross into Python it would be very bad for performance.  Presumably we could
> have a 4th type of "interop" string which lazily computes this but if we start
> wrapping .Net strings we could also get into object identity issues.
> 
> We could stop using System.String in IronPython all together and say when 
> working w/ .NET strings you get the .NET behavior and when working w/ Python 
> strings you get the Python behavior.  I'm not sure how weird and confusing that 
> would be but conversion from an Ipy string to a .NET string could remain cheap if 
> both were UTF-16, and conversions from .NET strings to Ipy strings would only 
> happen if the user did so explicitly.  
> 
> But it's a huge change - it'll almost certainly touch every single source file in 
> IronPython.  I would think we'd get 3.2 done first and then think about what to
> do here.
> 



From stefan_ml at behnel.de  Fri Aug 26 13:08:54 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 26 Aug 2011 13:08:54 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110826125130.591b142b@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>
	<4E5538B7.8010709@haypocalc.com>	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<20110826125130.591b142b@pitrou.net>
Message-ID: <j37us6$t6o$1@dough.gmane.org>

Antoine Pitrou, 26.08.2011 12:51:
> Why would PEP 393 apply to other implementations than CPython?

Not the PEP itself, just the implications of the result.

The question was whether the language specification in a post PEP-393 can 
(and if so, should) be changed into requiring unicode objects to be defined 
based on code points. Narrow builds, as well as Jython and IronPython, 
currently deviate from this as they use UTF-16 as their native string 
encoding, which, for one, prevents O(1) indexing into characters as well as 
a direct match between length and character count (minus combining 
characters etc.).

I think this discussion can safely be considered off-topic for this thread 
(which isn't exactly short enough to keep adding more topics to it).

Stefan


From solipsis at pitrou.net  Fri Aug 26 13:14:33 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 26 Aug 2011 13:14:33 +0200
Subject: [Python-Dev] Windows installers and %PATH%
References: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>
	<CADiSq7dtpc-oLoP3iUCNyc0NMQ_jUkKReUP_MqbR3XwnFZO2Gw@mail.gmail.com>
Message-ID: <20110826131433.7ab3680d@pitrou.net>

On Fri, 26 Aug 2011 14:52:07 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> Windows is a developer hostile platform unless you completely buy into
> the Microsoft toolchain, which is not an option for cross-platform
> projects like Python.

We already buy into the MS toolchain since we require Visual Studio (or
at least the command-line tools for building, but I suppose anyone doing
serious development on Windows would use the GUI). We also maintain the
project files by hand instead of using e.g. cmake.

> It's well within Microsoft's capabilities to create and support a
> POSIX compatibility layer that allows applications to look and feel
> like native ones

I have a hard time imagining how a POSIX compatibility layer would
make Windows apps feel more "native".
It's a matter of fact that Unix and Windows systems function
differently. I don't know how much of it can be completely hidden.

> the multibillion dollar corporation deliberately
> failing to implement a widely recognised OS interoperability
> standard

I wouldn't call POSIX an OS interoperability standard, but an Unix
interoperability standard. It exists because there is so much
fragmentation in the Unix world. I doubt MS was invited to the party
when POSIX specifications were designed.

Windows has its own standards, but since MS is basically the sole OS
vendor, they are free to dictate them :)

And when I look at the various "POSIX" systems we try to support there:
http://www.python.org/dev/buildbot/all/waterfall?category=3.x.stable&category=3.x.unstable
I have the feeling that perhaps we spend more time trying to work around
incompatibilities, special cases and various levels of (in)compliance
among POSIX systems, than implementing the Windows-specific code paths
of low-level functions (where the APIs are usually well-defined and
very stable).

Regards

Antoine.



From brian.curtin at gmail.com  Fri Aug 26 15:40:38 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Fri, 26 Aug 2011 08:40:38 -0500
Subject: [Python-Dev] Windows installers and %PATH%
In-Reply-To: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>
References: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>
Message-ID: <CAD+XWwppdPqtMCm5vU9m=Tn-V42PAj5NPv0H96U=K8fkjNMKXg@mail.gmail.com>

On Thu, Aug 25, 2011 at 23:04, Andrew Pennebaker <
andrew.pennebaker at gmail.com> wrote:

> Please have the Windows installers add the Python installation directory to
> the PATH environment variable.


The http://bugs.python.org bug tracker is a better place for feature
requests like this, of which there have been several over the years. This
has become a hotter topic lately with several discussions around the
community, and a PEP to provide some similar functionality. I've talked with
several educators/trainers around and the lack of a Path installation is the
#1 thing that bites their newcomers, and it's an issue that bites them
before they've even begun to learn.

Many newbies dive in without knowing that they must manually add C:\PythonXY
> to PATH. It's yak shaving, something perfectly automatable that should have
> been done by the installers way back in Python 1.0.
>
> Please also add PYTHONROOT\Scripts. It's where cool things like
> easy_install.exe are stored. More yak shaving.
>

A clean installation of Python includes no Scripts directory, so I'm not
sure we should be polluting the Path with yet-to-exist directories. An
approach could be to have packaging optionally add the scripts directory on
the installation of a third-party package.

The only potential downside to this is upsetting users who manage multiple
> python installations. It's not a problem: they already manually adjust PATH
> to their liking.
>

"Users who manage multiple python installations" is probably a very, very
large number, so we have quite the audience to appease, and it actually is a
problem. We should not go halfway on this feature and say "if it doesn't
work perfectly, you're back to being on your own". I think the likely case
is that any path addition feature will read the path, then offer to replace
existing instances or append to the end.

I haven't yet done any work on this, but my todo list for 3.3 includes
adding some path related features to the installer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/c5c79a50/attachment.html>

From jnoller at gmail.com  Fri Aug 26 16:05:09 2011
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 26 Aug 2011 10:05:09 -0400
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
Message-ID: <CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>

On Fri, Aug 26, 2011 at 3:18 AM, Nir Aides <nir at winpdb.org> wrote:
> Another face of the discussion is about whether to deprecate the mixing of
> the threading and processing modules and what to do about the
> multiprocessing module which is implemented with worker threads.

There's a bug open - http://bugs.python.org/issue8713 which would
offer non windows users the ability to avoid using fork() entirely,
which would sidestep the problem outlined in the atfork() bug. Under
windows, which has no fork() mechanism, we create a subprocess and
then use pipes for intercommunication: nothing is inherited from the
parent process except the state passed into the child.

I think that "deprecating" the use of threads w/ multiprocessing - or
at least crippling it is the wrong answer. Multiprocessing needs the
helper threads it uses internally to manage queues, etc. Removing that
ability would require a near-total rewrite, which is just a
non-starter.

I'd rather examine bug 8713 more closely, and offer this option for
all users in 3.x and document the existing issues outlined in
http://bugs.python.org/issue6721 for 2.x - the proposals in that bug
are IMHO, out of bounds for a 2.x release.

In essence; the issue here is multiprocessing's use of fork on unix
without the following exec - which is what the windows implementation
essentially does using subprocess.

Adding the option to *not* fork changes the fundamental behavior on
unix systems - but I fundamentally feel that it's a saner, and more
consistent behavior for the module as a whole.

So, I'd ask that we not talk about tearing out the ability to use MP
and threads, or threads with MP - that would be crippling, and there's
existing code in the wild (including multiprocessing itself) that uses
this mix without issue - it's stripping out functionality for what is
a surprising and painful edge case that rarely directly affects users.

I would focus on the atfork() patch more directly, ignoring
multiprocessing in the discussion, and focusing on the merits of gps'
initial proposal and patch.

jesse

From guido at python.org  Fri Aug 26 16:55:39 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 07:55:39 -0700
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E577589.4030809@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de>
	<CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>
	<4E577589.4030809@v.loewis.de>
Message-ID: <CAP7+vJ+fYXLcqG_rHkNtg5yuZKJTj2nMxFaLczr+zqeKUBtY8A@mail.gmail.com>

It would be nice if someone wrote a test to roughly verify these
numbers, e.v. by allocating lots of strings of a certain size and
measuring the process size before and after (being careful to adjust
for the list or other data structure required to keep those objects
alive).

--Guido

On Fri, Aug 26, 2011 at 3:29 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> But strings are allocated via PyObject_Malloc(), i.e. the custom
>> arena-based allocator -- isn't its overhead (for small objects) less
>> than 2 pointers per block?
>
> Ah, right, I missed that. Indeed, those have no header, and the only
> overhead is the padding to a multiple of 8.
>
> That shifts the picture; I hope the table below is correct,
> assuming ASCII strings.
> 3.2: 7 pointers (adds 4 bytes padding on 32-bit systems)
> 393: 10 pointers
>
> string | 32-bit pointer | 32-bit pointer | 64-bit pointer
> size ? | 16-bit wchar_t | 32-bit wchar_t | 32-bit wchar_t
> ? ? ? | 3.2 ? ? | ?393 | 3.2 ? ?| ?393 ?| 3.2 ? ?| ?393 ?|
> -----------------------------------------------------------
> 1 ? ? ?| 40 ? ? ?| 48 ? | 40 ? ? | ?48 ? | 64 ? ? | 88 ? ?|
> 2 ? ? ?| 40 ? ? ?| 48 ? | 48 ? ? | ?48 ? | 72 ? ? | 88 ? ?|
> 3 ? ? ?| 40 ? ? ?| 48 ? | 48 ? ? | ?48 ? | 72 ? ? | 88 ? ?|
> 4 ? ? ?| 48 ? ? ?| 48 ? | 56 ? ? | ?48 ? | 80 ? ? | 88 ? ?|
> 5 ? ? ?| 48 ? ? ?| 48 ? | 56 ? ? | ?48 ? | 80 ? ? | 88 ? ?|
> 6 ? ? ?| 48 ? ? ?| 48 ? | 64 ? ? | ?48 ? | 88 ? ? | 88 ? ?|
> 7 ? ? ?| 48 ? ? ?| 48 ? | 64 ? ? | ?48 ? | 88 ? ? | 88 ? ?|
> 8 ? ? ?| 56 ? ? ?| 56 ? | 72 ? ? | ?56 ? | 96 ? ? | 86 ? ?|
>
> So 1-byte strings increase in size; very short strings increase
> on 16-bit-wchar_t systems and 64-bit systems. Short strings
> keep there size, and long strings save.
>
> Regards,
> Martin
>
>
>



-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Fri Aug 26 16:56:05 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 07:56:05 -0700
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <CAP7+vJ+fYXLcqG_rHkNtg5yuZKJTj2nMxFaLczr+zqeKUBtY8A@mail.gmail.com>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de>
	<CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>
	<4E577589.4030809@v.loewis.de>
	<CAP7+vJ+fYXLcqG_rHkNtg5yuZKJTj2nMxFaLczr+zqeKUBtY8A@mail.gmail.com>
Message-ID: <CAP7+vJJD1hS=s5XREUh3F92=P=9PXJZSy1F00foZe+CuEpY5-g@mail.gmail.com>

Also, please add the table (and the reasoning that led to it) to the PEP.

On Fri, Aug 26, 2011 at 7:55 AM, Guido van Rossum <guido at python.org> wrote:
> It would be nice if someone wrote a test to roughly verify these
> numbers, e.v. by allocating lots of strings of a certain size and
> measuring the process size before and after (being careful to adjust
> for the list or other data structure required to keep those objects
> alive).
>
> --Guido
>
> On Fri, Aug 26, 2011 at 3:29 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>>> But strings are allocated via PyObject_Malloc(), i.e. the custom
>>> arena-based allocator -- isn't its overhead (for small objects) less
>>> than 2 pointers per block?
>>
>> Ah, right, I missed that. Indeed, those have no header, and the only
>> overhead is the padding to a multiple of 8.
>>
>> That shifts the picture; I hope the table below is correct,
>> assuming ASCII strings.
>> 3.2: 7 pointers (adds 4 bytes padding on 32-bit systems)
>> 393: 10 pointers
>>
>> string | 32-bit pointer | 32-bit pointer | 64-bit pointer
>> size ? | 16-bit wchar_t | 32-bit wchar_t | 32-bit wchar_t
>> ? ? ? | 3.2 ? ? | ?393 | 3.2 ? ?| ?393 ?| 3.2 ? ?| ?393 ?|
>> -----------------------------------------------------------
>> 1 ? ? ?| 40 ? ? ?| 48 ? | 40 ? ? | ?48 ? | 64 ? ? | 88 ? ?|
>> 2 ? ? ?| 40 ? ? ?| 48 ? | 48 ? ? | ?48 ? | 72 ? ? | 88 ? ?|
>> 3 ? ? ?| 40 ? ? ?| 48 ? | 48 ? ? | ?48 ? | 72 ? ? | 88 ? ?|
>> 4 ? ? ?| 48 ? ? ?| 48 ? | 56 ? ? | ?48 ? | 80 ? ? | 88 ? ?|
>> 5 ? ? ?| 48 ? ? ?| 48 ? | 56 ? ? | ?48 ? | 80 ? ? | 88 ? ?|
>> 6 ? ? ?| 48 ? ? ?| 48 ? | 64 ? ? | ?48 ? | 88 ? ? | 88 ? ?|
>> 7 ? ? ?| 48 ? ? ?| 48 ? | 64 ? ? | ?48 ? | 88 ? ? | 88 ? ?|
>> 8 ? ? ?| 56 ? ? ?| 56 ? | 72 ? ? | ?56 ? | 96 ? ? | 86 ? ?|
>>
>> So 1-byte strings increase in size; very short strings increase
>> on 16-bit-wchar_t systems and 64-bit systems. Short strings
>> keep there size, and long strings save.
>>
>> Regards,
>> Martin
>>
>>
>>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>



-- 
--Guido van Rossum (python.org/~guido)

From stefan_ml at behnel.de  Fri Aug 26 17:55:05 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 26 Aug 2011 17:55:05 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <j36et5$oa6$1@dough.gmane.org>
References: <4E553FBC.7080501@v.loewis.de> <j365bt$p1o$1@dough.gmane.org>
	<j36et5$oa6$1@dough.gmane.org>
Message-ID: <j38fkp$kj7$1@dough.gmane.org>

Stefan Behnel, 25.08.2011 23:30:
> Sadly, a quick look at a couple of recent commits in the pep-393 branch
> suggested that it is not even always obvious to you as the authors which
> macros can be called safely and which cannot. I immediately spotted a bug
> in one of the updated core functions (unicode_repr, IIRC) where
> PyUnicode_GET_LENGTH() is called without a previous call to
> PyUnicode_FAST_READY().

Here is another example from unicodeobject.c, commit 56aaa17fc05e:

+    switch(PyUnicode_KIND(string)) {
+    case PyUnicode_1BYTE_KIND:
+        list = ucs1lib_splitlines(
+            (PyObject*) string, PyUnicode_1BYTE_DATA(string),
+            PyUnicode_GET_LENGTH(string), keepends);
+        break;
+    case PyUnicode_2BYTE_KIND:
+        list = ucs2lib_splitlines(
+            (PyObject*) string, PyUnicode_2BYTE_DATA(string),
+            PyUnicode_GET_LENGTH(string), keepends);
+        break;
+    case PyUnicode_4BYTE_KIND:
+        list = ucs4lib_splitlines(
+            (PyObject*) string, PyUnicode_4BYTE_DATA(string),
+            PyUnicode_GET_LENGTH(string), keepends);
+        break;
+    default:
+        assert(0);
+        list = 0;
+    }

The assert(0) at the end will hit when the system is running out of memory 
while working on a wchar string.

Stefan


From solipsis at pitrou.net  Fri Aug 26 17:53:36 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 26 Aug 2011 17:53:36 +0200
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
Message-ID: <20110826175336.3af6be57@pitrou.net>


Hi,

> I think that "deprecating" the use of threads w/ multiprocessing - or
> at least crippling it is the wrong answer. Multiprocessing needs the
> helper threads it uses internally to manage queues, etc. Removing that
> ability would require a near-total rewrite, which is just a
> non-starter.

I agree that this wouldn't actually benefit anyone.
Besides, I don't think it's even possible to avoid threads in
multiprocessing, given the various constraints. We would have to force
the user to run their main thread in an event loop, and that would be
twisted (tm).

> I would focus on the atfork() patch more directly, ignoring
> multiprocessing in the discussion, and focusing on the merits of gps'
> initial proposal and patch.

I think this could also be combined with Charles-Fran?ois' patch.

Regards

Antoine.



From status at bugs.python.org  Fri Aug 26 18:07:20 2011
From: status at bugs.python.org (Python tracker)
Date: Fri, 26 Aug 2011 18:07:20 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20110826160720.D4C731CA8A@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-08-19 - 2011-08-26)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    2963 (+26)
  closed 21665 (+35)
  total  24628 (+61)

Open issues with patches: 1288 


Issues opened (44)
==================

#12326: Linux 3: code should avoid using sys.platform == 'linux2'
http://bugs.python.org/issue12326  reopened by georg.brandl

#12788: test_email fails with -R
http://bugs.python.org/issue12788  opened by pitrou

#12790: doctest.testmod does not run tests in functools.partial functi
http://bugs.python.org/issue12790  opened by stevenjd

#12793: allow filters in os.walk
http://bugs.python.org/issue12793  opened by Jacek.Pliszka

#12795: Remove the major version from sys.platform
http://bugs.python.org/issue12795  opened by haypo

#12797: io.FileIO and io.open should support openat
http://bugs.python.org/issue12797  opened by pitrou

#12798: Update mimetypes documentation
http://bugs.python.org/issue12798  opened by sandro.tosi

#12800: 'tarfile.StreamError: seeking backwards is not allowed' when e
http://bugs.python.org/issue12800  opened by adunand

#12801: C realpath not used by os.path.realpath
http://bugs.python.org/issue12801  opened by pitrou

#12802: Windows error code 267 should be mapped to ENOTDIR, not EINVAL
http://bugs.python.org/issue12802  opened by pitrou

#12805: Optimizations for bytes.join() et. al
http://bugs.python.org/issue12805  opened by jcon

#12806: argparse: Hybrid help text formatter
http://bugs.python.org/issue12806  opened by GraylinKim

#12807: Optimizations for {bytearray,bytes,unicode}.strip()
http://bugs.python.org/issue12807  opened by jcon

#12808: Coverage of codecs.py
http://bugs.python.org/issue12808  opened by tleeuwenburg

#12809: Missing new setsockopts in Linux (eg: IP_TRANSPARENT)
http://bugs.python.org/issue12809  opened by micolous

#12812: libffi does not build with clang on amd64
http://bugs.python.org/issue12812  opened by shenki

#12813: uuid4 is not tested if a uuid4 system routine isn't present
http://bugs.python.org/issue12813  opened by anacrolix

#12814: Possible intermittent bug in test_array
http://bugs.python.org/issue12814  opened by ncoghlan

#12815: Coverage of smtpd.py
http://bugs.python.org/issue12815  opened by tleeuwenburg

#12816: smtpd uses library outside of the standard libraries
http://bugs.python.org/issue12816  opened by tleeuwenburg

#12817: test_multiprocessing: io.BytesIO() requires bytearray buffers
http://bugs.python.org/issue12817  opened by skrah

#12818: email.utils.formataddr incorrectly quotes parens inside quoted
http://bugs.python.org/issue12818  opened by r.david.murray

#12819: PEP 393 - Flexible Unicode String Representation
http://bugs.python.org/issue12819  opened by torsten.becker

#12820: Tests for Lib/xml/dom/minicompat.py
http://bugs.python.org/issue12820  opened by John.Chandler

#12822: NewGIL should use CLOCK_MONOTONIC if possible.
http://bugs.python.org/issue12822  opened by naoki

#12823: Broken link in "SSL wrapper for socket objects" document
http://bugs.python.org/issue12823  opened by iworm

#12825: Missing and incorrect link to a command line option.
http://bugs.python.org/issue12825  opened by Kyle.Simpson

#12828: xml.dom.minicompat is not documented
http://bugs.python.org/issue12828  opened by sandro.tosi

#12829: pyexpat segmentation fault caused by multiple calls to Parse()
http://bugs.python.org/issue12829  opened by dhgutteridge

#12830: --install-data doesn't effect resources destination
http://bugs.python.org/issue12830  opened by trevor

#12832: The documentation for the print function should explain/point 
http://bugs.python.org/issue12832  opened by r.david.murray

#12833: raw_input misbehaves when readline is imported
http://bugs.python.org/issue12833  opened by idank

#12834: memoryview.tobytes() incorrect for non-contiguous arrays
http://bugs.python.org/issue12834  opened by skrah

#12835: Missing SSLSocket.sendmsg() wrapper allows programs to send un
http://bugs.python.org/issue12835  opened by baikie

#12836: cast() creates circular reference in original object
http://bugs.python.org/issue12836  opened by bgilbert

#12837: Patch for issue #12810 removed a valid check on socket ancilla
http://bugs.python.org/issue12837  opened by baikie

#12839: zlibmodule cannot handle Z_VERSION_ERROR zlib error
http://bugs.python.org/issue12839  opened by rmtew

#12840: "maintainer" value clear the "author" value when register
http://bugs.python.org/issue12840  opened by keul

#12841: Incorrect tarfile.py extraction
http://bugs.python.org/issue12841  opened by seblu

#12842: Docs: first parameter of tp_richcompare() always has the corre
http://bugs.python.org/issue12842  opened by skrah

#12843: file object read* methods in append mode overflows
http://bugs.python.org/issue12843  opened by Otacon.Karurosu

#12844: Support more than 255 arguments
http://bugs.python.org/issue12844  opened by andersk

#12845: PEP-3118: C-contiguity with zero strides
http://bugs.python.org/issue12845  opened by skrah

#12846: unicodedata.normalize turkish letter problem
http://bugs.python.org/issue12846  opened by fizymania



Most recent 15 issues with no replies (15)
==========================================

#12845: PEP-3118: C-contiguity with zero strides
http://bugs.python.org/issue12845

#12842: Docs: first parameter of tp_richcompare() always has the corre
http://bugs.python.org/issue12842

#12836: cast() creates circular reference in original object
http://bugs.python.org/issue12836

#12815: Coverage of smtpd.py
http://bugs.python.org/issue12815

#12814: Possible intermittent bug in test_array
http://bugs.python.org/issue12814

#12813: uuid4 is not tested if a uuid4 system routine isn't present
http://bugs.python.org/issue12813

#12812: libffi does not build with clang on amd64
http://bugs.python.org/issue12812

#12809: Missing new setsockopts in Linux (eg: IP_TRANSPARENT)
http://bugs.python.org/issue12809

#12805: Optimizations for bytes.join() et. al
http://bugs.python.org/issue12805

#12800: 'tarfile.StreamError: seeking backwards is not allowed' when e
http://bugs.python.org/issue12800

#12790: doctest.testmod does not run tests in functools.partial functi
http://bugs.python.org/issue12790

#12788: test_email fails with -R
http://bugs.python.org/issue12788

#12771: 2to3 -d adds extra whitespace
http://bugs.python.org/issue12771

#12742: Add support for CESU-8 encoding
http://bugs.python.org/issue12742

#12739: read stuck with multithreading and simultaneous subprocess.Pop
http://bugs.python.org/issue12739



Most recent 15 issues waiting for review (15)
=============================================

#12842: Docs: first parameter of tp_richcompare() always has the corre
http://bugs.python.org/issue12842

#12841: Incorrect tarfile.py extraction
http://bugs.python.org/issue12841

#12839: zlibmodule cannot handle Z_VERSION_ERROR zlib error
http://bugs.python.org/issue12839

#12837: Patch for issue #12810 removed a valid check on socket ancilla
http://bugs.python.org/issue12837

#12835: Missing SSLSocket.sendmsg() wrapper allows programs to send un
http://bugs.python.org/issue12835

#12832: The documentation for the print function should explain/point 
http://bugs.python.org/issue12832

#12822: NewGIL should use CLOCK_MONOTONIC if possible.
http://bugs.python.org/issue12822

#12820: Tests for Lib/xml/dom/minicompat.py
http://bugs.python.org/issue12820

#12819: PEP 393 - Flexible Unicode String Representation
http://bugs.python.org/issue12819

#12818: email.utils.formataddr incorrectly quotes parens inside quoted
http://bugs.python.org/issue12818

#12817: test_multiprocessing: io.BytesIO() requires bytearray buffers
http://bugs.python.org/issue12817

#12816: smtpd uses library outside of the standard libraries
http://bugs.python.org/issue12816

#12815: Coverage of smtpd.py
http://bugs.python.org/issue12815

#12813: uuid4 is not tested if a uuid4 system routine isn't present
http://bugs.python.org/issue12813

#12809: Missing new setsockopts in Linux (eg: IP_TRANSPARENT)
http://bugs.python.org/issue12809



Top 10 most discussed issues (10)
=================================

#12326: Linux 3: code should avoid using sys.platform == 'linux2'
http://bugs.python.org/issue12326  30 msgs

#12678: test_packaging and test_distutils failures under Windows
http://bugs.python.org/issue12678  27 msgs

#12713: argparse: allow abbreviation of sub commands by users
http://bugs.python.org/issue12713  13 msgs

#12795: Remove the major version from sys.platform
http://bugs.python.org/issue12795  12 msgs

#5231: Change format of a memoryview
http://bugs.python.org/issue5231   9 msgs

#11564: pickle not 64-bit ready
http://bugs.python.org/issue11564   9 msgs

#12801: C realpath not used by os.path.realpath
http://bugs.python.org/issue12801   9 msgs

#12808: Coverage of codecs.py
http://bugs.python.org/issue12808   8 msgs

#5113: 2.5.4.3 / test_posix failing on HPUX systems
http://bugs.python.org/issue5113   7 msgs

#12760: Add create mode to open()
http://bugs.python.org/issue12760   7 msgs



Issues closed (34)
==================

#4106: multiprocessing occasionally spits out exception during shutdo
http://bugs.python.org/issue4106  closed by pitrou

#5301: add mimetype for image/vnd.microsoft.icon (patch)
http://bugs.python.org/issue5301  closed by sandro.tosi

#6484: No unit test for mailcap module
http://bugs.python.org/issue6484  closed by python-dev

#6560: socket sendmsg(), recvmsg() methods
http://bugs.python.org/issue6560  closed by python-dev

#9200: Make the str.is* methods work with non-BMP chars on narrow bui
http://bugs.python.org/issue9200  closed by ezio.melotti

#11657: multiprocessing_{send,recv}fd fail with fds > 256
http://bugs.python.org/issue11657  closed by pitrou

#12191: Add shutil.chown to allow to use user and group name (and not 
http://bugs.python.org/issue12191  closed by sandro.tosi

#12213: BufferedRandom: issues with interlaced read-write
http://bugs.python.org/issue12213  closed by pitrou

#12461: it's not clear how the shutil.copystat() should work on symlin
http://bugs.python.org/issue12461  closed by eric.araujo

#12656: test.test_asyncore: add tests for AF_INET6 and AF_UNIX sockets
http://bugs.python.org/issue12656  closed by neologix

#12682: Meaning of 'accepted' resolution as documented in devguide
http://bugs.python.org/issue12682  closed by ezio.melotti

#12745: Python2 or Python3 page
http://bugs.python.org/issue12745  closed by terry.reedy

#12772: fractional day attribute in datetime class
http://bugs.python.org/issue12772  closed by belopolsky

#12775: immense performance problems related to the garbage collector
http://bugs.python.org/issue12775  closed by terry.reedy

#12778: JSON-serializing a large container takes too much memory
http://bugs.python.org/issue12778  closed by pitrou

#12783: test_posix failure on FreeBSD 6.4: test_get_and_set_scheduler_
http://bugs.python.org/issue12783  closed by neologix

#12786: subprocess wait() hangs when stdin is closed
http://bugs.python.org/issue12786  closed by neologix

#12787: xmlrpc.client documentation (MultiCall Objects) points to a br
http://bugs.python.org/issue12787  closed by sandro.tosi

#12789: re.Scanner doesn't support more than 2 groups on regex
http://bugs.python.org/issue12789  closed by angelonuffer

#12791: reference cycle with exception state not broken by generator.c
http://bugs.python.org/issue12791  closed by pitrou

#12792: Document the "type" field of the tracker in the devguide
http://bugs.python.org/issue12792  closed by ezio.melotti

#12794: platform: add a function to get the system version as tuple
http://bugs.python.org/issue12794  closed by haypo

#12796: total_ordering goes into infinite recursion when NotImplemente
http://bugs.python.org/issue12796  closed by ncoghlan

#12799: realpath not resolving symbolic links under Windows
http://bugs.python.org/issue12799  closed by haypo

#12803: SSLContext.load_cert_chain() should accept a password argument
http://bugs.python.org/issue12803  closed by pitrou

#12804: make test should not enable the urlfetch resource
http://bugs.python.org/issue12804  closed by nadeem.vawda

#12810: Remove check for negative unsigned value in socketmodule.c
http://bugs.python.org/issue12810  closed by neologix

#12811: Tabnanny doesn't close its tokenize files properly
http://bugs.python.org/issue12811  closed by ncoghlan

#12821: test_fcntl failed on OpenBSD 5.x
http://bugs.python.org/issue12821  closed by neologix

#12824: Make the write_file() helper function in test_shutil return th
http://bugs.python.org/issue12824  closed by hynek

#12826: module _socket failed to build on OpenBSD
http://bugs.python.org/issue12826  closed by python-dev

#12827: OS-specific location in Lib/tempfile.py for OpenBSD
http://bugs.python.org/issue12827  closed by neologix

#12831: 2to3 and integer division
http://bugs.python.org/issue12831  closed by mark.dickinson

#12838: FAQ/Programming typo: range[3] is used
http://bugs.python.org/issue12838  closed by python-dev

From brett at python.org  Fri Aug 26 18:35:12 2011
From: brett at python.org (Brett Cannon)
Date: Fri, 26 Aug 2011 09:35:12 -0700
Subject: [Python-Dev] Planned PEP status changes
In-Reply-To: <CADiSq7e7nj+EV4remUejKJYZczCmrs27iekQv9y9TEnkwx4=SA@mail.gmail.com>
References: <CADiSq7e7nj+EV4remUejKJYZczCmrs27iekQv9y9TEnkwx4=SA@mail.gmail.com>
Message-ID: <CAP1=2W5uHA9UbVEiMGib51xTqT8h4dVoi3BCgZfX8AV5h-ev=A@mail.gmail.com>

On Tue, Aug 23, 2011 at 19:42, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Unless I hear any objections, I plan to adjust the current PEP
> statuses as follows some time this weekend:
>
> Move from Accepted to Finished:
>
> ? ?389 ?argparse - New Command Line Parsing Module ? ? ? ? ? ? ?Bethard
> ? ?391 ?Dictionary-Based Configuration For Logging ? ? ? ? ? ? ?Sajip
> ? ?3108 ?Standard Library Reorganization ? ? ? ? ? ? ? ? ? ? ? ? Cannon

<sigh> I had always hoped to get profile/cProfile taken care of, but
obviously that just didn't ever happen. So no objection, just a slight
sting from the reminder of why the PEP was left open.

-Brett

> ? ?3135 ?New Super
> Spealman, Delaney, Ryan
>
> Move from Accepted to Withdrawn (with a reference to Reid Kleckner's blog post)
> ? ?3146 ?Merging Unladen Swallow into CPython
> Winter, Yasskin, Kleckner
>
>
> The PEP 3118 enhanced buffer protocol has some ongoing semantic and
> implementation issues still to be worked out, so I plan to leave that
> at Accepted. Ditto for PEP 3121 (extension module finalisation), since
> that doesn't play nicely with the current 'set everything to None'
> approach to breaking cycles during module finalisation.
>
> The other Accepted PEPs are either packaging standards related or
> genuinely not implemented yet.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
>

From guido at python.org  Fri Aug 26 18:51:00 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 09:51:00 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E576793.2010203@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de>
Message-ID: <CAP7+vJKBmuCFpRcL2SyqWBfcrX4yTYieEJqm7t9aP7HoAQMR3A@mail.gmail.com>

On Fri, Aug 26, 2011 at 2:29 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> IronPython and Jython can retain UTF-16 as their native form if that
>> makes interop cleaner, but in doing so they need to ensure that basic
>> operations like indexing and len work in terms of code points, not
>> code units, if they are to conform.
>
> That means that they won't conform, period. There is no efficient
> maintainable implementation strategy to achieve that property, and
> it may take well years until somebody provides an efficient
> unmaintainable implementation.
>
>> Does this make sense, or have I completely misunderstood things?
>
> You seem to assume it is ok for Jython/IronPython to provide indexing in
> O(n). It is not.

Indeed.

> However, non-conformance may not be that much of an issue. They do not
> conform in many other aspects, either (such as not supporting Python 3,
> for example, or not supporting the C API) that they may well chose to
> ignore such a minor requirement if there was one. For BMP strings,
> they conform fine, and it may well be that Jython eithers either don't
> have non-BMP strings, or don't care whether len() or indexing of their
> non-BMP strings is "correct".

I think this is fine. I had been hoping that all Python
implementations claiming compatibility with version 3.3 of the
language reference would be free of worries about surrogates, but it
simply doesn't make sense.

And yes, I'm well aware that PEP 393 is only for CPython. It's just
that I had hoped that it would get rid of some of Tom C's specific
complaints for all Python implementations; but it really seems
impossible to do so.

One consequence may be that the standard library, to the extent it is
shared by other implementations, may still have to worry about
surrogates and other issues inherent in narrow builds or other
16-bit-based string types. We'll cross that bridge when we get to it.

-- 
--Guido van Rossum (python.org/~guido)

From martin at v.loewis.de  Fri Aug 26 18:56:07 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 26 Aug 2011 18:56:07 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <j38fkp$kj7$1@dough.gmane.org>
References: <4E553FBC.7080501@v.loewis.de>
	<j365bt$p1o$1@dough.gmane.org>	<j36et5$oa6$1@dough.gmane.org>
	<j38fkp$kj7$1@dough.gmane.org>
Message-ID: <4E57D027.10204@v.loewis.de>

Am 26.08.2011 17:55, schrieb Stefan Behnel:
> Stefan Behnel, 25.08.2011 23:30:
>> Sadly, a quick look at a couple of recent commits in the pep-393 branch
>> suggested that it is not even always obvious to you as the authors which
>> macros can be called safely and which cannot. I immediately spotted a bug
>> in one of the updated core functions (unicode_repr, IIRC) where
>> PyUnicode_GET_LENGTH() is called without a previous call to
>> PyUnicode_FAST_READY().
> 
> Here is another example from unicodeobject.c, commit 56aaa17fc05e:
> 
> +    switch(PyUnicode_KIND(string)) {
> +    case PyUnicode_1BYTE_KIND:
> +        list = ucs1lib_splitlines(
> +            (PyObject*) string, PyUnicode_1BYTE_DATA(string),
> +            PyUnicode_GET_LENGTH(string), keepends);
> +        break;
> +    case PyUnicode_2BYTE_KIND:
> +        list = ucs2lib_splitlines(
> +            (PyObject*) string, PyUnicode_2BYTE_DATA(string),
> +            PyUnicode_GET_LENGTH(string), keepends);
> +        break;
> +    case PyUnicode_4BYTE_KIND:
> +        list = ucs4lib_splitlines(
> +            (PyObject*) string, PyUnicode_4BYTE_DATA(string),
> +            PyUnicode_GET_LENGTH(string), keepends);
> +        break;
> +    default:
> +        assert(0);
> +        list = 0;
> +    }
> 
> The assert(0) at the end will hit when the system is running out of
> memory while working on a wchar string.

No, that should not happen: it should never get to this point.

I agree with your observation that somebody should be done about error
handling, and will update the PEP shortly. I propose that
PyUnicode_Ready should be explicitly called on input where raising an
exception is feasible. In contexts where it is not feasible (such
as reading a character, or reading the length or the kind), failing to
ready the string should cause a fatal error.

What do you think?

Regards,
Martin

From guido at python.org  Fri Aug 26 19:02:46 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 10:02:46 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j37sj4$e3l$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <j37sj4$e3l$1@dough.gmane.org>
Message-ID: <CAP7+vJKn2NdSmTQUZZkdqvEMzDY8ttxaTchZeiuTsaOxSYUC=Q@mail.gmail.com>

On Fri, Aug 26, 2011 at 3:29 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> "Martin v. L?wis", 26.08.2011 11:29:
>>
>> You seem to assume it is ok for Jython/IronPython to provide indexing in
>> O(n). It is not.
>
> I think we can leave this discussion aside.

(And yet, you keep arguing. :-)

> Jython and IronPython have their
> own platform specific constraints to which they need to adapt their
> implementation. For a Jython user, it means a lot to be able to efficiently
> pass strings (and other data) back and forth between Jython and other JVM
> code, and it's not hard to guess that the same is true for IronPython/.NET
> users. After all, the platform integration is the very *reason* for most
> users to select one of these implementations.

Right.

> Besides, what if these implementations provided indexing in, say, O(log N)
> instead of O(1) or O(N), e.g. by building a tree index into each string? You
> could have an index that simply marks runs of surrogate pairs and BMP
> substrings, thus providing a likely-to-be-somewhat-compact index. That index
> would obviously have to be built, but so do the different string
> representations in post-PEP-393 CPython, especially on Windows, as I have
> learned.

Eek. No, please. Those platforms' native string types have length and
slicing operations that are O(1) and work in terms of 16-bit code
points. Python should use those. It would be awful if Java and Python
code doing the same manipulations on the same string would come to
different conclusions because Python tried to paper over surrogates.

I dug up some evidence for Java, at least:

http://download.oracle.com/javase/1,5.0/docs/api/java/lang/CharSequence.html#length%28%29

"""
length

int length()

    Returns the length of this character sequence. The length is the
number of 16-bit chars in the sequence.

    Returns:
        the number of chars in this sequence
"""

This is quite explicit about counting 16-bit code units. I've found
similar info about .NET, which defines "char" as a 16-bit quantity and
string length in terms of the number of "char" items.

> Would such a less severe violation of the strict O(1) rule still be "not
> ok"? I think this is not such a clear black-and-white issue. Both
> implementations have notably different performance characteristics than
> CPython in some more or less important areas, as does PyPy. At some point,
> the language compliance label has to account for that.

Since you had to ask, I have to declare that, indeed, non-O(1)
behavior would not be okay for those platforms.

All in all, I don't think we should legislate Python strings to be
able to support 21-bit code points using O(1) indexing. PEP 393 makes
this possible for CPython, and it's been said that PyPy can follow
suit. But it'll be a "quality-of-implementation" issue, not built into
the language spec.

-- 
--Guido van Rossum (python.org/~guido)

From p.f.moore at gmail.com  Fri Aug 26 19:13:42 2011
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 26 Aug 2011 18:13:42 +0100
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKn2NdSmTQUZZkdqvEMzDY8ttxaTchZeiuTsaOxSYUC=Q@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <j37sj4$e3l$1@dough.gmane.org>
	<CAP7+vJKn2NdSmTQUZZkdqvEMzDY8ttxaTchZeiuTsaOxSYUC=Q@mail.gmail.com>
Message-ID: <CACac1F-CPeTWk=nax2YiJ8r-Z4E6BttaOi=n5_b8H79traL2Gg@mail.gmail.com>

On 26 August 2011 17:51, Guido van Rossum <guido at python.org> wrote:
> On Fri, Aug 26, 2011 at 2:29 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:

(Regarding my comments on code point semantics)

>> You seem to assume it is ok for Jython/IronPython to provide indexing in
>> O(n). It is not.
>
> Indeed.


On 26 August 2011 18:02, Guido van Rossum <guido at python.org> wrote:

> Eek. No, please. Those platforms' native string types have length and
> slicing operations that are O(1) and work in terms of 16-bit code
> points. Python should use those. It would be awful if Java and Python
> code doing the same manipulations on the same string would come to
> different conclusions because Python tried to paper over surrogates.

*That* is actually the erroneous assumption I had made - that the Java
and .NET native string type had code point semantics (i.e., took
surrogates into account). As that isn't the case, my comments aren't
valid - and I agree that having common semantics (and hence exposing
surrogates) is too important to lose.

On the other hand, that pretty much establishes that whatever PEP 393
achieves in terms of allowing all builds of CPython to offer code
point semantics, the language definition can't mandate it.

Thanks for the clarification.
Paul.

From andrew.pennebaker at gmail.com  Fri Aug 26 19:18:55 2011
From: andrew.pennebaker at gmail.com (Andrew Pennebaker)
Date: Fri, 26 Aug 2011 13:18:55 -0400
Subject: [Python-Dev] Windows installers and %PATH%
In-Reply-To: <CAD+XWwppdPqtMCm5vU9m=Tn-V42PAj5NPv0H96U=K8fkjNMKXg@mail.gmail.com>
References: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>
	<CAD+XWwppdPqtMCm5vU9m=Tn-V42PAj5NPv0H96U=K8fkjNMKXg@mail.gmail.com>
Message-ID: <CAHXt_SUo9HkJQsWFVxcqmqLPdLRiTMVfqaUbjcDO2QzxYyBcaw@mail.gmail.com>

I see that the Ruby 1.9 stable Windows installer has a checkbox to add the
Ruby binaries to PATH. That would be excellent for Python.

Also, there's no need to "buy in" to the Windows toolchain just to edit
PATH. Installer software includes functionality for editing environment
variables, and in any case Python has built in environment variable editing,
even for Windows.

Cheers,

Andrew Pennebaker
www.yellosoft.us

On Fri, Aug 26, 2011 at 9:40 AM, Brian Curtin <brian.curtin at gmail.com>wrote:

> On Thu, Aug 25, 2011 at 23:04, Andrew Pennebaker <
> andrew.pennebaker at gmail.com> wrote:
>
>> Please have the Windows installers add the Python installation directory
>> to the PATH environment variable.
>
>
> The http://bugs.python.org bug tracker is a better place for feature
> requests like this, of which there have been several over the years. This
> has become a hotter topic lately with several discussions around the
> community, and a PEP to provide some similar functionality. I've talked with
> several educators/trainers around and the lack of a Path installation is the
> #1 thing that bites their newcomers, and it's an issue that bites them
> before they've even begun to learn.
>
> Many newbies dive in without knowing that they must manually add
>> C:\PythonXY to PATH. It's yak shaving, something perfectly automatable that
>> should have been done by the installers way back in Python 1.0.
>>
>> Please also add PYTHONROOT\Scripts. It's where cool things like
>> easy_install.exe are stored. More yak shaving.
>>
>
> A clean installation of Python includes no Scripts directory, so I'm not
> sure we should be polluting the Path with yet-to-exist directories. An
> approach could be to have packaging optionally add the scripts directory on
> the installation of a third-party package.
>
> The only potential downside to this is upsetting users who manage multiple
>> python installations. It's not a problem: they already manually adjust PATH
>> to their liking.
>>
>
> "Users who manage multiple python installations" is probably a very, very
> large number, so we have quite the audience to appease, and it actually is a
> problem. We should not go halfway on this feature and say "if it doesn't
> work perfectly, you're back to being on your own". I think the likely case
> is that any path addition feature will read the path, then offer to replace
> existing instances or append to the end.
>
> I haven't yet done any work on this, but my todo list for 3.3 includes
> adding some path related features to the installer.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/8df4001c/attachment-0001.html>

From andrew.pennebaker at gmail.com  Fri Aug 26 19:21:18 2011
From: andrew.pennebaker at gmail.com (Andrew Pennebaker)
Date: Fri, 26 Aug 2011 13:21:18 -0400
Subject: [Python-Dev] Windows installers and %PATH%
In-Reply-To: <CAD+XWwppdPqtMCm5vU9m=Tn-V42PAj5NPv0H96U=K8fkjNMKXg@mail.gmail.com>
References: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>
	<CAD+XWwppdPqtMCm5vU9m=Tn-V42PAj5NPv0H96U=K8fkjNMKXg@mail.gmail.com>
Message-ID: <CAHXt_SXokbHd0YUA0Be2HuXWun6FF=T=7wDgsCSNLYAJ8+NLUg@mail.gmail.com>

I mentioned PYTHONROOT\Script because of the distribute package, which adds
PYTHONROOT\Script\easy_install.exe.

My mistake if \Script is created by distribute and not Python. Then my beef
is with distribute for not adding its binaries to PATH--how else would I use
easy_setup if not in a terminal?

Cheers,

Andrew Pennebaker
www.yellosoft.us

On Fri, Aug 26, 2011 at 9:40 AM, Brian Curtin <brian.curtin at gmail.com>wrote:

> On Thu, Aug 25, 2011 at 23:04, Andrew Pennebaker <
> andrew.pennebaker at gmail.com> wrote:
>
>> Please have the Windows installers add the Python installation directory
>> to the PATH environment variable.
>
>
> The http://bugs.python.org bug tracker is a better place for feature
> requests like this, of which there have been several over the years. This
> has become a hotter topic lately with several discussions around the
> community, and a PEP to provide some similar functionality. I've talked with
> several educators/trainers around and the lack of a Path installation is the
> #1 thing that bites their newcomers, and it's an issue that bites them
> before they've even begun to learn.
>
> Many newbies dive in without knowing that they must manually add
>> C:\PythonXY to PATH. It's yak shaving, something perfectly automatable that
>> should have been done by the installers way back in Python 1.0.
>>
>> Please also add PYTHONROOT\Scripts. It's where cool things like
>> easy_install.exe are stored. More yak shaving.
>>
>
> A clean installation of Python includes no Scripts directory, so I'm not
> sure we should be polluting the Path with yet-to-exist directories. An
> approach could be to have packaging optionally add the scripts directory on
> the installation of a third-party package.
>
> The only potential downside to this is upsetting users who manage multiple
>> python installations. It's not a problem: they already manually adjust PATH
>> to their liking.
>>
>
> "Users who manage multiple python installations" is probably a very, very
> large number, so we have quite the audience to appease, and it actually is a
> problem. We should not go halfway on this feature and say "if it doesn't
> work perfectly, you're back to being on your own". I think the likely case
> is that any path addition feature will read the path, then offer to replace
> existing instances or append to the end.
>
> I haven't yet done any work on this, but my todo list for 3.3 includes
> adding some path related features to the installer.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/bab3ee12/attachment.html>

From guido at python.org  Fri Aug 26 19:26:38 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 10:26:38 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CACac1F-CPeTWk=nax2YiJ8r-Z4E6BttaOi=n5_b8H79traL2Gg@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <j37sj4$e3l$1@dough.gmane.org>
	<CAP7+vJKn2NdSmTQUZZkdqvEMzDY8ttxaTchZeiuTsaOxSYUC=Q@mail.gmail.com>
	<CACac1F-CPeTWk=nax2YiJ8r-Z4E6BttaOi=n5_b8H79traL2Gg@mail.gmail.com>
Message-ID: <CAP7+vJKneUas6gpA4ROCW9dPkSDajmUna7sKKs9gSEFtsFoO1A@mail.gmail.com>

On Fri, Aug 26, 2011 at 10:13 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 26 August 2011 18:02, Guido van Rossum <guido at python.org> wrote:
>
>> Eek. No, please. Those platforms' native string types have length and
>> slicing operations that are O(1) and work in terms of 16-bit code
>> points. Python should use those. It would be awful if Java and Python
>> code doing the same manipulations on the same string would come to
>> different conclusions because Python tried to paper over surrogates.
>
> *That* is actually the erroneous assumption I had made - that the Java
> and .NET native string type had code point semantics (i.e., took
> surrogates into account). As that isn't the case, my comments aren't
> valid - and I agree that having common semantics (and hence exposing
> surrogates) is too important to lose.

Those platforms probably *also* have libraries of operations to
support writing apps that conform to the Unicode standard. But those
apps will have to be aware of the difference between the "naive"
length of a string and the number of code points of characters in it.

> On the other hand, that pretty much establishes that whatever PEP 393
> achieves in terms of allowing all builds of CPython to offer code
> point semantics, the language definition can't mandate it.

The most severe consequence to me seems that the stdlib (which is
reused by those other platforms) cannot assume CPython's ideal world
-- even if specific apps sometimes can.

-- 
--Guido van Rossum (python.org/~guido)

From brian.curtin at gmail.com  Fri Aug 26 19:30:49 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Fri, 26 Aug 2011 12:30:49 -0500
Subject: [Python-Dev] Windows installers and %PATH%
In-Reply-To: <CAHXt_SUo9HkJQsWFVxcqmqLPdLRiTMVfqaUbjcDO2QzxYyBcaw@mail.gmail.com>
References: <CAHXt_SWg6ZwOQi54RPi+tEmJEcV-4x+A4wervPWyRYSyZ7h-Hw@mail.gmail.com>
	<CAD+XWwppdPqtMCm5vU9m=Tn-V42PAj5NPv0H96U=K8fkjNMKXg@mail.gmail.com>
	<CAHXt_SUo9HkJQsWFVxcqmqLPdLRiTMVfqaUbjcDO2QzxYyBcaw@mail.gmail.com>
Message-ID: <CAD+XWwofdqg7n-h1oYNwa6q1GRXCNMebPne092_TV10q8_whdg@mail.gmail.com>

On Fri, Aug 26, 2011 at 12:18, Andrew Pennebaker <
andrew.pennebaker at gmail.com> wrote:

> Also, there's no need to "buy in" to the Windows toolchain just to edit
> PATH. Installer software includes functionality for editing environment
> variables, and in any case Python has built in environment variable editing,
> even for Windows.
>

The built-in environment variable support, e.g., os.getenv/putenv/environ,
isn't helpful here as it does not modify the global environment. It modifies
the current process and usually subprocesses. The proper way to apply
environment variable changes to the entire system is via the registry and
broadcasting a setting change message.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/50a8d447/attachment.html>

From stefan_ml at behnel.de  Fri Aug 26 20:14:17 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 26 Aug 2011 20:14:17 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKn2NdSmTQUZZkdqvEMzDY8ttxaTchZeiuTsaOxSYUC=Q@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>	<4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>	<4E576793.2010203@v.loewis.de>
	<j37sj4$e3l$1@dough.gmane.org>
	<CAP7+vJKn2NdSmTQUZZkdqvEMzDY8ttxaTchZeiuTsaOxSYUC=Q@mail.gmail.com>
Message-ID: <j38npp$bpk$1@dough.gmane.org>

Guido van Rossum, 26.08.2011 19:02:
> On Fri, Aug 26, 2011 at 3:29 AM, Stefan Behnel wrote:
>> Besides, what if these implementations provided indexing in, say, O(log N)
>> instead of O(1) or O(N), e.g. by building a tree index into each string? You
>> could have an index that simply marks runs of surrogate pairs and BMP
>> substrings, thus providing a likely-to-be-somewhat-compact index. That index
>> would obviously have to be built, but so do the different string
>> representations in post-PEP-393 CPython, especially on Windows, as I have
>> learned.
>
> Eek. No, please.

I was mostly just confabulating. My main point was that this isn't a 
black-and-white thing - O(1) xor O(N) - and thus is orthogonal to the PEP. 
You can achieve compliant/acceptable behaviour at the code point level, the 
performance guarantees level or the platform integration level - choose any 
two. CPython is just lucky that there isn't really a platform integration 
level to take into account (if we leave the Windows environment aside for a 
moment).


> Those platforms' native string types have length and
> slicing operations that are O(1) and work in terms of 16-bit code
> points. Python should use those. It would be awful if Java and Python
> code doing the same manipulations on the same string would come to
> different conclusions because Python tried to paper over surrogates.

I fully agree.


>> Would such a less severe violation of the strict O(1) rule still be "not
>> ok"? I think this is not such a clear black-and-white issue. Both
>> implementations have notably different performance characteristics than
>> CPython in some more or less important areas, as does PyPy. At some point,
>> the language compliance label has to account for that.
>
> Since you had to ask, I have to declare that, indeed, non-O(1)
> behavior would not be okay for those platforms.

I take it that you say that because you want strings to perform in the 
'normal' platform specific way here (i.e. like Java/.NET strings), and not 
so much because you want to require the exact same (performance) 
characteristics across Python implementations. So your choice is platform 
integration over code points, leaving the identical performance as a 
side-effect of the platform integration.


> All in all, I don't think we should legislate Python strings to be
> able to support 21-bit code points using O(1) indexing. PEP 393 makes
> this possible for CPython, and it's been said that PyPy can follow
> suit. But it'll be a "quality-of-implementation" issue, not built into
> the language spec.

Makes sense to me. Most likely, Unicode heavy Python code will have to take 
platform specifics into account anyway, so there are limits as to what is 
suitable for a language spec.

Stefan


From stefan_ml at behnel.de  Fri Aug 26 20:28:43 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 26 Aug 2011 20:28:43 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E57D027.10204@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de>	<j365bt$p1o$1@dough.gmane.org>	<j36et5$oa6$1@dough.gmane.org>	<j38fkp$kj7$1@dough.gmane.org>
	<4E57D027.10204@v.loewis.de>
Message-ID: <j38oks$h4u$1@dough.gmane.org>

"Martin v. L?wis", 26.08.2011 18:56:
> I agree with your observation that somebody should be done about error
> handling, and will update the PEP shortly. I propose that
> PyUnicode_Ready should be explicitly called on input where raising an
> exception is feasible. In contexts where it is not feasible (such
> as reading a character, or reading the length or the kind), failing to
> ready the string should cause a fatal error.

I consider this an increase in complexity. It will then no longer be enough 
to access the data, the user will first have to figure out a suitable place 
in the code to make sure it's actually there, potentially forgetting about 
it because it works in all test cases, or potentially triggering a huge 
amount of overhead that copies and 'recodes' the string data by executing 
one of the macros that does it automatically.

For the specific case of Cython, I would guess that I could just add 
another special case that reads the data from the Py_UNICODE buffer and 
combines surrogates at need, but that will only work in some cases 
(specifically not for indexing). And outside of Cython, most normal user 
code won't do that.

My gut feeling leans towards a KISS approach. If you go the route to 
require an explicit point for triggering PyUnicode_Ready() calls, why not 
just go all the way and make it completely explicit in *all* cases? I.e. 
remove all implicit calls from the macros and make it part of the new API 
semantics that users *must* call PyUnicode_FAST_READY() before doing 
anything with a new string data layout. Much fewer surprises.

Note that there isn't currently an official macro way to figure out that 
the flexible string layout has not been initialised yet, i.e. that wstr is 
set but str is not. If the implicit PyUnicode_Ready() calls get removed, 
PyUnicode_KIND() could take that place by simply returning WSTR_KIND.

That being said, the main problem I currently see is that basically all 
existing code needs to be updated in order to handle these errors. 
Otherwise, it would be possible to trigger crashes by properly forging a 
string and passing it into an unprepared C library to let it run into a 
NULL pointer return value of PyUnicode_AS_UNICODE().

Stefan


From stefan_ml at behnel.de  Fri Aug 26 21:58:52 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 26 Aug 2011 21:58:52 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <j38oks$h4u$1@dough.gmane.org>
References: <4E553FBC.7080501@v.loewis.de>	<j365bt$p1o$1@dough.gmane.org>	<j36et5$oa6$1@dough.gmane.org>	<j38fkp$kj7$1@dough.gmane.org>	<4E57D027.10204@v.loewis.de>
	<j38oks$h4u$1@dough.gmane.org>
Message-ID: <j38tts$ipe$1@dough.gmane.org>

Stefan Behnel, 26.08.2011 20:28:
> "Martin v. L?wis", 26.08.2011 18:56:
>> I agree with your observation that somebody should be done about error
>> handling, and will update the PEP shortly. I propose that
>> PyUnicode_Ready should be explicitly called on input where raising an
>> exception is feasible. In contexts where it is not feasible (such
>> as reading a character, or reading the length or the kind), failing to
>> ready the string should cause a fatal error.
>[...]
> My gut feeling leans towards a KISS approach. If you go the route to
> require an explicit point for triggering PyUnicode_Ready() calls, why not
> just go all the way and make it completely explicit in *all* cases? I.e.
> remove all implicit calls from the macros and make it part of the new API
> semantics that users *must* call PyUnicode_FAST_READY() before doing
> anything with a new string data layout. Much fewer surprises.
>
> Note that there isn't currently an official macro way to figure out that
> the flexible string layout has not been initialised yet, i.e. that wstr is
> set but str is not. If the implicit PyUnicode_Ready() calls get removed,
> PyUnicode_KIND() could take that place by simply returning WSTR_KIND.

Here's a patch that updates only the header file, to make it clear what I mean.

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simplified-pep-393-api.patch
Type: text/x-patch
Size: 4637 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/2cc927c1/attachment.bin>

From guido at python.org  Fri Aug 26 23:45:17 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 14:45:17 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
Message-ID: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>

I just made a pass of all the Unicode-related bugs filed by Tom
Christiansen, and found that in several, the response was "this is
fixed in the regex module [by Matthew Barnett]". I started replying
that I thought that we should fix the bugs in the re module (i.e.,
really in _sre.c) but on second thought I wonder if maybe regex is
mature enough to replace re in Python 3.3. It would mean that we won't
fix any of these bugs in earlier Python versions, but I could live
with that.

However, I don't know much about regex -- how compatible is it, how
fast is it (including extreme cases where the backtracking goes
crazy), how bug-free is it, and so on. Plus, how much work would it be
to actually incorporate it into CPython as a complete drop-in
replacement of the re package (such that nobody needs to change their
imports or the flags they pass to the re module).

We'd also probably have to train some core developers to be familiar
enough with the code to maintain and evolve it -- I assume we can't
just volunteer Matthew to do so forever... :-)

What's the alternative? Is adding the requested bug fixes and new
features to _sre.c really that hard?

-- 
--Guido van Rossum (python.org/~guido)

From victor.stinner at haypocalc.com  Fri Aug 26 23:37:42 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 26 Aug 2011 23:37:42 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
Message-ID: <201108262337.42349.victor.stinner@haypocalc.com>

Le vendredi 26 ao?t 2011 02:01:42, Dino Viehland a ?crit :
> The biggest difficulty for IronPython here would be dealing w/ .NET
> interop. We can certainly introduce either an IronPython specific string
> class which is similar to CPython's PyUnicodeObject or we could have
> multiple distinct .NET types (IronPython.Runtime.AsciiString,
> System.String, and
> IronPython.Runtime.Ucs4String) which all appear as the same type to Python.
> 
> But when Python is calling a .NET API it's always going to return a
> System.String which is UTF-16.  If we had to check and convert all of
> those strings when they cross into Python it would be very bad for
> performance.  Presumably we could have a 4th type of "interop" string
> which lazily computes this but if we start wrapping .Net strings we could
> also get into object identity issues.

Python 3 encodes all Unicode strings to the OS encoding (and the result is 
decoded) for all syscalls and calls to libraries: to the locale encoding on 
UNIX, to UTF-16 on Windows. Currently, Py_UNICODE is wchar_t which is 16 bits. 
So Py_UNICODE* is already a UTF-16 string.

I don't know if the overhead of the PEP 393 (encode to UTF-16 on Windows) for 
these calls is important or not. But on UNIX, pure ASCII string don't have to 
be encoded anymore if the locale encoding is UTF-8 or ASCII.

IronPython can wait to see how CPython+PEP 383 handles these problems, and how 
slower it is.

> But it's a huge change - it'll almost certainly touch every single source
> file in IronPython.

With the PEP 393, it's transparent: the PyUnicode_AS_UNICODE encodes the 
string to UTF-16 (allocate memory, etc.). Except that applications should now 
check if an error occurred (check for NULL).

> I would think we'd get 3.2 done first and then think
> about what to do here.

I don't think that IronPython needs to support non-BMP characters without 
using surrogates. Bug reports about non-BMP characters usually don't have use 
cases, but just want to make Python perfect. There is no need to hurry.

PEP 393 tries to reduce the memory footprint. The effect on non-BMP character 
is just a *nice* border effect. Or was the PEP design to solve narrow build 
issues?

Victor


From guido at python.org  Sat Aug 27 00:00:07 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 15:00:07 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <201108262337.42349.victor.stinner@haypocalc.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
Message-ID: <CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>

I have a different question about IronPython and Jython now. Do their
regular expression libraries support Unicode better than CPython's?
E.g. does "." match a surrogate pair? Tom C suggests that Java's regex
libraries get this and many other details right despite Java's use of
UTF-16 to represent strings. So hopefully Jython's re library is built
on top of Java's?

PS. Is there a better contact for Jython?

-- 
--Guido van Rossum (python.org/~guido)

From mal at egenix.com  Sat Aug 27 00:09:10 2011
From: mal at egenix.com (M.-A. Lemburg)
Date: Sat, 27 Aug 2011 00:09:10 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
Message-ID: <4E581986.3000709@egenix.com>

Guido van Rossum wrote:
> I just made a pass of all the Unicode-related bugs filed by Tom
> Christiansen, and found that in several, the response was "this is
> fixed in the regex module [by Matthew Barnett]". I started replying
> that I thought that we should fix the bugs in the re module (i.e.,
> really in _sre.c) but on second thought I wonder if maybe regex is
> mature enough to replace re in Python 3.3. It would mean that we won't
> fix any of these bugs in earlier Python versions, but I could live
> with that.
> 
> However, I don't know much about regex -- how compatible is it, how
> fast is it (including extreme cases where the backtracking goes
> crazy), how bug-free is it, and so on. Plus, how much work would it be
> to actually incorporate it into CPython as a complete drop-in
> replacement of the re package (such that nobody needs to change their
> imports or the flags they pass to the re module).
> 
> We'd also probably have to train some core developers to be familiar
> enough with the code to maintain and evolve it -- I assume we can't
> just volunteer Matthew to do so forever... :-)
> 
> What's the alternative? Is adding the requested bug fixes and new
> features to _sre.c really that hard?

Why not simply add the new lib, see whether it works out and
then decide which path to follow.

We've done that with the old regex lib. It took a few years
and releases to have people port their applications to the
then new re module and syntax, but in the end it worked.

With a new regex library there are likely going to be quite
a few subtle differences between re and regex - even if it's
just doing things in a more Unicode compatible way.

I don't think anyone can actually list all the differences given
the complex nature of regular expressions, so people will
likely need a few years and releases to get used it before
a switch can be made.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 27 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                38 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From guido at python.org  Sat Aug 27 00:18:35 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 15:18:35 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <4E581986.3000709@egenix.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
Message-ID: <CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>

On Fri, Aug 26, 2011 at 3:09 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Guido van Rossum wrote:
>> I just made a pass of all the Unicode-related bugs filed by Tom
>> Christiansen, and found that in several, the response was "this is
>> fixed in the regex module [by Matthew Barnett]". I started replying
>> that I thought that we should fix the bugs in the re module (i.e.,
>> really in _sre.c) but on second thought I wonder if maybe regex is
>> mature enough to replace re in Python 3.3. It would mean that we won't
>> fix any of these bugs in earlier Python versions, but I could live
>> with that.
>>
>> However, I don't know much about regex -- how compatible is it, how
>> fast is it (including extreme cases where the backtracking goes
>> crazy), how bug-free is it, and so on. Plus, how much work would it be
>> to actually incorporate it into CPython as a complete drop-in
>> replacement of the re package (such that nobody needs to change their
>> imports or the flags they pass to the re module).
>>
>> We'd also probably have to train some core developers to be familiar
>> enough with the code to maintain and evolve it -- I assume we can't
>> just volunteer Matthew to do so forever... :-)
>>
>> What's the alternative? Is adding the requested bug fixes and new
>> features to _sre.c really that hard?
>
> Why not simply add the new lib, see whether it works out and
> then decide which path to follow.
>
> We've done that with the old regex lib. It took a few years
> and releases to have people port their applications to the
> then new re module and syntax, but in the end it worked.
>
> With a new regex library there are likely going to be quite
> a few subtle differences between re and regex - even if it's
> just doing things in a more Unicode compatible way.
>
> I don't think anyone can actually list all the differences given
> the complex nature of regular expressions, so people will
> likely need a few years and releases to get used it before
> a switch can be made.

I can't say I liked how that transition was handled last time around.
I really don't want to have to tell people "Oh, that bug is fixed but
you have to use regex instead of re" and then a few years later have
to tell them "Oh, we're deprecating regex, you should just use re".

I'm really hoping someone has more actual technical understanding of
re vs. regex and can give us some facts about the differences, rather
than, frankly, FUD.

-- 
--Guido van Rossum (python.org/~guido)

From solipsis at pitrou.net  Sat Aug 27 00:33:59 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 00:33:59 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
Message-ID: <20110827003359.43416085@pitrou.net>

On Fri, 26 Aug 2011 15:18:35 -0700
Guido van Rossum <guido at python.org> wrote:
> 
> I can't say I liked how that transition was handled last time around.
> I really don't want to have to tell people "Oh, that bug is fixed but
> you have to use regex instead of re" and then a few years later have
> to tell them "Oh, we're deprecating regex, you should just use re".
> 
> I'm really hoping someone has more actual technical understanding of
> re vs. regex and can give us some facts about the differences, rather
> than, frankly, FUD.

The best way would be to contact the author, Matthew Barnett, or to ask
on the tracker on http://bugs.python.org/issue2636. He has been quite
willing to answer such questions in the past, AFAIR.

Regards

Antoine.



From guido at python.org  Sat Aug 27 00:47:21 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 15:47:21 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <20110827003359.43416085@pitrou.net>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
	<20110827003359.43416085@pitrou.net>
Message-ID: <CAP7+vJ+TytdPCyrSdARrSsEDYk1otsNBEgAa7_bj7bXM9HFeUQ@mail.gmail.com>

On Fri, Aug 26, 2011 at 3:33 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Fri, 26 Aug 2011 15:18:35 -0700
> Guido van Rossum <guido at python.org> wrote:
>>
>> I can't say I liked how that transition was handled last time around.
>> I really don't want to have to tell people "Oh, that bug is fixed but
>> you have to use regex instead of re" and then a few years later have
>> to tell them "Oh, we're deprecating regex, you should just use re".
>>
>> I'm really hoping someone has more actual technical understanding of
>> re vs. regex and can give us some facts about the differences, rather
>> than, frankly, FUD.
>
> The best way would be to contact the author, Matthew Barnett,

I had added him to the beginning of this thread but someone took him off.

> or to ask
> on the tracker on http://bugs.python.org/issue2636. He has been quite
> willing to answer such questions in the past, AFAIR.

So, that issue is about something called "regexp". AFAIK Matthew
(MRAB) wrote something called "regex"
(http://pypi.python.org/pypi/regex). Are they two different things???

-- 
--Guido van Rossum (python.org/~guido)

From drsalists at gmail.com  Sat Aug 27 00:48:42 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Fri, 26 Aug 2011 15:48:42 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
Message-ID: <CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>

On Fri, Aug 26, 2011 at 2:45 PM, Guido van Rossum <guido at python.org> wrote:

> ...but on second thought I wonder if maybe regex is
> mature enough to replace re in Python 3.3.
>

I agree that the move from regex to re was kind of painful.

It seems someone should merge the unit tests for re and regex, and apply the
merged result to each for the sake of comparison.  There might also be a
need to expand the merged result to include new things.

Then there probably should be a from __future__ import for a while.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/14bb4256/attachment.html>

From martin at v.loewis.de  Sat Aug 27 00:54:42 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 27 Aug 2011 00:54:42 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
Message-ID: <4E582432.2080301@v.loewis.de>

> However, I don't know much about regex

The problem really is: nobody does (except for Matthew Barnett
probably). This means that this contribution might be stuck
"forever": somebody would have to review the module, identify
issues, approve it, and take the blame if something breaks.
That takes considerable time and has a considerable risk, for
little expected glory - so nobody has volunteered to
mentor/manage integration of that code.

I believe most core contributors (who have run into this code)
consider it worthwhile, but are just too scared to take action.

Among us, some are more "regex gurus" than others; you know
who you are. I guess the PSF would pay for the review, if that
is what it would take.

Regards,
Martin

From guido at python.org  Sat Aug 27 00:57:26 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 15:57:26 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <4E582432.2080301@v.loewis.de>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
Message-ID: <CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>

On Fri, Aug 26, 2011 at 3:54 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> However, I don't know much about regex
>
> The problem really is: nobody does (except for Matthew Barnett
> probably). This means that this contribution might be stuck
> "forever": somebody would have to review the module, identify
> issues, approve it, and take the blame if something breaks.
> That takes considerable time and has a considerable risk, for
> little expected glory - so nobody has volunteered to
> mentor/manage integration of that code.
>
> I believe most core contributors (who have run into this code)
> consider it worthwhile, but are just too scared to take action.
>
> Among us, some are more "regex gurus" than others; you know
> who you are. I guess the PSF would pay for the review, if that
> is what it would take.

Makes sense. I noticed Ezio seems quite in favor of regex. Maybe he knows more?

-- 
--Guido van Rossum (python.org/~guido)

From mal at egenix.com  Sat Aug 27 01:00:31 2011
From: mal at egenix.com (M.-A. Lemburg)
Date: Sat, 27 Aug 2011 01:00:31 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
Message-ID: <4E58258F.9050204@egenix.com>

Guido van Rossum wrote:
> On Fri, Aug 26, 2011 at 3:09 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> Guido van Rossum wrote:
>>> I just made a pass of all the Unicode-related bugs filed by Tom
>>> Christiansen, and found that in several, the response was "this is
>>> fixed in the regex module [by Matthew Barnett]". I started replying
>>> that I thought that we should fix the bugs in the re module (i.e.,
>>> really in _sre.c) but on second thought I wonder if maybe regex is
>>> mature enough to replace re in Python 3.3. It would mean that we won't
>>> fix any of these bugs in earlier Python versions, but I could live
>>> with that.
>>>
>>> However, I don't know much about regex -- how compatible is it, how
>>> fast is it (including extreme cases where the backtracking goes
>>> crazy), how bug-free is it, and so on. Plus, how much work would it be
>>> to actually incorporate it into CPython as a complete drop-in
>>> replacement of the re package (such that nobody needs to change their
>>> imports or the flags they pass to the re module).
>>>
>>> We'd also probably have to train some core developers to be familiar
>>> enough with the code to maintain and evolve it -- I assume we can't
>>> just volunteer Matthew to do so forever... :-)
>>>
>>> What's the alternative? Is adding the requested bug fixes and new
>>> features to _sre.c really that hard?
>>
>> Why not simply add the new lib, see whether it works out and
>> then decide which path to follow.
>>
>> We've done that with the old regex lib. It took a few years
>> and releases to have people port their applications to the
>> then new re module and syntax, but in the end it worked.
>>
>> With a new regex library there are likely going to be quite
>> a few subtle differences between re and regex - even if it's
>> just doing things in a more Unicode compatible way.
>>
>> I don't think anyone can actually list all the differences given
>> the complex nature of regular expressions, so people will
>> likely need a few years and releases to get used it before
>> a switch can be made.
> 
> I can't say I liked how that transition was handled last time around.
> I really don't want to have to tell people "Oh, that bug is fixed but
> you have to use regex instead of re" and then a few years later have
> to tell them "Oh, we're deprecating regex, you should just use re".

No, you tell them: "If you want Unicode 6 semantics, use regex,
if you're fine with Unicode 2.0/3.0 semantics, use re". After all,
it's not like re suddenly stopped working :-)

> I'm really hoping someone has more actual technical understanding of
> re vs. regex and can give us some facts about the differences, rather
> than, frankly, FUD.

The good part is that it's based on the re code, the FUD comes
from the fact that the new lib is 380kB larger than the old one
and that's not even counting the generated 500kB of lookup
tables.

If no one steps up to do a review or analysis, I think the
only practical way to test the lib is to give it a prominent
chance to prove itself.

The other aspect is maintenance.

Perhaps we could have a summer of code student do a review and
analysis to get familiar with the code and then have at least
two developers know the code well enough to support it for
a while.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 27 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                38 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From python at mrabarnett.plus.com  Sat Aug 27 01:21:03 2011
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 27 Aug 2011 00:21:03 +0100
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <8653.1314400096@chthon>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
	<4E58258F.9050204@egenix.com> <8653.1314400096@chthon>
Message-ID: <4E582A5F.7060804@mrabarnett.plus.com>

On 27/08/2011 00:08, Tom Christiansen wrote:
> "M.-A. Lemburg"<mal at egenix.com>  wrote
>     on Sat, 27 Aug 2011 01:00:31 +0200:
>
>> The good part is that it's based on the re code, the FUD comes
>> from the fact that the new lib is 380kB larger than the old one
>> and that's not even counting the generated 500kB of lookup
>> tables.
>
> Well, you have to put the property tables somewhere, somehow.
> There are various schemes for demand loading them as needed,
> but I don't know whether those are used.
>
FYI, the .pyd for Python v3.2 is 227KB, about half of which is property
tables.

From guido at python.org  Sat Aug 27 01:29:02 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 16:29:02 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <4E582A5F.7060804@mrabarnett.plus.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
	<4E58258F.9050204@egenix.com> <8653.1314400096@chthon>
	<4E582A5F.7060804@mrabarnett.plus.com>
Message-ID: <CAP7+vJJscVCb5_qxM3FtMgOmXYCWxt2DLdOavixiBHRS5rUrLA@mail.gmail.com>

On Fri, Aug 26, 2011 at 4:21 PM, MRAB <python at mrabarnett.plus.com> wrote:
> On 27/08/2011 00:08, Tom Christiansen wrote:
>>
>> "M.-A. Lemburg"<mal at egenix.com> ?wrote
>> ? ?on Sat, 27 Aug 2011 01:00:31 +0200:
>>
>>> The good part is that it's based on the re code, the FUD comes
>>> from the fact that the new lib is 380kB larger than the old one
>>> and that's not even counting the generated 500kB of lookup
>>> tables.
>>
>> Well, you have to put the property tables somewhere, somehow.
>> There are various schemes for demand loading them as needed,
>> but I don't know whether those are used.
>>
> FYI, the .pyd for Python v3.2 is 227KB, about half of which is property
> tables.

I wouldn't hold the size of the generated tables against you. :-)

-- 
--Guido van Rossum (python.org/~guido)

From tchrist at perl.com  Sat Aug 27 01:08:16 2011
From: tchrist at perl.com (Tom Christiansen)
Date: Fri, 26 Aug 2011 17:08:16 -0600
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <4E58258F.9050204@egenix.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
	<4E58258F.9050204@egenix.com>
Message-ID: <8653.1314400096@chthon>

"M.-A. Lemburg" <mal at egenix.com> wrote
   on Sat, 27 Aug 2011 01:00:31 +0200: 

> The good part is that it's based on the re code, the FUD comes
> from the fact that the new lib is 380kB larger than the old one
> and that's not even counting the generated 500kB of lookup
> tables.

Well, you have to put the property tables somewhere, somehow.
There are various schemes for demand loading them as needed,
but I don't know whether those are used.

--tom

From tjreedy at udel.edu  Sat Aug 27 00:57:37 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 26 Aug 2011 18:57:37 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E576793.2010203@v.loewis.de>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de>
Message-ID: <4E5824E1.9010101@udel.edu>



On 8/26/2011 5:29 AM, "Martin v. L?wis" wrote:
>> IronPython and Jython can retain UTF-16 as their native form if that
>> makes interop cleaner, but in doing so they need to ensure that basic
>> operations like indexing and len work in terms of code points, not
>> code units, if they are to conform.

My impression is that a UFT-16 implementation, to be properly called 
such, must do len and [] in terms of code points, which is why Python's 
narrow builds are called UCS-2 and not UTF-16.

> That means that they won't conform, period. There is no efficient
> maintainable implementation strategy to achieve that property,

Given that both 'efficient' and 'maintainable' are relative terms, that 
is you pessimistic opinion, not really a fact.

> it may take well years until somebody provides an efficient
> unmaintainable implementation.
>
>> Does this make sense, or have I completely misunderstood things?
>
> You seem to assume it is ok for Jython/IronPython to provide indexing in
> O(n). It is not.

Why do you keep saying that O(n) is the alternative? I have already 
given a simple solution that is O(logk), where k is the number of 
non-BMP characters/codepoints/surrogate_pairs if there are any, and O(1) 
otherwise (for all BMP chars). It uses O(k) space. I think that is 
pretty efficient. I suspect that is the most time efficient possible 
without using at least as much space as a UCS-4 solution. The fact that 
you and other do not want this for CPython should not preclude other 
implementations that are more tied to UTF-16 from exploring the idea.

Maintainability partly depends on whether all-codepoint support is built 
in or bolted on to a BMP-only implementation burdened with back 
compatibility for a code unit API. Maintainability is probably harder 
with a separate UTF-32 type, which CPython has but which I gather Jython 
and Iron-Python do not. It might or might not be easier is there were a 
separate internal character type containing a 32 bit code point value, 
so that interation and indexing (and single char slicing) always 
returned the same type of object regardless of whether the character was 
in the BMP or not. This certainly would help all the unicode database 
functions.

Tom Christiansen appears to have said that Perl is or will use UTF-8 
plus auxiliary arrays. If so, we will find out if they can maintain it.

---
Terry Jan Reedy


From solipsis at pitrou.net  Sat Aug 27 02:03:21 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 02:03:21 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJ+TytdPCyrSdARrSsEDYk1otsNBEgAa7_bj7bXM9HFeUQ@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
	<20110827003359.43416085@pitrou.net>
	<CAP7+vJ+TytdPCyrSdARrSsEDYk1otsNBEgAa7_bj7bXM9HFeUQ@mail.gmail.com>
Message-ID: <20110827020321.64061f94@pitrou.net>

On Fri, 26 Aug 2011 15:47:21 -0700
Guido van Rossum <guido at python.org> wrote:
> > The best way would be to contact the author, Matthew Barnett,
> 
> I had added him to the beginning of this thread but someone took him off.
> 
> > or to ask
> > on the tracker on http://bugs.python.org/issue2636. He has been quite
> > willing to answer such questions in the past, AFAIR.
> 
> So, that issue is about something called "regexp". AFAIK Matthew
> (MRAB) wrote something called "regex"
> (http://pypi.python.org/pypi/regex). Are they two different things???

No, it's the same.  The source is at
https://code.google.com/p/mrab-regex-hg/, btw.

Regards

Antoine.

From solipsis at pitrou.net  Sat Aug 27 02:06:35 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 02:06:35 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
	<4E58258F.9050204@egenix.com>
Message-ID: <20110827020635.272d75bd@pitrou.net>

On Sat, 27 Aug 2011 01:00:31 +0200
"M.-A. Lemburg" <mal at egenix.com> wrote:
> > 
> > I can't say I liked how that transition was handled last time around.
> > I really don't want to have to tell people "Oh, that bug is fixed but
> > you have to use regex instead of re" and then a few years later have
> > to tell them "Oh, we're deprecating regex, you should just use re".
> 
> No, you tell them: "If you want Unicode 6 semantics, use regex,
> if you're fine with Unicode 2.0/3.0 semantics, use re". After all,
> it's not like re suddenly stopped working :-)

It has a whole lot of new features in addition to better unicode
support. See for yourself:
https://code.google.com/p/mrab-regex-hg/wiki/GeneralDetails

> Perhaps we could have a summer of code student do a review and
> analysis to get familiar with the code and then have at least
> two developers know the code well enough to support it for
> a while.

I'm not sure a GSoC student would be the best candidate to do a review
matching our expectations.

Regards

Antoine.



From solipsis at pitrou.net  Sat Aug 27 02:08:35 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 02:08:35 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
Message-ID: <20110827020835.08a2a492@pitrou.net>

On Fri, 26 Aug 2011 15:48:42 -0700
Dan Stromberg <drsalists at gmail.com> wrote:
> 
> Then there probably should be a from __future__ import for a while.

If you are willing to use a "from __future__ import", why not simply

    import regex as re

? We're not Perl, we don't have built-in syntactic support for regular
expressions.

Regards

Antoine.



From greg.ewing at canterbury.ac.nz  Sat Aug 27 02:17:18 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 27 Aug 2011 12:17:18 +1200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
Message-ID: <4E58378E.2090609@canterbury.ac.nz>

Paul Moore wrote:

> IronPython and Jython can retain UTF-16 as their native form if that
> makes interop cleaner, but in doing so they need to ensure that basic
> operations like indexing and len work in terms of code points, not
> code units, if they are to conform. ... They lose the O(1)
> guarantee, but that's easily defensible as a tradeoff to conform to
> underlying runtime semantics.

I would only agree as long as it wasn't too much worse
than O(1). O(log n) might be all right, but O(n) would be
unacceptable, I think.

-- 
Greg

From drsalists at gmail.com  Sat Aug 27 02:25:56 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Fri, 26 Aug 2011 17:25:56 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <20110827020835.08a2a492@pitrou.net>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
Message-ID: <CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>

On Fri, Aug 26, 2011 at 5:08 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Fri, 26 Aug 2011 15:48:42 -0700
> Dan Stromberg <drsalists at gmail.com> wrote:
> >
> > Then there probably should be a from __future__ import for a while.
>
> If you are willing to use a "from __future__ import", why not simply
>
>    import regex as re
>
> ? We're not Perl, we don't have built-in syntactic support for regular
> expressions.
>
> Regards
>

If you add regex as "import regex", and the new regex module doesn't work
out, regex might be harder to get rid of.  from __future__ import is an
established way of trying something for a while to see if it's going to
work.

EG: "from __future__ import re", where re is really the new module.

But whatever.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/bbe0007e/attachment.html>

From solipsis at pitrou.net  Sat Aug 27 02:23:31 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 02:23:31 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E58378E.2090609@canterbury.ac.nz>
Message-ID: <20110827022331.0d99a22c@pitrou.net>

On Sat, 27 Aug 2011 12:17:18 +1200
Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Paul Moore wrote:
> 
> > IronPython and Jython can retain UTF-16 as their native form if that
> > makes interop cleaner, but in doing so they need to ensure that basic
> > operations like indexing and len work in terms of code points, not
> > code units, if they are to conform. ... They lose the O(1)
> > guarantee, but that's easily defensible as a tradeoff to conform to
> > underlying runtime semantics.
> 
> I would only agree as long as it wasn't too much worse
> than O(1). O(log n) might be all right, but O(n) would be
> unacceptable, I think.

It also depends a lot on *actual* measured performance. As someone
mentioned in the tracker, the index you use on a string usually comes
from a previous string operation (like a search), perhaps with a small
offset. So a caching scheme may actually give very good results with a
rather small overhead (you could cache, say, the 4 most recent indices
and choose the nearest when an indexing operation is done; with utf-8,
scanning backward and forward is equally simple).

Regards

Antoine.



From greg.ewing at canterbury.ac.nz  Sat Aug 27 02:34:48 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 27 Aug 2011 12:34:48 +1200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E575F31.5010709@egenix.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<87sjoqzs3a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j33kvu$f9d$1@dough.gmane.org>
	<87liuixrfh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKz=u0ZNn6iymFe29pOtYAQArZMzZcLNBB8rUW+BJpA4A@mail.gmail.com>
	<4E55C2C3.3060205@canterbury.ac.nz>
	<CAP7+vJL+rP15bDRZxYGvnF=9O8xZ2AtCy3TaM3aBBTiHFEb8zQ@mail.gmail.com>
	<Pine.GSO.4.64.1108252119360.21993@core.cs.uwaterloo.ca>
	<j377qe$nif$1@dough.gmane.org> <4E575F31.5010709@egenix.com>
Message-ID: <4E583BA8.5080406@canterbury.ac.nz>

M.-A. Lemburg wrote:
> Simply going with UCS-4 does not solve the problem, since
> even with UCS-4 storage, you can still have surrogates in your
> Python Unicode string.

Yes, but in that case, you presumably *intend* them to
be treated as separate indexing units. If you didn't,
there would be no need to use surrogates in the first
place.

-- 
Greg

From guido at python.org  Sat Aug 27 02:42:32 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Aug 2011 17:42:32 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5824E1.9010101@udel.edu>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com> <j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
Message-ID: <CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>

On Fri, Aug 26, 2011 at 3:57 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>
>
> On 8/26/2011 5:29 AM, "Martin v. L?wis" wrote:
>>>
>>> IronPython and Jython can retain UTF-16 as their native form if that
>>> makes interop cleaner, but in doing so they need to ensure that basic
>>> operations like indexing and len work in terms of code points, not
>>> code units, if they are to conform.
>
> My impression is that a UFT-16 implementation, to be properly called such,
> must do len and [] in terms of code points, which is why Python's narrow
> builds are called UCS-2 and not UTF-16.

I don't think anyone else has that impression. Please cite chapter and
verse if you really think this is important. IIUC, UCS-2 does not
allow surrogate pairs, whereas Python (and Java, and .NET, and
Windows) 16-bit strings all do support surrogate pairs. And they all
have a len or length function that counts code units, not code points.

>> That means that they won't conform, period. There is no efficient
>> maintainable implementation strategy to achieve that property,
>
> Given that both 'efficient' and 'maintainable' are relative terms, that is
> you pessimistic opinion, not really a fact.
>
>> it may take well years until somebody provides an efficient
>> unmaintainable implementation.
>>
>>> Does this make sense, or have I completely misunderstood things?
>>
>> You seem to assume it is ok for Jython/IronPython to provide indexing in
>> O(n). It is not.
>
> Why do you keep saying that O(n) is the alternative? I have already given a
> simple solution that is O(logk), where k is the number of non-BMP
> characters/codepoints/surrogate_pairs if there are any, and O(1) otherwise
> (for all BMP chars). It uses O(k) space. I think that is pretty efficient. I
> suspect that is the most time efficient possible without using at least as
> much space as a UCS-4 solution. The fact that you and other do not want this
> for CPython should not preclude other implementations that are more tied to
> UTF-16 from exploring the idea.
>
> Maintainability partly depends on whether all-codepoint support is built in
> or bolted on to a BMP-only implementation burdened with back compatibility
> for a code unit API. Maintainability is probably harder with a separate
> UTF-32 type, which CPython has but which I gather Jython and Iron-Python do
> not. It might or might not be easier is there were a separate internal
> character type containing a 32 bit code point value, so that interation and
> indexing (and single char slicing) always returned the same type of object
> regardless of whether the character was in the BMP or not. This certainly
> would help all the unicode database functions.
>
> Tom Christiansen appears to have said that Perl is or will use UTF-8 plus
> auxiliary arrays. If so, we will find out if they can maintain it.

Their API style is completely different from ours. What Perl can
maintain has little bearing on what Python can.

-- 
--Guido van Rossum (python.org/~guido)

From ben+python at benfinney.id.au  Sat Aug 27 03:22:58 2011
From: ben+python at benfinney.id.au (Ben Finney)
Date: Sat, 27 Aug 2011 11:22:58 +1000
Subject: [Python-Dev] Should we move to replace re with regex?
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
	<4E58258F.9050204@egenix.com>
Message-ID: <87mxevvea5.fsf@benfinney.id.au>

"M.-A. Lemburg" <mal at egenix.com> writes:

> Guido van Rossum wrote:

> > I really don't want to have to tell people "Oh, that bug is fixed
> > but you have to use regex instead of re" and then a few years later
> > have to tell them "Oh, we're deprecating regex, you should just use
> > re".
>
> No, you tell them: "If you want Unicode 6 semantics, use regex, if
> you're fine with Unicode 2.0/3.0 semantics, use re".

What do we say, then, to those who are unaware of the different
semantics between those versions of Unicode, and want regular expression
to ?just work? in Python?

To which document can we direct them to understand what semantics they
want?

> After all, it's not like re suddenly stopped working :-)

For some value of ?working?, that is. The trick is to know whether that
value is what one wants.

-- 
 \        ?The fact of your own existence is the most astonishing fact |
  `\    you'll ever have to confront. Don't dare ever see your life as |
_o__)    boring, monotonous, or joyless.? ?Richard Dawkins, 2010-03-10 |
Ben Finney


From ezio.melotti at gmail.com  Sat Aug 27 03:37:21 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Sat, 27 Aug 2011 04:37:21 +0300
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
Message-ID: <CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>

On Sat, Aug 27, 2011 at 1:57 AM, Guido van Rossum <guido at python.org> wrote:

> On Fri, Aug 26, 2011 at 3:54 PM, "Martin v. L?wis" <martin at v.loewis.de>
> wrote:
> > [...]
> > Among us, some are more "regex gurus" than others; you know
> > who you are. I guess the PSF would pay for the review, if that
> > is what it would take.
>
> Makes sense. I noticed Ezio seems quite in favor of regex. Maybe he knows
> more?
>

Matthew has always been responsive on the tracker, usually fixing reported
bugs in a matter of days, and I think he's willing to keep doing so once the
regex module is included.  Even if I haven't yet tried the module myself
(I'm planning to do it though), it seems quite popular out there (the
download number on PyPI apparently gets reset for each new release, so I
don't know the exact total), and apparently people are already using it as a
replacement of re.

I'm not sure it's worth doing an extensive review of the code, a better
approach might be to require extensive test coverage  (and a review of
tests).  If the code seems well written, commented, documented (I think
proper rst documentation is still missing), and tested (both with unittest
and out in the wild), and Matthew is willing to maintain it, I think we can
include it.  We will get familiar with the code once we start contributing
to it and fixing bugs, as it already happens with most of the other modules.

See also the "New regex module for 3.2?" thread (
http://mail.python.org/pipermail/python-dev/2010-July/101606.html ).

Best Regards,
Ezio Melotti


>
> --
> --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/a87fa4f1/attachment-0001.html>

From steve at pearwood.info  Sat Aug 27 03:54:26 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 27 Aug 2011 11:54:26 +1000
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <87mxevvea5.fsf@benfinney.id.au>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>	<4E581986.3000709@egenix.com>	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>	<4E58258F.9050204@egenix.com>
	<87mxevvea5.fsf@benfinney.id.au>
Message-ID: <4E584E52.1080606@pearwood.info>

Ben Finney wrote:
> "M.-A. Lemburg" <mal at egenix.com> writes:

>> No, you tell them: "If you want Unicode 6 semantics, use regex, if
>> you're fine with Unicode 2.0/3.0 semantics, use re".
> 
> What do we say, then, to those who are unaware of the different
> semantics between those versions of Unicode, and want regular expression
> to ?just work? in Python?
> 
> To which document can we direct them to understand what semantics they
> want?

Presumably, like all modules, both the re and the regex module will have 
their own individual pages in the library reference. As the newcomer, 
regex should include a discussion of differences between the two. This 
can then be quietly dropped once re becomes formally deprecated.

(Assuming that the std lib keeps re and regex in parallel for a few 
releases, which is not a given.)

However, I note that last time, the old regex module was just documented 
as obsolete with little detailed discussion of the differences:

http://docs.python.org/release/1.5/lib/node69.html#SECTION005300000000000000000


-- 
Steven

From solipsis at pitrou.net  Sat Aug 27 03:56:02 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 03:56:02 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
Message-ID: <20110827035602.557f772f@pitrou.net>

On Sat, 27 Aug 2011 04:37:21 +0300
Ezio Melotti <ezio.melotti at gmail.com> wrote:
> 
> I'm not sure it's worth doing an extensive review of the code, a better
> approach might be to require extensive test coverage  (and a review of
> tests).  If the code seems well written, commented, documented (I think
> proper rst documentation is still missing),

Isn't this precisely what a review is supposed to assess?

> We will get familiar with the code once we start contributing
> to it and fixing bugs, as it already happens with most of the other modules.

I'm not sure it's a good idea for a module with more than 10000 lines
of C code (and 4000 lines of pure Python code). This is several times
the size of multiprocessing. The C code looks very cleanly written, but
it's still a big chunk of algorithmically sophisticated code.

Another "interesting" question is whether it's easy to port to the PEP
393 string representation, if it gets accepted.

Regards

Antoine.



From solipsis at pitrou.net  Sat Aug 27 03:59:16 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 03:59:16 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
Message-ID: <20110827035916.583c3d81@pitrou.net>

On Fri, 26 Aug 2011 17:25:56 -0700
Dan Stromberg <drsalists at gmail.com> wrote:
> On Fri, Aug 26, 2011 at 5:08 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> 
> > On Fri, 26 Aug 2011 15:48:42 -0700
> > Dan Stromberg <drsalists at gmail.com> wrote:
> > >
> > > Then there probably should be a from __future__ import for a while.
> >
> > If you are willing to use a "from __future__ import", why not simply
> >
> >    import regex as re
> >
> > ? We're not Perl, we don't have built-in syntactic support for regular
> > expressions.
> >
> > Regards
> >
> 
> If you add regex as "import regex", and the new regex module doesn't work
> out, regex might be harder to get rid of.  from __future__ import is an
> established way of trying something for a while to see if it's going to
> work.

That's an interesting idea. This way, integrating the new module would
be a less risky move, since if it gives us too many problems, we could
back out our decision in the next feature release.

Regards

Antoine.

From ben+python at benfinney.id.au  Sat Aug 27 05:15:18 2011
From: ben+python at benfinney.id.au (Ben Finney)
Date: Sat, 27 Aug 2011 13:15:18 +1000
Subject: [Python-Dev] Should we move to replace re with regex?
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
	<4E58258F.9050204@egenix.com> <87mxevvea5.fsf@benfinney.id.au>
	<4E584E52.1080606@pearwood.info>
Message-ID: <87fwknv92x.fsf@benfinney.id.au>

Steven D'Aprano <steve at pearwood.info> writes:

> Ben Finney wrote:
> > "M.-A. Lemburg" <mal at egenix.com> writes:
>
> >> No, you tell them: "If you want Unicode 6 semantics, use regex, if
> >> you're fine with Unicode 2.0/3.0 semantics, use re".
> >
> > What do we say, then, to those who are unaware of the different
> > semantics between those versions of Unicode, and want regular expression
> > to ?just work? in Python?
> >
> > To which document can we direct them to understand what semantics they
> > want?
>
> Presumably, like all modules, both the re and the regex module will
> have their own individual pages in the library reference.

My question is directed more to M-A Lemburg's passage above, and its
implicit assumption that the user understand the changes between
?Unicode 2.0/3.0 semantics? and ?Unicode 6 semantics?, and how their own
needs relate to those semantics.

For programmers who know they want to follow Unicode conventions in
Python, but don't know the distinction M-A Lemburg is drawing, to which
document does he recommend we direct them?

?The Unicode specification document in its various versions? isn't a
feasible answer.

-- 
 \     ?Computers are useless. They can only give you answers.? ?Pablo |
  `\                                                           Picasso |
_o__)                                                                  |
Ben Finney


From steve at pearwood.info  Sat Aug 27 05:31:03 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 27 Aug 2011 13:31:03 +1000
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <87fwknv92x.fsf@benfinney.id.au>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>	<4E581986.3000709@egenix.com>	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>	<4E58258F.9050204@egenix.com>
	<87mxevvea5.fsf@benfinney.id.au>	<4E584E52.1080606@pearwood.info>
	<87fwknv92x.fsf@benfinney.id.au>
Message-ID: <4E5864F7.2010106@pearwood.info>

Ben Finney wrote:
> Steven D'Aprano <steve at pearwood.info> writes:
> 
>> Ben Finney wrote:
>>> "M.-A. Lemburg" <mal at egenix.com> writes:
>>>> No, you tell them: "If you want Unicode 6 semantics, use regex, if
>>>> you're fine with Unicode 2.0/3.0 semantics, use re".
>>> What do we say, then, to those who are unaware of the different
>>> semantics between those versions of Unicode, and want regular expression
>>> to ?just work? in Python?
>>>
>>> To which document can we direct them to understand what semantics they
>>> want?
>> Presumably, like all modules, both the re and the regex module will
>> have their own individual pages in the library reference.
> 
> My question is directed more to M-A Lemburg's passage above, and its
> implicit assumption that the user understand the changes between
> ?Unicode 2.0/3.0 semantics? and ?Unicode 6 semantics?, and how their own
> needs relate to those semantics.
> 
> For programmers who know they want to follow Unicode conventions in
> Python, but don't know the distinction M-A Lemburg is drawing, to which
> document does he recommend we direct them?


I can only repeat my answer: the docs for the new regex module should 
include a discussion of the differences. If that requires summarising 
the differences that M-A Lemburg refers to, then so be it.


> ?The Unicode specification document in its various versions? isn't a
> feasible answer.

Presumably the Unicode spec will be the canonical source, but I agree 
that we should not expect people to read that in order to make a 
decision between re and regex.


-- 
Steven

From steve at pearwood.info  Sat Aug 27 05:47:34 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 27 Aug 2011 13:47:34 +1000
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <20110827035916.583c3d81@pitrou.net>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>	<20110827020835.08a2a492@pitrou.net>	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net>
Message-ID: <4E5868D6.8090203@pearwood.info>

Antoine Pitrou wrote:
> On Fri, 26 Aug 2011 17:25:56 -0700
> Dan Stromberg <drsalists at gmail.com> wrote:
[...]
>> If you add regex as "import regex", and the new regex module doesn't work
>> out, regex might be harder to get rid of.  from __future__ import is an
>> established way of trying something for a while to see if it's going to
>> work.
> 
> That's an interesting idea. This way, integrating the new module would
> be a less risky move, since if it gives us too many problems, we could
> back out our decision in the next feature release.

I'm not sure that's correct. If there are differences in either the 
interface or the behaviour between the new regex and re, then reverting 
will be a pain regardless of whether you have:

from __future__ import re
re.compile(...)

or

import regex
regex.compile(...)


Either way, if the new regex library goes away, code will break, and 
fixing it may not be easy. It's not likely to be so easy that merely 
deleting the "from __future__ ..." line will do it, but if it is that 
easy, then using "import re as regex" will be just as easy.

Have then been any __future__ features that were added provisionally? I 
can't think of any. That's not what __future__ is for, at least 
according to PEP 236.

http://www.python.org/dev/peps/pep-0236/

I can't think of any __future__ feature that could be easily reverted 
once people start relying on it. Either syntax would break, or behaviour 
would change.

The PEP even explicitly states that __future__ should not be used for 
changes which are backward compatible:

     Note that there is no need to involve the future_statement machinery
     in new features unless they can break existing code; fully backward-
     compatible additions can-- and should --be introduced without a
     corresponding future_statement.


I wasn't around for the move from 1.4 regex to 1.5 re, so I don't know 
what was done poorly last time. But I can't see why we should treat 
regular expressions so differently from (say) argparse and optparse.

from __future__ import optparse

No. Just... no.




-- 
Steven


From tjreedy at udel.edu  Sat Aug 27 05:51:30 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 26 Aug 2011 23:51:30 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
Message-ID: <4E5869C2.2040008@udel.edu>



On 8/26/2011 8:42 PM, Guido van Rossum wrote:
> On Fri, Aug 26, 2011 at 3:57 PM, Terry Reedy<tjreedy at udel.edu>  wrote:

>> My impression is that a UFT-16 implementation, to be properly called such,
>> must do len and [] in terms of code points, which is why Python's narrow
>> builds are called UCS-2 and not UTF-16.
>
> I don't think anyone else has that impression. Please cite chapter and
> verse if you really think this is important. IIUC, UCS-2 does not
> allow surrogate pairs, whereas Python (and Java, and .NET, and
> Windows) 16-bit strings all do support surrogate pairs. And they all

For that reason, I think UTF-16 is a better term that UCS-2 for narrow 
builds (whether or not the above impression is true).
But Marc Lemburg disagrees.
http://mail.python.org/pipermail/python-dev/2010-November/105751.html
The 2.7 docs still refer to usc2 builds, as is his wish.

---
Terry Jan Reedy

From g.brandl at gmx.net  Sat Aug 27 07:47:35 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 27 Aug 2011 07:47:35 +0200
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <CAPdtAj3wv2WCePdYM3qRcbRvLfzhAp2G1JpvRjd6-ttw2d1Q2A@mail.gmail.com>
References: <4E4AF610.5040303@simplistix.co.uk>
	<CAPdtAj3wv2WCePdYM3qRcbRvLfzhAp2G1JpvRjd6-ttw2d1Q2A@mail.gmail.com>
Message-ID: <j3a0dn$pas$1@dough.gmane.org>

Am 23.08.2011 01:09, schrieb Sandro Tosi:
> Hi all,
> 
>> Any chance the version of sphinx used to generate the docs on
>> docs.python.org could be updated?
> 
> I'd like to discuss this aspect, in particular for the implication it
> has on http://bugs.python.org/issue12409 .
> 
> Personally, I do think it has a value to have the same set of tools to
> build the Python documentation of the currently active branches.
> Currently, only 2.7 is different, since it still fetches (from
> svn.python.org... can we fix this too? suggestions welcome!) sphinx
> 0.6.7 while 3.2/3.3 uses 1.0.7.
> 
> If you're worried about the time needed to convert the actual 2.7 doc
> to new sphinx format and all the related changes, I volunteer to do
> the job (and/or collaborate with whom is already on it), but what I
> want to understand if it's an acceptable change.
> 
> I see sphinx more as of an internal, building tool, so freezing it
> it's like saying "don't upgrade gcc" or so. Now the delta is just the
> C functions definitions and some py-specific roles, but during the
> years it will increase. Keeping it small, simplifying the forward-port
> of doc patches (not needing to have 2 version between 2.7 and 3.x
> f.e.) and having a common set of tools for doc building is worth IMHO.
> 
> What do you think about it? and yes Georg, I'd like to hear your opinion too :)

One of the main reasons for keeping Sphinx compatibility to 0.6.x was to
enable distributions (like Debian) to build the docs for the Python they ship
with the version of Sphinx that they ship.

This should now be fine with 1.0.x, so since you are ready to do the work of
converting the 2.7 Doc sources, it will be accepted.  The argument of easier
backports is a very good one.

The issue of using svn to download the tools is orthogonal; for this I would
agree to just packaging up a tarball or zipfile that is then downloaded using a
small Python script (should be properly cross-platform then).  Cloning the
original repositories is a) not useful, b) depends on availability of at least
two additional servers (remember docutils) and c) requires hg and svn.

Georg


From raymond.hettinger at gmail.com  Sat Aug 27 07:58:10 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Fri, 26 Aug 2011 22:58:10 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5869C2.2040008@udel.edu>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
Message-ID: <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>


On Aug 26, 2011, at 8:51 PM, Terry Reedy wrote:

> 
> 
> On 8/26/2011 8:42 PM, Guido van Rossum wrote:
>> On Fri, Aug 26, 2011 at 3:57 PM, Terry Reedy<tjreedy at udel.edu>  wrote:
> 
>>> My impression is that a UFT-16 implementation, to be properly called such,
>>> must do len and [] in terms of code points, which is why Python's narrow
>>> builds are called UCS-2 and not UTF-16.
>> 
>> I don't think anyone else has that impression. Please cite chapter and
>> verse if you really think this is important. IIUC, UCS-2 does not
>> allow surrogate pairs, whereas Python (and Java, and .NET, and
>> Windows) 16-bit strings all do support surrogate pairs. And they all
> 
> For that reason, I think UTF-16 is a better term that UCS-2 for narrow builds (whether or not the above impression is true).

I agree.  It's weird to call something UCS-2 if code points above 65535 are representable.
The naming convention for codecs is that the UTF prefix is used for lossless encodings that cover the entire range of Unicode.

"The first amendment to the original edition of the UCS defined UTF-16, an extension of UCS-2, to represent code points outside the BMP."

Raymond

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/981e8a99/attachment.html>

From drsalists at gmail.com  Sat Aug 27 08:01:21 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Fri, 26 Aug 2011 23:01:21 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <4E5868D6.8090203@pearwood.info>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net>
	<4E5868D6.8090203@pearwood.info>
Message-ID: <CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>

On Fri, Aug 26, 2011 at 8:47 PM, Steven D'Aprano <steve at pearwood.info>wrote:

> Antoine Pitrou wrote:
>
>> On Fri, 26 Aug 2011 17:25:56 -0700
>> Dan Stromberg <drsalists at gmail.com> wrote:
>>
> If you add regex as "import regex", and the new regex module doesn't work
>
>> out, regex might be harder to get rid of.  from __future__ import is an
>>> established way of trying something for a while to see if it's going to
>>> work.
>>>
>>
>> That's an interesting idea. This way, integrating the new module would
>> be a less risky move, since if it gives us too many problems, we could
>> back out our decision in the next feature release.
>>
>
> I'm not sure that's correct. If there are differences in either the
> interface or the behaviour between the new regex and re, then reverting will
> be a pain regardless of whether you have:
>
> from __future__ import re
> re.compile(...)
>
> or
>
> import regex
> regex.compile(...)
>
>
> Either way, if the new regex library goes away, code will break, and fixing
> it may not be easy.


You're talking technically, which is important, but wasn't what I was
suggesting would be helped.

Politically, and from a marketing standpoint, it's easier to withdraw a
feature you've given with a "Play with this, see if it works for you"
warning.

Have then been any __future__ features that were added provisionally?
>

I can't either, but ISTR hearing that from __future__ import was started
with such an intent.  Irrespective, it's hard to import something from
"future" without at least suspecting that you're on the bleeding edge.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110826/b49ab872/attachment-0001.html>

From martin at v.loewis.de  Sat Aug 27 08:02:31 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 27 Aug 2011 08:02:31 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>	<4E582432.2080301@v.loewis.de>	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
Message-ID: <4E588877.3080204@v.loewis.de>

> I'm not sure it's worth doing an extensive review of the code, a better
> approach might be to require extensive test coverage  (and a review of
> tests).

I think it's worth. It's really bad if only one developer fully
understands the regex implementation.

Regards,
Martin

From tjreedy at udel.edu  Sat Aug 27 08:25:17 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 27 Aug 2011 02:25:17 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110827022331.0d99a22c@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E58378E.2090609@canterbury.ac.nz>
	<20110827022331.0d99a22c@pitrou.net>
Message-ID: <j3a2li$3ie$1@dough.gmane.org>

On 8/26/2011 8:23 PM, Antoine Pitrou wrote:

>> I would only agree as long as it wasn't too much worse
>> than O(1). O(log n) might be all right, but O(n) would be
>> unacceptable, I think.
>
> It also depends a lot on *actual* measured performance

Amen. Some regard O(n*n) sorts to be, by definition, 'worse' than 
O(n*logn). I even read that in an otherwise good book by a university 
professor. Fortunately for Python users, Tim Peters ignored that 
'wisdom', coded the best O(n*n) sort he could, and then *measured* to 
find out what was better for what types and lengths of arrays. So not we 
have a list.sort that sometimes beats the pure O(nlog) quicksort of C 
libraries.

-- 
Terry Jan Reedy


From martin at v.loewis.de  Sat Aug 27 08:31:44 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 27 Aug 2011 08:31:44 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>	<20110827020835.08a2a492@pitrou.net>	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>	<20110827035916.583c3d81@pitrou.net>	<4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
Message-ID: <4E588F50.1040903@v.loewis.de>

> I can't either, but ISTR hearing that from __future__ import was started
> with such an intent. 

No, not at all. The original intention was to enable features that would
definitely would be added, not just right now. Tim Peters always
objected to claims that future imports were talking about provisional
features.

> Politically, and from a marketing standpoint, it's easier to withdraw
> a feature you've given with a "Play with this, see if it works for
> you" warning.

We don't want to add features to Python that we may have to withdraw.
If there is doubt whether they should be added, they shouldn't be added.
If they do get added, we have to live with it (until, say, Python 4,
where bad features can be removed again).

Regards,
Martin

From tjreedy at udel.edu  Sat Aug 27 08:33:44 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 27 Aug 2011 02:33:44 -0400
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <20110827035602.557f772f@pitrou.net>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
	<20110827035602.557f772f@pitrou.net>
Message-ID: <j3a35d$3ie$2@dough.gmane.org>

On 8/26/2011 9:56 PM, Antoine Pitrou wrote:

> Another "interesting" question is whether it's easy to port to the PEP
> 393 string representation, if it gets accepted.

Will the re module need porting also?

-- 
Terry Jan Reedy


From martin at v.loewis.de  Sat Aug 27 09:18:14 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 27 Aug 2011 09:18:14 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <j3a35d$3ie$2@dough.gmane.org>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>	<4E582432.2080301@v.loewis.de>	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>	<20110827035602.557f772f@pitrou.net>
	<j3a35d$3ie$2@dough.gmane.org>
Message-ID: <4E589A36.80109@v.loewis.de>

Am 27.08.2011 08:33, schrieb Terry Reedy:
> On 8/26/2011 9:56 PM, Antoine Pitrou wrote:
> 
>> Another "interesting" question is whether it's easy to port to the PEP
>> 393 string representation, if it gets accepted.
> 
> Will the re module need porting also?

That's a quality-of-implementation issue (in both cases). In principle,
the modules should continue to work unmodified, and indeed SRE does.
However, the module will then match on Py_UNICODE, which may be
expensive to produce, and may not meet your expectations of surrogate
pair handling.

So realistically, the module should be ported, which has the challenge
that matching needs to operate on three different representations. The
modules already support two representations (unsigned char and
Py_UNICODE), but probably switching on type, not on state.

Regards,
Martin

From steve at pearwood.info  Sat Aug 27 09:40:24 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 27 Aug 2011 17:40:24 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j3a2li$3ie$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>
	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>
	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>	<4E58378E.2090609@canterbury.ac.nz>	<20110827022331.0d99a22c@pitrou.net>
	<j3a2li$3ie$1@dough.gmane.org>
Message-ID: <4E589F68.60301@pearwood.info>

Terry Reedy wrote:
> On 8/26/2011 8:23 PM, Antoine Pitrou wrote:
> 
>>> I would only agree as long as it wasn't too much worse
>>> than O(1). O(log n) might be all right, but O(n) would be
>>> unacceptable, I think.
>>
>> It also depends a lot on *actual* measured performance
> 
> Amen. Some regard O(n*n) sorts to be, by definition, 'worse' than 
> O(n*logn). I even read that in an otherwise good book by a university 
> professor. Fortunately for Python users, Tim Peters ignored that 
> 'wisdom', coded the best O(n*n) sort he could, and then *measured* to 
> find out what was better for what types and lengths of arrays. So not we 
> have a list.sort that sometimes beats the pure O(nlog) quicksort of C 
> libraries.

A nice story, but Quicksort's worst case is O(n*n) too.

http://en.wikipedia.org/wiki/Quicksort

timsort is O(n) in the best case (all items already in order).

You are right though about Tim Peters doing extensive measurements:

http://bugs.python.org/file4451/timsort.txt

If you haven't read the whole thing, do so. I am in awe -- not just 
because he came up with the algorithm, but because of the discipline Tim 
demonstrated in such detailed testing. A far cry from a couple of timeit 
runs on short-ish lists.



-- 
Steven


From martin at v.loewis.de  Sat Aug 27 09:59:03 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 27 Aug 2011 09:59:03 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E589F68.60301@pearwood.info>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<20110823001440.433a0f1f@pitrou.net>	<4E536B0C.8050008@v.loewis.de>	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>	<4E537EEC.1070602@v.loewis.de>	<1314099542.3485.10.camel@localhost.localdomain>	<4E53945E.1050102@v.loewis.de>	<1314101745.3485.18.camel@localhost.localdomain>	<4E53A5D1.2040808@v.loewis.de>	<4E53A950.30005@haypocalc.com>	<j31hlc$dp5$2@dough.gmane.org>	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>	<j32igg$hd7$1@dough.gmane.org>	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>	<4E58378E.2090609@canterbury.ac.nz>	<20110827022331.0d99a22c@pitrou.net>	<j3a2li$3ie$1@dough.gmane.org>
	<4E589F68.60301@pearwood.info>
Message-ID: <4E58A3C7.6050301@v.loewis.de>

Am 27.08.2011 09:40, schrieb Steven D'Aprano:
> Terry Reedy wrote:
>> On 8/26/2011 8:23 PM, Antoine Pitrou wrote:
>>
>>>> I would only agree as long as it wasn't too much worse
>>>> than O(1). O(log n) might be all right, but O(n) would be
>>>> unacceptable, I think.
>>>
>>> It also depends a lot on *actual* measured performance
>>
>> Amen. Some regard O(n*n) sorts to be, by definition, 'worse' than
>> O(n*logn). I even read that in an otherwise good book by a university
>> professor. Fortunately for Python users, Tim Peters ignored that
>> 'wisdom', coded the best O(n*n) sort he could, and then *measured* to
>> find out what was better for what types and lengths of arrays. So not
>> we have a list.sort that sometimes beats the pure O(nlog) quicksort of
>> C libraries.
> 
> A nice story, but Quicksort's worst case is O(n*n) too.

In addition, timsort is O(n log n), which also makes it a real good
O(n*n) sort :-)

Regards,
Martin

From ncoghlan at gmail.com  Sat Aug 27 10:02:49 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 27 Aug 2011 18:02:49 +1000
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net>
	<4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
Message-ID: <CADiSq7di9zsTRnyD8tqG_=i4nqdzF7E1kfXZM6s2yUKoQXKnZg@mail.gmail.com>

On Sat, Aug 27, 2011 at 4:01 PM, Dan Stromberg <drsalists at gmail.com> wrote:
> You're talking technically, which is important, but wasn't what I was
> suggesting would be helped.
>
> Politically, and from a marketing standpoint, it's easier to withdraw a
> feature you've given with a "Play with this, see if it works for you"
> warning.

The standard library isn't for playing. "pip install regex" is for
playing. If we aren't sure we want to make the transition, then it
doesn't go in.

However, to my mind, reviewing and incorporating regex is a far more
feasible model than trying to enhance the existing re module with a
comparable feature set. At the moment, there's already an obvious way
to get enhanced regex support in Python: install regex and use it
instead of the standard library's re module. That's enough to pretty
much kill any motivation anyone might have to make major changes to re
itself.

We're at least getting one thing right this time that we got wrong
with multiprocessing, though - we're much, much further out from the
3.3 release than we were from the 2.6 release when multiprocessing was
added to the standard library :)

The next step needed is for someone to volunteer to write and champion
a PEP that:
- articulates the deficiencies in the current re module (the regex
docs already cover some of this, as do Tom Christiansen's notes on the
issue tracker)
- explains why upgrading re in place is not feasible (e.g. noting that
the availability of regex really limits the desire for anyone to
reinvent that particular wheel, so even things that are theoretically
possible may be highly unlikely in practice)
- proposes a transition plan (personally, I'd be fine with an optparse
-> argparse style transition where re remains around indefinitely to
support legacy code, but new users are pointed towards regex. But
depending on compatibility details, merging the two APIs in the
existing re namespace may also be feasible)
- proposes a maintenance strategy (I don't know how much Matthew has
written regarding internal design details, but that kind of thing
could really help. Matthew agreeing to continue maintenance as part of
the standard library would also help a great deal, but wouldn't be
enough on its own - while it's good for modules to have active
maintainers to make the final call associated design decisions, it's
potentially problematic when other core developers don't understand
what the code is doing well enough to fix bugs in it)
- confirms that the regex test suite can be incorporated cleanly into
the standard library regression test suite (the difficulty of this was
something that was underestimated for the inclusion of
multiprocessing. Test suite integration is also the final sticking
point holding up the PEP 380 'yield from' patch, although that's close
to being resolved following the PyConAU sprints)
- document tests conducted (e.g. micro-benchmark results, fusil results)

PEP 371 (addition of multiprocessing), PEP 389 (addition of argparse)
and Jesse's reflections on the way multiprocessing was added
(http://jessenoller.com/2009/01/28/multiprocessing-in-hindsight/) are
well worth reading for anyone considering stepping up to write a PEP.
That last also highlights why even Matthew's support, however capably
he has handled maintenance of regex as an independent project,
wouldn't be enough - we had Richard Oudkerk's support and agreement to
continue maintenance as the original author of multiprocessing, but he
became unavailable early in the integration process. If Jesse hadn't
been able to take up most of that slack, the likely result would have
been reversion of the changes and removal of multiprocessing from the
2.6 release.

Writing PEPs can be quite a frustrating experience (since a lot of
feedback will be negative as people try to poke holes in the idea to
see if it stands up to close scrutiny), but it's also really
satisfying and rewarding if they end up getting accepted and
incorporated :)

>> Have then been any __future__ features that were added provisionally?
>
> I can't either, but ISTR hearing that from __future__ import was started
> with such an intent.? Irrespective, it's hard to import something from
> "future" without at least suspecting that you're on the bleeding edge.

No, we make an explicit guarantee that future imports will never go
away once they've been added. They may become redundant, but they
won't break. There's no provision in the future mechanism for changes
that are added and then later removed (see
http://docs.python.org/dev/library/__future__).

They're strictly for cases where backwards incompatibilities (usually,
but not always, new keywords) may break existing code.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ask at celeryproject.org  Sat Aug 27 11:59:16 2011
From: ask at celeryproject.org (Ask Solem)
Date: Sat, 27 Aug 2011 10:59:16 +0100
Subject: [Python-Dev] issue 6721 "Locks in python standard library
	should be sanitized on fork"
In-Reply-To: <20110826175336.3af6be57@pitrou.net>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
Message-ID: <A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>


On 26 Aug 2011, at 16:53, Antoine Pitrou wrote:

> 
> Hi,
> 
>> I think that "deprecating" the use of threads w/ multiprocessing - or
>> at least crippling it is the wrong answer. Multiprocessing needs the
>> helper threads it uses internally to manage queues, etc. Removing that
>> ability would require a near-total rewrite, which is just a
>> non-starter.
> 
> I agree that this wouldn't actually benefit anyone.
> Besides, I don't think it's even possible to avoid threads in
> multiprocessing, given the various constraints. We would have to force
> the user to run their main thread in an event loop, and that would be
> twisted (tm).
> 
>> I would focus on the atfork() patch more directly, ignoring
>> multiprocessing in the discussion, and focusing on the merits of gps'
>> initial proposal and patch.
> 
> I think this could also be combined with Charles-Fran?ois' patch.
> 
> Regards



Have to agree with Jesse and Antoine here.

Celery (celeryproject.org) uses multiprocessing, is wildly used in production,
and is regarded as stable software that have been known to run for months at a time
only to be restarted for software upgrades.

I have been investigating an issue for some time, that I'm pretty sure is caused
by this.  It occurs only rarely, so rarely I have not had any actual bug reports
about it, it's just something I have experienced during extensive testing.
The tone of the discussion on the bug tracker makes me think that I have
been very lucky :-)

Using the fork+exec approach seems like a much more realistic solution
than rewriting multiprocessing.Pool and Manager to not use threads. In fact
this is something I have been considering as a fix for the suspected
issue for for some time.
It does have implications that are annoying for sure, but we are already
used to this on the Windows platform (it could help portability even).

-- 
Ask Solem
twitter.com/asksol | +44 (0)7713357179


From solipsis at pitrou.net  Sat Aug 27 12:09:29 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 12:09:29 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
	<20110827035602.557f772f@pitrou.net> <j3a35d$3ie$2@dough.gmane.org>
	<4E589A36.80109@v.loewis.de>
Message-ID: <20110827120929.2600c3e9@pitrou.net>

On Sat, 27 Aug 2011 09:18:14 +0200
"Martin v. L?wis" <martin at v.loewis.de> wrote:
> Am 27.08.2011 08:33, schrieb Terry Reedy:
> > On 8/26/2011 9:56 PM, Antoine Pitrou wrote:
> > 
> >> Another "interesting" question is whether it's easy to port to the PEP
> >> 393 string representation, if it gets accepted.
> > 
> > Will the re module need porting also?
> 
> That's a quality-of-implementation issue (in both cases). In principle,
> the modules should continue to work unmodified, and indeed SRE does.
> However, the module will then match on Py_UNICODE, which may be
> expensive to produce, and may not meet your expectations of surrogate
> pair handling.
> 
> So realistically, the module should be ported, which has the challenge
> that matching needs to operate on three different representations. The
> modules already support two representations (unsigned char and
> Py_UNICODE), but probably switching on type, not on state.

From what I've seen, re generates two different sets of functions at
compile-time (with a stringlib-like approach), while regex has a
run-time flag to choose between the two representations (where,
interestingly, the two code paths are explicitly spelled, almost
duplicate of each other).
Matthew, please correct me if I'm wrong.

Regards

Antoine.



From solipsis at pitrou.net  Sat Aug 27 12:10:12 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 12:10:12 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
	<4E588877.3080204@v.loewis.de>
Message-ID: <20110827121012.37b39947@pitrou.net>

On Sat, 27 Aug 2011 08:02:31 +0200
"Martin v. L?wis" <martin at v.loewis.de> wrote:
> > I'm not sure it's worth doing an extensive review of the code, a better
> > approach might be to require extensive test coverage  (and a review of
> > tests).
> 
> I think it's worth. It's really bad if only one developer fully
> understands the regex implementation.

Could such a review be the topic of an informational PEP?

Regards

Antoine.



From arigo at tunes.org  Sat Aug 27 12:45:05 2011
From: arigo at tunes.org (Armin Rigo)
Date: Sat, 27 Aug 2011 12:45:05 +0200
Subject: [Python-Dev] Software Transactional Memory for Python
Message-ID: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>

Hi all,

About multithreading models: I recently made an observation which
might be obvious to some, but not to me, and as far as I know not to
most of us either.  I think that it's worth being pointed out :-)

http://mail.python.org/pipermail/pypy-dev/2011-August/008153.html


A bient?t,

Armin.

From exarkun at twistedmatrix.com  Sat Aug 27 13:11:29 2011
From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com)
Date: Sat, 27 Aug 2011 11:11:29 -0000
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
Message-ID: <20110827111129.1808.139401277.divmod.xquotient.9@localhost.localdomain>

On 26 Aug, 09:45 pm, guido at python.org wrote:
>I just made a pass of all the Unicode-related bugs filed by Tom
>Christiansen, and found that in several, the response was "this is
>fixed in the regex module [by Matthew Barnett]". I started replying
>that I thought that we should fix the bugs in the re module (i.e.,
>really in _sre.c) but on second thought I wonder if maybe regex is
>mature enough to replace re in Python 3.3. It would mean that we won't
>fix any of these bugs in earlier Python versions, but I could live
>with that.
>
>However, I don't know much about regex -- how compatible is it, how
>fast is it (including extreme cases where the backtracking goes
>crazy), how bug-free is it, and so on. Plus, how much work would it be
>to actually incorporate it into CPython as a complete drop-in
>replacement of the re package (such that nobody needs to change their
>imports or the flags they pass to the re module).
>
>We'd also probably have to train some core developers to be familiar
>enough with the code to maintain and evolve it -- I assume we can't
>just volunteer Matthew to do so forever... :-)
>
>What's the alternative? Is adding the requested bug fixes and new
>features to _sre.c really that hard?

What about other Python implementations (ie, PEP 399)?  For this to be 
seriously considered, shouldn't there also be a pure Python 
implementation of the functionality?

Jean-Paul

From ncoghlan at gmail.com  Sat Aug 27 14:40:35 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 27 Aug 2011 22:40:35 +1000
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
Message-ID: <CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>

On Sat, Aug 27, 2011 at 8:45 PM, Armin Rigo <arigo at tunes.org> wrote:
> Hi all,
>
> About multithreading models: I recently made an observation which
> might be obvious to some, but not to me, and as far as I know not to
> most of us either. ?I think that it's worth being pointed out :-)
>
> http://mail.python.org/pipermail/pypy-dev/2011-August/008153.html

Having a context manager to say "don't release the GIL" for a bit
could actually be really nice (e.g. for implementing builtin-style
method semantics for data types written in Python).

However, two immediate questions come to mind:

1. How does the patch interact with C code that explicitly releases
the GIL? (e.g. IO commands inside a "with atomic:" block)
2. Whether or not Jython and IronPython could implement something like
that, since they're free threaded with fine-grained locks. If they
can't then I don't see how we could justify making it part of the
standard library.

Interesting idea, though :)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From arigo at tunes.org  Sat Aug 27 15:08:36 2011
From: arigo at tunes.org (Armin Rigo)
Date: Sat, 27 Aug 2011 15:08:36 +0200
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
Message-ID: <CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>

Hi Nick,

On Sat, Aug 27, 2011 at 2:40 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 1. How does the patch interact with C code that explicitly releases
> the GIL? (e.g. IO commands inside a "with atomic:" block)

As implemented, any code in a "with atomic" is prevented from
explicitly releasing and reacquiring the GIL: the GIL remain acquired
until the end of the "with" block.  In other words
Py_BEGIN_ALLOW_THREADS has no effect in a "with" block.  This gives
semantics that, in a full multi-core STM world, would be implementable
by saying that if, in the middle of a transaction, you need to do I/O,
then from this point onwards the transaction is not allowed to abort
any more.  Such "inevitable" transactions are already supported e.g.
by RSTM, the C++ framework I used to prototype a C version
(https://bitbucket.org/arigo/arigo/raw/default/hack/stm/c ).

> 2. Whether or not Jython and IronPython could implement something like
> that, since they're free threaded with fine-grained locks. If they
> can't then I don't see how we could justify making it part of the
> standard library.

Yes, I can imagine some solutions.  I am no Jython or IronPython
expert, but let us assume that they have a way to check synchronously
for external events from time to time (i.e. if there is some
equivalent to sys.setcheckinterval()).  If they do, then all you need
is the right synchronization: the thread that wants to start a "with
atomic" has to wait until all other threads are paused in the external
check code.  (Again, like CPython's, this not a properly multi-core
STM-ish solution, but it would give the right semantics.  (And if it
turns out that STM is successful in the future, Java will grow more
direct support for it <wink>))


A bient?t,

Armin.

From nadeem.vawda at gmail.com  Sat Aug 27 15:47:45 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Sat, 27 Aug 2011 15:47:45 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
Message-ID: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>

Hello all,

I'd like to propose the addition of a new module in Python 3.3. The 'lzma'
module will provide support for compression and decompression using the LZMA
algorithm, and the .xz and .lzma file formats. The matter has already been
discussed on the tracker <http://bugs.python.org/issue6715>, where there seems
to be a consensus that this is a desirable feature. What are your thoughts?

The proposed module's API will be very similar to that of the bz2 module;
the only differences will be additional keyword arguments to some functions,
for specifying container formats and detailed compressor options.

The implementation will also be similar to bz2 - basic compressor and
decompressor classes written in C, with convenience functions and a file
interface implemented on top of those in Python.

I've already done some work on the C parts of the module; I'll push that to my
sandbox <http://hg.python.org/sandbox/nvawda/> in the next day or two.

Cheers,
Nadeem

From rosslagerwall at gmail.com  Sat Aug 27 16:36:50 2011
From: rosslagerwall at gmail.com (Ross Lagerwall)
Date: Sat, 27 Aug 2011 16:36:50 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
Message-ID: <1314455810.11891.7.camel@hobo>

> I'd like to propose the addition of a new module in Python 3.3. The 'lzma'
> module will provide support for compression and decompression using the LZMA
> algorithm, and the .xz and .lzma file formats. The matter has already been
> discussed on the tracker <http://bugs.python.org/issue6715>, where there seems
> to be a consensus that this is a desirable feature. What are your thoughts?
> 
> The proposed module's API will be very similar to that of the bz2 module;
> the only differences will be additional keyword arguments to some functions,
> for specifying container formats and detailed compressor options.

+1 for adding and +1 for keeping a similar interface.

Cheers
Ross


From solipsis at pitrou.net  Sat Aug 27 16:47:17 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 16:47:17 +0200
Subject: [Python-Dev] Software Transactional Memory for Python
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
Message-ID: <20110827164717.15dbf64c@pitrou.net>

On Sat, 27 Aug 2011 15:08:36 +0200
Armin Rigo <arigo at tunes.org> wrote:
> Hi Nick,
> 
> On Sat, Aug 27, 2011 at 2:40 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > 1. How does the patch interact with C code that explicitly releases
> > the GIL? (e.g. IO commands inside a "with atomic:" block)
> 
> As implemented, any code in a "with atomic" is prevented from
> explicitly releasing and reacquiring the GIL: the GIL remain acquired
> until the end of the "with" block.  In other words
> Py_BEGIN_ALLOW_THREADS has no effect in a "with" block.

You then risk deadlocks. Say:
- thread A is inside a "with atomic" and calls a library function which
  tries to take lock L
- thread B has already taken lock L and is currently executing an I/O
  function with GIL released
- thread B then waits for the GIL (and hence depends on thread A going
  forward), while thread A waits for lock L (and hence depends on
  thread B going forward)

Lock L could simply be the lock used by the file object  (a
Buffered{Reader,Writer,Random}) which thread B is reading or writing
from.

Regards

Antoine.



From martin at v.loewis.de  Sat Aug 27 16:50:02 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 27 Aug 2011 16:50:02 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
Message-ID: <4E59041A.7040100@v.loewis.de>

> The implementation will also be similar to bz2 - basic compressor and
> decompressor classes written in C, with convenience functions and a file
> interface implemented on top of those in Python.

When I reviewed lzma, I found that this approach might not be
appropriate. lzma has many more options and aspects that allow tuning
and selection, and a Python LZMA library should provide the same feature
set as the underlying C library.

So I would propose that a very thin C layer is created around the C
library that focuses on the actual algorithms, and that any higher
layers (in particular file formats) are done in Python.

Regards,
Martin

From nadeem.vawda at gmail.com  Sat Aug 27 16:59:21 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Sat, 27 Aug 2011 16:59:21 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <4E59041A.7040100@v.loewis.de>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
Message-ID: <CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>

On Sat, Aug 27, 2011 at 4:50 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> The implementation will also be similar to bz2 - basic compressor and
>> decompressor classes written in C, with convenience functions and a file
>> interface implemented on top of those in Python.
>
> When I reviewed lzma, I found that this approach might not be
> appropriate. lzma has many more options and aspects that allow tuning
> and selection, and a Python LZMA library should provide the same feature
> set as the underlying C library.
>
> So I would propose that a very thin C layer is created around the C
> library that focuses on the actual algorithms, and that any higher
> layers (in particular file formats) are done in Python.

I probably shouldn't have used the word "basic" here - these classes expose all
the features of the underlying library. I was rather trying to underscore that
the rest of the module is implemented in terms of these two classes.

As for file formats, these are handled by liblzma itself; the extension module
just selects which compressor/decompressor initializer function to use depending
on the value of the "format" argument. Our code won't contain anything along the
lines of GzipFile; all of that work is done by the underlying C library. Rather,
the LZMAFile class will be like BZ2File - just a simple filter that passes the
read/written data through a LZMACompressor or LZMADecompressor as appropriate.

Cheers,
Nadeem

From martin at v.loewis.de  Sat Aug 27 17:15:09 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 27 Aug 2011 17:15:09 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
Message-ID: <4E5909FD.7060809@v.loewis.de>

> As for file formats, these are handled by liblzma itself; the extension module
> just selects which compressor/decompressor initializer function to use depending
> on the value of the "format" argument. Our code won't contain anything along the
> lines of GzipFile; all of that work is done by the underlying C library. Rather,
> the LZMAFile class will be like BZ2File - just a simple filter that passes the
> read/written data through a LZMACompressor or LZMADecompressor as appropriate.

This is exactly what I worry about. I think adding file I/O to bz2 was a
mistake, as this doesn't integrate with Python's IO library (it used
to, but now after dropping stdio, they were incompatible. Indeed, for
Python 3.2, BZ2File has been removed from the C module, and lifted to
Python.

IOW, the _lzma C module must not do any I/O, neither directly nor
indirectly (through liblzma). The approach of gzip.py (doing IO
and file formats in pure Python) is exactly right.

Regards,
Martin

From ncoghlan at gmail.com  Sat Aug 27 17:36:50 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 28 Aug 2011 01:36:50 +1000
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <4E5909FD.7060809@v.loewis.de>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
Message-ID: <CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>

On Sun, Aug 28, 2011 at 1:15 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> This is exactly what I worry about. I think adding file I/O to bz2 was a
> mistake, as this doesn't integrate with Python's IO library (it used
> to, but now after dropping stdio, they were incompatible. Indeed, for
> Python 3.2, BZ2File has been removed from the C module, and lifted to
> Python.
>
> IOW, the _lzma C module must not do any I/O, neither directly nor
> indirectly (through liblzma). The approach of gzip.py (doing IO
> and file formats in pure Python) is exactly right.

PEP 399 also comes into play - we need a pure Python version for PyPy
et al (or a plausible story for why an exception should be granted).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From nadeem.vawda at gmail.com  Sat Aug 27 17:37:52 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Sat, 27 Aug 2011 17:37:52 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <4E5909FD.7060809@v.loewis.de>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
Message-ID: <CANF4RMmX=0jbk2xaJe=zt_vhXSBMZUg-JB5YWv7D-FbNYv5Ynw@mail.gmail.com>

On Sat, Aug 27, 2011 at 5:15 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> As for file formats, these are handled by liblzma itself; the extension module
>> just selects which compressor/decompressor initializer function to use depending
>> on the value of the "format" argument. Our code won't contain anything along the
>> lines of GzipFile; all of that work is done by the underlying C library. Rather,
>> the LZMAFile class will be like BZ2File - just a simple filter that passes the
>> read/written data through a LZMACompressor or LZMADecompressor as appropriate.
>
> This is exactly what I worry about. I think adding file I/O to bz2 was a
> mistake, as this doesn't integrate with Python's IO library (it used
> to, but now after dropping stdio, they were incompatible. Indeed, for
> Python 3.2, BZ2File has been removed from the C module, and lifted to
> Python.
>
> IOW, the _lzma C module must not do any I/O, neither directly nor
> indirectly (through liblzma). The approach of gzip.py (doing IO
> and file formats in pure Python) is exactly right.

It is not my intention for the _lzma C module to do I/O - that will be done by
the LZMAFile class, which will be written in Python. My comparison with bz2 was
in reference to the state of the module after it was rewritten for issue 5863.

Saying "anything along the lines of GzipFile" was a bad choice of wording; what
I meant is that the LZMAFile class won't handle the problem of picking apart the
.xz and .lzma container formats. That is handled by liblzma (operating entirely
on in-memory buffers). It will do _only_ I/O, in a similar fashion to
the BZ2File
class (as of changeset 2cb07a46f4b5, to avoid ambiguity ;) ).

Cheers,
Nadeem

From martin at v.loewis.de  Sat Aug 27 17:42:50 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 27 Aug 2011 17:42:50 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CANF4RMmX=0jbk2xaJe=zt_vhXSBMZUg-JB5YWv7D-FbNYv5Ynw@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<4E59041A.7040100@v.loewis.de>	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>	<4E5909FD.7060809@v.loewis.de>
	<CANF4RMmX=0jbk2xaJe=zt_vhXSBMZUg-JB5YWv7D-FbNYv5Ynw@mail.gmail.com>
Message-ID: <4E59107A.4010001@v.loewis.de>

> It is not my intention for the _lzma C module to do I/O - that will be done by
> the LZMAFile class, which will be written in Python. My comparison with bz2 was
> in reference to the state of the module after it was rewritten for issue 5863.

Ok. I'll defer my judgement then until actual code is to review.

Not sure whether you already have this: supporting the tarfile module
would be nice.

Regards,
Martin

From solipsis at pitrou.net  Sat Aug 27 17:40:57 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 17:40:57 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
Message-ID: <20110827174057.6c4b619e@pitrou.net>

On Sun, 28 Aug 2011 01:36:50 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Sun, Aug 28, 2011 at 1:15 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > This is exactly what I worry about. I think adding file I/O to bz2 was a
> > mistake, as this doesn't integrate with Python's IO library (it used
> > to, but now after dropping stdio, they were incompatible. Indeed, for
> > Python 3.2, BZ2File has been removed from the C module, and lifted to
> > Python.
> >
> > IOW, the _lzma C module must not do any I/O, neither directly nor
> > indirectly (through liblzma). The approach of gzip.py (doing IO
> > and file formats in pure Python) is exactly right.
> 
> PEP 399 also comes into play - we need a pure Python version for PyPy
> et al (or a plausible story for why an exception should be granted).

The plausible story being that we basically wrap an existing library?
I don't think PyPy et al have pure Python versions of the zlib or
OpenSSL, do they?

If we start taking PEP 399 conformance to such levels, we might as well
stop developing CPython.

cheers

Antoine.



From ncoghlan at gmail.com  Sat Aug 27 17:52:51 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 28 Aug 2011 01:52:51 +1000
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <20110827174057.6c4b619e@pitrou.net>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
Message-ID: <CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>

On Sun, Aug 28, 2011 at 1:40 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sun, 28 Aug 2011 01:36:50 +1000
> Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On Sun, Aug 28, 2011 at 1:15 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> > This is exactly what I worry about. I think adding file I/O to bz2 was a
>> > mistake, as this doesn't integrate with Python's IO library (it used
>> > to, but now after dropping stdio, they were incompatible. Indeed, for
>> > Python 3.2, BZ2File has been removed from the C module, and lifted to
>> > Python.
>> >
>> > IOW, the _lzma C module must not do any I/O, neither directly nor
>> > indirectly (through liblzma). The approach of gzip.py (doing IO
>> > and file formats in pure Python) is exactly right.
>>
>> PEP 399 also comes into play - we need a pure Python version for PyPy
>> et al (or a plausible story for why an exception should be granted).
>
> The plausible story being that we basically wrap an existing library?
> I don't think PyPy et al have pure Python versions of the zlib or
> OpenSSL, do they?
>
> If we start taking PEP 399 conformance to such levels, we might as well
> stop developing CPython.

It's acceptable for the Python version to use ctypes in the case of
wrapping an existing library, but the Python version should still
exist.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From nadeem.vawda at gmail.com  Sat Aug 27 17:58:11 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Sat, 27 Aug 2011 17:58:11 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
Message-ID: <CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>

On Sat, Aug 27, 2011 at 5:42 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Not sure whether you already have this: supporting the tarfile module
> would be nice.

Yes, got that - issue 5689. Also of interest is issue 5411 - adding .xz
support to distutils. But I think that these are separate projects that
should wait until the lzma module is finalized.

On Sat, Aug 27, 2011 at 5:40 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sun, 28 Aug 2011 01:36:50 +1000
> Nick Coghlan <ncoghlan at gmail.com> wrote:
>> PEP 399 also comes into play - we need a pure Python version for PyPy
>> et al (or a plausible story for why an exception should be granted).
>
> The plausible story being that we basically wrap an existing library?
> I don't think PyPy et al have pure Python versions of the zlib or
> OpenSSL, do they?
>
> If we start taking PEP 399 conformance to such levels, we might as well
> stop developing CPython.

Indeed, PEP 399 specifically notes that exemptions can be granted for
modules that wrap external C libraries.

On Sat, Aug 27, 2011 at 5:52 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> It's acceptable for the Python version to use ctypes in the case of
> wrapping an existing library, but the Python version should still
> exist.

I'm not too sure about that - PEP 399 explicitly says that using ctypes is
frowned upon, and doesn't mention anywhere that it should be used in this
sort of situation.

Cheers,
Nadeem

From ncoghlan at gmail.com  Sat Aug 27 18:04:39 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 28 Aug 2011 02:04:39 +1000
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
Message-ID: <CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>

On Sun, Aug 28, 2011 at 1:58 AM, Nadeem Vawda <nadeem.vawda at gmail.com> wrote:
> On Sat, Aug 27, 2011 at 5:52 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> It's acceptable for the Python version to use ctypes in the case of
>> wrapping an existing library, but the Python version should still
>> exist.
>
> I'm not too sure about that - PEP 399 explicitly says that using ctypes is
> frowned upon, and doesn't mention anywhere that it should be used in this
> sort of situation.

Note to self: do not comment on python-dev at 2 am, as one's ability
to read PEPs correctly apparently suffers :)

Consider my comment withdrawn, you're quite right that PEP 399
actually says this is precisely the case where an exemption is a
reasonable idea. Although I believe it's likely that PyPy will wrap it
with ctypes anyway :)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From arigo at tunes.org  Sat Aug 27 18:14:10 2011
From: arigo at tunes.org (Armin Rigo)
Date: Sat, 27 Aug 2011 18:14:10 +0200
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
Message-ID: <CAMSv6X3DRGuA+L-3yZ9Ozo_bj7QpTn+qfONrK2b=mCSJKA5jiQ@mail.gmail.com>

Hi Antoine,

> You then risk deadlocks. Say:
> (...)

Yes, it is indeed not a solution that co-operates transparently and
deadlock-freely with regular locks.  You risk the same kind of
deadlocks as you would when using only locks.  The reason is similar
to threads that try to acquire two locks in succession.  In your
example:

> - thread A is inside a "with atomic" and calls a library function which
>   tries to take lock L

This is basically dangerous, because it corresponds to taking lock
"GIL" and lock L, in that order, whereas the thread B takes lock L and
plays around with lock "GIL" in the opposite order.  I think a
reasonable solution to avoid deadlocks is simply not to use explicit
locks inside "with atomic" blocks.

Generally speaking it can be regarded as wrong to do any action that
causes an unbounded wait in a "with atomic" block, but the solution I
chose to implement in my patch is to still allow them, because it
doesn't make much sense to say that "print" or "pdb.set_trace()" are
forbidden.


A bient?t,

Armin.

From guido at python.org  Sat Aug 27 18:19:31 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 27 Aug 2011 09:19:31 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net> <4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
Message-ID: <CAP7+vJKGmOXU2eV7qZwbT2CzL04mihNnwDuuHso-AYokFzLdOQ@mail.gmail.com>

On Fri, Aug 26, 2011 at 11:01 PM, Dan Stromberg <drsalists at gmail.com> wrote:
[Steven]
>> Have then been any __future__ features that were added provisionally?
>
> I can't either, but ISTR hearing that from __future__ import was started
> with such an intent.? Irrespective, it's hard to import something from
> "future" without at least suspecting that you're on the bleeding edge.

No, this was not the intent of __future__. The intent is that a
feature is desirable but also backwards incompatible (e.g. introduces
a new keyword) so that for 1 (sometimes more) releases we require the
users to use the __future__ import.

There was never any intent to use __future__ for experimental
features. If we want that maybe we could have from __experimental__
import <whatever>.

-- 
--Guido van Rossum (python.org/~guido)

From drsalists at gmail.com  Sat Aug 27 18:48:16 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Sat, 27 Aug 2011 09:48:16 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJKGmOXU2eV7qZwbT2CzL04mihNnwDuuHso-AYokFzLdOQ@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net>
	<4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
	<CAP7+vJKGmOXU2eV7qZwbT2CzL04mihNnwDuuHso-AYokFzLdOQ@mail.gmail.com>
Message-ID: <CAGGBd_rs5ugc-o_XqQnPk_8TOBYFFjy29XKY7DUfpd5OmbX1Lg@mail.gmail.com>

On Sat, Aug 27, 2011 at 9:19 AM, Guido van Rossum <guido at python.org> wrote:

> On Fri, Aug 26, 2011 at 11:01 PM, Dan Stromberg <drsalists at gmail.com>
> wrote:
> [Steven]
> >> Have then been any __future__ features that were added provisionally?
> >
> > I can't either, but ISTR hearing that from __future__ import was started
> > with such an intent.  Irrespective, it's hard to import something from
> > "future" without at least suspecting that you're on the bleeding edge.
>
> No, this was not the intent of __future__. The intent is that a
> feature is desirable but also backwards incompatible (e.g. introduces
> a new keyword) so that for 1 (sometimes more) releases we require the
> users to use the __future__ import.
>
> There was never any intent to use __future__ for experimental
> features. If we want that maybe we could have from __experimental__
> import <whatever>.
>
> OK.  So what -is- the purpose of from __future__ import?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/5931b3e9/attachment.html>

From solipsis at pitrou.net  Sat Aug 27 18:50:40 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 18:50:40 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
Message-ID: <20110827185040.5cb3064a@pitrou.net>

On Sun, 28 Aug 2011 01:52:51 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> >
> > The plausible story being that we basically wrap an existing library?
> > I don't think PyPy et al have pure Python versions of the zlib or
> > OpenSSL, do they?
> >
> > If we start taking PEP 399 conformance to such levels, we might as well
> > stop developing CPython.
> 
> It's acceptable for the Python version to use ctypes in the case of
> wrapping an existing library, but the Python version should still
> exist.

I think you're taking this too seriously. Our extension modules (_bz2,
_ssl...) are *already* optional even on CPython. If the library or its
development headers are not available on the system, building these
extensions is simply skipped, and the test suite passes nonetheless.
The only required libraries for passing the tests being basically the
libc and the zlib.

Regards

Antoine.

From brian.curtin at gmail.com  Sat Aug 27 18:53:13 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Sat, 27 Aug 2011 11:53:13 -0500
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAGGBd_rs5ugc-o_XqQnPk_8TOBYFFjy29XKY7DUfpd5OmbX1Lg@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net>
	<4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
	<CAP7+vJKGmOXU2eV7qZwbT2CzL04mihNnwDuuHso-AYokFzLdOQ@mail.gmail.com>
	<CAGGBd_rs5ugc-o_XqQnPk_8TOBYFFjy29XKY7DUfpd5OmbX1Lg@mail.gmail.com>
Message-ID: <CAD+XWwrkcH7HWYfxYGaJPu0ixvg_Jii=8=v+7q1XEj36jb0Tgw@mail.gmail.com>

On Sat, Aug 27, 2011 at 11:48, Dan Stromberg <drsalists at gmail.com> wrote:
>
> No, this was not the intent of __future__. The intent is that a
>> feature is desirable but also backwards incompatible (e.g. introduces
>> a new keyword) so that for 1 (sometimes more) releases we require the
>> users to use the __future__ import.
>>
>> There was never any intent to use __future__ for experimental
>> features. If we want that maybe we could have from __experimental__
>> import <whatever>.
>>
>> OK.  So what -is- the purpose of from __future__ import?
>

It's in the first paragraph.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/843204df/attachment.html>

From martin at v.loewis.de  Sat Aug 27 19:07:47 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 27 Aug 2011 19:07:47 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<4E59041A.7040100@v.loewis.de>	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>	<4E5909FD.7060809@v.loewis.de>	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
Message-ID: <4E592463.8020305@v.loewis.de>

>>> PEP 399 also comes into play - we need a pure Python version for PyPy
>>> et al (or a plausible story for why an exception should be granted).

No, we don't. We can grant an exception, which I'm very willing to do.
The PEP lists wrapping a specific C-based library as a plausible reason.

> It's acceptable for the Python version to use ctypes

Hmm. To me, *that's* unacceptable. In the specific case, having a
pure-Python implementation would be acceptable to me, but I'm skeptical
that anybody is willing to produce one.

Regards,
Martin

From neologix at free.fr  Sat Aug 27 19:11:18 2011
From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Sat, 27 Aug 2011 19:11:18 +0200
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <CAMSv6X3DRGuA+L-3yZ9Ozo_bj7QpTn+qfONrK2b=mCSJKA5jiQ@mail.gmail.com>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
	<CAMSv6X3DRGuA+L-3yZ9Ozo_bj7QpTn+qfONrK2b=mCSJKA5jiQ@mail.gmail.com>
Message-ID: <CAH_1eM2Pj_AWrs_aKjgPsRtgvupM=FC9d7QMa9wZTwkG0dFe5Q@mail.gmail.com>

Hi Armin,

> This is basically dangerous, because it corresponds to taking lock
> "GIL" and lock L, in that order, whereas the thread B takes lock L and
> plays around with lock "GIL" in the opposite order. ?I think a
> reasonable solution to avoid deadlocks is simply not to use explicit
> locks inside "with atomic" blocks.

The problem is that many locks are actually acquired implicitely.
For example, `print` to a buffered stream will acquire the fileobject's mutex.
Also, even if the code inside the "with atomic" block doesn't directly
or indirectely acquire a lock, there's still the possibility of
asynchronous code that acquire locks being executed in the middle of
this block: for example, signal handlers are run on behalf of the main
thread from the main eval loop and in certain other places, and the GC
might kick in at any time.

> Generally speaking it can be regarded as wrong to do any action that
> causes an unbounded wait in a "with atomic" block,

Indeed.

cf

From martin at v.loewis.de  Sat Aug 27 19:11:58 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 27 Aug 2011 19:11:58 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <20110827121012.37b39947@pitrou.net>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>	<4E582432.2080301@v.loewis.de>	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>	<4E588877.3080204@v.loewis.de>
	<20110827121012.37b39947@pitrou.net>
Message-ID: <4E59255E.6000905@v.loewis.de>

Am 27.08.2011 12:10, schrieb Antoine Pitrou:
> On Sat, 27 Aug 2011 08:02:31 +0200
> "Martin v. L?wis" <martin at v.loewis.de> wrote:
>>> I'm not sure it's worth doing an extensive review of the code, a better
>>> approach might be to require extensive test coverage  (and a review of
>>> tests).
>>
>> I think it's worth. It's really bad if only one developer fully
>> understands the regex implementation.
> 
> Could such a review be the topic of an informational PEP?

Well, the reviewer would also have to dive into the code details,
e.g. through Rietveld. Of course, referencing the Rietveld issue in
the PEP might be appropriate.

A PEP should IMO only cover end-user aspects of the new re module.
Code organization is typically not in the PEP. To give a specific
example: you mentioned that there is (near) code duplication
MRAB's module. As a reviewer, I would discuss whether this can be
eliminated - but not in the PEP.

Regards,
Martin

From solipsis at pitrou.net  Sat Aug 27 19:36:01 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Aug 2011 19:36:01 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<20110827185040.5cb3064a@pitrou.net>
Message-ID: <20110827193601.60582ee5@pitrou.net>

On Sat, 27 Aug 2011 18:50:40 +0200
Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Sun, 28 Aug 2011 01:52:51 +1000
> Nick Coghlan <ncoghlan at gmail.com> wrote:
> > >
> > > The plausible story being that we basically wrap an existing library?
> > > I don't think PyPy et al have pure Python versions of the zlib or
> > > OpenSSL, do they?
> > >
> > > If we start taking PEP 399 conformance to such levels, we might as well
> > > stop developing CPython.
> > 
> > It's acceptable for the Python version to use ctypes in the case of
> > wrapping an existing library, but the Python version should still
> > exist.
> 
> I think you're taking this too seriously. Our extension modules (_bz2,
> _ssl...) are *already* optional even on CPython. If the library or its
> development headers are not available on the system, building these
> extensions is simply skipped, and the test suite passes nonetheless.
> The only required libraries for passing the tests being basically the
> libc and the zlib.

...and, apparently, pyexpat...



From drsalists at gmail.com  Sat Aug 27 20:20:10 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Sat, 27 Aug 2011 11:20:10 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAD+XWwrkcH7HWYfxYGaJPu0ixvg_Jii=8=v+7q1XEj36jb0Tgw@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net>
	<4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
	<CAP7+vJKGmOXU2eV7qZwbT2CzL04mihNnwDuuHso-AYokFzLdOQ@mail.gmail.com>
	<CAGGBd_rs5ugc-o_XqQnPk_8TOBYFFjy29XKY7DUfpd5OmbX1Lg@mail.gmail.com>
	<CAD+XWwrkcH7HWYfxYGaJPu0ixvg_Jii=8=v+7q1XEj36jb0Tgw@mail.gmail.com>
Message-ID: <CAGGBd_qFwnfdmbUvz+tXpFqQxXV2e14b-89ZvFMgVD0LpimR7A@mail.gmail.com>

On Sat, Aug 27, 2011 at 9:53 AM, Brian Curtin <brian.curtin at gmail.com>wrote:

> On Sat, Aug 27, 2011 at 11:48, Dan Stromberg <drsalists at gmail.com> wrote:
>>
>> No, this was not the intent of __future__. The intent is that a
>>> feature is desirable but also backwards incompatible (e.g. introduces
>>> a new keyword) so that for 1 (sometimes more) releases we require the
>>> users to use the __future__ import.
>>>
>>> There was never any intent to use __future__ for experimental
>>> features. If we want that maybe we could have from __experimental__
>>> import <whatever>.
>>>
>>> OK.  So what -is- the purpose of from __future__ import?
>>
>
> It's in the first paragraph.
>

I disagree.  The first paragraph says this has something to do with new
keywords.  It doesn't appear to say what we expect users to -do- with it.
Both are important.

Is it "You'd better try this, because it's going in eventually.  If you
don't try it out before it becomes default behavior, you have no right to
complain"?

And if people do complain, what are python-dev's options?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/2b2df8f9/attachment.html>

From hsoft at hardcoded.net  Sat Aug 27 20:33:12 2011
From: hsoft at hardcoded.net (Virgil Dupras)
Date: Sat, 27 Aug 2011 14:33:12 -0400
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAGGBd_qFwnfdmbUvz+tXpFqQxXV2e14b-89ZvFMgVD0LpimR7A@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net>
	<4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
	<CAP7+vJKGmOXU2eV7qZwbT2CzL04mihNnwDuuHso-AYokFzLdOQ@mail.gmail.com>
	<CAGGBd_rs5ugc-o_XqQnPk_8TOBYFFjy29XKY7DUfpd5OmbX1Lg@mail.gmail.com>
	<CAD+XWwrkcH7HWYfxYGaJPu0ixvg_Jii=8=v+7q1XEj36jb0Tgw@mail.gmail.com>
	<CAGGBd_qFwnfdmbUvz+tXpFqQxXV2e14b-89ZvFMgVD0LpimR7A@mail.gmail.com>
Message-ID: <C7E76C6D-81EB-4999-9A4A-ADE80CE36E83@hardcoded.net>

On 2011-08-27, at 2:20 PM, Dan Stromberg wrote:

> 
> On Sat, Aug 27, 2011 at 9:53 AM, Brian Curtin <brian.curtin at gmail.com> wrote:
> On Sat, Aug 27, 2011 at 11:48, Dan Stromberg <drsalists at gmail.com> wrote:
> No, this was not the intent of __future__. The intent is that a
> feature is desirable but also backwards incompatible (e.g. introduces
> a new keyword) so that for 1 (sometimes more) releases we require the
> users to use the __future__ import.
> 
> There was never any intent to use __future__ for experimental
> features. If we want that maybe we could have from __experimental__
> import <whatever>.
> 
> OK.  So what -is- the purpose of from __future__ import?
> 
> It's in the first paragraph. 
> 
> I disagree.  The first paragraph says this has something to do with new keywords.  It doesn't appear to say what we expect users to -do- with it.  Both are important.
> 
> Is it "You'd better try this, because it's going in eventually.  If you don't try it out before it becomes default behavior, you have no right to complain"?
> 
> And if people do complain, what are python-dev's options?
> 

__future__ imports have nothing to do with "trying stuff before it comes", it has to do with backward compatibility. For example, the "with_statement" was a __future__ import because introducing the "with" keyword would break any code using "with" as a token. I don't think that the goal of introducing "with" as a future import was "we're gonna see how it pans out, and decide if we really introduce it later".

__future__ means "It's coming, prepare your code".

From martin at v.loewis.de  Sat Aug 27 21:05:35 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 27 Aug 2011 21:05:35 +0200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAGGBd_qFwnfdmbUvz+tXpFqQxXV2e14b-89ZvFMgVD0LpimR7A@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>	<20110827020835.08a2a492@pitrou.net>	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>	<20110827035916.583c3d81@pitrou.net>	<4E5868D6.8090203@pearwood.info>	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>	<CAP7+vJKGmOXU2eV7qZwbT2CzL04mihNnwDuuHso-AYokFzLdOQ@mail.gmail.com>	<CAGGBd_rs5ugc-o_XqQnPk_8TOBYFFjy29XKY7DUfpd5OmbX1Lg@mail.gmail.com>	<CAD+XWwrkcH7HWYfxYGaJPu0ixvg_Jii=8=v+7q1XEj36jb0Tgw@mail.gmail.com>
	<CAGGBd_qFwnfdmbUvz+tXpFqQxXV2e14b-89ZvFMgVD0LpimR7A@mail.gmail.com>
Message-ID: <4E593FFF.1030203@v.loewis.de>

> I disagree.  The first paragraph says this has something to do with new
> keywords.  It doesn't appear to say what we expect users to -do- with
> it.  Both are important.

Well, users can use the new features...

> Is it "You'd better try this, because it's going in eventually.  If you
> don't try it out before it becomes default behavior, you have no right
> to complain"?

No. It's "we have that feature which will be activated in a future
version. If you want to use it today, use the __future__ import. If
you don't want to use it (now or in the future), just don't."

> And if people do complain, what are python-dev's options?

That will depend on the complaint. If it's "I don't like the new
feature", then the obvious response is "don't use it, then".

Regards,
Martin

From tjreedy at udel.edu  Sat Aug 27 21:47:00 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 27 Aug 2011 15:47:00 -0400
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
Message-ID: <j3bhkr$s98$1@dough.gmane.org>

On 8/27/2011 9:47 AM, Nadeem Vawda wrote:

> I'd like to propose the addition of a new module in Python 3.3. The 'lzma'
> module will provide support for compression and decompression using the LZMA
> algorithm, and the .xz and .lzma file formats. The matter has already been
> discussed on the tracker<http://bugs.python.org/issue6715>, where there seems
> to be a consensus that this is a desirable feature. What are your thoughts?

As I read the discussion, the idea has been more or less accepted in 
principle. However, the current patch is not and needs changes.

> The proposed module's API will be very similar to that of the bz2 module;
> the only differences will be additional keyword arguments to some functions,
> for specifying container formats and detailed compressor options.

I believe Antoine suggested a PEP. It should summarize the salient 
points in the long tracker discussion into a coherent exposition and 
flesh out the details implied above. (Perhaps they are already in the 
proposed doc addition.)

> The implementation will also be similar to bz2 - basic compressor and
> decompressor classes written in C, with convenience functions and a file
> interface implemented on top of those in Python.

I would follow Martin's suggestions, including doing all i/o with the io 
module and the following:
"So I would propose that a very thin C layer is created around the C
library that focuses on the actual algorithms, and that any higher
layers (in particular file formats) are done in Python."

If we minimize the C code we add and maximize what is done in Python, 
that would maximize the ease of porting to other implementations. This 
would conform to the spirit of PEP 399.

-- 
Terry Jan Reedy


From steve at pearwood.info  Sat Aug 27 21:55:52 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 28 Aug 2011 05:55:52 +1000
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAGGBd_qFwnfdmbUvz+tXpFqQxXV2e14b-89ZvFMgVD0LpimR7A@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>	<20110827020835.08a2a492@pitrou.net>	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>	<20110827035916.583c3d81@pitrou.net>	<4E5868D6.8090203@pearwood.info>	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>	<CAP7+vJKGmOXU2eV7qZwbT2CzL04mihNnwDuuHso-AYokFzLdOQ@mail.gmail.com>	<CAGGBd_rs5ugc-o_XqQnPk_8TOBYFFjy29XKY7DUfpd5OmbX1Lg@mail.gmail.com>	<CAD+XWwrkcH7HWYfxYGaJPu0ixvg_Jii=8=v+7q1XEj36jb0Tgw@mail.gmail.com>
	<CAGGBd_qFwnfdmbUvz+tXpFqQxXV2e14b-89ZvFMgVD0LpimR7A@mail.gmail.com>
Message-ID: <4E594BC8.4060802@pearwood.info>

Dan Stromberg wrote:
> On Sat, Aug 27, 2011 at 9:53 AM, Brian Curtin <brian.curtin at gmail.com>wrote:
> 
>> On Sat, Aug 27, 2011 at 11:48, Dan Stromberg <drsalists at gmail.com> wrote:
>>> No, this was not the intent of __future__. The intent is that a
>>>> feature is desirable but also backwards incompatible (e.g. introduces
>>>> a new keyword) so that for 1 (sometimes more) releases we require the
>>>> users to use the __future__ import.
>>>>
>>>> There was never any intent to use __future__ for experimental
>>>> features. If we want that maybe we could have from __experimental__
>>>> import <whatever>.
>>>>
>>>> OK.  So what -is- the purpose of from __future__ import?
>> It's in the first paragraph.
>>
> 
> I disagree.  The first paragraph says this has something to do with new
> keywords.  It doesn't appear to say what we expect users to -do- with it.
> Both are important.

Have you read the PEP? I found it very helpful.

http://www.python.org/dev/peps/pep-0236/

The motivation given in the first paragraph is pretty clear to me: 
__future__ is machinery added to Python to aid the transition when a 
backwards incompatible change is made.

Perhaps it needs a note stating explicitly that it is not for trying out 
new features which may or may not be added at a later date. That may 
help prevent confusion in the, er, future.


[...]
> And if people do complain, what are python-dev's options?

The PEP includes a question very similar to that:


   Q: Going back to the nested_scopes example, what if release 2.2
      comes along and I still haven't changed my code?  How can I keep
      the 2.1 behavior then?

   A: By continuing to use 2.1, and not moving to 2.2 until you do
      change your code.  The purpose of future_statement is to make
      life easier for people who keep current with the latest release
      in a timely fashion.  We don't hate you if you don't, but your
      problems are much harder to solve, and somebody with those
      problems will need to write a PEP addressing them.
      future_statement is aimed at a different audience.


To me, it's quite clear: once a feature change hits __future__, it is 
already part of the language. It may be an optional part for at least 
one release, but removing it again will require the same deprecation 
process as removing any other language feature (see PEP 5 for more details).



-- 
Steven


From digitalxero at gmail.com  Sat Aug 27 21:57:26 2011
From: digitalxero at gmail.com (Dj Gilcrease)
Date: Sat, 27 Aug 2011 15:57:26 -0400
Subject: [Python-Dev] Add from __experimental__ import bla [was: Should we
 move to replace re with regex?]
Message-ID: <CAMPUAFOFTjjxytsc=QJ0Eo6kV3KA1x917Kx+MC9tNeeCrDoy_Q@mail.gmail.com>

In the thread about replacing re with regex someone mentioned adding
to __future__ which isnt a great idea as future APIs are already
solidified, they just live there to give developer time to adapt their
code. The idea of a __experimental__ area is good for any pep's or
stliib additions that are somewhat controversial (API isnt agreed on,
code may take a while to integrate properly, developer wants some time
to hash out any edge case bugs or API clarifications that may come up
in large scale testing, etc).

__experimental__ should emit a warning on import that says anything in
here may change or be removed at any time and should not be used in
stable code.

__experimental__ features should behave the same as __future__ in that
they can add new keywords or semantics to the existing language

__experimental__ features can move directly to the stlib or builtins
if they do not add new keywords and/or are backwards compatible with
the feature they are replacing. Otherwise they move into __future__
for how ever many releases are deemed reasonable time for developers
to adapt their code.

From drsalists at gmail.com  Sat Aug 27 21:58:39 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Sat, 27 Aug 2011 12:58:39 -0700
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
Message-ID: <CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>

On Sat, Aug 27, 2011 at 9:04 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Sun, Aug 28, 2011 at 1:58 AM, Nadeem Vawda <nadeem.vawda at gmail.com>
> wrote:
> > On Sat, Aug 27, 2011 at 5:52 PM, Nick Coghlan <ncoghlan at gmail.com>
> wrote:
> >> It's acceptable for the Python version to use ctypes in the case of
> >> wrapping an existing library, but the Python version should still
> >> exist.
> >
> > I'm not too sure about that - PEP 399 explicitly says that using ctypes
> is
> > frowned upon, and doesn't mention anywhere that it should be used in this
> > sort of situation.
>
> Note to self: do not comment on python-dev at 2 am, as one's ability
> to read PEPs correctly apparently suffers :)
>
> Consider my comment withdrawn, you're quite right that PEP 399
> actually says this is precisely the case where an exemption is a
> reasonable idea. Although I believe it's likely that PyPy will wrap it
> with ctypes anyway :)
>

I'd like to better understand why ctypes is (sometimes) frowned upon.

Is it the brittleness?  Tendency to segfault?

If yes, is there a way of making ctypes less brittle - say, by carefully
matching it against a specific version of a .so/.dll before starting to make
heavy use of said .so/.dll?

FWIW, I have a partial implementation of a module that does xz from Python
using ctypes.  It only does in-memory compression and decompression (not
stream compression or decompression to or from a file), because that was all
I needed for my current project, but it runs on CPython 2.x, CPython 3.x,
and PyPy.  I don't think it runs on Jython, but I've not looked at that
carefully - my code falls back on subprocess if ctypes doesn't appear to be
all there.

It's at http://stromberg.dnsalias.org/svn/xz_mod/trunk/xz_mod.py
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/d098d4be/attachment.html>

From martin at v.loewis.de  Sat Aug 27 22:21:41 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 27 Aug 2011 22:21:41 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<4E59041A.7040100@v.loewis.de>	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>	<4E5909FD.7060809@v.loewis.de>	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>	<20110827174057.6c4b619e@pitrou.net>	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
Message-ID: <4E5951D5.5020200@v.loewis.de>

> I'd like to better understand why ctypes is (sometimes) frowned upon.
> 
> Is it the brittleness?  Tendency to segfault?

That, and Python should work completely if ctypes is not available.

> FWIW, I have a partial implementation of a module that does xz from
> Python using ctypes.

So does it work on Sparc/Solaris? On OpenBSD? On ARM-Linux? Does it
work if the xz library is installed into /opt/sfw/xz?

Regards,
Martin

From nadeem.vawda at gmail.com  Sat Aug 27 22:36:52 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Sat, 27 Aug 2011 22:36:52 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
Message-ID: <CANF4RMmKLXbmE04NBi+s9n+-GtXhWS-1bv3TCDL8XzBL4NeVXQ@mail.gmail.com>

On Sat, Aug 27, 2011 at 9:47 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 8/27/2011 9:47 AM, Nadeem Vawda wrote:
>> I'd like to propose the addition of a new module in Python 3.3. The 'lzma'
>> module will provide support for compression and decompression using the
>> LZMA
>> algorithm, and the .xz and .lzma file formats. The matter has already been
>> discussed on the tracker<http://bugs.python.org/issue6715>, where there
>> seems
>> to be a consensus that this is a desirable feature. What are your
>> thoughts?
>
> As I read the discussion, the idea has been more or less accepted in
> principle. However, the current patch is not and needs changes.

Please note that the code I'm talking about is not the same as the patches by
Per ?yvind Karlsen that are attached to the tracker issue. I have been doing
a completely new implementation of the module, specifically to address the
concerns raised by Martin and Antoine.

(As for why I haven't posted my own changes yet - I'm currently an intern at
Google, and they want me to run my code by their open-source team before
releasing it into the wild. Sorry for the delay and the confusion.)


>> The proposed module's API will be very similar to that of the bz2 module;
>> the only differences will be additional keyword arguments to some
>> functions,
>> for specifying container formats and detailed compressor options.
>
> I believe Antoine suggested a PEP. It should summarize the salient points in
> the long tracker discussion into a coherent exposition and flesh out the
> details implied above. (Perhaps they are already in the proposed doc
> addition.)

I talked to Antoine about this on IRC; he didn't seem to think a PEP would be
necessary. But a summary of the discussion on the tracker issue might still
be a useful thing to have, given how long it's gotten.


>> The implementation will also be similar to bz2 - basic compressor and
>> decompressor classes written in C, with convenience functions and a file
>> interface implemented on top of those in Python.
>
> I would follow Martin's suggestions, including doing all i/o with the io
> module and the following:
> "So I would propose that a very thin C layer is created around the C
> library that focuses on the actual algorithms, and that any higher
> layers (in particular file formats) are done in Python."
>
> If we minimize the C code we add and maximize what is done in Python, that
> would maximize the ease of porting to other implementations. This would
> conform to the spirit of PEP 399.

As stated in my earlier response to Martin, I intend to do this. Aside from
I/O, though, there's not much that _can_ be done in Python - the rest is
basically just providing a thin wrapper for the C library.


On Sat, Aug 27, 2011 at 9:58 PM, Dan Stromberg <drsalists at gmail.com> wrote:
> I'd like to better understand why ctypes is (sometimes) frowned upon.
>
> Is it the brittleness?? Tendency to segfault?

The problem (as I understand it) is that ABI changes in a library will
cause code that uses it via ctypes to break without warning. With an
extension module, you'll get a compile failure if you rely on things
that change in an incompatible way. With a ctypes wrapper, you just get
incorrect answers, or segfaults.


> If yes, is there a way of making ctypes less brittle - say, by
> carefully matching it against a specific version of a .so/.dll before
> starting to make heavy use of said .so/.dll?

This might be feasible for a specific application running in a controlled
environment, but it seems impractical for something as widely-used as the
stdlib. Having to include a whitelist of acceptable library versions would
be a substantial maintenance burden, and (compatible) new versions would
not work until the library whitelist gets updated.


Cheers,
Nadeem

From drsalists at gmail.com  Sat Aug 27 22:41:07 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Sat, 27 Aug 2011 13:41:07 -0700
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <4E5951D5.5020200@v.loewis.de>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
Message-ID: <CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>

On Sat, Aug 27, 2011 at 1:21 PM, "Martin v. L?wis" <martin at v.loewis.de>wrote:

> > I'd like to better understand why ctypes is (sometimes) frowned upon.
> >
> > Is it the brittleness?  Tendency to segfault?
>
> That, and Python should work completely if ctypes is not available.
>

What are the most major platforms ctypes doesn't work on?

It seems like there should be some way of coming up with an xml file
describing the types of the various bits of data and formal arguments -
perhaps using gccxml or something like it.

> FWIW, I have a partial implementation of a module that does xz from
> > Python using ctypes.
>
> So does it work on Sparc/Solaris? On OpenBSD? On ARM-Linux? Does it
> work if the xz library is installed into /opt/sfw/xz?
>

So far, I've only tried it on a couple of Linuxes and Cygwin.  I intend to
try it on a large number of *ix variants in the future, including OS/X and
Haiku.  I doubt I'll test OpenBSD, but I'm likely to test on FreeBSD and
Dragonfly again.

With regard to /opt/sfw/xz, if ctypes.util.find_library(library) is smart
enough to look there, then yes, xz_mod should find libxz there.

On Cygwin, ctypes.util.find_library() wasn't smart enough to find a Cygwin
DLL, so I coded around that.  But it finds the library OK on the Linuxes
I've tried so far.

(This is part of a larger project, a backup program.  The backup program has
been tested on a large number of OS's, but I've not done another broad round
of testing yet since adding the ctypes+xz code)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/07fd5676/attachment.html>

From victor.stinner at haypocalc.com  Sat Aug 27 22:54:48 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Sat, 27 Aug 2011 22:54:48 +0200
Subject: [Python-Dev] Add from __experimental__ import bla [was: Should
	we move to replace re with regex?]
In-Reply-To: <CAMPUAFOFTjjxytsc=QJ0Eo6kV3KA1x917Kx+MC9tNeeCrDoy_Q@mail.gmail.com>
References: <CAMPUAFOFTjjxytsc=QJ0Eo6kV3KA1x917Kx+MC9tNeeCrDoy_Q@mail.gmail.com>
Message-ID: <201108272254.48635.victor.stinner@haypocalc.com>

Le samedi 27 ao?t 2011 21:57:26, Dj Gilcrease a ?crit :
> The idea of a __experimental__ area is good for any pep's or
> stliib additions that are somewhat controversial (API isnt agreed on,
> code may take a while to integrate properly, developer wants some time
> to hash out any edge case bugs or API clarifications that may come up
> in large scale testing, etc).

__experimental__ does already exist, it's the Python Package Index (PyPI) !

http://pypi.python.org/pypi

You can write Python extensions in C and distribute them on the PyPI. I did 
that when my patch to display the Python backtrace on a crash was "rejected" 
(not included in Python 3.2, just before the release). It was a great idea, 
because I had more time to change the API (read the history of the 
faulthandler module on PyPI: the API changed 5 times since the first public 
version on PyPI...) and the module is now available for Python 2.5 - 3.2, not 
only for Python 3.3.

Remember that the API of a module added to CPython is frozen. You will have to 
wait something like 18 months until the next CPython release to change 
anything (add a new function, remove an old/useless function, etc.). 
Seriously, it's not a good idea to add a young module into Python before its 
API is well defined and stable.

The Linux kernel has "staging" drivers. It's different because there is a new 
release of the Linux kernel each two months (instead of 18 months for 
CPython). The policy for the API is also different: the kernel has no stable 
API, whereas the Python API cannot be changed in minor release (x.y.Z).

http://www.kroah.com/log/linux/stable_api_nonsense.html
http://www.mjmwired.net/kernel/Documentation/stable_api_nonsense.txt

Victor


From exarkun at twistedmatrix.com  Sat Aug 27 23:02:23 2011
From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com)
Date: Sat, 27 Aug 2011 21:02:23 -0000
Subject: [Python-Dev] Add from __experimental__ import bla [was: Should
	we	move to replace re with regex?]
In-Reply-To: <CAMPUAFOFTjjxytsc=QJ0Eo6kV3KA1x917Kx+MC9tNeeCrDoy_Q@mail.gmail.com>
References: <CAMPUAFOFTjjxytsc=QJ0Eo6kV3KA1x917Kx+MC9tNeeCrDoy_Q@mail.gmail.com>
Message-ID: <20110827210223.1808.46364677.divmod.xquotient.81@localhost.localdomain>

On 07:57 pm, digitalxero at gmail.com wrote:
>In the thread about replacing re with regex someone mentioned adding
>to __future__ which isnt a great idea as future APIs are already
>solidified, they just live there to give developer time to adapt their
>code. The idea of a __experimental__ area is good for any pep's or
>stliib additions that are somewhat controversial (API isnt agreed on,
>code may take a while to integrate properly, developer wants some time
>to hash out any edge case bugs or API clarifications that may come up
>in large scale testing, etc).
>
>__experimental__ should emit a warning on import that says anything in
>here may change or be removed at any time and should not be used in
>stable code.
>
>__experimental__ features should behave the same as __future__ in that
>they can add new keywords or semantics to the existing language
>
>__experimental__ features can move directly to the stlib or builtins
>if they do not add new keywords and/or are backwards compatible with
>the feature they are replacing. Otherwise they move into __future__
>for how ever many releases are deemed reasonable time for developers
>to adapt their code.

Hi Dj,

As a developer of Python libraries and applications, I don't see how 
this would make my life easier.

A warning in a module docstring that a module may not be long-lived if 
it is not well received tells me just as much as a warning emitted at 
runtime.  And a warning emitted at runtime is likely to scare my users 
into thinking something is broken, leading to spurious or misleading bug 
reports.  There also does not appear to be general consensus that 
modules should be added to stdlib if they are not widely used and 
demanded, so I don't know when a module would be added to 
__experimental__, anyway.  The normal deprecation procedures (rarely 
used as they are) seem to cover this, anyway.

Adding a new namespace separate from __future__ also just gives me 
another thing to remember.  Was the feature added to __experimental__ or 
__future__?  Also, it seems even less common that language features are 
added on an experimental basis.  When a language feature (new syntax or 
semantics) goes in to the language, it is there for a long, long time.

If new features are added first to __experimental__ and then to 
__future__ or the non-__experimental__ stdlib namespace, then I just 
have to update all my code to keep using it.  So I'm guaranteed extra 
work whether the feature is successful and is adopted or if it fails and 
is later removed.  I'd rather not have to do the extra work in the 
success case, at least, which is what the existing add-it-and-then-maybe 
-(but-probably-not-)deprecate it approach gives me.

Jean-Paul
>_______________________________________________
>Python-Dev mailing list
>Python-Dev at python.org
>http://mail.python.org/mailman/listinfo/python-dev
>Unsubscribe: http://mail.python.org/mailman/options/python- 
>dev/exarkun%40twistedmatrix.com

From nadeem.vawda at gmail.com  Sat Aug 27 23:38:43 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Sat, 27 Aug 2011 23:38:43 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
Message-ID: <CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>

On Sat, Aug 27, 2011 at 10:41 PM, Dan Stromberg <drsalists at gmail.com> wrote:
> It seems like there should be some way of coming up with an xml file
> describing the types of the various bits of data and formal arguments -
> perhaps using gccxml or something like it.

The problem is that you would need to do this check at runtime, every time
you load up the library - otherwise, what happens if the user upgrades
their installed copy of liblzma? And we can't expect users to have the
liblzma headers installed, so we'd have to try and figure out whether the
library was ABI-compatible from the shared object alone; I doubt that this
is even possible.

From drsalists at gmail.com  Sun Aug 28 00:14:15 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Sat, 27 Aug 2011 15:14:15 -0700
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
Message-ID: <CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>

On Sat, Aug 27, 2011 at 2:38 PM, Nadeem Vawda <nadeem.vawda at gmail.com>wrote:

> On Sat, Aug 27, 2011 at 10:41 PM, Dan Stromberg <drsalists at gmail.com>
> wrote:
> > It seems like there should be some way of coming up with an xml file
> > describing the types of the various bits of data and formal arguments -
> > perhaps using gccxml or something like it.
>
> The problem is that you would need to do this check at runtime, every time
> you load up the library - otherwise, what happens if the user upgrades
> their installed copy of liblzma? And we can't expect users to have the
> liblzma headers installed, so we'd have to try and figure out whether the
> library was ABI-compatible from the shared object alone; I doubt that this
> is even possible.
>

I was thinking about this as I was getting groceries a bit ago.

Why -can't- we expect the user to have liblzma headers installed?  Couldn't
it just be a dependency in the package management system?

BTW, gcc-xml seems to be only for C++ (?), but long ago, around the time
people were switching from K&R to Ansi C, there were programs like
"mkptypes" that could parse a .c/.h and output prototypes.  It seems we
could do something like this on module init.

IMO, we really, really need some common way of accessing C libraries that
works for all major Python variants.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/3ae6b7a9/attachment.html>

From solipsis at pitrou.net  Sun Aug 28 00:26:42 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 28 Aug 2011 00:26:42 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
Message-ID: <20110828002642.4765fc89@pitrou.net>

On Sat, 27 Aug 2011 15:14:15 -0700
Dan Stromberg <drsalists at gmail.com> wrote:

> On Sat, Aug 27, 2011 at 2:38 PM, Nadeem Vawda <nadeem.vawda at gmail.com>wrote:
> 
> > On Sat, Aug 27, 2011 at 10:41 PM, Dan Stromberg <drsalists at gmail.com>
> > wrote:
> > > It seems like there should be some way of coming up with an xml file
> > > describing the types of the various bits of data and formal arguments -
> > > perhaps using gccxml or something like it.
> >
> > The problem is that you would need to do this check at runtime, every time
> > you load up the library - otherwise, what happens if the user upgrades
> > their installed copy of liblzma? And we can't expect users to have the
> > liblzma headers installed, so we'd have to try and figure out whether the
> > library was ABI-compatible from the shared object alone; I doubt that this
> > is even possible.
> >
> 
> I was thinking about this as I was getting groceries a bit ago.
> 
> Why -can't- we expect the user to have liblzma headers installed?  Couldn't
> it just be a dependency in the package management system?

Package managers, under Linux, often split development files (headers,
etc.) from runtime binaries.
Also, under Windows, most users don't have development stuff installed
at all.

Regards

Antoine.

From martin at v.loewis.de  Sun Aug 28 00:47:19 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 28 Aug 2011 00:47:19 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<4E59041A.7040100@v.loewis.de>	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>	<4E5909FD.7060809@v.loewis.de>	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>	<20110827174057.6c4b619e@pitrou.net>	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>	<4E5951D5.5020200@v.loewis.de>	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
Message-ID: <4E5973F7.30805@v.loewis.de>

> Why -can't- we expect the user to have liblzma headers installed? 
> Couldn't it just be a dependency in the package management system?

Please give it up. You just won't convince that list that ctypes
is a viable approach for the standard library.

Regards,
Martin

From drsalists at gmail.com  Sun Aug 28 01:19:01 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Sat, 27 Aug 2011 16:19:01 -0700
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <20110828002642.4765fc89@pitrou.net>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
Message-ID: <CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>

On Sat, Aug 27, 2011 at 3:26 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Sat, 27 Aug 2011 15:14:15 -0700
> Dan Stromberg <drsalists at gmail.com> wrote:
>
> > On Sat, Aug 27, 2011 at 2:38 PM, Nadeem Vawda <nadeem.vawda at gmail.com
> >wrote:
> >
> > > On Sat, Aug 27, 2011 at 10:41 PM, Dan Stromberg <drsalists at gmail.com>
> > > wrote:
> > > > It seems like there should be some way of coming up with an xml file
> > > > describing the types of the various bits of data and formal arguments
> -
> > > > perhaps using gccxml or something like it.
> > >
> > > The problem is that you would need to do this check at runtime, every
> time
> > > you load up the library - otherwise, what happens if the user upgrades
> > > their installed copy of liblzma? And we can't expect users to have the
> > > liblzma headers installed, so we'd have to try and figure out whether
> the
> > > library was ABI-compatible from the shared object alone; I doubt that
> this
> > > is even possible.
> > >
> >
> > I was thinking about this as I was getting groceries a bit ago.
> >
> > Why -can't- we expect the user to have liblzma headers installed?
>  Couldn't
> > it just be a dependency in the package management system?
>
> Package managers, under Linux, often split development files (headers,
> etc.) from runtime binaries.
>

Well, uhhhhh, yeah.  Not sure what your point is.
1) We could easily work with the dev / nondev distinction by taking a
dependency on the -dev version of whatever we need, instead of the nondev
version.
2) It's a rather arbitrary distinction that's being drawn between dev and
nondev today.  There's no particular reason why the line couldn't be drawn
somewhere else.


> Also, under Windows, most users don't have development stuff installed
> at all.
>
Yes...  But if the nature of "what development stuff is" were to change,
they'd have different stuff.

Also, we wouldn't have to parse the .h's every time a module is loaded - we
could have a timestamp file (or database) indicating when we last parsed a
given .h.

Also, we could query the package management system for the version of lzma
that's currently installed on module init.

Also, we could include our own version of lzma.  Granted, this was a mess
when zlib needed to be patched, but even this one might be worth it for the
improved library unification across Python implementations.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/0c3c46d2/attachment-0001.html>

From solipsis at pitrou.net  Sun Aug 28 01:27:05 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 28 Aug 2011 01:27:05 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
Message-ID: <20110828012705.523e51d4@pitrou.net>

On Sat, 27 Aug 2011 16:19:01 -0700
Dan Stromberg <drsalists at gmail.com> wrote:
> 2) It's a rather arbitrary distinction that's being drawn between dev and
> nondev today.  There's no particular reason why the line couldn't be drawn
> somewhere else.

Sure. Now please convince Linux distributions first, because this
particular subthread is going nowhere.

Regards

Antoine.

From greg.ewing at canterbury.ac.nz  Sun Aug 28 01:39:10 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 28 Aug 2011 11:39:10 +1200
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CADiSq7di9zsTRnyD8tqG_=i4nqdzF7E1kfXZM6s2yUKoQXKnZg@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net> <4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
	<CADiSq7di9zsTRnyD8tqG_=i4nqdzF7E1kfXZM6s2yUKoQXKnZg@mail.gmail.com>
Message-ID: <4E59801E.7080406@canterbury.ac.nz>

Nick Coghlan wrote:

> The next step needed is for someone to volunteer to write and champion
> a PEP that:

Would it be feasible and desirable to modify regex so
that it *is* backwards-compatible with re, with a view
to making it a drop-in replacement at some point?

If not, the PEP should discuss this also.

-- 
Greg

From tjreedy at udel.edu  Sun Aug 28 02:48:02 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 27 Aug 2011 20:48:02 -0400
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <4E59801E.7080406@canterbury.ac.nz>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net>
	<4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
	<CADiSq7di9zsTRnyD8tqG_=i4nqdzF7E1kfXZM6s2yUKoQXKnZg@mail.gmail.com>
	<4E59801E.7080406@canterbury.ac.nz>
Message-ID: <j3c399$vrd$1@dough.gmane.org>

On 8/27/2011 7:39 PM, Greg Ewing wrote:
> Nick Coghlan wrote:
>
>> The next step needed is for someone to volunteer to write and champion
>> a PEP that:
>
> Would it be feasible and desirable to modify regex so
> that it *is* backwards-compatible with re, with a view
> to making it a drop-in replacement at some point?
>
> If not, the PEP should discuss this also.

Many of the things regex does differently might be called either bug 
fixes or feature changes, depending on one's viewpoint. Regex should 
definitely not be 'bug-compatible'.

I think regex should be unicode-standard compliant as much as possible, 
and let the chips fall where they may. If so, it would be like the 
decimal module, which closely tracks the IEEE decimal standard, rather 
than the binary float standard. Regex is already much more compliant 
than re, as shown by Tom Christiansen. This is pretty obviously 
intentional on MB's part. It is also probably intentional that re *not* 
match today's Unicode TR18 specifications.

These are reasons why both Ezio and I suggested on the tracker adding 
regex without deleting re. (I personally would not mind just replacing 
re with regex, but then I have no legacy re code to break. So I am not 
suggesting that out of respect for those who do.)

-- 
Terry Jan Reedy


From drsalists at gmail.com  Sun Aug 28 03:28:20 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Sat, 27 Aug 2011 18:28:20 -0700
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <20110828012705.523e51d4@pitrou.net>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
Message-ID: <CAGGBd_rEqAiTxBKwW2K_Dn-027st3CfcowLHLmmL3=D68B8CVA@mail.gmail.com>

On Sat, Aug 27, 2011 at 4:27 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

>
> Sure. Now please convince Linux distributions first, because this
> particular subthread is going nowhere.
>

I hope you're not a solipsist.

Anyway, if the mere -discussion- of embracing a standard and safe way of
making C libraries callable from all the major Python implementations is
"going nowhere" before the discussion has even gotten started, I fear for
Python's future.

Repeat aloud to yourself: Python != CPython.  Python != CPython.  Python !=
CPython.

Has this topic been discussed to death?  If so, then say so.  It's rude to
try to kill the thread summarily before it gets started, sans discussion,
sans explanation, sans commentary on whether new additions to the topic have
surfaced or not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/c11ae533/attachment.html>

From drsalists at gmail.com  Sun Aug 28 03:33:10 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Sat, 27 Aug 2011 18:33:10 -0700
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <20110828012705.523e51d4@pitrou.net>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
Message-ID: <CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>

On Sat, Aug 27, 2011 at 4:27 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Sat, 27 Aug 2011 16:19:01 -0700
> Dan Stromberg <drsalists at gmail.com> wrote:
> > 2) It's a rather arbitrary distinction that's being drawn between dev and
> > nondev today.  There's no particular reason why the line couldn't be
> drawn
> > somewhere else.
>
> Sure. Now please convince Linux distributions first, because this
> particular subthread is going nowhere.
>

Interesting.  You seem to want to throw an arbitrary barrier between Python,
the language, and accomplishing something important for said language.

Care to tell me why I'm wrong?  I'm all ears.

I'll note that you've deleted:

> 1) We could easily work with the dev / nondev distinction by
> taking a dependency on the -dev version of whatever we need,
> instead of the nondev version.

...which makes it more than apparent that we needn't convince Linux
distributors of #2, which you seem to prefer to focus on.

Why was it in your best interest to delete #1, without even commenting on
it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/67afcfe9/attachment.html>

From ezio.melotti at gmail.com  Sun Aug 28 05:19:15 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Sun, 28 Aug 2011 06:19:15 +0300
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <j3c399$vrd$1@dough.gmane.org>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net>
	<4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
	<CADiSq7di9zsTRnyD8tqG_=i4nqdzF7E1kfXZM6s2yUKoQXKnZg@mail.gmail.com>
	<4E59801E.7080406@canterbury.ac.nz> <j3c399$vrd$1@dough.gmane.org>
Message-ID: <CACBhJdFwT_MvxJmehsJmQG7GWRsmb6KQtQjZ2wwjLWv+cGCWVg@mail.gmail.com>

On Sun, Aug 28, 2011 at 3:48 AM, Terry Reedy <tjreedy at udel.edu> wrote:

>
> These are reasons why both Ezio and I suggested on the tracker adding regex
> without deleting re. (I personally would not mind just replacing re with
> regex, but then I have no legacy re code to break. So I am not suggesting
> that out of respect for those who do.)
>

I would actually prefer to replace re.

Before doing that we should make a list of all the differences between the
two modules (possibly in the PEP).  On the regex page on PyPI there's
already a list that can be used for this purpose [0].
For bug fixes it *shouldn't* be a problem if the behavior changes.  New
features shouldn't bring any backward-incompatible behavioral changes, and,
as far as I understand, Matthew introduced the NEW flag [1], to avoid
problems when they do.

I think re should be kept around only if there are too many
incompatibilities left and if they can't be fixed in regex.

Best Regards,
Ezio Melotti


[0]: http://pypi.python.org/pypi/regex/0.1.20110717
[1]: "The NEW flag turns on the new behaviour of this module, which can
differ from that of the 're' module, such as splitting on zero-width
matches, inline flags affecting only what follows, and being able to turn
inline flags off."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110828/ea30cb1a/attachment.html>

From guido at python.org  Sun Aug 28 05:54:13 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 27 Aug 2011 20:54:13 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <j3c399$vrd$1@dough.gmane.org>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net> <4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
	<CADiSq7di9zsTRnyD8tqG_=i4nqdzF7E1kfXZM6s2yUKoQXKnZg@mail.gmail.com>
	<4E59801E.7080406@canterbury.ac.nz> <j3c399$vrd$1@dough.gmane.org>
Message-ID: <CAP7+vJL064xr4O88XXR1c8TVfQ2=ga0ZARyOqs4hpzg9OeUZ2w@mail.gmail.com>

On Sat, Aug 27, 2011 at 5:48 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> Many of the things regex does differently might be called either bug fixes
> or feature changes, depending on one's viewpoint. Regex should definitely
> not be 'bug-compatible'.

Well, as you said, it depends on one's viewpoint. If there's a bug in
the treatment of non-BMP character ranges, that's a bug, and fixing it
shouldn't break anybody's code (unless it was worth breaking :-). But
if there's a change that e.g. (hypothetical example) makes a different
choice about how empty matches are treated in some edge case, and the
old behavior was properly documented, that's a feature change, and I'd
rather introduce a flag to select the new behavior (or, if we have to,
a flag to preserve the old behavior, if the new behavior is really
considered much better and much more useful).

> I think regex should be unicode-standard compliant as much as possible, and
> let the chips fall where they may.

In most cases the Unicode improvements in regex are not where it is
incompatible; e.g. adding \X and named ranges are fine new additions
and IIUC the syntax was carefully designed not to introduce any
incompatibilities (within the limitations of \-escapes).

It's the many other "improvements" to the regex module that sometimes
make it incompatible.There's a comprehensive list here:
http://pypi.python.org/pypi/regex . Somebody should just go over it
and for each difference make a recommendation for whether to treat
this as a bugfix, a compatible new feature, or an incompatibility that
requires some kind of flag. (We could have a single flag for all
incompatibilities, or several flags.)

> If so, it would be like the decimal
> module, which closely tracks the IEEE decimal standard, rather than the
> binary float standard.

Well, I would hope that for each "major" Python version (i.e. 3.2,
3.3, 3.4, ...) we would pick a specific version of the Unicode
standard and declare our desire to be compliant with that Unicode
standard version, and not switch allegiances in some bugfix version
(e.g. 3.2.3, 3.3.1, ...).

> Regex is already much more compliant than re, as shown by Tom Christiansen.

Nobody disagrees with this or thinks it's a bad thing. :-)

> This is pretty obviously intentional on MB's part.

That's also clear.

> It is also probably intentional that re *not* match today's Unicode
> TR18 specifications.

That I'm not so sure of. I think it's more the case that TR18 evolved
and that the re modules didn't -- probably mostly because nobody had
the time and nobody was aware of the TR18 changes.

> These are reasons why both Ezio and I suggested on the tracker adding regex
> without deleting re. (I personally would not mind just replacing re with
> regex, but then I have no legacy re code to break. So I am not suggesting
> that out of respect for those who do.)

That option is definitely still on the table. At the very least a
thorough review of the stated differences between re and regex should
be done -- I trust that MR has been very thorough in his listing of
those differences. The issues regarding maintenance and stability of
MR's code can be solved in a number of ways -- if MR doesn't mind I
would certainly be willing to give him core committer access (though
I'd still recommend that he use his time primarily to train others in
maintaining this important code base).

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Sun Aug 28 05:57:21 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 27 Aug 2011 20:57:21 -0700
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
Message-ID: <CAP7+vJJs-hRQk8s3wwsM=DpO42ghE1Nq7cbO6Hq6+JLjVY8kJQ@mail.gmail.com>

On Sat, Aug 27, 2011 at 3:14 PM, Dan Stromberg <drsalists at gmail.com> wrote:
> IMO, we really, really need some common way of accessing C libraries that
> works for all major Python variants.

We have one. It's called writing an extension module.

ctypes is a crutch because it doesn't realistically have access to the
header files. It's a fine crutch for PyPy, which doesn't have much of
an alternative. It's also a fine crutch for people who need something
to run *now*. It's a horrible strategy for the standard library.

If you have a better proposal please do write it up. But so far you
are mostly exposing your ignorance and insisting dramatically that you
be educated.

-- 
--Guido van Rossum (python.org/~guido)

From ezio.melotti at gmail.com  Sun Aug 28 05:59:59 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Sun, 28 Aug 2011 06:59:59 +0300
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <20110827035602.557f772f@pitrou.net>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
	<20110827035602.557f772f@pitrou.net>
Message-ID: <CACBhJdE+y4s2gGpZJzY_eBvqmYNBGApC8ORsYMHFO5w3EOwM8Q@mail.gmail.com>

On Sat, Aug 27, 2011 at 4:56 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Sat, 27 Aug 2011 04:37:21 +0300
> Ezio Melotti <ezio.melotti at gmail.com> wrote:
> >
> > I'm not sure it's worth doing an extensive review of the code, a better
> > approach might be to require extensive test coverage  (and a review of
> > tests).  If the code seems well written, commented, documented (I think
> > proper rst documentation is still missing),
>
> Isn't this precisely what a review is supposed to assess?
>

This can be done without actually knowing and understanding every single
function in the module (I got the impression that someone wants this kind of
review, correct me if I'm wrong).


>
> > We will get familiar with the code once we start contributing
> > to it and fixing bugs, as it already happens with most of the other
> modules.
>
> I'm not sure it's a good idea for a module with more than 10000 lines
> of C code (and 4000 lines of pure Python code). This is several times
> the size of multiprocessing. The C code looks very cleanly written, but
> it's still a big chunk of algorithmically sophisticated code.
>

Even unicodeobject.c is 10k+ lines of C code and I got familiar with (parts
of) it just by fixing bugs in specific functions.
I took a look at the regex code and it seems clear, with enough comments and
several small functions that are easy to follow and understand.
multiprocessing requires good knowledge of a number of concepts and
platform-specific issues that makes it more difficult to understand and
maintain (but maybe regex-related concepts seems easier to me because I'm
already familiar with them).

I think it would be good to:
  1) have some document that explains the general design and main (internal)
functions of the module (e.g. a PEP);
  2) make a review on rietveld (possibly only of the diff with re, to limit
the review to the new code only), so that people can ask questions, discuss
and understand the code;
  3) possibly update the document/PEP with the outcome of the rietveld
review(s) and/or address the issues discussed (if any);
  4) add documentation for the module and the (public) functions in
Doc/library (this should be done anyway).

This will ensure that the general quality of the code is good, and when
someone actually has to work on the code, there's enough documentation to
make it possible.

Best Regards,
Ezio Melotti


>
> Another "interesting" question is whether it's easy to port to the PEP
> 393 string representation, if it gets accepted.
>
> Regards
>
> Antoine.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110828/4448fb17/attachment-0001.html>

From guido at python.org  Sun Aug 28 06:28:17 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 27 Aug 2011 21:28:17 -0700
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CACBhJdE+y4s2gGpZJzY_eBvqmYNBGApC8ORsYMHFO5w3EOwM8Q@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
	<20110827035602.557f772f@pitrou.net>
	<CACBhJdE+y4s2gGpZJzY_eBvqmYNBGApC8ORsYMHFO5w3EOwM8Q@mail.gmail.com>
Message-ID: <CAP7+vJJ6mY2Qg0P91dozK-R73VzLdxzw9VdhA+nN+t8o5e+roQ@mail.gmail.com>

On Sat, Aug 27, 2011 at 8:59 PM, Ezio Melotti <ezio.melotti at gmail.com> wrote:
> On Sat, Aug 27, 2011 at 4:56 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>> On Sat, 27 Aug 2011 04:37:21 +0300
>> Ezio Melotti <ezio.melotti at gmail.com> wrote:
>> >
>> > I'm not sure it's worth doing an extensive review of the code, a better
>> > approach might be to require extensive test coverage ?(and a review of
>> > tests). ?If the code seems well written, commented, documented (I think
>> > proper rst documentation is still missing),
>>
>> Isn't this precisely what a review is supposed to assess?
>
> This can be done without actually knowing and understanding every single
> function in the module (I got the impression that someone wants this kind of
> review, correct me if I'm wrong).

Wasn't me. I've long given up expecting to understand every line of
code in CPython. I'm happy if the code is written in a way that makes
it possible to read and understand it as the need arises.

>> > We will get familiar with the code once we start contributing
>> > to it and fixing bugs, as it already happens with most of the other
>> > modules.
>>
>> I'm not sure it's a good idea for a module with more than 10000 lines
>> of C code (and 4000 lines of pure Python code). This is several times
>> the size of multiprocessing. The C code looks very cleanly written, but
>> it's still a big chunk of algorithmically sophisticated code.
>
> Even unicodeobject.c is 10k+ lines of C code and I got familiar with (parts
> of) it just by fixing bugs in specific functions.
> I took a look at the regex code and it seems clear, with enough comments and
> several small functions that are easy to follow and understand.
> multiprocessing requires good knowledge of a number of concepts and
> platform-specific issues that makes it more difficult to understand and
> maintain (but maybe regex-related concepts seems easier to me because I'm
> already familiar with them).

Are you volunteering? (Even if you don't want to be the only
maintainer, it still sounds like you'd be a good co-maintainer of the
regex module.)

> I think it would be good to:
> ? 1) have some document that explains the general design and main (internal)
> functions of the module (e.g. a PEP);

I don't think that such a document needs to be a PEP; PEPs are usually
intended where there is significant discussion expected, not just to
explain things. A README file or a Wiki page would be fine, as long as
it's sufficiently comprehensive.

> ? 2) make a review on rietveld (possibly only of the diff with re, to limit
> the review to the new code only), so that people can ask questions, discuss
> and understand the code;

That would be an interesting exercise indeed.

> ? 3) possibly update the document/PEP with the outcome of the rietveld
> review(s) and/or address the issues discussed (if any);

Yeah, of course.

> ? 4) add documentation for the module and the (public) functions in
> Doc/library (this should be done anyway).

Does regex have a significany public C interface? (_sre.c doesn't.)
Does it have a Python-level interface beyond what re.py offers (apart
from the obvious new flags and new regex syntax/semantics)?

> This will ensure that the general quality of the code is good, and when
> someone actually has to work on the code, there's enough documentation to
> make it possible.

That sounds like a good description of a process that could lead to
acceptance of regex as a re replacement.

>> Another "interesting" question is whether it's easy to port to the PEP
>> 393 string representation, if it gets accepted.

It's very likely that PEP 393 is accepted. So likely, in fact, that I
would recommend that you start porting regex to PEP 393 now. The
experience would benefit both your understanding of the regex module
and the quality of the PEP and its implementation.

I like what I hear here!

-- 
--Guido van Rossum (python.org/~guido)

From tjreedy at udel.edu  Sun Aug 28 06:58:35 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 28 Aug 2011 00:58:35 -0400
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
	support in 3.3)
In-Reply-To: <CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
Message-ID: <j3chv2$tvf$1@dough.gmane.org>

Dan, I once had the more or less the same opinion/question as you with 
regard to ctypes, but I now see at least 3 problems.

1) It seems hard to write it correctly. There are currently 47 open 
ctypes issues, with 9 being feature requests, leaving 38 
behavior-related issues. Tom Heller has not been able to work on it 
since the beginning of 2010 and has formally withdrawn as maintainer. No 
one else that I know of has taken his place.

2) It is not trivial to use it correctly. I think it needs a SWIG-like 
companion script that can write at least first-pass ctypes code from the 
.h header files. Or maybe it could/should use header info at runtime 
(with the .h bundled with a module).

3) It seems to be slower than compiled C extension wrappers. That, at 
least, was the discovery of someone who re-wrote pygame using ctypes. 
(The hope was that using ctypes would aid porting to 3.x, but the time 
penalty was apparently too much for time-critical code.)

If you want to see more use of ctypes in the Python community (though 
not necessarily immediately in the stdlib), feel free to work on any one 
of these problems.

A fourth problem is that people capable of working on ctypes are also 
capable of writing C extensions, and most prefer that. Or some work on 
Cython, which is a third solution.

-- 
Terry Jan Reedy


From tjreedy at udel.edu  Sun Aug 28 07:27:39 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 28 Aug 2011 01:27:39 -0400
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJL064xr4O88XXR1c8TVfQ2=ga0ZARyOqs4hpzg9OeUZ2w@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net>
	<4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
	<CADiSq7di9zsTRnyD8tqG_=i4nqdzF7E1kfXZM6s2yUKoQXKnZg@mail.gmail.com>
	<4E59801E.7080406@canterbury.ac.nz> <j3c399$vrd$1@dough.gmane.org>
	<CAP7+vJL064xr4O88XXR1c8TVfQ2=ga0ZARyOqs4hpzg9OeUZ2w@mail.gmail.com>
Message-ID: <j3cjlh$5r8$1@dough.gmane.org>

On 8/27/2011 11:54 PM, Guido van Rossum wrote:

>> If so, it would be like the decimal
>> module, which closely tracks the IEEE decimal standard, rather than the
>> binary float standard.
>
> Well, I would hope that for each "major" Python version (i.e. 3.2,
> 3.3, 3.4, ...) we would pick a specific version of the Unicode
> standard and declare our desire to be compliant with that Unicode
> standard version, and not switch allegiances in some bugfix version
> (e.g. 3.2.3, 3.3.1, ...).

Definitely. The unicode version would have to be frozen with beta 1 if 
not before. (I am quite sure the decimal module also freezes the IEEE 
standard version *it* follows for each Python version.)

In my view, x.y is a version of the Python language while the x.y.z 
CPython releases are progressively better implementations of that one 
language, starting with x.y.0. This is the main reason I suggested that 
the first CPython release for the 3.3 language be called 3.3.0, as it 
now is. In this view, there is no question of an x.y.z+1 release 
changing the definition of the x.y language.

-- 
Terry Jan Reedy


From drsalists at gmail.com  Sun Aug 28 07:36:41 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Sat, 27 Aug 2011 22:36:41 -0700
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAP7+vJJs-hRQk8s3wwsM=DpO42ghE1Nq7cbO6Hq6+JLjVY8kJQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<CAP7+vJJs-hRQk8s3wwsM=DpO42ghE1Nq7cbO6Hq6+JLjVY8kJQ@mail.gmail.com>
Message-ID: <CAGGBd_ovB_wm5yT7k4DGgaSUY7_rnRpuKEDPSJJu9GBhwjvezQ@mail.gmail.com>

On Sat, Aug 27, 2011 at 8:57 PM, Guido van Rossum <guido at python.org> wrote:

> On Sat, Aug 27, 2011 at 3:14 PM, Dan Stromberg <drsalists at gmail.com>
> wrote:
> > IMO, we really, really need some common way of accessing C libraries that
> > works for all major Python variants.
>
> We have one. It's called writing an extension module.
>

And yet Cext's are full of CPython-isms.

I've said in the past that Python has been lucky in that it had only a
single implementation for a long time, but still managed to escape becoming
too defined by the idiosyncrasies of that implementation - that's quite
impressive, and is probably our best indication that Python has had
leadership with foresight.  In the language proper, I'd say I still believe
this, but Cext's are sadly not a good example.


> ctypes is a crutch because it doesn't realistically have access to the
> header files.


Well, actually, header files are pretty easy to come by.  I bet you've
installed them yourself many times.  In fact, you've probably even
automatically brought some of them in via a package management system of one
form or another without getting your hands dirty.

As a thought experiment, imagine having a ctypes configuration system that
looks around a computer for .h's and .so's (etc) with even 25% of the effort
expended by GNU autoconf.  Instead of building the results into a bunch of
.o's, the results are saved in a .ct file or something.  If you build-in
some reasonable default locations to look in, plus the equivalent of some
-I's and -L's (and maybe -rpath's) as needed, you probably end up with a
pretty comparable system.

(typedef's might be a harder problem - that's particularly worth discussing,
IMO - your chance to nip this in the bud with a reasoned explanation why
they can't be handled well!)

It's a fine crutch for PyPy, which doesn't have much of
> an alternative.


Wait - a second ago I thought I was to believe that C extension modules were
the one true way of interfacing with C code across all major
implementations?  Are we perhaps saying that CPython is "the" major
implementation, and that we want it to stay that way?

I personally feel that PyPy has arrived as a major implementation.  The
backup program I've been writing in my spare time runs great on PyPy (and
the CPython's from 2.5.x, and pretty well on Jython).  And PyPy has been
maturing very rapidly ('just wish they'd do 3.x!).

It's also a fine crutch for people who need something
> to run *now*. It's a horrible strategy for the standard library.
>

I guess I'm coming to see this as dogma.

If ctypes is augmented with type information and/or version information and
where to find things, wouldn't it Become safe and convenient?  Or do you
have other concerns?

Make a list of things that can go wrong with ctypes modules.  Now make a
list of things that can wrong with C extension modules.  Aren't they really
pretty similar - missing .so, .so in a weird place, and especially: .so with
a changed interface?  C really isn't a very safe language - not like
http://en.wikipedia.org/wiki/Turing_%28programming_language%29 or
something.  Perhaps it's a little easier to mess things up with ctypes today
(a recompile doesn't fix, or at least detect, as many problems), but isn't
it at least worth Thinking about how that situation could be improved?

If you have a better proposal please do write it up. But so far you
> are mostly exposing your ignorance and insisting dramatically that you
> be educated.
>

I'm not sure why you're trying to avoid having a discussion.  I think it's
premature to dive into a proposal before getting other people's thoughts.
Frankly, 100 people tend to think better than one - at least, if the 100
people feel like they can talk.

I'm -not- convinced ctypes are the way forward.  I just want to talk about
it - for now.  ctypes have some significant advantages - if we can find a
way to eliminate and/or ameliorate their disadvantages, they might be quite
a bit nicer than Cext's.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110827/859fea4e/attachment.html>

From martin at v.loewis.de  Sun Aug 28 07:58:02 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 28 Aug 2011 07:58:02 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAGGBd_ovB_wm5yT7k4DGgaSUY7_rnRpuKEDPSJJu9GBhwjvezQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<4E59041A.7040100@v.loewis.de>	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>	<4E5909FD.7060809@v.loewis.de>	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>	<20110827174057.6c4b619e@pitrou.net>	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>	<4E5951D5.5020200@v.loewis.de>	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>	<CAP7+vJJs-hRQk8s3wwsM=DpO42ghE1Nq7cbO6Hq6+JLjVY8kJQ@mail.gmail.com>
	<CAGGBd_ovB_wm5yT7k4DGgaSUY7_rnRpuKEDPSJJu9GBhwjvezQ@mail.gmail.com>
Message-ID: <4E59D8EA.4080306@v.loewis.de>

> I just want to talk about it - for now.

python-ideas is a better place to just talk than python-dev.

Regards,
Martin

From stefan_ml at behnel.de  Sun Aug 28 08:50:21 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 28 Aug 2011 08:50:21 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<4E59041A.7040100@v.loewis.de>	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>	<4E5909FD.7060809@v.loewis.de>	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>	<20110827174057.6c4b619e@pitrou.net>	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
Message-ID: <j3cofd$t8t$1@dough.gmane.org>

Dan Stromberg, 27.08.2011 21:58:
> On Sat, Aug 27, 2011 at 9:04 AM, Nick Coghlan wrote:
>> On Sun, Aug 28, 2011 at 1:58 AM, Nadeem Vawda wrote:
>>> On Sat, Aug 27, 2011 at 5:52 PM, Nick Coghlan wrote:
>>>> It's acceptable for the Python version to use ctypes in the case of
>>>> wrapping an existing library, but the Python version should still
>>>> exist.
>>>
>>> I'm not too sure about that - PEP 399 explicitly says that using ctypes
>>> is
>>> frowned upon, and doesn't mention anywhere that it should be used in this
>>> sort of situation.
>>
>> Note to self: do not comment on python-dev at 2 am, as one's ability
>> to read PEPs correctly apparently suffers :)
>>
>> Consider my comment withdrawn, you're quite right that PEP 399
>> actually says this is precisely the case where an exemption is a
>> reasonable idea. Although I believe it's likely that PyPy will wrap it
>> with ctypes anyway :)
>
> I'd like to better understand why ctypes is (sometimes) frowned upon.
>
> Is it the brittleness?  Tendency to segfault?

Maybe unwieldy code and slow execution on CPython?

Note that there's a ctypes backend for Cython being written as part of a 
GSoC, so it should eventually become possible to write C library wrappers 
in Cython and have it generate a ctypes version to run on PyPy. That, 
together with the IronPython backend that is on its way, would give you a 
way to write fast wrappers for at least three of the major four Python 
implementations, without sacrificing readability or speed in one of them.

Stefan


From ncoghlan at gmail.com  Sun Aug 28 09:57:24 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 28 Aug 2011 17:57:24 +1000
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJJ6mY2Qg0P91dozK-R73VzLdxzw9VdhA+nN+t8o5e+roQ@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
	<20110827035602.557f772f@pitrou.net>
	<CACBhJdE+y4s2gGpZJzY_eBvqmYNBGApC8ORsYMHFO5w3EOwM8Q@mail.gmail.com>
	<CAP7+vJJ6mY2Qg0P91dozK-R73VzLdxzw9VdhA+nN+t8o5e+roQ@mail.gmail.com>
Message-ID: <CADiSq7f9DvCH2U8AacEBtw=nmJEVjysMPRNQ2RgG=Tx_65vKrQ@mail.gmail.com>

On Sun, Aug 28, 2011 at 2:28 PM, Guido van Rossum <guido at python.org> wrote:
> On Sat, Aug 27, 2011 at 8:59 PM, Ezio Melotti <ezio.melotti at gmail.com> wrote:
>> I think it would be good to:
>> ? 1) have some document that explains the general design and main (internal)
>> functions of the module (e.g. a PEP);
>
> I don't think that such a document needs to be a PEP; PEPs are usually
> intended where there is significant discussion expected, not just to
> explain things. A README file or a Wiki page would be fine, as long as
> it's sufficiently comprehensive.

timsort.txt and dictnotes.txt may be useful precedents for the kind of
thing that is useful on that front. IIRC, the pymalloc stuff has a
massive embedded comment, which can also work.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From hodgestar+pythondev at gmail.com  Sun Aug 28 13:40:43 2011
From: hodgestar+pythondev at gmail.com (Simon Cross)
Date: Sun, 28 Aug 2011 13:40:43 +0200
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <j3chv2$tvf$1@dough.gmane.org>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org>
Message-ID: <CAD5NRCGYveRWKHunLWa9CsKCW-8+=HA6v6xhui35ga3DbgxPhg@mail.gmail.com>

On Sun, Aug 28, 2011 at 6:58 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> 2) It is not trivial to use it correctly. I think it needs a SWIG-like
> companion script that can write at least first-pass ctypes code from the .h
> header files. Or maybe it could/should use header info at runtime (with the
> .h bundled with a module).

This is sort of already available:

-- http://starship.python.net/crew/theller/ctypes/old/codegen.html
-- http://svn.python.org/projects/ctypes/trunk/ctypeslib/

It just appears to have never made it into CPython. I've used it
successfully on a small project.

Schiavo
Simon

From guido at python.org  Sun Aug 28 18:43:33 2011
From: guido at python.org (Guido van Rossum)
Date: Sun, 28 Aug 2011 09:43:33 -0700
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
Message-ID: <CAP7+vJJY3UwBZdFnCjAP830-yRrcaWYc4gP2duk79Z9f_AkU2A@mail.gmail.com>

On Sat, Aug 27, 2011 at 6:08 AM, Armin Rigo <arigo at tunes.org> wrote:
> Hi Nick,
>
> On Sat, Aug 27, 2011 at 2:40 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> 1. How does the patch interact with C code that explicitly releases
>> the GIL? (e.g. IO commands inside a "with atomic:" block)
>
> As implemented, any code in a "with atomic" is prevented from
> explicitly releasing and reacquiring the GIL: the GIL remain acquired
> until the end of the "with" block. ?In other words
> Py_BEGIN_ALLOW_THREADS has no effect in a "with" block. ?This gives
> semantics that, in a full multi-core STM world, would be implementable
> by saying that if, in the middle of a transaction, you need to do I/O,
> then from this point onwards the transaction is not allowed to abort
> any more. ?Such "inevitable" transactions are already supported e.g.
> by RSTM, the C++ framework I used to prototype a C version
> (https://bitbucket.org/arigo/arigo/raw/default/hack/stm/c ).
>
>> 2. Whether or not Jython and IronPython could implement something like
>> that, since they're free threaded with fine-grained locks. If they
>> can't then I don't see how we could justify making it part of the
>> standard library.
>
> Yes, I can imagine some solutions. ?I am no Jython or IronPython
> expert, but let us assume that they have a way to check synchronously
> for external events from time to time (i.e. if there is some
> equivalent to sys.setcheckinterval()). ?If they do, then all you need
> is the right synchronization: the thread that wants to start a "with
> atomic" has to wait until all other threads are paused in the external
> check code. ?(Again, like CPython's, this not a properly multi-core
> STM-ish solution, but it would give the right semantics. ?(And if it
> turns out that STM is successful in the future, Java will grow more
> direct support for it <wink>))
>
>
> A bient?t,
>
> Armin.

This sounds like a very interesting idea to pursue, even if it's late,
and even if it's experimental, and even if it's possible to cause
deadlocks (no news there). I propose that we offer a C API in Python
3.3 as well as an extension module that offers the proposed decorator.
The C API could then be used to implement alternative APIs purely as
extension modules (e.g. would a deadlock-detecting API be possible?).

I don't think this needs a PEP, it's not a very pervasive change. We
can even document the API as experimental. But (if I may trust Armin's
reasoning) it's important to add support directly to CPython, as
currently it cannot be done as a pure extension module.

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Sun Aug 28 18:53:00 2011
From: guido at python.org (Guido van Rossum)
Date: Sun, 28 Aug 2011 09:53:00 -0700
Subject: [Python-Dev]  Should we move to replace re with regex?
In-Reply-To: <CAP7+vJKgyHG+2wtcUguOofYniX2feiqyXaHkDLhoqH6m0c4MGA@mail.gmail.com>
References: <CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
	<20110827035916.583c3d81@pitrou.net> <4E5868D6.8090203@pearwood.info>
	<CAGGBd_pjYW-Z_Fw5Qs+n9J1UVCtD3DmJZaMx+rtuqYQbPSWjmg@mail.gmail.com>
	<CADiSq7di9zsTRnyD8tqG_=i4nqdzF7E1kfXZM6s2yUKoQXKnZg@mail.gmail.com>
	<4E59801E.7080406@canterbury.ac.nz> <j3c399$vrd$1@dough.gmane.org>
	<CAP7+vJL064xr4O88XXR1c8TVfQ2=ga0ZARyOqs4hpzg9OeUZ2w@mail.gmail.com>
	<20110828075246.GG99611@nexus.in-nomine.org>
	<CAP7+vJKgyHG+2wtcUguOofYniX2feiqyXaHkDLhoqH6m0c4MGA@mail.gmail.com>
Message-ID: <CAP7+vJ+HX1cf0HVH2kPJprcZukGD9=PU9ai7-NG3VULgR+0ibA@mail.gmail.com>

Someone asked me off-line what I wanted besides talk. Here's the list
I came up with:

You could try for instance volunteer to do a thorough code review of
the regex code, trying to think of ways to break it (e.g. bad syntax
or extreme use of nesting etc., or bad data). Or you could volunteer
to maintain it in the future. Or you could try to port it to PEP 393.
Or you could systematically go over the given list of differences
between re and regex and decide whether they are likely to be
backwards incompatibilities that will break existing code. Or you
could try to add some of the functionality requested by Tom C in one
of his several bugs.

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Sun Aug 28 19:12:56 2011
From: guido at python.org (Guido van Rossum)
Date: Sun, 28 Aug 2011 10:12:56 -0700
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAGGBd_ovB_wm5yT7k4DGgaSUY7_rnRpuKEDPSJJu9GBhwjvezQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<CAP7+vJJs-hRQk8s3wwsM=DpO42ghE1Nq7cbO6Hq6+JLjVY8kJQ@mail.gmail.com>
	<CAGGBd_ovB_wm5yT7k4DGgaSUY7_rnRpuKEDPSJJu9GBhwjvezQ@mail.gmail.com>
Message-ID: <CAP7+vJ+6sFecMFe7C3KTbvxvT4pm-454iOgjGvh_0sRCdp-zmg@mail.gmail.com>

On Sat, Aug 27, 2011 at 10:36 PM, Dan Stromberg <drsalists at gmail.com> wrote:
>
> On Sat, Aug 27, 2011 at 8:57 PM, Guido van Rossum <guido at python.org> wrote:
>>
>> On Sat, Aug 27, 2011 at 3:14 PM, Dan Stromberg <drsalists at gmail.com>
>> wrote:
>> > IMO, we really, really need some common way of accessing C libraries
>> > that
>> > works for all major Python variants.
>>
>> We have one. It's called writing an extension module.
>
> And yet Cext's are full of CPython-isms.

I have to apologize, I somehow misread your "all Python variants" as a
mixture of "all CPython versions" and "all platforms where CPython
runs".

While I have no desire to continue this discussion, you are most
welcome to do so.

-- 
--Guido van Rossum (python.org/~guido)

From martin at v.loewis.de  Sun Aug 28 20:13:03 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 28 Aug 2011 20:13:03 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <CAP7+vJJD1hS=s5XREUh3F92=P=9PXJZSy1F00foZe+CuEpY5-g@mail.gmail.com>
References: <4E553FBC.7080501@v.loewis.de>
	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>	<CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>	<4E577589.4030809@v.loewis.de>	<CAP7+vJ+fYXLcqG_rHkNtg5yuZKJTj2nMxFaLczr+zqeKUBtY8A@mail.gmail.com>
	<CAP7+vJJD1hS=s5XREUh3F92=P=9PXJZSy1F00foZe+CuEpY5-g@mail.gmail.com>
Message-ID: <4E5A852F.7040206@v.loewis.de>

Am 26.08.2011 16:56, schrieb Guido van Rossum:
> Also, please add the table (and the reasoning that led to it) to the PEP.

Done!

Martin

From stefan_ml at behnel.de  Sun Aug 28 20:23:35 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 28 Aug 2011 20:23:35 +0200
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
	support in 3.3)
In-Reply-To: <j3chv2$tvf$1@dough.gmane.org>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<4E5909FD.7060809@v.loewis.de>	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>	<20110827174057.6c4b619e@pitrou.net>	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>	<4E5951D5.5020200@v.loewis.de>	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>	<20110828002642.4765fc89@pitrou.net>	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>	<20110828012705.523e51d4@pitrou.net>	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org>
Message-ID: <j3e137$q0g$1@dough.gmane.org>

Hi,

sorry for hooking in here with my usual Cython bias and promotion. When the 
question comes up what a good FFI for Python should look like, it's an 
obvious reaction from my part to throw Cython into the game.

Terry Reedy, 28.08.2011 06:58:
> Dan, I once had the more or less the same opinion/question as you with
> regard to ctypes, but I now see at least 3 problems.
>
> 1) It seems hard to write it correctly. There are currently 47 open ctypes
> issues, with 9 being feature requests, leaving 38 behavior-related issues.
> Tom Heller has not been able to work on it since the beginning of 2010 and
> has formally withdrawn as maintainer. No one else that I know of has taken
> his place.

Cython has an active set of developers and a rather large and growing user 
base.

It certainly has lots of open issues in its bug tracker, but most of them 
are there because we *know* where the development needs to go, not so much 
because we don't know how to get there. After all, the semantics of Python 
and C/C++, between which Cython sits, are pretty much established.

Cython compiles to C code for CPython, (hopefully soon [1]) to 
Python+ctypes for PyPy and (mostly [2]) C++/CLI code for IronPython, which 
boils down to the same build time and runtime kind of dependencies that the 
supported Python runtimes have anyway. It does not add dependencies on any 
external libraries by itself, such as the libffi in CPython's ctypes 
implementation.

For the CPython backend, the generated code is very portable and is 
self-contained when compiled against the CPython runtime (plus, obviously, 
libraries that the user code explicitly uses). It generates efficient code 
for all existing CPython versions starting with Python 2.4, with several 
optimisations also for recent CPython versions (including the upcoming 3.3).


> 2) It is not trivial to use it correctly.

Cython is basically Python, so Python developers with some C or C++ 
knowledge tend to get along with it quickly.

I can't say yet how easy it is (or will be) to write code that is portable 
across independent Python implementations, but given that that field is 
still young, there's certainly a lot that can be done to aid this.


> I think it needs a SWIG-like
> companion script that can write at least first-pass ctypes code from the .h
> header files. Or maybe it could/should use header info at runtime (with the
> .h bundled with a module).

 From my experience, this is a "nice to have" more than a requirement. It 
has been requested for Cython a couple of times, especially by new users, 
and there are a couple of scripts out there that do this to some extent. 
But the usual problem is that Cython users (and, similarly, ctypes users) 
do not want a 1:1 mapping of a library API to a Python API (there's SWIG 
for that), and you can't easily get more than a trivial mapping out of a 
script. But, yes, a one-shot generator for the necessary declarations would 
at least help in cases where the API to be wrapped is somewhat large.


> 3) It seems to be slower than compiled C extension wrappers. That, at
> least, was the discovery of someone who re-wrote pygame using ctypes. (The
> hope was that using ctypes would aid porting to 3.x, but the time penalty
> was apparently too much for time-critical code.)

Cython code can be as fast as C code, and in some cases, especially when 
developer time is limited, even faster than hand written C extensions. It 
allows for a straight forward optimisation path from regular Python code 
down to the speed of C, and trivial interaction with C code itself, if the 
need arises.

Stefan


[1] The PyPy port of Cython is currently being written as a GSoC project.

[2] The IronPython port of Cython was written to facility a NumPy port to 
the .NET environment. It's currently not a complete port of all Cython 
features.



From solipsis at pitrou.net  Sun Aug 28 21:07:54 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 28 Aug 2011 21:07:54 +0200
Subject: [Python-Dev] peps: Add memory consumption table.
References: <E1Qxjr5-0001hg-A0@dinsdale.python.org>
Message-ID: <20110828210754.4bec2e92@pitrou.net>

On Sun, 28 Aug 2011 20:13:11 +0200
martin.v.loewis <python-checkins at python.org> wrote:
>  
> +Performance
> +-----------
> +
> +Performance of this patch must be considered for both memory
> +consumption and runtime efficiency. For memory consumption, the
> +expectation is that applications that have many large strings will see
> +a reduction in memory usage. For small strings, the effects depend on
> +the pointer size of the system, and the size of the Py_UNICODE/wchar_t
> +type. The following table demonstrates this for various small string
> +sizes and platforms.

The table is for ASCII-only strings, right? Perhaps that should be
mentioned somewhere.

Regards

Antoine.



From martin at v.loewis.de  Sun Aug 28 21:47:05 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 28 Aug 2011 21:47:05 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <20110825132734.1c236d17@pitrou.net>
References: <4E553FBC.7080501@v.loewis.de>
	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>
	<20110825132734.1c236d17@pitrou.net>
Message-ID: <4E5A9B39.8090009@v.loewis.de>

> I would say no more than a 15% slowdown on each of the following
> benchmarks:
> 
> - stringbench.py -u
>   (http://svn.python.org/view/sandbox/trunk/stringbench/)
> - iobench.py -t
>   (in Tools/iobench/)
> - the json_dump, json_load and regex_v8 tests from
>   http://hg.python.org/benchmarks/

I now have benchmark results for these; numbers are for revision
c10bcab2aac7, comparing to 1ea72da11724 (wide unicode), on 64-bit
Linux with gcc 4.6.1 running on Core i7 2.8GHz.

- stringbench gives 10% slowdown on total time; the tests take
  between 78% and 220%. The cost is typically not in performing
  the string operations themselves, but in the creation of the
  result strings. In PEP 393, a buffer must be scanned for the
  highest code point, which means that each byte must be inspected
  twice (a second time when the copying occurs).
- the iobench results are between 2% acceleration (seek operations),
  16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and
  37% for large sized reads (154 MB/s vs. 235 MB/s). The speed
  difference is probably in the UTF-8 decoder; I have already
  restored the "runs of ASCII" optimization and am out of ideas for
  further speedups. Again, having to scan the UTF-8 string twice
  is probably one cause of slowdown.
- the json and regex_v8 tests see a slowdown of below 1%.

The slowdown is larger when compared with a narrow Unicode build.

> Additionally, it would be nice if you could run at least some of the
> test_bigmem tests, according to your system's available RAM.

Running only StrTest with 4.5G allows me to run 2 tests
(test_encode_raw_unicode_escape and test_encode_utf7); this sees
a slowdown of 37% in Linux user time.

Regards,
Martin

From solipsis at pitrou.net  Sun Aug 28 22:01:06 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 28 Aug 2011 22:01:06 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5A9B39.8090009@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de>
	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>
	<20110825132734.1c236d17@pitrou.net>  <4E5A9B39.8090009@v.loewis.de>
Message-ID: <1314561666.3656.3.camel@localhost.localdomain>


> - the iobench results are between 2% acceleration (seek operations),
>   16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and
>   37% for large sized reads (154 MB/s vs. 235 MB/s). The speed
>   difference is probably in the UTF-8 decoder; I have already
>   restored the "runs of ASCII" optimization and am out of ideas for
>   further speedups. Again, having to scan the UTF-8 string twice
>   is probably one cause of slowdown.

I don't think it's the UTF-8 decoder because I see an even larger
slowdown with simpler encodings (e.g. "-E latin1" or "-E utf-16le").

Thanks

Antoine.



From martin at v.loewis.de  Sun Aug 28 22:23:42 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 28 Aug 2011 22:23:42 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <1314561666.3656.3.camel@localhost.localdomain>
References: <4E553FBC.7080501@v.loewis.de>	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>	<20110825132734.1c236d17@pitrou.net>
	<4E5A9B39.8090009@v.loewis.de>
	<1314561666.3656.3.camel@localhost.localdomain>
Message-ID: <4E5AA3CE.50503@v.loewis.de>

Am 28.08.2011 22:01, schrieb Antoine Pitrou:
> 
>> - the iobench results are between 2% acceleration (seek operations),
>>   16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and
>>   37% for large sized reads (154 MB/s vs. 235 MB/s). The speed
>>   difference is probably in the UTF-8 decoder; I have already
>>   restored the "runs of ASCII" optimization and am out of ideas for
>>   further speedups. Again, having to scan the UTF-8 string twice
>>   is probably one cause of slowdown.
> 
> I don't think it's the UTF-8 decoder because I see an even larger
> slowdown with simpler encodings (e.g. "-E latin1" or "-E utf-16le").

But those aren't used in iobench, are they?

Regards,
Martin

From solipsis at pitrou.net  Sun Aug 28 22:27:20 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 28 Aug 2011 22:27:20 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5AA3CE.50503@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de>
	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>
	<20110825132734.1c236d17@pitrou.net>  <4E5A9B39.8090009@v.loewis.de>
	<1314561666.3656.3.camel@localhost.localdomain>
	<4E5AA3CE.50503@v.loewis.de>
Message-ID: <1314563240.3656.6.camel@localhost.localdomain>

Le dimanche 28 ao?t 2011 ? 22:23 +0200, "Martin v. L?wis" a ?crit :
> Am 28.08.2011 22:01, schrieb Antoine Pitrou:
> > 
> >> - the iobench results are between 2% acceleration (seek operations),
> >>   16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and
> >>   37% for large sized reads (154 MB/s vs. 235 MB/s). The speed
> >>   difference is probably in the UTF-8 decoder; I have already
> >>   restored the "runs of ASCII" optimization and am out of ideas for
> >>   further speedups. Again, having to scan the UTF-8 string twice
> >>   is probably one cause of slowdown.
> > 
> > I don't think it's the UTF-8 decoder because I see an even larger
> > slowdown with simpler encodings (e.g. "-E latin1" or "-E utf-16le").
> 
> But those aren't used in iobench, are they?

I was not very clear, but you can change the encoding used in iobench by
using the "-E" command-line option (while UTF-8 is the default if you
don't specify anything).

For example:

$ ./python Tools/iobench/iobench.py -t -E latin1
Preparing files...
Text unit = one character (latin1-decoded)

** Text input **

[ 400KB ] read one unit at a time...                   5.17 MB/s
[ 400KB ] read 20 units at a time...                   77.6 MB/s
[ 400KB ] read one line at a time...                    209 MB/s
[ 400KB ] read 4096 units at a time...                  509 MB/s

[  20KB ] read whole contents at once...                885 MB/s
[ 400KB ] read whole contents at once...                730 MB/s
[  10MB ] read whole contents at once...                726 MB/s

(etc.)

Regards

Antoine.



From martin at v.loewis.de  Sun Aug 28 23:06:34 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 28 Aug 2011 23:06:34 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <1314561666.3656.3.camel@localhost.localdomain>
References: <4E553FBC.7080501@v.loewis.de>	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>	<20110825132734.1c236d17@pitrou.net>
	<4E5A9B39.8090009@v.loewis.de>
	<1314561666.3656.3.camel@localhost.localdomain>
Message-ID: <4E5AADDA.5090206@v.loewis.de>

Am 28.08.2011 22:01, schrieb Antoine Pitrou:
> 
>> - the iobench results are between 2% acceleration (seek operations),
>>   16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and
>>   37% for large sized reads (154 MB/s vs. 235 MB/s). The speed
>>   difference is probably in the UTF-8 decoder; I have already
>>   restored the "runs of ASCII" optimization and am out of ideas for
>>   further speedups. Again, having to scan the UTF-8 string twice
>>   is probably one cause of slowdown.
> 
> I don't think it's the UTF-8 decoder because I see an even larger
> slowdown with simpler encodings (e.g. "-E latin1" or "-E utf-16le").

Those haven't been ported to the new API, yet. Consider, for example,
d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test;
with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this
is a 25% speedup for PEP 393.

Regards,
Martin



From greg.ewing at canterbury.ac.nz  Mon Aug 29 00:24:04 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 29 Aug 2011 10:24:04 +1200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CAP7+vJJs-hRQk8s3wwsM=DpO42ghE1Nq7cbO6Hq6+JLjVY8kJQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<CAP7+vJJs-hRQk8s3wwsM=DpO42ghE1Nq7cbO6Hq6+JLjVY8kJQ@mail.gmail.com>
Message-ID: <4E5AC004.8030103@canterbury.ac.nz>

Guido van Rossum wrote:
> On Sat, Aug 27, 2011 at 3:14 PM, Dan Stromberg <drsalists at gmail.com> wrote:
> 
>>IMO, we really, really need some common way of accessing C libraries that
>>works for all major Python variants.
> 
> We have one. It's called writing an extension module.

I think Dan means some way of doing this without having
to hand-craft a different one for each Python implementation.

If we're really serious about the idea that "Python is not
CPython", this seems like a reasonable thing to want. Currently
the Python universe is very much centred around CPython, with
the other implementations perpetually in catch-up mode.

My suggestion on how to address this would be something akin
to Pyrex or Cython. I gather that there has been some work
recently on adding different back-ends to Cython to generate
code for different Python implementations.

-- 
Greg

From stephen at xemacs.org  Mon Aug 29 04:20:12 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 29 Aug 2011 11:20:12 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
Message-ID: <87liudklgj.fsf@uwakimon.sk.tsukuba.ac.jp>

Paul Moore writes:

 > IronPython and Jython can retain UTF-16 as their native form if that
 > makes interop cleaner, but in doing so they need to ensure that basic
 > operations like indexing and len work in terms of code points, not
 > code units, if they are to conform.

[...]

 > They lose the O(1) guarantee, but that's easily defensible as a
 > tradeoff to conform to underlying runtime semantics.

Unfortunately, I don't think it's all that easy to defend.  Absent PEP
393 or a restriction to the characters in the BMP, this is a very
expensive change, easily visible to interactive users, let alone
performance-hungry applications.

I personally do advocate the "array of code points" definition, but I
don't use IronPython or Jython so PEP 393 is as close to heaven as I
expect to get.  OTOH, I also use Emacsen with Mule, and I have to
admit that there is a perceptible performance hit in any large (>1 MB)
buffer containing non-ASCII characters vs. pure ASCII (the code unit
in Mule is 1 byte).  I expect that if IronPython and Jython really
want to retain native, code-unit-based representations, it's going to
be painful to conform to an "array of code points" specification.

There may need to be a compromise of the form "Implementations SHOULD
provide an implementation of str that is both O(1) in indexing and an
array of code points.  Code that is Unicode-ly correct in Python
implementing PEP 393 will need to be ported with some effort to
implementations that do not satisfy this requirement, perhaps using
different algorithms or extra libraries."

From guido at python.org  Mon Aug 29 04:27:16 2011
From: guido at python.org (Guido van Rossum)
Date: Sun, 28 Aug 2011 19:27:16 -0700
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <j3e137$q0g$1@dough.gmane.org>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
Message-ID: <CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>

On Sun, Aug 28, 2011 at 11:23 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi,
>
> sorry for hooking in here with my usual Cython bias and promotion. When the
> question comes up what a good FFI for Python should look like, it's an
> obvious reaction from my part to throw Cython into the game.
>
> Terry Reedy, 28.08.2011 06:58:
>>
>> Dan, I once had the more or less the same opinion/question as you with
>> regard to ctypes, but I now see at least 3 problems.
>>
>> 1) It seems hard to write it correctly. There are currently 47 open ctypes
>> issues, with 9 being feature requests, leaving 38 behavior-related issues.
>> Tom Heller has not been able to work on it since the beginning of 2010 and
>> has formally withdrawn as maintainer. No one else that I know of has taken
>> his place.
>
> Cython has an active set of developers and a rather large and growing user
> base.
>
> It certainly has lots of open issues in its bug tracker, but most of them
> are there because we *know* where the development needs to go, not so much
> because we don't know how to get there. After all, the semantics of Python
> and C/C++, between which Cython sits, are pretty much established.
>
> Cython compiles to C code for CPython, (hopefully soon [1]) to Python+ctypes
> for PyPy and (mostly [2]) C++/CLI code for IronPython, which boils down to
> the same build time and runtime kind of dependencies that the supported
> Python runtimes have anyway. It does not add dependencies on any external
> libraries by itself, such as the libffi in CPython's ctypes implementation.
>
> For the CPython backend, the generated code is very portable and is
> self-contained when compiled against the CPython runtime (plus, obviously,
> libraries that the user code explicitly uses). It generates efficient code
> for all existing CPython versions starting with Python 2.4, with several
> optimisations also for recent CPython versions (including the upcoming 3.3).
>
>
>> 2) It is not trivial to use it correctly.
>
> Cython is basically Python, so Python developers with some C or C++
> knowledge tend to get along with it quickly.
>
> I can't say yet how easy it is (or will be) to write code that is portable
> across independent Python implementations, but given that that field is
> still young, there's certainly a lot that can be done to aid this.

Cythin does sound attractive for cross-Python-implementation use. This
is exciting.

>> I think it needs a SWIG-like
>> companion script that can write at least first-pass ctypes code from the .h
>> header files. Or maybe it could/should use header info at runtime (with the
>> .h bundled with a module).
>
> From my experience, this is a "nice to have" more than a requirement. It has
> been requested for Cython a couple of times, especially by new users, and
> there are a couple of scripts out there that do this to some extent. But the
> usual problem is that Cython users (and, similarly, ctypes users) do not
> want a 1:1 mapping of a library API to a Python API (there's SWIG for that),
> and you can't easily get more than a trivial mapping out of a script. But,
> yes, a one-shot generator for the necessary declarations would at least help
> in cases where the API to be wrapped is somewhat large.

Hm, the main use that was proposed here for ctypes is to wrap existing
libraries (not to create nicer APIs, that can be done in pure Python
on top of this). In general, an existing library cannot be called
without access to its .h files -- there are probably struct and
constant definitions, platform-specific #ifdefs and #defines, and
other things in there that affect the linker-level calling conventions
for the functions in the library. (Just like Python's own .h files --
e.g. the extensive renaming of the Unicode APIs depending on
narrow/wide build) How does Cython deal with these? I wonder if for
this particular purpose SWIG isn't the better match. (If SWIG weren't
universally hated, even by its original author. :-)

>> 3) It seems to be slower than compiled C extension wrappers. That, at
>> least, was the discovery of someone who re-wrote pygame using ctypes. (The
>> hope was that using ctypes would aid porting to 3.x, but the time penalty
>> was apparently too much for time-critical code.)
>
> Cython code can be as fast as C code, and in some cases, especially when
> developer time is limited, even faster than hand written C extensions. It
> allows for a straight forward optimisation path from regular Python code
> down to the speed of C, and trivial interaction with C code itself, if the
> need arises.
>
> Stefan
>
>
> [1] The PyPy port of Cython is currently being written as a GSoC project.
>
> [2] The IronPython port of Cython was written to facility a NumPy port to
> the .NET environment. It's currently not a complete port of all Cython
> features.

-- 
--Guido van Rossum (python.org/~guido)

From stephen at xemacs.org  Mon Aug 29 04:48:43 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 29 Aug 2011 11:48:43 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
Message-ID: <87k49xkk50.fsf@uwakimon.sk.tsukuba.ac.jp>

Guido van Rossum writes:

 > I don't think anyone else has that impression. Please cite chapter and
 > verse if you really think this is important. IIUC, UCS-2 does not
 > allow surrogate pairs,

In the original definition of UCS-2 in draft ISO 10646 (1990),
everything in the BMP except for 0xFFFF and 0xFFFE was a character,
and there was no concept of "surrogate" at all.  Later in ISO 10646
(1993)[1], the Surrogate Area was carved out of the Private Area, but
UCS-2 implementations simply treat them as (single) characters with
special properties.  This was more or less backward compatible as all
corporate uses of the private area used the lower code points and
didn't conflict with the surrogates.  Finally (in 2000 or 2003) the
definition of UCS-2 in ISO 10646 was revised in a backward-
incompatible way to exclude surrogates entirely, ie, nowadays it is a
range-restricted version of UTF-16.

Footnotes: 
[1]  IIRC, strictly speaking this was done slightly later (1993 or
1994) in an official Amendment to ISO 10646; the Amendment was
incorporated into the standard in 2000.


From ncoghlan at gmail.com  Mon Aug 29 04:59:41 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 29 Aug 2011 12:59:41 +1000
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
Message-ID: <CADiSq7eUsaeWOa__ph4SKcXaY3P0ndzgr6XHDhZcoVPrSJ8KiA@mail.gmail.com>

On Mon, Aug 29, 2011 at 12:27 PM, Guido van Rossum <guido at python.org> wrote:
> I wonder if for
> this particular purpose SWIG isn't the better match. (If SWIG weren't
> universally hated, even by its original author. :-)

SWIG is nice when you control the C/C++ side of the API as well and
can tweak it to be SWIG-friendly. I shudder at the idea of using it to
wrap arbitrary C++ code, though.

That said, the idea of using SWIG to emit Cython code rather than
C/API code may be one well worth exploring.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From stephen at xemacs.org  Mon Aug 29 05:43:24 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 29 Aug 2011 12:43:24 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<20110823001440.433a0f1f@pitrou.net> <4E536B0C.8050008@v.loewis.de>
	<A20F306D-85ED-428D-A7C7-0889DD9B3DF0@masklinn.net>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
Message-ID: <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>

Raymond Hettinger writes:

 > The naming convention for codecs is that the UTF prefix is used for
 > lossless encodings that cover the entire range of Unicode.

Sure.  The operative word here is "codec", not "str", though.

 > "The first amendment to the original edition of the UCS defined
 > UTF-16, an extension of UCS-2, to represent code points outside the
 > BMP."

Since when can s[0] represent a code point outside the BMP, for s a
Unicode string in a narrow build?

Remember, the UCS-2/narrow vs. UCS-4/wide distinction is *not* about
what Python supports vs. the outside world.  It's about what the str/
unicode type is an array of.



From glyph at twistedmatrix.com  Mon Aug 29 06:46:35 2011
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Sun, 28 Aug 2011 21:46:35 -0700
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
	support in 3.3)
In-Reply-To: <CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
Message-ID: <9FA8683B-FB0A-4F46-878F-11B36F92A342@twistedmatrix.com>


On Aug 28, 2011, at 7:27 PM, Guido van Rossum wrote:

> In general, an existing library cannot be called
> without access to its .h files -- there are probably struct and
> constant definitions, platform-specific #ifdefs and #defines, and
> other things in there that affect the linker-level calling conventions
> for the functions in the library.

Unfortunately I don't know a lot about this, but I keep hearing about something called "rffi" that PyPy uses to call C from RPython: <http://readthedocs.org/docs/pypy/en/latest/rffi.html>.  This has some shortcomings currently, most notably the fact that it needs those .h files (and therefore a C compiler) at runtime, so it's currently a non-starter for code distributed to users.  Not to mention the fact that, as you can see, it's not terribly thoroughly documented.  But, that "ExternalCompilationInfo" object looks very promising, since it has fields like "includes", "libraries", etc.

Nevertheless it seems like it's a bit more type-safe than ctypes or cython, and it seems to me that it could cache some of that information that it extracts from header files and store it for later when a compiler might not be around.

Perhaps someone with more PyPy knowledge than I could explain whether this is a realistic contender for other Python runtimes?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110828/ef91a36b/attachment.html>

From mal at egenix.com  Mon Aug 29 10:00:56 2011
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 29 Aug 2011 10:00:56 +0200
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<20110827174057.6c4b619e@pitrou.net>	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>	<4E5951D5.5020200@v.loewis.de>	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>	<20110828002642.4765fc89@pitrou.net>	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>	<20110828012705.523e51d4@pitrou.net>	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>	<j3chv2$tvf$1@dough.gmane.org>
	<j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
Message-ID: <4E5B4738.30008@egenix.com>

Guido van Rossum wrote:
> On Sun, Aug 28, 2011 at 11:23 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> Hi,
>>
>> sorry for hooking in here with my usual Cython bias and promotion. When the
>> question comes up what a good FFI for Python should look like, it's an
>> obvious reaction from my part to throw Cython into the game.
>>
>> Terry Reedy, 28.08.2011 06:58:
>>>
>>> Dan, I once had the more or less the same opinion/question as you with
>>> regard to ctypes, but I now see at least 3 problems.
>>>
>>> 1) It seems hard to write it correctly. There are currently 47 open ctypes
>>> issues, with 9 being feature requests, leaving 38 behavior-related issues.
>>> Tom Heller has not been able to work on it since the beginning of 2010 and
>>> has formally withdrawn as maintainer. No one else that I know of has taken
>>> his place.
>>
>> Cython has an active set of developers and a rather large and growing user
>> base.
>>
>> It certainly has lots of open issues in its bug tracker, but most of them
>> are there because we *know* where the development needs to go, not so much
>> because we don't know how to get there. After all, the semantics of Python
>> and C/C++, between which Cython sits, are pretty much established.
>>
>> Cython compiles to C code for CPython, (hopefully soon [1]) to Python+ctypes
>> for PyPy and (mostly [2]) C++/CLI code for IronPython, which boils down to
>> the same build time and runtime kind of dependencies that the supported
>> Python runtimes have anyway. It does not add dependencies on any external
>> libraries by itself, such as the libffi in CPython's ctypes implementation.
>>
>> For the CPython backend, the generated code is very portable and is
>> self-contained when compiled against the CPython runtime (plus, obviously,
>> libraries that the user code explicitly uses). It generates efficient code
>> for all existing CPython versions starting with Python 2.4, with several
>> optimisations also for recent CPython versions (including the upcoming 3.3).
>>
>>
>>> 2) It is not trivial to use it correctly.
>>
>> Cython is basically Python, so Python developers with some C or C++
>> knowledge tend to get along with it quickly.
>>
>> I can't say yet how easy it is (or will be) to write code that is portable
>> across independent Python implementations, but given that that field is
>> still young, there's certainly a lot that can be done to aid this.
> 
> Cythin does sound attractive for cross-Python-implementation use. This
> is exciting.
> 
>>> I think it needs a SWIG-like
>>> companion script that can write at least first-pass ctypes code from the .h
>>> header files. Or maybe it could/should use header info at runtime (with the
>>> .h bundled with a module).
>>
>> From my experience, this is a "nice to have" more than a requirement. It has
>> been requested for Cython a couple of times, especially by new users, and
>> there are a couple of scripts out there that do this to some extent. But the
>> usual problem is that Cython users (and, similarly, ctypes users) do not
>> want a 1:1 mapping of a library API to a Python API (there's SWIG for that),
>> and you can't easily get more than a trivial mapping out of a script. But,
>> yes, a one-shot generator for the necessary declarations would at least help
>> in cases where the API to be wrapped is somewhat large.
> 
> Hm, the main use that was proposed here for ctypes is to wrap existing
> libraries (not to create nicer APIs, that can be done in pure Python
> on top of this). In general, an existing library cannot be called
> without access to its .h files -- there are probably struct and
> constant definitions, platform-specific #ifdefs and #defines, and
> other things in there that affect the linker-level calling conventions
> for the functions in the library. (Just like Python's own .h files --
> e.g. the extensive renaming of the Unicode APIs depending on
> narrow/wide build) How does Cython deal with these? I wonder if for
> this particular purpose SWIG isn't the better match. (If SWIG weren't
> universally hated, even by its original author. :-)

SIP is an alternative to SWIG:

     http://www.riverbankcomputing.com/software/sip/intro
     http://pypi.python.org/pypi/SIP

and there are a few others as well:

     http://wiki.python.org/moin/IntegratingPythonWithOtherLanguages

>>> 3) It seems to be slower than compiled C extension wrappers. That, at
>>> least, was the discovery of someone who re-wrote pygame using ctypes. (The
>>> hope was that using ctypes would aid porting to 3.x, but the time penalty
>>> was apparently too much for time-critical code.)
>>
>> Cython code can be as fast as C code, and in some cases, especially when
>> developer time is limited, even faster than hand written C extensions. It
>> allows for a straight forward optimisation path from regular Python code
>> down to the speed of C, and trivial interaction with C code itself, if the
>> need arises.
>>
>> Stefan
>>
>>
>> [1] The PyPy port of Cython is currently being written as a GSoC project.
>>
>> [2] The IronPython port of Cython was written to facility a NumPy port to
>> the .NET environment. It's currently not a complete port of all Cython
>> features.
> 

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 29 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                36 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From dirkjan at ochtman.nl  Mon Aug 29 11:03:04 2011
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 29 Aug 2011 11:03:04 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5A9B39.8090009@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de> <20110825132734.1c236d17@pitrou.net>
	<4E5A9B39.8090009@v.loewis.de>
Message-ID: <CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>

On Sun, Aug 28, 2011 at 21:47, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> ?result strings. In PEP 393, a buffer must be scanned for the
> ?highest code point, which means that each byte must be inspected
> ?twice (a second time when the copying occurs).

This may be a silly question: are there things in place to optimize
this for the case where two strings are combined? E.g. highest
character in combined string is max(highest character in either of the
strings).

Also, this PEP makes me wonder if there should be a way to distinguish
between language PEPs and (CPython) implementation PEPs, by adding a
tag or using the PEP number ranges somehow.

Cheers,

Dirkjan

From victor.stinner at haypocalc.com  Mon Aug 29 11:19:48 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 29 Aug 2011 11:19:48 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
References: <4E553FBC.7080501@v.loewis.de>
	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>
	<20110825132734.1c236d17@pitrou.net>	<4E5A9B39.8090009@v.loewis.de>
	<CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
Message-ID: <4E5B59B4.9010207@haypocalc.com>

Le 29/08/2011 11:03, Dirkjan Ochtman a ?crit :
> On Sun, Aug 28, 2011 at 21:47, "Martin v. L?wis"<martin at v.loewis.de>  wrote:
>>   result strings. In PEP 393, a buffer must be scanned for the
>>   highest code point, which means that each byte must be inspected
>>   twice (a second time when the copying occurs).
>
> This may be a silly question: are there things in place to optimize
> this for the case where two strings are combined? E.g. highest
> character in combined string is max(highest character in either of the
> strings).

The "double-scan" issue is only for codec decoders.

If you combine two Unicode objects (a+b), you already know the highest 
code point and the kind of each string.

Victor

From victor.stinner at haypocalc.com  Mon Aug 29 10:52:52 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 29 Aug 2011 10:52:52 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5AADDA.5090206@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de>	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>	<20110825132734.1c236d17@pitrou.net>	<4E5A9B39.8090009@v.loewis.de>	<1314561666.3656.3.camel@localhost.localdomain>
	<4E5AADDA.5090206@v.loewis.de>
Message-ID: <4E5B5364.9040100@haypocalc.com>

Le 28/08/2011 23:06, "Martin v. L?wis" a ?crit :
> Am 28.08.2011 22:01, schrieb Antoine Pitrou:
>>
>>> - the iobench results are between 2% acceleration (seek operations),
>>>    16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and
>>>    37% for large sized reads (154 MB/s vs. 235 MB/s). The speed
>>>    difference is probably in the UTF-8 decoder; I have already
>>>    restored the "runs of ASCII" optimization and am out of ideas for
>>>    further speedups. Again, having to scan the UTF-8 string twice
>>>    is probably one cause of slowdown.
>>
>> I don't think it's the UTF-8 decoder because I see an even larger
>> slowdown with simpler encodings (e.g. "-E latin1" or "-E utf-16le").
>
> Those haven't been ported to the new API, yet. Consider, for example,
> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test;
> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this
> is a 25% speedup for PEP 393.

If I understand correctly, the performance now highly depend on the used 
characters? A pure ASCII string is faster than a string with characters 
in the ISO-8859-1 charset? Is it also true for BMP characters vs non-BMP 
characters?

Do these benchmark tools use only ASCII characters, or also some 
ISO-8859-1 characters? Or, better, different Unicode ranges in different 
tests?

Victor

From arigo at tunes.org  Mon Aug 29 11:36:27 2011
From: arigo at tunes.org (Armin Rigo)
Date: Mon, 29 Aug 2011 11:36:27 +0200
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <CAP7+vJJY3UwBZdFnCjAP830-yRrcaWYc4gP2duk79Z9f_AkU2A@mail.gmail.com>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
	<CAP7+vJJY3UwBZdFnCjAP830-yRrcaWYc4gP2duk79Z9f_AkU2A@mail.gmail.com>
Message-ID: <CAMSv6X2uShFp8BN1r3Q8ESOROGcZj6xCO4CPn58Ova5VcL_Hdg@mail.gmail.com>

Hi Guido,

On Sun, Aug 28, 2011 at 6:43 PM, Guido van Rossum <guido at python.org> wrote:
> This sounds like a very interesting idea to pursue, even if it's late,
> and even if it's experimental, and even if it's possible to cause
> deadlocks (no news there). I propose that we offer a C API in Python
> 3.3 as well as an extension module that offers the proposed decorator.

Very good idea.  http://bugs.python.org/issue12850

The extension module, called 'stm' for now, is designed as an
independent 3rd-party extension module.  It should at this point not
be included in the stdlib; for one thing, it needs some more testing
than my quick one-page hacks, and we need to seriously look at the
deadlock issues mentioned here.  But the patch to ceval.c above looks
rather straightforward to me and could, if no subtle issue is found,
be included in the standard CPython.


A bient?t,

Armin.

From stefan_ml at behnel.de  Mon Aug 29 11:39:12 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 29 Aug 2011 11:39:12 +0200
Subject: [Python-Dev] Ctypes and the stdlib
In-Reply-To: <CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<20110827174057.6c4b619e@pitrou.net>	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>	<4E5951D5.5020200@v.loewis.de>	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>	<20110828002642.4765fc89@pitrou.net>	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>	<20110828012705.523e51d4@pitrou.net>	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>	<j3chv2$tvf$1@dough.gmane.org>
	<j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
Message-ID: <j3fmo0$68f$1@dough.gmane.org>

Guido van Rossum, 29.08.2011 04:27:
> On Sun, Aug 28, 2011 at 11:23 AM, Stefan Behnel wrote:
>> Terry Reedy, 28.08.2011 06:58:
>>> I think it needs a SWIG-like
>>> companion script that can write at least first-pass ctypes code from the .h
>>> header files. Or maybe it could/should use header info at runtime (with the
>>> .h bundled with a module).
>>
>>  From my experience, this is a "nice to have" more than a requirement. It has
>> been requested for Cython a couple of times, especially by new users, and
>> there are a couple of scripts out there that do this to some extent. But the
>> usual problem is that Cython users (and, similarly, ctypes users) do not
>> want a 1:1 mapping of a library API to a Python API (there's SWIG for that),
>> and you can't easily get more than a trivial mapping out of a script. But,
>> yes, a one-shot generator for the necessary declarations would at least help
>> in cases where the API to be wrapped is somewhat large.
>
> Hm, the main use that was proposed here for ctypes is to wrap existing
> libraries (not to create nicer APIs, that can be done in pure Python
> on top of this).

The same applies to Cython, obviously. The main advantage of Cython over 
ctypes for this is that the Python-level wrapper code is also compiled into 
C, so whenever the need for a thicker wrapper arises in some part of the 
API, you don't loose any performance in intermediate layers.


> In general, an existing library cannot be called
> without access to its .h files -- there are probably struct and
> constant definitions, platform-specific #ifdefs and #defines, and
> other things in there that affect the linker-level calling conventions
> for the functions in the library. (Just like Python's own .h files --
> e.g. the extensive renaming of the Unicode APIs depending on
> narrow/wide build) How does Cython deal with these?

In the CPython backend, the header files are normally #included by the 
generated C code, so they are used at C compilation time.

Cython has its own view on the header files in separate declaration files 
(.pxd). Basically looks like this:

     # file "mymath.pxd"
     cdef extern from "aheader.h":
         double PI
         double E
         double abs(double x)

These declaration files usually only contain the parts of a header file 
that are used in the user code, either manually copied over or extracted by 
scripts (that's what I was referring to in my reply to Terry). The complete 
'real' content of the header file is then used by the C compiler at C 
compilation time.

The user code employs a "cimport" statement to import the declarations at 
Cython compilation time, e.g.

     # file "mymodule.pyx"
     cimport mymath
     print mymath.PI + mymath.E

would result in C code that #includes "aheader.h", adds the C constants 
"PI" and "E", converts the result to a Python float object and prints it 
out using the normal CPython machinery.

This means that declarations can be reused across modules, just like with 
header files. In fact, Cython actually ships with a couple of common 
declaration files, e.g. for parts of libc, NumPy or CPython's C-API.

I don't know that much about the IronPython backend, but from what I heard, 
it uses basically the same build time mechanisms and generates a thin C++ 
wrapper and a corresponding CLI part as glue layer.

The ctypes backend for PyPy works different in that it generates a Python 
module from the .pxd files that contains the declarations as ctypes code. 
Then, the user code imports that normally at Python runtime. Obviously, 
this means that there are cases where the Cython-level declarations and 
thus the generated ctypes code will not match the ABI for a given target 
platform. So, in the worst case, there is a need to manually adapt the 
ctypes declarations in the Python module that was generated from the .pxd. 
Not worse than the current situation, though, and the rest of the Cython 
wrapper will compile into plain Python code that simply imports the 
declarations from the .pxd modules. But there's certainly room for 
improvements here.

Stefan


From p.f.moore at gmail.com  Mon Aug 29 12:37:23 2011
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 29 Aug 2011 11:37:23 +0100
Subject: [Python-Dev] Ctypes and the stdlib
In-Reply-To: <j3fmo0$68f$1@dough.gmane.org>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
	<j3fmo0$68f$1@dough.gmane.org>
Message-ID: <CACac1F-n-1z_BZ3Nr448KRp45KhTcMpp_2E68UemhszRRDAZjQ@mail.gmail.com>

On 29 August 2011 10:39, Stefan Behnel <stefan_ml at behnel.de> wrote:
> In the CPython backend, the header files are normally #included by the
> generated C code, so they are used at C compilation time.
>
> Cython has its own view on the header files in separate declaration files
> (.pxd). Basically looks like this:
>
> ? ?# file "mymath.pxd"
> ? ?cdef extern from "aheader.h":
> ? ? ? ?double PI
> ? ? ? ?double E
> ? ? ? ?double abs(double x)
>
> These declaration files usually only contain the parts of a header file that
> are used in the user code, either manually copied over or extracted by
> scripts (that's what I was referring to in my reply to Terry). The complete
> 'real' content of the header file is then used by the C compiler at C
> compilation time.
>
> The user code employs a "cimport" statement to import the declarations at
> Cython compilation time, e.g.
>
> ? ?# file "mymodule.pyx"
> ? ?cimport mymath
> ? ?print mymath.PI + mymath.E
>
> would result in C code that #includes "aheader.h", adds the C constants "PI"
> and "E", converts the result to a Python float object and prints it out
> using the normal CPython machinery.

One thing that would make it easier for me to understand the role of
Cython in this context would be to see a simple example of the type of
"thin wrapper" we're talking about here. The above code is nearly
this, but the pyx file executes "real code".

For example, how do I simply expose pi and abs from math.h? Based on
the above, I tried a pyx file containing just the code

    cdef extern from "math.h":
        double pi
        double abs(double x)

but the resulting module exported no symbols. What am I doing wrong?
Could you show a working example of writing such a wrapper?

This is probably a bit off-topic, but it seems to me that whenever
Cython comes up in these discussions, the implications of
Cython-as-an-implementation-of-python obscure the idea of simply using
Cython as a means of writing thin library wrappers.

Just to clarify - the above code (if it works) seems to me like a nice
simple means of writing wrappers. Something involving this in a pxd
file, plus a pyx file with a whole load of dummy

    def abs(x):
        return cimported_module.abs(x)

definitions, seems ok, but annoyingly clumsy. (Particularly for big APIs).

I've kept python-dev in this response, on the assumption that others
on the list might be glad of seeing a concrete example of using Cython
to build wrapper code. But anything further should probably be taken
off-list...

Thanks,
Paul.

PS This would also probably be a useful addition to the Cython wiki
and/or the manual. I searched both and found very little other than a
page on wrapping C++ classes (which is not very helpful for simple C
global functions and constants).

From ezio.melotti at gmail.com  Mon Aug 29 13:12:20 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Mon, 29 Aug 2011 14:12:20 +0300
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAP7+vJJ6mY2Qg0P91dozK-R73VzLdxzw9VdhA+nN+t8o5e+roQ@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
	<20110827035602.557f772f@pitrou.net>
	<CACBhJdE+y4s2gGpZJzY_eBvqmYNBGApC8ORsYMHFO5w3EOwM8Q@mail.gmail.com>
	<CAP7+vJJ6mY2Qg0P91dozK-R73VzLdxzw9VdhA+nN+t8o5e+roQ@mail.gmail.com>
Message-ID: <CACBhJdFVUo5npxe3vqZiABMTH-57DACvc+ENZBXkma6HjjRnOA@mail.gmail.com>

On Sun, Aug 28, 2011 at 7:28 AM, Guido van Rossum <guido at python.org> wrote:

>
> Are you volunteering? (Even if you don't want to be the only
> maintainer, it still sounds like you'd be a good co-maintainer of the
> regex module.)
>

My name is listed in the experts index for 're' [0], and that should make me
already "co-maintainer" for the module.


> [...]
>
> >   4) add documentation for the module and the (public) functions in
> > Doc/library (this should be done anyway).
>
> Does regex have a significany public C interface? (_sre.c doesn't.)
> Does it have a Python-level interface beyond what re.py offers (apart
> from the obvious new flags and new regex syntax/semantics)?
>

I don't think it does.
Explaining the new syntax/semantics is useful for developers (e.g.what \p
and \X are supposed to match), but also for users, so it's fine to have this
documented in Doc/library/re.rst (and I don't think it's necessary to
duplicate it in the README/PEP/Wiki).


>
> > This will ensure that the general quality of the code is good, and when
> > someone actually has to work on the code, there's enough documentation to
> > make it possible.
>
> That sounds like a good description of a process that could lead to
> acceptance of regex as a re replacement.
>
>
So if we want to get this done I think we need Matthew for 1) (unless
someone else wants to do it and have him review the result).
If making a diff with the current re is doable and makes sense, we can use
the rietveld instance on the bug tracker to make the review for 2).  The
same could be done with a diff that replaces the whole module though.
3) will follow after 2), and 4) is not difficult and can be done when we
actually replace re (it's probably enough to reorganize a bit and convert to
rst the page on PyPI).

Best Regards,
Ezio Melotti

[0]: http://docs.python.org/devguide/experts.html#stdlib
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110829/1e598216/attachment.html>

From solipsis at pitrou.net  Mon Aug 29 14:14:40 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 29 Aug 2011 14:14:40 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <20110829141440.2e2178c6@pitrou.net>

On Mon, 29 Aug 2011 12:43:24 +0900
"Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> 
> Since when can s[0] represent a code point outside the BMP, for s a
> Unicode string in a narrow build?
> 
> Remember, the UCS-2/narrow vs. UCS-4/wide distinction is *not* about
> what Python supports vs. the outside world.  It's about what the str/
> unicode type is an array of.

Why would that be?

Antoine.



From solipsis at pitrou.net  Mon Aug 29 14:20:15 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 29 Aug 2011 14:20:15 +0200
Subject: [Python-Dev] Software Transactional Memory for Python
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
	<CAP7+vJJY3UwBZdFnCjAP830-yRrcaWYc4gP2duk79Z9f_AkU2A@mail.gmail.com>
Message-ID: <20110829142015.5eb247dc@pitrou.net>

On Sun, 28 Aug 2011 09:43:33 -0700
Guido van Rossum <guido at python.org> wrote:
> 
> This sounds like a very interesting idea to pursue, even if it's late,
> and even if it's experimental, and even if it's possible to cause
> deadlocks (no news there). I propose that we offer a C API in Python
> 3.3 as well as an extension module that offers the proposed decorator.
> The C API could then be used to implement alternative APIs purely as
> extension modules (e.g. would a deadlock-detecting API be possible?).

We could offer the C API without shipping an extension module ourselves.
I don't think we should provide (and maintain!) a Python API that helps
users put themselves in all kind of nasty situations. There is enough
misunderstanding around the GIL and multithreading already.

Regards

Antoine.



From barry at python.org  Mon Aug 29 14:30:29 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 29 Aug 2011 08:30:29 -0400
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CANF4RMmKLXbmE04NBi+s9n+-GtXhWS-1bv3TCDL8XzBL4NeVXQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<CANF4RMmKLXbmE04NBi+s9n+-GtXhWS-1bv3TCDL8XzBL4NeVXQ@mail.gmail.com>
Message-ID: <20110829083029.68faa57b@resist.wooz.org>

On Aug 27, 2011, at 10:36 PM, Nadeem Vawda wrote:

>I talked to Antoine about this on IRC; he didn't seem to think a PEP would be
>necessary. But a summary of the discussion on the tracker issue might still
>be a useful thing to have, given how long it's gotten.

I agree with Antoine - no PEP should be necessary.  A well reviewed and tested
module should do it.

-Barry

From dave at dabeaz.com  Mon Aug 29 14:41:23 2011
From: dave at dabeaz.com (David Beazley)
Date: Mon, 29 Aug 2011 07:41:23 -0500
Subject: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)
In-Reply-To: <mailman.2419.1314608606.27777.python-dev@python.org>
References: <mailman.2419.1314608606.27777.python-dev@python.org>
Message-ID: <A0128757-0E04-4D97-B27E-DA4AC500FFB1@dabeaz.com>

On Mon, Aug 29, 2011 at 12:27 PM, Guido van Rossum <guido at python.org> wrote:

> I wonder if for
> this particular purpose SWIG isn't the better match. (If SWIG weren't
> universally hated, even by its original author. :-)

Hate is probably a strong word, but as the author of Swig, let me chime in here ;-).   I think there are probably some lessons to be learned from Swig.

As Nick noted, Swig is best suited when you have control over both sides (C/C++ and Python) of whatever code you're working with.  In fact, the original motivation for  Swig was to give application programmers (scientists in my case), a means for automatically generating the Python bindings to their code.  However, there was one other important assumption--and that was the fact that all of your "real code" was going to be written in C/C++ and that the Python scripting interface was just an optional add-on (perhaps even just a throw-away thing).  Keep in mind, Swig was first created in 1995 and at that time, the use of Python (or any similar language) was a pretty radical idea in the sciences.  Moreover, there was a lot of legacy code that people just weren't going to abandon.  Thus, I always viewed Swig as a kind of transitional vehicle for getting people to use Python who might otherwise not even consider it.   Getting back to Nick's point though, to really use Swig effectively, it was always known that you might have to reorganize or refactor your C/C++ code to make it more Python friendly.  However, due to the automatic wrapper generation, you didn't have to do it all at once.  Basically your code could organically evolve and Swig would just keep up with whatever you were doing.  In my projects, we'd usually just tuck Swig away in some Makefile somewhere and forget about it.

One of the major complexities of Swig is the fact that it attempts to parse C/C++ header files.   This very notion is actually a dangerous trap waiting for anyone who wants to wander into it.  You might look at a header file and say, well how hard could it be to just grab a few definitions out of there?   I'll just write a few regexs or come up with some simple hack for recognizing function definitions or something.   Yes, you can do that, but you're immediately going to find that whatever approach you take starts to break down into horrible corner cases.   Swig started out like this and quickly turned into a quagmire of esoteric bug reports.  All sorts of problems with preprocessor macros, typedefs, missing headers, and other things.  For awhile, I would get these bug reports that would go something like "I had this C++ class inside a namespace with an abstract method taking a typedef'd const reference to this smart pointer ..... and Swig broke."   Hell, I can't even understand the bug report let alone know how to fix it.  Almost all of these bugs were due to the fact that Swig started out as a hack and didn't really have any kind of solid conceptual foundation for how it should be put together.

If you flash forward a bit, from about 2001-2004 there was a very serious push to fix these kinds of issues.  Although it was not a complete rewrite of Swig, there were a huge number of changes to how it worked during this time.  Swig grew a fully compatible C++ preprocessor that fully supported macros   A complete C++ type system was implemented including support for namespaces, templates, and even such things as template partial specialization.  Swig evolved into a multi-pass compiler that was doing all sorts of global analysis of the interface.   Just to give you an idea, Swig would do things such as automatically detect/wrap C++ smart pointers.  It could wrap overloaded C++ methods/function.  Also, if you had a C++ class with virtual methods, it would only make one Python wrapper function and then reuse across all wrapped subclasses. 

Under the covers of all of this, the implementation basically evolved into a sophisticated macro preprocessor coupled with a pattern matching engine built on top of the C++ type system.   For example, you could write patterns that matched specific C++ types (the much hated "typemap" feature) and you could write patterns that matched entire C++ declarations.  This whole pattern matching approach had a huge power if you knew what you were doing.  For example, I had a graduate student working on adding "contracts" to Swig--something that was being funded by a NSF grant.   It was cool and mind boggling all at once.

In hindsight however, I think the complexity of Swig has exceeded anyone's ability to fully understand it (including my own).  For example, to even make sense of what's happening, you have to have a pretty solid grasp of the C/C++ type system (easier said than done).   Couple that with all sorts of crazy pattern matching, low-level code fragments, and a ton of macro definitions, your head will literally explode if you try to figure out what's happening.   So far as I know, recent versions of Swig have even combined all of this type-pattern matching with regular expressions.  I can't even fathom it.
 
Sadly, my involvement was Swig was an unfortunate casualty of my academic career biting the dust.  By 2005, I was so burned out of working on it and so sick of what I was doing, I quite literally put all of my computer stuff aside to go play in a band for a few years.   After a few years, I came back to programming (obviously), but not to keep working on the same stuff.   In particularly, I will die quite happy if I never have to look at another line of C++ code ever again.  No, I would much rather fling my toddlers, ride my bike, play piano, or do just about anything than ever do that again.   Although I still subscribe the Swig mailing lists and watch what's happening, I'm not active with it at the moment.

I've sometimes thought it might be interesting to create a Swig replacement purely in Python.  When I work on the PLY project, this is often what I think about.   In that project, I've actually built a number of the parsing tools that would be useful in creating such a thing.   The only catch is that when I start thinking along these lines, I usually reach a point where I say "nah, I'll just write the whole application in Python." 

Anyways, this is probably way more than anyone wants to know about Swig.   Getting back to the original topic of using it to make standard library modules, I just don't know.   I think you probably could have some success with an automatic code generator of some kind.  I'm just not sure it should take the Swig approach of parsing C++ headers.  I think you could do better.

Cheers,
Dave

P.S. By the way, if people want to know a lot more about Swig internals, they should check out the PyCon 2008 presentation I gave about it.  http://www.dabeaz.com/SwigMaster/


From greg at krypto.org  Mon Aug 29 14:51:42 2011
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 29 Aug 2011 05:51:42 -0700
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <20110829142015.5eb247dc@pitrou.net>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
	<CAP7+vJJY3UwBZdFnCjAP830-yRrcaWYc4gP2duk79Z9f_AkU2A@mail.gmail.com>
	<20110829142015.5eb247dc@pitrou.net>
Message-ID: <CAGE7PNJupnf3t5gsAzFnow6dC_9uk4KeYoMqThQdapER9wi1yA@mail.gmail.com>

On Mon, Aug 29, 2011 at 5:20 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Sun, 28 Aug 2011 09:43:33 -0700
> Guido van Rossum <guido at python.org> wrote:
> >
> > This sounds like a very interesting idea to pursue, even if it's late,
> > and even if it's experimental, and even if it's possible to cause
> > deadlocks (no news there). I propose that we offer a C API in Python
> > 3.3 as well as an extension module that offers the proposed decorator.
> > The C API could then be used to implement alternative APIs purely as
> > extension modules (e.g. would a deadlock-detecting API be possible?).
>
> We could offer the C API without shipping an extension module ourselves.
> I don't think we should provide (and maintain!) a Python API that helps
> users put themselves in all kind of nasty situations. There is enough
> misunderstanding around the GIL and multithreading already.
>

+1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110829/b6ede755/attachment.html>

From arigo at tunes.org  Mon Aug 29 14:57:12 2011
From: arigo at tunes.org (Armin Rigo)
Date: Mon, 29 Aug 2011 14:57:12 +0200
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <CAH_1eM2Pj_AWrs_aKjgPsRtgvupM=FC9d7QMa9wZTwkG0dFe5Q@mail.gmail.com>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
	<CAMSv6X3DRGuA+L-3yZ9Ozo_bj7QpTn+qfONrK2b=mCSJKA5jiQ@mail.gmail.com>
	<CAH_1eM2Pj_AWrs_aKjgPsRtgvupM=FC9d7QMa9wZTwkG0dFe5Q@mail.gmail.com>
Message-ID: <CAMSv6X2eYWVP_wgPjZtYFwRZAcDwF8a52g9Uvi1M8UkQVbqgAQ@mail.gmail.com>

Hi Charles-Fran?ois,

2011/8/27 Charles-Fran?ois Natali <neologix at free.fr>:
> The problem is that many locks are actually acquired implicitely.
> For example, `print` to a buffered stream will acquire the fileobject's mutex.

Indeed.  After looking more at the kind of locks used throughout the
stdlib, I notice that in many cases a lock is acquired by code in the
following simple pattern:

    Py_BEGIN_ALLOW_THREADS
    PyThread_acquire_lock(self->lock, 1);
    Py_END_ALLOW_THREADS

If one thread is waiting in the END_ALLOW_THREADS for another one to
release the GIL, but the other one is in a "with atomic" block and
tries to acquire the same lock, deadlock.  But the issue can be
resolved: the first thread in the above example needs to notice that
the other thread is in a "with atomic" block, and "be nice" and
release the lock again.  Then it waits until the "with atomic" block
finishes, and tries again from the start.

We could do this by putting the above pattern it own function (which
makes some sense anyway, because the pattern is repeated left and
right, and is often complicated by an additional "if
(!PyThread_acquire_lock(self->lock, 0))" before); and then allowing
that function to be overridden by the external 'stm' module.

I suspect that I need to do a more thorough review of the stdlib to
make sure (at least more than now) that all potential deadlocking
places can be avoided with a similar refactoring.  All in all, it
seems that the patch to CPython itself will need to be more than just
the few lines in ceval.c --- but still very reasonable both in size and
in content.


A bient?t,

Armin.

From barry at python.org  Mon Aug 29 15:00:56 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 29 Aug 2011 09:00:56 -0400
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <4E59255E.6000905@v.loewis.de>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
	<4E588877.3080204@v.loewis.de> <20110827121012.37b39947@pitrou.net>
	<4E59255E.6000905@v.loewis.de>
Message-ID: <20110829090056.03f719ad@resist.wooz.org>

On Aug 27, 2011, at 07:11 PM, Martin v. L?wis wrote:

>A PEP should IMO only cover end-user aspects of the new re module.
>Code organization is typically not in the PEP. To give a specific
>example: you mentioned that there is (near) code duplication
>MRAB's module. As a reviewer, I would discuss whether this can be
>eliminated - but not in the PEP.

+1

-Barry


From benjamin at python.org  Mon Aug 29 15:20:56 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 29 Aug 2011 09:20:56 -0400
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <9FA8683B-FB0A-4F46-878F-11B36F92A342@twistedmatrix.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
	<9FA8683B-FB0A-4F46-878F-11B36F92A342@twistedmatrix.com>
Message-ID: <CAPZV6o8GnJrk_q+9NfEEmN34z_0f==L92ZEGhfvn=AxwHSPT_Q@mail.gmail.com>

2011/8/29 Glyph Lefkowitz <glyph at twistedmatrix.com>:
>
> On Aug 28, 2011, at 7:27 PM, Guido van Rossum wrote:
>
> In general, an existing library cannot be called
> without access to its .h files -- there are probably struct and
> constant definitions, platform-specific #ifdefs and #defines, and
> other things in there that affect the linker-level calling conventions
> for the functions in the library.
>
> Unfortunately I don't know a lot about this, but I keep hearing about
> something called "rffi" that PyPy uses to call C from RPython:
> <http://readthedocs.org/docs/pypy/en/latest/rffi.html>. ?This has some
> shortcomings currently, most notably the fact that it needs those .h files
> (and therefore a C compiler) at runtime

This is incorrect. rffi is actually quite like ctypes. The part you
are referring to is probably rffi_platform [1], which invokes the
compiler to determine constant values and struct offsets, or
ctypes_configure, which does need runtime headers [2].

[1] https://bitbucket.org/pypy/pypy/src/92e36ab4eb5e/pypy/rpython/tool/rffi_platform.py

[2] https://bitbucket.org/pypy/pypy/src/92e36ab4eb5e/ctypes_configure/

-- 
Regards,
Benjamin

From barry at python.org  Mon Aug 29 15:33:05 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 29 Aug 2011 09:33:05 -0400
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<CAGGBd_pR8Fi1bSvriH756thnAwm8Y1EB9tfuLevPWK+bDmz-2Q@mail.gmail.com>
	<20110827020835.08a2a492@pitrou.net>
	<CAGGBd_rNnB7+uTUqMOFfGiBZ=QN8hxMxft8fCAi+TbhYTDw4hw@mail.gmail.com>
Message-ID: <20110829093305.256a6e6b@resist.wooz.org>

On Aug 26, 2011, at 05:25 PM, Dan Stromberg wrote:

>from __future__ import is an established way of trying something for a while
>to see if it's going to work.

Actually, no.

The documentation says:

-----snip snip-----
__future__ is a real module, and serves three purposes:

* To avoid confusing existing tools that analyze import statements and expect
  to find the modules they?re importing.
* To ensure that future statements run under releases prior to 2.1 at least
  yield runtime exceptions (the import of __future__ will fail, because there
  was no module of that name prior to 2.1).
* To document when incompatible changes were introduced, and when they will be
  ? or were ? made mandatory. This is a form of executable documentation, and
  can be inspected programmatically via importing __future__ and examining its
  contents.
-----snip snip-----

So, really the __future__ module is a way to introduce accepted but
incompatible changes in a controlled way, through successive releases.  It's
never been used to introduce experimental features that might be removed if
they don't work out.

Cheers,
-Barry

From barry at python.org  Mon Aug 29 15:41:15 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 29 Aug 2011 09:41:15 -0400
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <87fwknv92x.fsf@benfinney.id.au>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E581986.3000709@egenix.com>
	<CAP7+vJ+wV8GS=O1ATMKXatCy4vDuo_Rt33uya+uC7pSP+owFhQ@mail.gmail.com>
	<4E58258F.9050204@egenix.com> <87mxevvea5.fsf@benfinney.id.au>
	<4E584E52.1080606@pearwood.info> <87fwknv92x.fsf@benfinney.id.au>
Message-ID: <20110829094115.4721b7f9@resist.wooz.org>

On Aug 27, 2011, at 01:15 PM, Ben Finney wrote:

>My question is directed more to M-A Lemburg's passage above, and its
>implicit assumption that the user understand the changes between
>?Unicode 2.0/3.0 semantics? and ?Unicode 6 semantics?, and how their own
>needs relate to those semantics.

More likely, it'll be a choice between wanting Unicode 6 semantics, and "don't
care".  So the PEP could include some clues as to why you'd care to use regex
instead of re.

-Barry

From greg at krypto.org  Mon Aug 29 15:44:27 2011
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 29 Aug 2011 06:44:27 -0700
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
	<A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
Message-ID: <CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>

On Sat, Aug 27, 2011 at 2:59 AM, Ask Solem <ask at celeryproject.org> wrote:

>
> On 26 Aug 2011, at 16:53, Antoine Pitrou wrote:
>
> >
> > Hi,
> >
> >> I think that "deprecating" the use of threads w/ multiprocessing - or
> >> at least crippling it is the wrong answer. Multiprocessing needs the
> >> helper threads it uses internally to manage queues, etc. Removing that
> >> ability would require a near-total rewrite, which is just a
> >> non-starter.
> >
> > I agree that this wouldn't actually benefit anyone.
> > Besides, I don't think it's even possible to avoid threads in
> > multiprocessing, given the various constraints. We would have to force
> > the user to run their main thread in an event loop, and that would be
> > twisted (tm).
> >
> >> I would focus on the atfork() patch more directly, ignoring
> >> multiprocessing in the discussion, and focusing on the merits of gps'
> >> initial proposal and patch.
> >
> > I think this could also be combined with Charles-Fran?ois' patch.
> >
> > Regards
>
>
>
> Have to agree with Jesse and Antoine here.
>
> Celery (celeryproject.org) uses multiprocessing, is wildly used in
> production,
> and is regarded as stable software that have been known to run for months
> at a time
> only to be restarted for software upgrades.
>
> I have been investigating an issue for some time, that I'm pretty sure is
> caused
> by this.  It occurs only rarely, so rarely I have not had any actual bug
> reports
> about it, it's just something I have experienced during extensive testing.
> The tone of the discussion on the bug tracker makes me think that I have
> been very lucky :-)
>
> Using the fork+exec approach seems like a much more realistic solution
> than rewriting multiprocessing.Pool and Manager to not use threads. In fact
> this is something I have been considering as a fix for the suspected
> issue for for some time.
> It does have implications that are annoying for sure, but we are already
> used to this on the Windows platform (it could help portability even).
>

+3 (agreed to Jesse, Antoine and Ask here).  The
http://bugs.python.org/issue8713 described "non-fork" implementation that
always uses subprocesses rather than plain forked processes is the right way
forward for multiprocessing.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110829/d8069757/attachment.html>

From stefan_ml at behnel.de  Mon Aug 29 16:14:53 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 29 Aug 2011 16:14:53 +0200
Subject: [Python-Dev] Cython, ctypes and the stdlib
In-Reply-To: <CACac1F-n-1z_BZ3Nr448KRp45KhTcMpp_2E68UemhszRRDAZjQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>	<4E5951D5.5020200@v.loewis.de>	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>	<20110828002642.4765fc89@pitrou.net>	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>	<20110828012705.523e51d4@pitrou.net>	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>	<j3chv2$tvf$1@dough.gmane.org>
	<j3e137$q0g$1@dough.gmane.org>	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>	<j3fmo0$68f$1@dough.gmane.org>
	<CACac1F-n-1z_BZ3Nr448KRp45KhTcMpp_2E68UemhszRRDAZjQ@mail.gmail.com>
Message-ID: <j3g6st$kcb$1@dough.gmane.org>

Hi,

I agree that this is getting off-topic for this list. I'm answering here in 
a certain detail to lighten things up a bit regarding thin and thick 
wrappers, but please move further usage related questions to the 
cython-users mailing list.

Paul Moore, 29.08.2011 12:37:
> On 29 August 2011 10:39, Stefan Behnel wrote:
>> In the CPython backend, the header files are normally #included by the
>> generated C code, so they are used at C compilation time.
>>
>> Cython has its own view on the header files in separate declaration files
>> (.pxd). Basically looks like this:
>>
>>     # file "mymath.pxd"
>>     cdef extern from "aheader.h":
>>         double PI
>>         double E
>>         double abs(double x)
>>
>> These declaration files usually only contain the parts of a header file that
>> are used in the user code, either manually copied over or extracted by
>> scripts (that's what I was referring to in my reply to Terry). The complete
>> 'real' content of the header file is then used by the C compiler at C
>> compilation time.
>>
>> The user code employs a "cimport" statement to import the declarations at
>> Cython compilation time, e.g.
>>
>>     # file "mymodule.pyx"
>>     cimport mymath
>>     print mymath.PI + mymath.E
>>
>> would result in C code that #includes "aheader.h", adds the C constants "PI"
>> and "E", converts the result to a Python float object and prints it out
>> using the normal CPython machinery.
>
> One thing that would make it easier for me to understand the role of
> Cython in this context would be to see a simple example of the type of
> "thin wrapper" we're talking about here. The above code is nearly
> this, but the pyx file executes "real code".

Yes, that's the idea. If all you want is an exact, thin wrapper, you are 
better off with SWIG (well, assuming that performance is not important for 
you - Cython is a *lot* faster). But if you use it, or any other plain glue 
code generator, chances are that you will quickly learn that you do not 
actually want a thin wrapper. Instead, you want something that makes the 
external library easily and efficiently usable from Python code. Which 
means that the wrapper will be thin in some places and thick in others, 
sometimes very thick in selected places, and usually growing thicker over time.

You can do this by using a glue code generator and writing the rest in a 
Python wrapper on top of the thin glue code. It's just that Cython makes 
such a wrapper much more efficient (for CPython), be it in terms of CPU 
performance (fast Python interaction, overhead-free C interaction, native C 
data type support, various Python code optimisations), or in terms of 
parallelisation support (explicit GIL-free threading and OpenMP), or just 
general programmer efficiency, e.g. regarding automatic data conversion or 
ease and safety of manual C memory management.


> For example, how do I simply expose pi and abs from math.h? Based on
> the above, I tried a pyx file containing just the code
>
>      cdef extern from "math.h":
>          double pi
>          double abs(double x)
>
> but the resulting module exported no symbols.

Recent Cython versions have support for directly exporting C values (e.g. 
enum values) at the Python module level. However, the normal way is to 
explicitly implement the module API as you guessed, i.e.

     cimport mydecls   # assuming there is a mydecls.pxd

     PI = mydecls.PI
     def abs(x):
         return mydecls.abs(x)

Looks simple, right? Nothing interesting here, until you start putting 
actual code into it, as in this (totally contrived and untested, but much 
more correct) example:

     from libc cimport math

     cdef extern from *:
         # these are defined by the always included Python.h:
         long LONG_MAX, LONG_MIN

     def abs(x):
         if isinstance(x, float):    # -> C double
             return math.fabs(x)
         elif isinstance(x, int):    # -> may or may not be a C integer
             if LONG_MIN <= x <= LONG_MAX:
                 return <unsigned long> math.labs(x)
             else:
                 # either within "long long" or raise OverflowError
                 return <unsigned long long> math.llabs(x)
         else:
             # assume it can at least coerce to a C long,
             # or raise ValueError or OverflowError or whatever
             return <unsigned long> math.labs(x)

BTW, there is some simple templating/generics-like type merging support 
upcoming in a GSoC to simplify this kind of type specific code.


> This is probably a bit off-topic, but it seems to me that whenever
> Cython comes up in these discussions, the implications of
> Cython-as-an-implementation-of-python obscure the idea of simply using
> Cython as a means of writing thin library wrappers.

Cython is not a glue code generator, it's a full-fledged programming 
language. It's Python, with additional support for C data types. That makes 
it great for writing non-trivial wrappers between Python and C. It's not so 
great for the trivial cases, but luckily, those are rare. ;)


> I've kept python-dev in this response, on the assumption that others
> on the list might be glad of seeing a concrete example of using Cython
> to build wrapper code. But anything further should probably be taken
> off-list...

Agreed. The best place for asking about Cython usage is the cython-users 
mailing list.


> PS This would also probably be a useful addition to the Cython wiki
> and/or the manual. I searched both and found very little other than a
> page on wrapping C++ classes (which is not very helpful for simple C
> global functions and constants).

Hmm, ok, I guess that's because it's too simple (you actually guessed how 
it works) and a somewhat rare use case. In most cases, wrappers tend to use 
extension types, as presented here:

http://docs.cython.org/src/tutorial/clibraries.html

Stefan


From barry at python.org  Mon Aug 29 18:24:20 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 29 Aug 2011 12:24:20 -0400
Subject: [Python-Dev] PEP categories (was Re:  PEP 393 review)
In-Reply-To: <CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de> <20110825132734.1c236d17@pitrou.net>
	<4E5A9B39.8090009@v.loewis.de>
	<CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
Message-ID: <20110829122420.2d342f9c@resist.wooz.org>

On Aug 29, 2011, at 11:03 AM, Dirkjan Ochtman wrote:

>Also, this PEP makes me wonder if there should be a way to distinguish
>between language PEPs and (CPython) implementation PEPs, by adding a
>tag or using the PEP number ranges somehow.

I've thought about this, and about a similar split between language changes
and stdlib changes (i.e. new modules such as regex).  Probably the best thing
to do would be to allocate some 1000's to the different categories, like we
did for the 3xxx Python 3k PEPS (now largely moot though).

-Barry


From neologix at free.fr  Mon Aug 29 18:24:29 2011
From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Mon, 29 Aug 2011 18:24:29 +0200
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
	<A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
	<CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>
Message-ID: <CAH_1eM1jGaf1TirSLv6GyY8bKbUJnezU1+qBNANHKA5G=NfnNA@mail.gmail.com>

> +3 (agreed to Jesse, Antoine and Ask here).
> ?The?http://bugs.python.org/issue8713?described "non-fork" implementation
> that always uses subprocesses rather than plain forked processes is the
> right way forward for multiprocessing.

I see two drawbacks:
- it will be slower, since the interpreter startup time is
non-negligible (well, normally you shouldn't spawn a new process for
every item, but it should be noted)
- it'll consume more memory, since we lose the COW advantage (even
though it's already limited by the fact that even treating a variable
read-only can trigger an incref, as was noted in a previous thread)

cf

From dirkjan at ochtman.nl  Mon Aug 29 18:38:23 2011
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Mon, 29 Aug 2011 18:38:23 +0200
Subject: [Python-Dev] PEP categories (was Re: PEP 393 review)
In-Reply-To: <20110829122420.2d342f9c@resist.wooz.org>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de> <20110825132734.1c236d17@pitrou.net>
	<4E5A9B39.8090009@v.loewis.de>
	<CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
	<20110829122420.2d342f9c@resist.wooz.org>
Message-ID: <CAKmKYaBUdxystWO3mL2z-0m17GZqKRvy4kCbQQWE-pcCCOOAFw@mail.gmail.com>

On Mon, Aug 29, 2011 at 18:24, Barry Warsaw <barry at python.org> wrote:
>>Also, this PEP makes me wonder if there should be a way to distinguish
>>between language PEPs and (CPython) implementation PEPs, by adding a
>>tag or using the PEP number ranges somehow.
>
> I've thought about this, and about a similar split between language changes
> and stdlib changes (i.e. new modules such as regex). ?Probably the best thing
> to do would be to allocate some 1000's to the different categories, like we
> did for the 3xxx Python 3k PEPS (now largely moot though).

Allocating 1000's seems sensible enough to me.

And yes, the division between recents 3x and non-3x PEPs seems quite arbitrary.

Cheers,

Dirkjan

P.S. Perhaps the index could list accepted and open PEPs before meta
and informational? And maybe reverse the order under some headings,
for example in the finished category...

From solipsis at pitrou.net  Mon Aug 29 18:40:37 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 29 Aug 2011 18:40:37 +0200
Subject: [Python-Dev] PEP categories (was Re: PEP 393 review)
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de> <20110825132734.1c236d17@pitrou.net>
	<4E5A9B39.8090009@v.loewis.de>
	<CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
	<20110829122420.2d342f9c@resist.wooz.org>
	<CAKmKYaBUdxystWO3mL2z-0m17GZqKRvy4kCbQQWE-pcCCOOAFw@mail.gmail.com>
Message-ID: <20110829184037.594359f0@pitrou.net>

On Mon, 29 Aug 2011 18:38:23 +0200
Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
> On Mon, Aug 29, 2011 at 18:24, Barry Warsaw <barry at python.org> wrote:
> >>Also, this PEP makes me wonder if there should be a way to distinguish
> >>between language PEPs and (CPython) implementation PEPs, by adding a
> >>tag or using the PEP number ranges somehow.
> >
> > I've thought about this, and about a similar split between language changes
> > and stdlib changes (i.e. new modules such as regex). ?Probably the best thing
> > to do would be to allocate some 1000's to the different categories, like we
> > did for the 3xxx Python 3k PEPS (now largely moot though).
> 
> Allocating 1000's seems sensible enough to me.
> 
> And yes, the division between recents 3x and non-3x PEPs seems quite arbitrary.

I like the 3k numbers myself :))




From stefan_ml at behnel.de  Mon Aug 29 18:55:00 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 29 Aug 2011 18:55:00 +0200
Subject: [Python-Dev] PEP categories (was Re:  PEP 393 review)
In-Reply-To: <20110829122420.2d342f9c@resist.wooz.org>
References: <4E553FBC.7080501@v.loewis.de>
	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>
	<20110825132734.1c236d17@pitrou.net>	<4E5A9B39.8090009@v.loewis.de>	<CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
	<20110829122420.2d342f9c@resist.wooz.org>
Message-ID: <j3gg94$rg1$1@dough.gmane.org>

Barry Warsaw, 29.08.2011 18:24:
> On Aug 29, 2011, at 11:03 AM, Dirkjan Ochtman wrote:
>
>> Also, this PEP makes me wonder if there should be a way to distinguish
>> between language PEPs and (CPython) implementation PEPs, by adding a
>> tag or using the PEP number ranges somehow.
>
> I've thought about this, and about a similar split between language changes
> and stdlib changes (i.e. new modules such as regex).  Probably the best thing
> to do would be to allocate some 1000's to the different categories, like we
> did for the 3xxx Python 3k PEPS (now largely moot though).

These things tend to get somewhat clumsy over time, though. What about a 
stdlib change that only applies to CPython for some reason, e.g. because no 
other implementation currently has that module?

I think it's ok to make a coarse-grained distinction by numbers, but there 
should also be a way to tag PEPs textually.

Stefan


From jnoller at gmail.com  Mon Aug 29 19:03:53 2011
From: jnoller at gmail.com (Jesse Noller)
Date: Mon, 29 Aug 2011 13:03:53 -0400
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <CAH_1eM1jGaf1TirSLv6GyY8bKbUJnezU1+qBNANHKA5G=NfnNA@mail.gmail.com>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
	<A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
	<CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>
	<CAH_1eM1jGaf1TirSLv6GyY8bKbUJnezU1+qBNANHKA5G=NfnNA@mail.gmail.com>
Message-ID: <CACQrdOni_HGkd7AJGSU4HtAbfTaYWGEZbK1tBNpKUMiJywRYxg@mail.gmail.com>

2011/8/29 Charles-Fran?ois Natali <neologix at free.fr>:
>> +3 (agreed to Jesse, Antoine and Ask here).
>> ?The?http://bugs.python.org/issue8713?described "non-fork" implementation
>> that always uses subprocesses rather than plain forked processes is the
>> right way forward for multiprocessing.
>
> I see two drawbacks:
> - it will be slower, since the interpreter startup time is
> non-negligible (well, normally you shouldn't spawn a new process for
> every item, but it should be noted)

Yes; but spawning and forking are both slow to begin with - it's
documented (I hope heavily enough) that you should spawn
multiprocessing children early, and keep them around instead of
constantly creating/destroying them.

> - it'll consume more memory, since we lose the COW advantage (even
> though it's already limited by the fact that even treating a variable
> read-only can trigger an incref, as was noted in a previous thread)
>
> cf

Yes, it would consume slightly more memory; but the benefits - making
it consistent across *all* platforms with the *same* restrictions gets
us closer to the principle of least surprise.

From eliben at gmail.com  Mon Aug 29 19:14:13 2011
From: eliben at gmail.com (Eli Bendersky)
Date: Mon, 29 Aug 2011 20:14:13 +0300
Subject: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)
In-Reply-To: <A0128757-0E04-4D97-B27E-DA4AC500FFB1@dabeaz.com>
References: <mailman.2419.1314608606.27777.python-dev@python.org>
	<A0128757-0E04-4D97-B27E-DA4AC500FFB1@dabeaz.com>
Message-ID: <CAF-Rda_16bGNCosOzFH0-tBOp6cTkKRPu7p+z1dQSiOrcL7NMQ@mail.gmail.com>

<snip>

> I've sometimes thought it might be interesting to create a Swig replacement
> purely in Python.  When I work on the PLY project, this is often what I
> think about.   In that project, I've actually built a number of the parsing
> tools that would be useful in creating such a thing.   The only catch is
> that when I start thinking along these lines, I usually reach a point where
> I say "nah, I'll just write the whole application in Python."
>
> Anyways, this is probably way more than anyone wants to know about Swig.
> Getting back to the original topic of using it to make standard library
> modules, I just don't know.   I think you probably could have some success
> with an automatic code generator of some kind.  I'm just not sure it should
> take the Swig approach of parsing C++ headers.  I think you could do better.
>
>
Dave,

Having written a full C99 parser (http://code.google.com/p/pycparser/) based
on your (excellent) PLY library, my impression is that the problem is with
the problem, not with the solution. Strange sentence, I know :-) What I mean
is that parsing C++ (even its headers) is inherently hard, which is why the
solutions tend to grow so complex. Even with the modest C99, clean and
simple solutions based on theoretical approaches (like PLY with its
generated LALR parsers) tend to run into walls [*]. C++ is an order of
magnitude harder.

If I went to implement something like SWIG today, I would almost surely base
my implementation on Clang (http://clang.llvm.org/). They have a full C++
parser (carefully hand-crafted, quite admirably keeping a relatively
comprehensible code-base for such a task) used in a real compiler front-end,
and a flexible library structure aimed at creating tools. There are also
Python bindings that would allow to do most of the interesting
Python-interface-specific work in Python - parse the C++ headers using
Clang's existing parser into ASTs - then generate ctypes / extensions from
that, *in Python*.

The community is also gladly accepting contributions. I've had some fixes
committed for the Python bindings and the C interfaces that tie them to
Clang, and got the impression from Clang's core devs that further
contributions will be most welcome. So whatever is missing from the Python
bindings can be easily added.

Eli

[*]
http://eli.thegreenplace.net/2011/05/02/the-context-sensitivity-of-c%E2%80%99s-grammar-revisited/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110829/45c4dbc5/attachment.html>

From solipsis at pitrou.net  Mon Aug 29 19:16:08 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 29 Aug 2011 19:16:08 +0200
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
	<A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
	<CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>
	<CAH_1eM1jGaf1TirSLv6GyY8bKbUJnezU1+qBNANHKA5G=NfnNA@mail.gmail.com>
	<CACQrdOni_HGkd7AJGSU4HtAbfTaYWGEZbK1tBNpKUMiJywRYxg@mail.gmail.com>
Message-ID: <20110829191608.7916da73@pitrou.net>

On Mon, 29 Aug 2011 13:03:53 -0400
Jesse Noller <jnoller at gmail.com> wrote:
> 2011/8/29 Charles-Fran?ois Natali <neologix at free.fr>:
> >> +3 (agreed to Jesse, Antoine and Ask here).
> >> ?The?http://bugs.python.org/issue8713?described "non-fork" implementation
> >> that always uses subprocesses rather than plain forked processes is the
> >> right way forward for multiprocessing.
> >
> > I see two drawbacks:
> > - it will be slower, since the interpreter startup time is
> > non-negligible (well, normally you shouldn't spawn a new process for
> > every item, but it should be noted)
> 
> Yes; but spawning and forking are both slow to begin with - it's
> documented (I hope heavily enough) that you should spawn
> multiprocessing children early, and keep them around instead of
> constantly creating/destroying them.

I think fork() is quite fast on modern systems (e.g. Linux). exec() is
certainly slow, though.

The third drawback is that you are limited to picklable objects when
specifying the arguments for your child process. This can be annoying
if, for example, you wanted to pass an OS resource.

Regards

Antoine.



From jnoller at gmail.com  Mon Aug 29 19:23:20 2011
From: jnoller at gmail.com (Jesse Noller)
Date: Mon, 29 Aug 2011 13:23:20 -0400
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <20110829191608.7916da73@pitrou.net>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
	<A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
	<CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>
	<CAH_1eM1jGaf1TirSLv6GyY8bKbUJnezU1+qBNANHKA5G=NfnNA@mail.gmail.com>
	<CACQrdOni_HGkd7AJGSU4HtAbfTaYWGEZbK1tBNpKUMiJywRYxg@mail.gmail.com>
	<20110829191608.7916da73@pitrou.net>
Message-ID: <CACQrdOmfWKRENi=K46VdHTE3pcB09ciQoXf=DjskEnswz+XZ-A@mail.gmail.com>

On Mon, Aug 29, 2011 at 1:16 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 29 Aug 2011 13:03:53 -0400
> Jesse Noller <jnoller at gmail.com> wrote:
>> 2011/8/29 Charles-Fran?ois Natali <neologix at free.fr>:
>> >> +3 (agreed to Jesse, Antoine and Ask here).
>> >> ?The?http://bugs.python.org/issue8713?described "non-fork" implementation
>> >> that always uses subprocesses rather than plain forked processes is the
>> >> right way forward for multiprocessing.
>> >
>> > I see two drawbacks:
>> > - it will be slower, since the interpreter startup time is
>> > non-negligible (well, normally you shouldn't spawn a new process for
>> > every item, but it should be noted)
>>
>> Yes; but spawning and forking are both slow to begin with - it's
>> documented (I hope heavily enough) that you should spawn
>> multiprocessing children early, and keep them around instead of
>> constantly creating/destroying them.
>
> I think fork() is quite fast on modern systems (e.g. Linux). exec() is
> certainly slow, though.
>
> The third drawback is that you are limited to picklable objects when
> specifying the arguments for your child process. This can be annoying
> if, for example, you wanted to pass an OS resource.
>
> Regards
>
> Antoine.

Yes, it is annoying; but again - this makes it more consistent with
the windows implementation. I'd rather that restriction than the
"sanitization" of the ability to use threading and multiprocessing
alongside one another.

From solipsis at pitrou.net  Mon Aug 29 19:22:53 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 29 Aug 2011 19:22:53 +0200
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <CACQrdOmfWKRENi=K46VdHTE3pcB09ciQoXf=DjskEnswz+XZ-A@mail.gmail.com>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
	<A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
	<CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>
	<CAH_1eM1jGaf1TirSLv6GyY8bKbUJnezU1+qBNANHKA5G=NfnNA@mail.gmail.com>
	<CACQrdOni_HGkd7AJGSU4HtAbfTaYWGEZbK1tBNpKUMiJywRYxg@mail.gmail.com>
	<20110829191608.7916da73@pitrou.net>
	<CACQrdOmfWKRENi=K46VdHTE3pcB09ciQoXf=DjskEnswz+XZ-A@mail.gmail.com>
Message-ID: <1314638573.3551.14.camel@localhost.localdomain>

Le lundi 29 ao?t 2011 ? 13:23 -0400, Jesse Noller a ?crit :
> 
> Yes, it is annoying; but again - this makes it more consistent with
> the windows implementation. I'd rather that restriction than the
> "sanitization" of the ability to use threading and multiprocessing
> alongside one another.

That sanitization is generally useful, though. For example if you want
to use any I/O after a fork().

Regards

Antoine.



From s.brunthaler at uci.edu  Mon Aug 29 19:35:14 2011
From: s.brunthaler at uci.edu (stefan brunthaler)
Date: Mon, 29 Aug 2011 10:35:14 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
Message-ID: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>

Hi,

pretty much a year ago I wrote about the optimizations I did for my
PhD thesis that target the Python 3 series interpreters. While I got
some replies, the discussion never really picked up and no final
explicit conclusion was reached. AFAICT, because of the following two
factors, my optimizations were not that interesting for inclusion with
the distribution at that time:
a) Unladden Swallow was targeting Python 3, too.
b) My prototype did not pass the regression tests.

As of November 2010 (IIRC), Google is not supporting work on US
anymore, and the project is stalled. (If I am wrong and there is still
activity and any plans with the corresponding PEP, please let me
know.) Which is why I recently spent some time fixing issues so that I
can run the regression tests. There is still some work to be done, but
by and large it should be possible to complete all regression tests in
reasonable time (with the actual infrastructure in place, enabling
optimizations later on is not a problem at all, too.)

So, the two big issues aside, is there any interest in incorporating
these optimizations in Python 3?

Have a nice day,
--stefan

PS: It probably is unusual, but in a part of my home page I have
created a link to indicate interest (makes both counting and voting
easier, http://www.ics.uci.edu/~sbruntha/) There were also links
indicating interest in funding the work; I have disabled these, so as
not to upset anybody or make the impression of begging for money...

From jnoller at gmail.com  Mon Aug 29 19:42:02 2011
From: jnoller at gmail.com (Jesse Noller)
Date: Mon, 29 Aug 2011 13:42:02 -0400
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <1314638573.3551.14.camel@localhost.localdomain>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
	<A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
	<CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>
	<CAH_1eM1jGaf1TirSLv6GyY8bKbUJnezU1+qBNANHKA5G=NfnNA@mail.gmail.com>
	<CACQrdOni_HGkd7AJGSU4HtAbfTaYWGEZbK1tBNpKUMiJywRYxg@mail.gmail.com>
	<20110829191608.7916da73@pitrou.net>
	<CACQrdOmfWKRENi=K46VdHTE3pcB09ciQoXf=DjskEnswz+XZ-A@mail.gmail.com>
	<1314638573.3551.14.camel@localhost.localdomain>
Message-ID: <CACQrdO=wfH5eNkSBUd0okyGnA69xV5zA9fx=Gify_Put69RvDg@mail.gmail.com>

On Mon, Aug 29, 2011 at 1:22 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Le lundi 29 ao?t 2011 ? 13:23 -0400, Jesse Noller a ?crit :
>>
>> Yes, it is annoying; but again - this makes it more consistent with
>> the windows implementation. I'd rather that restriction than the
>> "sanitization" of the ability to use threading and multiprocessing
>> alongside one another.
>
> That sanitization is generally useful, though. For example if you want
> to use any I/O after a fork().

Oh! I don't disagree; I'm just against the removal of the ability to
mix multiprocessing and threads; which it does internally and others
do in every day code.

The "proposed" removal of that functionality - using the two together
- would leave users in the dust, and not needed if we patch
http://bugs.python.org/issue8713 - which at it's core is just an
addition flag. We could document the risk(s) of using the fork()
mechanism which has to remain the default for some time.

The point is, is that the solution to http://bugs.python.org/issue6721
should not be intertwined or cause a severe change in the
multiprocessing module (e.g. "rewriting from scratch"), etc. I'm not
arguing that both bugs should not be fixed.

jesse

From benjamin at python.org  Mon Aug 29 20:10:12 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 29 Aug 2011 14:10:12 -0400
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
Message-ID: <CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>

2011/8/29 stefan brunthaler <s.brunthaler at uci.edu>:
> So, the two big issues aside, is there any interest in incorporating
> these optimizations in Python 3?

Perhaps there would be something to say given patches/overviews/specifics.


-- 
Regards,
Benjamin

From nir at winpdb.org  Mon Aug 29 20:29:11 2011
From: nir at winpdb.org (Nir Aides)
Date: Mon, 29 Aug 2011 21:29:11 +0300
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <20110829191608.7916da73@pitrou.net>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
	<A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
	<CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>
	<CAH_1eM1jGaf1TirSLv6GyY8bKbUJnezU1+qBNANHKA5G=NfnNA@mail.gmail.com>
	<CACQrdOni_HGkd7AJGSU4HtAbfTaYWGEZbK1tBNpKUMiJywRYxg@mail.gmail.com>
	<20110829191608.7916da73@pitrou.net>
Message-ID: <CAEd-RNoptY358uLR0N8db5TRqZaw=HgbVD0dYrsDn8QfdUx+6A@mail.gmail.com>

On Mon, Aug 29, 2011 at 8:16 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> On Mon, 29 Aug 2011 13:03:53 -0400 Jesse Noller <jnoller at gmail.com> wrote:
> >
> > Yes; but spawning and forking are both slow to begin with - it's
> > documented (I hope heavily enough) that you should spawn
> > multiprocessing children early, and keep them around instead of
> > constantly creating/destroying them.
>
> I think fork() is quite fast on modern systems (e.g. Linux). exec() is
> certainly slow, though.

On my system, the time it takes worker code to start is:

40 usec with thread.start_new_thread
240 usec with threading.Thread().start
450 usec with os.fork
1 ms with multiprocessing.Process.start
25 ms with subprocess.Popen to start a trivial script.

so os.fork has similar latency to threading.Thread().start, while
spawning is 100 times slower.

From s.brunthaler at uci.edu  Mon Aug 29 20:33:14 2011
From: s.brunthaler at uci.edu (stefan brunthaler)
Date: Mon, 29 Aug 2011 11:33:14 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
Message-ID: <CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>

> Perhaps there would be something to say given patches/overviews/specifics.
>
Currently I don't have patches, but for an overview and specifics, I
can provide the following:
* My optimizations basically rely on quickening to incorporate
run-time information.
* I use two separate instruction dispatch routines, and use profiling
to switch from the regular Python 3 dispatch routine to an optimized
one (the implementation is actually vice versa, but that is not
important now)
* The optimized dispatch routine has a changed instruction format
(word-sized instead of bytecodes) that allows for regular instruction
decoding (without the HAS_ARG-check) and inlinind of some objects in
the instruction format on 64bit architectures.
* I use inline-caching based on quickening (passes almost all
regression tests [302 out of 307]), eliminate reference count
operations using quickening (passes but has a memory leak), promote
frequently accessed local variables to their dedicated instructions
(passes), and cache LOAD_GLOBAL/LOAD_NAME objects in the instruction
encoding when possible (I am working on this right now.)

The changes I made can be summarized as:
* I changed some header files to accommodate additional information
(Python.h, ceval.h, code.h, frameobject.h, opcode.h, tupleobject.h)
* I changed mostly abstract.c to incorporate runtime-type feedback.
* All other changes target mostly ceval.c and all supplementary code
is in a sub-directory named "opt" and all generated files in a
sub-directory within that ("opt/gen").
* I have a code generator in place that takes care of generating all
the functions; it uses the Mako template system for creating C code
and does not necessarily need to be shipped with the interpreter
(though one can play around and experiment with it.)

So, all in all, the changes are not that big to the actual
implementation, and most of the code is generated (using sloccount,
opt has 1990 lines of C, and opt/gen has 8649 lines of C).

That's a quick summary, if there are any further or more in-depth
questions, let me know.

best,
--stefan

From cournape at gmail.com  Mon Aug 29 20:37:31 2011
From: cournape at gmail.com (David Cournapeau)
Date: Mon, 29 Aug 2011 20:37:31 +0200
Subject: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)
In-Reply-To: <CAF-Rda_16bGNCosOzFH0-tBOp6cTkKRPu7p+z1dQSiOrcL7NMQ@mail.gmail.com>
References: <mailman.2419.1314608606.27777.python-dev@python.org>
	<A0128757-0E04-4D97-B27E-DA4AC500FFB1@dabeaz.com>
	<CAF-Rda_16bGNCosOzFH0-tBOp6cTkKRPu7p+z1dQSiOrcL7NMQ@mail.gmail.com>
Message-ID: <CAGY4rcV7aF=uAWqDMhX8QEnskWgXHbQXnWnJYwbh=muAeCD0jQ@mail.gmail.com>

On Mon, Aug 29, 2011 at 7:14 PM, Eli Bendersky <eliben at gmail.com> wrote:
> <snip>
>>
>> I've sometimes thought it might be interesting to create a Swig
>> replacement purely in Python. ?When I work on the PLY project, this is often
>> what I think about. ? In that project, I've actually built a number of the
>> parsing tools that would be useful in creating such a thing. ? The only
>> catch is that when I start thinking along these lines, I usually reach a
>> point where I say "nah, I'll just write the whole application in Python."
>>
>> Anyways, this is probably way more than anyone wants to know about Swig.
>> Getting back to the original topic of using it to make standard library
>> modules, I just don't know. ? I think you probably could have some success
>> with an automatic code generator of some kind. ?I'm just not sure it should
>> take the Swig approach of parsing C++ headers. ?I think you could do better.
>>
>
> Dave,
>
> Having written a full C99 parser (http://code.google.com/p/pycparser/) based
> on your (excellent) PLY library, my impression is that the problem is with
> the problem, not with the solution. Strange sentence, I know :-) What I mean
> is that parsing C++ (even its headers) is inherently hard, which is why the
> solutions tend to grow so complex. Even with the modest C99, clean and
> simple solutions based on theoretical approaches (like PLY with its
> generated LALR parsers) tend to run into walls [*]. C++ is an order of
> magnitude harder.
>
> If I went to implement something like SWIG today, I would almost surely base
> my implementation on Clang (http://clang.llvm.org/). They have a full C++
> parser (carefully hand-crafted, quite admirably keeping a relatively
> comprehensible code-base for such a task) used in a real compiler front-end,
> and a flexible library structure aimed at creating tools. There are also
> Python bindings that would allow to do most of the interesting
> Python-interface-specific work in Python - parse the C++ headers using
> Clang's existing parser into ASTs - then generate ctypes / extensions from
> that, *in Python*.
>
> The community is also gladly accepting contributions. I've had some fixes
> committed for the Python bindings and the C interfaces that tie them to
> Clang, and got the impression from Clang's core devs that further
> contributions will be most welcome. So whatever is missing from the Python
> bindings can be easily added.

Agreed, I know some people have looked into that direction in the
scientific python community (to generate .pxd for cython). I wrote one
of the hack Stefan refered to (based on ctypeslib using gccxml), and
using clang makes so much more sense.

To go back to the initial issue, using cython to wrap C code makes a
lot of sense. In the scipy community, I believe there is a broad
agreement that most of code which would requires C/C++ should be done
in cython instead (numpy and scipy already do so a bit). I personally
cannot see man situations where writing wrappers in C by hand works
better than cython (especially since cython handles python2/3
automatically for you).

cheers,

David

From barry at python.org  Mon Aug 29 20:59:02 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 29 Aug 2011 14:59:02 -0400
Subject: [Python-Dev] PEP categories (was Re:  PEP 393 review)
In-Reply-To: <j3gg94$rg1$1@dough.gmane.org>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de> <20110825132734.1c236d17@pitrou.net>
	<4E5A9B39.8090009@v.loewis.de>
	<CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
	<20110829122420.2d342f9c@resist.wooz.org>
	<j3gg94$rg1$1@dough.gmane.org>
Message-ID: <20110829145902.4774d0fc@resist.wooz.org>

On Aug 29, 2011, at 06:55 PM, Stefan Behnel wrote:

>These things tend to get somewhat clumsy over time, though. What about a
>stdlib change that only applies to CPython for some reason, e.g. because no
>other implementation currently has that module?  I think it's ok to make a
>coarse-grained distinction by numbers, but there should also be a way to tag
>PEPs textually.

Yeah, the categories would be pretty coarse grained, and their orthogonality
would cause classification problems.  I suppose we could use some kind of
hashtag approach.  OTOH, I'm not entirely sure it's worth it either. ;)

I think we'd need a concrete proposal and someone willing to hack the PEP0
autogen tools.

-Barry


From barry at python.org  Mon Aug 29 21:00:07 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 29 Aug 2011 15:00:07 -0400
Subject: [Python-Dev] PEP categories (was Re: PEP 393 review)
In-Reply-To: <20110829184037.594359f0@pitrou.net>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de> <20110825132734.1c236d17@pitrou.net>
	<4E5A9B39.8090009@v.loewis.de>
	<CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
	<20110829122420.2d342f9c@resist.wooz.org>
	<CAKmKYaBUdxystWO3mL2z-0m17GZqKRvy4kCbQQWE-pcCCOOAFw@mail.gmail.com>
	<20110829184037.594359f0@pitrou.net>
Message-ID: <20110829150007.72460089@resist.wooz.org>

On Aug 29, 2011, at 06:40 PM, Antoine Pitrou wrote:

>I like the 3k numbers myself :))

Me too. :) But I think we've pretty much abandoned that convention for any new
PEPs.  Well, until Guido announces Python 4k. :)

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110829/599b9379/attachment.pgp>

From nadeem.vawda at gmail.com  Mon Aug 29 21:04:38 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Mon, 29 Aug 2011 21:04:38 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <20110829083029.68faa57b@resist.wooz.org>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<CANF4RMmKLXbmE04NBi+s9n+-GtXhWS-1bv3TCDL8XzBL4NeVXQ@mail.gmail.com>
	<20110829083029.68faa57b@resist.wooz.org>
Message-ID: <CANF4RMmysZest=cYG0rbPBL1YB24hh8ttV5gnxAP-QMknzoFwA@mail.gmail.com>

I've updated the issue <http://bugs.python.org/issue6715> with a patch
containing my work so far - the LZMACompressor and LZMADecompressor classes,
along with some tests. These two classes should provide a fairly complete
interface to liblzma; it will be possible to implement LZMAFile on top of them,
entirely in Python. Note that the C code does no I/O; this will be handled by
LZMAFile.

Please take a look, and let me know what you think.

Cheers,
Nadeem

From martin at v.loewis.de  Mon Aug 29 21:20:53 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 29 Aug 2011 21:20:53 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de> <20110825132734.1c236d17@pitrou.net>
	<4E5A9B39.8090009@v.loewis.de>
	<CAKmKYaDrHRDf+xZYjf0HdQSiAjQ+209m3y5TLv0yLSFxXOO3Mw@mail.gmail.com>
Message-ID: <4E5BE695.2070203@v.loewis.de>

Am 29.08.2011 11:03, schrieb Dirkjan Ochtman:
> On Sun, Aug 28, 2011 at 21:47, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>>  result strings. In PEP 393, a buffer must be scanned for the
>>  highest code point, which means that each byte must be inspected
>>  twice (a second time when the copying occurs).
> 
> This may be a silly question: are there things in place to optimize
> this for the case where two strings are combined? E.g. highest
> character in combined string is max(highest character in either of the
> strings).

Unicode_Concat goes like this

    maxchar = PyUnicode_MAX_CHAR_VALUE(u);
    if (PyUnicode_MAX_CHAR_VALUE(v) > maxchar)
        maxchar = PyUnicode_MAX_CHAR_VALUE(v);

    /* Concat the two Unicode strings */
    w = (PyUnicodeObject *) PyUnicode_New(
                            PyUnicode_GET_LENGTH(u) +
PyUnicode_GET_LENGTH(v),
                            maxchar);
    if (w == NULL)
        goto onError;
    PyUnicode_CopyCharacters(w, 0, u, 0, PyUnicode_GET_LENGTH(u));
    PyUnicode_CopyCharacters(w, PyUnicode_GET_LENGTH(u), v, 0,
                             PyUnicode_GET_LENGTH(v));

> Also, this PEP makes me wonder if there should be a way to distinguish
> between language PEPs and (CPython) implementation PEPs, by adding a
> tag or using the PEP number ranges somehow.

Well, no. This would equally apply to every single patch, and is just
not feasible. Instead, alternative implementations typically target a
CPython version, and then find out what features they need to implement
to claim conformance.

Regards,
Martin

From tjreedy at udel.edu  Mon Aug 29 21:24:12 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 29 Aug 2011 15:24:12 -0400
Subject: [Python-Dev] Should we move to replace re with regex?
In-Reply-To: <20110829090056.03f719ad@resist.wooz.org>
References: <CAP7+vJL1WVfXd_DtktM+X3YEj6t=A9DMukKqFvgRk=9z=CHBOA@mail.gmail.com>
	<4E582432.2080301@v.loewis.de>
	<CAP7+vJK7Rahw9UWCqNVxgFOrqAxekXoOXSc3HzqKSsTqR3zD6w@mail.gmail.com>
	<CACBhJdFFbzZ065Vjnj432dNDbzvn_OS76Z87SyzObmYCGCczeQ@mail.gmail.com>
	<4E588877.3080204@v.loewis.de> <20110827121012.37b39947@pitrou.net>
	<4E59255E.6000905@v.loewis.de>
	<20110829090056.03f719ad@resist.wooz.org>
Message-ID: <j3gp25$n2r$1@dough.gmane.org>

On 8/29/2011 9:00 AM, Barry Warsaw wrote:
> On Aug 27, 2011, at 07:11 PM, Martin v. L?wis wrote:
>
>> A PEP should IMO only cover end-user aspects of the new re module.
>> Code organization is typically not in the PEP. To give a specific
>> example: you mentioned that there is (near) code duplication
>> MRAB's module. As a reviewer, I would discuss whether this can be
>> eliminated - but not in the PEP.
>
> +1

I think at this point we need a tracker issue to which can be attached 
such reviews, for safe-keeping, even if most discussion continues here.

-- 
Terry Jan Reedy



From ndbecker2 at gmail.com  Mon Aug 29 21:28:23 2011
From: ndbecker2 at gmail.com (Neal Becker)
Date: Mon, 29 Aug 2011 15:28:23 -0400
Subject: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)
References: <mailman.2419.1314608606.27777.python-dev@python.org>
	<A0128757-0E04-4D97-B27E-DA4AC500FFB1@dabeaz.com>
	<CAF-Rda_16bGNCosOzFH0-tBOp6cTkKRPu7p+z1dQSiOrcL7NMQ@mail.gmail.com>
	<CAGY4rcV7aF=uAWqDMhX8QEnskWgXHbQXnWnJYwbh=muAeCD0jQ@mail.gmail.com>
Message-ID: <j3gp8o$sia$1@dough.gmane.org>

Then there is gccxml, although I'm not sure how active it is now.


From martin at v.loewis.de  Mon Aug 29 21:34:48 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 29 Aug 2011 21:34:48 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5B5364.9040100@haypocalc.com>
References: <4E553FBC.7080501@v.loewis.de>	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>	<20110825132734.1c236d17@pitrou.net>	<4E5A9B39.8090009@v.loewis.de>	<1314561666.3656.3.camel@localhost.localdomain>	<4E5AADDA.5090206@v.loewis.de>
	<4E5B5364.9040100@haypocalc.com>
Message-ID: <4E5BE9D8.5050309@v.loewis.de>

>> Those haven't been ported to the new API, yet. Consider, for example,
>> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test;
>> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this
>> is a 25% speedup for PEP 393.
> 
> If I understand correctly, the performance now highly depend on the used
> characters? A pure ASCII string is faster than a string with characters
> in the ISO-8859-1 charset?

How did you infer that from above paragraph??? ASCII and Latin-1 are
mostly identical in terms of performance - the ASCII decoder should be
slightly slower than the Latin-1 decoder, since the ASCII decoder needs
to check for errors, whereas the Latin-1 decoder will never be
confronted with errors.

What matters is
a) is the codec already rewritten to use the new representation, or
   must it go through Py_UNICODE[] first, requiring then a second copy
   to the canonical form?
b) what is the cost of finding out the highest character? - regardless
   of what the highest character turns out to be

> Is it also true for BMP characters vs non-BMP
> characters?

Well... If you are talking about the ASCII and Latin-1 codecs - neither
of these support most BMP characters, let alone non-BMP characters.
In general, non-BMP characters are more expensive to process since they
take more space.

> Do these benchmark tools use only ASCII characters, or also some
> ISO-8859-1 characters?

See for yourself. iobench uses Latin-1, including non-ASCII, but not
non-Latin-1.

> Or, better, different Unicode ranges in different tests?

That's why I asked for a list of benchmarks to perform. I cannot
run an infinite number of benchmarks prior to adoption of the PEP.

Regards,
Martin

From nir at winpdb.org  Mon Aug 29 21:41:27 2011
From: nir at winpdb.org (Nir Aides)
Date: Mon, 29 Aug 2011 22:41:27 +0300
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <CACQrdO=wfH5eNkSBUd0okyGnA69xV5zA9fx=Gify_Put69RvDg@mail.gmail.com>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<CAH_1eM00ZjntSdj6r99J1RLjVvdafNJ3wtMt-jST+ChJ+=nW=w@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
	<A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
	<CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>
	<CAH_1eM1jGaf1TirSLv6GyY8bKbUJnezU1+qBNANHKA5G=NfnNA@mail.gmail.com>
	<CACQrdOni_HGkd7AJGSU4HtAbfTaYWGEZbK1tBNpKUMiJywRYxg@mail.gmail.com>
	<20110829191608.7916da73@pitrou.net>
	<CACQrdOmfWKRENi=K46VdHTE3pcB09ciQoXf=DjskEnswz+XZ-A@mail.gmail.com>
	<1314638573.3551.14.camel@localhost.localdomain>
	<CACQrdO=wfH5eNkSBUd0okyGnA69xV5zA9fx=Gify_Put69RvDg@mail.gmail.com>
Message-ID: <CAEd-RNqrAjWoW+V2NQb4rdYn52zCMPnEFQYG28pr_KhhCVfHqA@mail.gmail.com>

On Mon, Aug 29, 2011 at 8:42 PM, Jesse Noller <jnoller at gmail.com> wrote:
> On Mon, Aug 29, 2011 at 1:22 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>> That sanitization is generally useful, though. For example if you want
>> to use any I/O after a fork().
>
> Oh! I don't disagree; I'm just against the removal of the ability to
> mix multiprocessing and threads; which it does internally and others
> do in every day code.

I am not familiar with the python-dev definition for deprecation, but
when I used the word in the bug discussion I meant to advertize to
users that they should not mix threading and forking since that mix is
and will remain broken by design; I did not mean removal or crippling
of functionality.

?When I use a word,? Humpty Dumpty said, in rather a scornful tone,
?it means just what I choose it to mean?neither more nor less.? -
Through the Looking-Glass

(btw, my tone is not scornful)

And there is no way around it - the mix in general is broken, with an
atfork mechanism or without it.
People can choose to keep doing it in their every day code at their
own risk, be it significantly high or insignificantly low.
But the documentation should explain the problem clearly.

As for the internal use of threads in the multiprocessing module I
proposed a potential way to "sanitize" those particular worker
threads:
http://bugs.python.org/issue6721#msg140402

If it makes sense and entails changes to internal multiprocessing
worker threads, those changes could be applied as bug fixes to Python
2.x and previous Python 3.x releases.

This does not contradict adding now the feature to spawn, and to make
it the only possibility in the future. I agree that this is the
"saner" approach but it is a new feature not a bug fix.

Nir

From martin at v.loewis.de  Mon Aug 29 22:32:01 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 29 Aug 2011 22:32:01 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <CAP7+vJ+fYXLcqG_rHkNtg5yuZKJTj2nMxFaLczr+zqeKUBtY8A@mail.gmail.com>
References: <4E553FBC.7080501@v.loewis.de>
	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>	<CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>	<4E577589.4030809@v.loewis.de>
	<CAP7+vJ+fYXLcqG_rHkNtg5yuZKJTj2nMxFaLczr+zqeKUBtY8A@mail.gmail.com>
Message-ID: <4E5BF741.50209@v.loewis.de>

tl;dr: PEP-393 reduces the memory usage for strings of a very small
Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB.

Am 26.08.2011 16:55, schrieb Guido van Rossum:
> It would be nice if someone wrote a test to roughly verify these
> numbers, e.v. by allocating lots of strings of a certain size and
> measuring the process size before and after (being careful to adjust
> for the list or other data structure required to keep those objects
> alive).

I have now written a Django application to measure the effect of PEP
393, using the debug mode (to find all strings), and sys.getsizeof:

https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py

The results for 3.3 and pep-393 are attached.

The Django app is small in every respect: trivial ORM, very few
objects (just for the sake of exercising the ORM at all),
no templating, short strings. The memory snapshot is taken in
the middle of a request.

The tests were run on a 64-bit Linux system with 32-bit Py_UNICODE.

The tally of strings by length confirms that both tests have indeed
comparable sets of objects (not surprising since it is identical Django
source code and the identical application). Most strings in this
benchmark are shorter than 16 characters, and a few have several
thousand characters. The tally of byte lengths shows that it's the
really long memory blocks that are gone with the PEP.

Digging into the internal representation, it's possibly to estimate
"unaccounted" bytes. For PEP 393:

   bytes - 80*strings - (chars+strings) = 190053

This is the total of the wchar_t and UTF-8 representations for objects
that have them, plus any 2-byte and four-byte strings accounted
incorrectly in above formula. Unfortunately, for "default"

   bytes + 56*strings - 4*(chars+strings) = 0

as unicode__sizeof__ doesn't account for the (separate) PyBytes
object that may carry the default encoding. So in practice, the 3.3
number should be somewhat larger.

In both cases, the app didn't cope for internal fragmentation;
this would be possible by rounding up each string size to the next
multiple of 8 (given that it's all allocated through the object
allocator).

It should be possible to squeeze a little bit out of the 190kB,
by finding objects for which the wchar_t or UTF-8 representations
are created unnecessarily.

Regards,
Martin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 3k.txt
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110829/6c37b94c/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 393.txt
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110829/6c37b94c/attachment-0001.txt>

From martin at v.loewis.de  Mon Aug 29 22:43:35 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 29 Aug 2011 22:43:35 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
Message-ID: <4E5BF9F7.9020608@v.loewis.de>

> So, the two big issues aside, is there any interest in incorporating
> these optimizations in Python 3?

The question really is whether this is an all-or-nothing deal. If you
could identify smaller parts that can be applied independently, interest
would be higher.

Also, I'd be curious whether your techniques help or hinder a potential
integration of a JIT generator.

Regards,
Martin

From mal at egenix.com  Mon Aug 29 22:54:27 2011
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 29 Aug 2011 22:54:27 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5BF741.50209@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de>	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>	<CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>	<4E577589.4030809@v.loewis.de>	<CAP7+vJ+fYXLcqG_rHkNtg5yuZKJTj2nMxFaLczr+zqeKUBtY8A@mail.gmail.com>
	<4E5BF741.50209@v.loewis.de>
Message-ID: <4E5BFC83.2020304@egenix.com>

"Martin v. L?wis" wrote:
> tl;dr: PEP-393 reduces the memory usage for strings of a very small
> Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB.
> 
> Am 26.08.2011 16:55, schrieb Guido van Rossum:
>> It would be nice if someone wrote a test to roughly verify these
>> numbers, e.v. by allocating lots of strings of a certain size and
>> measuring the process size before and after (being careful to adjust
>> for the list or other data structure required to keep those objects
>> alive).
> 
> I have now written a Django application to measure the effect of PEP
> 393, using the debug mode (to find all strings), and sys.getsizeof:
> 
> https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py
> 
> The results for 3.3 and pep-393 are attached.
> 
> The Django app is small in every respect: trivial ORM, very few
> objects (just for the sake of exercising the ORM at all),
> no templating, short strings. The memory snapshot is taken in
> the middle of a request.
> 
> The tests were run on a 64-bit Linux system with 32-bit Py_UNICODE.

For comparison, could you run the test of the unmodified
Python 3.3 on a 16-bit Py_UNICODE version as well ?

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 29 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                36 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From solipsis at pitrou.net  Mon Aug 29 22:54:13 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 29 Aug 2011 22:54:13 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5BF741.50209@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de>
	<CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>
	<4E577589.4030809@v.loewis.de>
	<CAP7+vJ+fYXLcqG_rHkNtg5yuZKJTj2nMxFaLczr+zqeKUBtY8A@mail.gmail.com>
	<4E5BF741.50209@v.loewis.de>
Message-ID: <20110829225413.689d073c@pitrou.net>

On Mon, 29 Aug 2011 22:32:01 +0200
"Martin v. L?wis" <martin at v.loewis.de> wrote:
> I have now written a Django application to measure the effect of PEP
> 393, using the debug mode (to find all strings), and sys.getsizeof:
> 
> https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py
> 
> The results for 3.3 and pep-393 are attached.

This looks very nice. Is 3.3 a wide build? (how about a narrow build?)

(is it with your own port of Django to py3k, or is there an official
branch for it?)

Regards

Antoine.

From s.brunthaler at uci.edu  Mon Aug 29 23:05:20 2011
From: s.brunthaler at uci.edu (stefan brunthaler)
Date: Mon, 29 Aug 2011 14:05:20 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <4E5BF9F7.9020608@v.loewis.de>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<4E5BF9F7.9020608@v.loewis.de>
Message-ID: <CA+j1x0=T0wHary2800Yxju1GRcy5VkyxXtEatbY-AAXJrG1p4g@mail.gmail.com>

> The question really is whether this is an all-or-nothing deal. If you
> could identify smaller parts that can be applied independently, interest
> would be higher.
>
Well, it's not an all-or-nothing deal. In my current architecture, I
can selectively enable most of the optimizations as I see fit. The
only pre-requisite (in my implementation) is that I have two dispatch
loops with a changed instruction format. It is, however, not a
technical necessity, just the way I implemented it. Basically, you can
choose whatever you like best, and I could extract that part. I am
just offering to add all the things that I have done :)


> Also, I'd be curious whether your techniques help or hinder a potential
> integration of a JIT generator.
>
This is something I have previously frequently discussed with several
JIT people. IMHO, having my optimizations in-place also helps a JIT
compiler, since it can re-use the information I gathered to generate
more aggressively optimized native machine code right away (the inline
caches can be generated with the type information right away, some
functions could be inlined with the guard statements subsumed, etc.)
Another benefit could be that the JIT compiler can spend longer time
on generating code, because the interpreter is already faster (so in
some cases it would probably not make sense to include a
non-optimizing fast and simple JIT compiler).
There are others on the list, who probably can/want to comment on this, too.

That aside, I think that while having a JIT is an important goal, I
can very well imagine scenarios where the additional memory
consumption (for the generated native machine code) of a JIT for each
process (I assume that the native machine code caches are not shared)
hinders scalability. I have in fact no data to back this up, but I
think that would be an interesting trade off, say if I have 30% gain
in performance without substantial additional memory requirements on
my existing hardware, compared to higher achievable speedups that
require more machines, though.


Regards,
--stefan

From solipsis at pitrou.net  Mon Aug 29 23:14:20 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 29 Aug 2011 23:14:20 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
Message-ID: <20110829231420.20c3516a@pitrou.net>

On Mon, 29 Aug 2011 11:33:14 -0700
stefan brunthaler <s.brunthaler at uci.edu> wrote:
> * The optimized dispatch routine has a changed instruction format
> (word-sized instead of bytecodes) that allows for regular instruction
> decoding (without the HAS_ARG-check) and inlinind of some objects in
> the instruction format on 64bit architectures.

Having a word-sized "bytecode" format would probably be acceptable in
itself, so if you want to submit a patch for that, go ahead.

Regards

Antoine.



From greg.ewing at canterbury.ac.nz  Mon Aug 29 23:17:24 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 30 Aug 2011 09:17:24 +1200
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
Message-ID: <4E5C01E4.2050106@canterbury.ac.nz>

Guido van Rossum wrote:
> (Just like Python's own .h files --
> e.g. the extensive renaming of the Unicode APIs depending on
> narrow/wide build) How does Cython deal with these?

Pyrex/Cython deal with it by generating C code that includes
the relevant headers, so the C compiler expands all the
macros, interprets the struct declarations, etc. All you
need to do when writing the .pyx file is follow the same
API that you would if you were writing C code to use the
library.

-- 
Greg

From barry at python.org  Mon Aug 29 23:18:33 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 29 Aug 2011 17:18:33 -0400
Subject: [Python-Dev] PEP 3151 from the BDFOP
In-Reply-To: <20110824015756.51cdceac@pitrou.net>
References: <20110823170357.3b3ab2fc@resist.wooz.org>
	<20110824015756.51cdceac@pitrou.net>
Message-ID: <20110829171833.5e0cc40d@resist.wooz.org>

On Aug 24, 2011, at 01:57 AM, Antoine Pitrou wrote:

>> One guiding principle for me is that we should keep the abstraction as thin
>> as possible.  In particular, I'm concerned about mapping multiple errnos
>> into a single Error.  For example both EPIPE and ESHUTDOWN mapping to
>> BrokePipeError, or EACESS or EPERM to PermissionError.  I think we should
>> resist this, so that one errno maps to exactly one Error.  Where grouping
>> is desired, Python already has mechanisms to deal with that,
>> e.g. superclasses and multiple inheritance.  Therefore, I think it would be
>> better to have
>> 
>> + FileSystemPermissionError
>>   + AccessError (EACCES)
>>   + PermissionError (EPERM)
>
>I'm not sure that's a good idea:

Was it the specific grouping under FileSystemPermissionError that you're
objecting to, or the "keep the abstraction thin" principle?  Let's say we
threw out the idea of FSPE superclass, would you still want to collapse EACCES
and EPERM into PermissionError, or would separate exceptions for each be okay?
It's still pretty easy to catch both in one except clause, and it won't be too
annoying if it's rare.

>Yes, FileSystemError might be removed. I thought that it would be
>useful, in some library routines, to catch all filesystem-related
>errors indistinctly, but it's not a complete catchall actually (for
>example, AccessError is outside of the FileSystemError subtree).

Reading your IRC message (sorry, I was afk) it sounds like you think
FileSystemError can be removed.  I like keeping the hierarchy flat.

>> Similarly, I think it would be helpful to have the errno name (e.g. ENOENT)
>> in the error message string.  That way, it won't get in the way for most
>> code, but would be usefully printed out for uncaught exceptions.
>
>Agreed, but I think that's a feature request quite orthogonal from the
>PEP. The errno *number* is still printed as it was before:
>
>>>> open("foo")
>Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>FileNotFoundError: [Errno 2] No such file or directory: 'foo'
>
>(see e.g. http://bugs.python.org/issue12762)

True, but since you're going to be creating a bunch of new exception classes,
it should be relatively painless to give them a better str.  Thanks for
pointing out that bug; I agree with it.

>> A second guiding principle should be that careful code that works in Python
>> 3.2 must continue to work in Python 3.3 once PEP 3151 is accepted, but also
>> for Python 2 code ported straight to Python 3.3.
>
>I don't porting straight to 3.3 would make a difference, especially now
>that the idea of deprecating old exception names has been abandoned.

Cool.

>> Do be prepared for complaints about compatibility for careless code though
>> - there's a ton of that out in the wild, and people will always complain
>> with their "working" code breaks due to an upgrade.  Be *very* explicit
>> about this in the release notes and NEWS file, and put your asbestos
>> underoos on.
>
>I'll take care about that :)

:)

>> Have you considered the impact of this PEP on other Python implementations?
>> My hazy memory of Jython tells me that errnos don't really leak into Java
>> and thus Jython much, but what about PyPy and IronPython?  E.g. step 1's
>> deprecation strategy seems pretty CPython-centric.
>
>Alternative implementations already have to implement errno codes in a
>way or another if they want to have a chance of running existing code.
>So I don't think the PEP makes much of a difference for them.
>But their implementors can give their opinion on this.

Let's give them a little more time to chime in (hopefully, they are reading
this thread).  We needn't wait too long though.

>> As for step 1 (coalescing the errors).  This makes sense and I'm generally
>> agreeable, but I'm wondering whether it's best to re-use IOError for this
>> rather than introduce a new exception.  Not that I can think of a good name
>> for that.  I'm just not totally convinced that existing code when upgrading
>> to Python 3.3 won't introduce silent failures.  If an existing error is to
>> be re-used for this, I'm torn on whether IOError or OSError is a better
>> choice.  Popularity aside, OSError *feels* more right.
>
>I don't have any personal preference. Previous discussions seemed to
>indicate people preferred IOError. But changing the implementation to
>OSError would be simple. I agree OSError feels slightly more right, as
>in more generic.

Thanks for making this change in the PEP.

>> And that anything raising an exception (e.g. via PyErr_SetFromErrno) other
>> than the new ones will raise IOError?
>
>I'm not sure I understand the question precisely.

My question mostly was about raising OSError (as the current PEP states) with
an errno that does *not* map to one of the new exceptions.  In that case, I
don't think there's anything you could raise other than exactly OSError,
right?

>The errno mapping mechanism is implemented in IOError.__new__, but it gets
>called only if the class is exactly IOError, not a subclass:
>
>>>> IOError(errno.EPERM, "foo")
>PermissionError(1, 'foo')
>>>> class MyIOError(IOError): pass
>... 
>>>> MyIOError(errno.EPERM, "foo")
>MyIOError(1, 'foo')
>
>Using IOError.__new__ is the easiest way to ensure that all code
>raising IO errors takes advantage of the errno mapping. Otherwise you
>may get APIs raising the proper subclasses, and other APIs always
>raising base IOError (it doesn't happen often, but some Python
>library code raises an IOError with an explicit errno).
>
>> I also think that rather than transforming exception when raised from
>> Python, i.e. via __new__ hackery, perhaps it should be a ValueError in its
>> own right to raise IOError with an error represented by one of the
>> subclasses.
>
>That would make it harder to keep compatibility while adding new
>subclasses in future Python versions. Imagine a lot of people lobby for
>a dedicated EBADF subclass and obtain it, then IOError(EBADF, "some
>message") would suddenly raise a ValueError. Or do I misunderstand your
>proposal?

Somewhat.  FWIW, this is the part that I'm most uncomfortable with.

So, for raising OSError with an errno mapping to one of the subclasses, it
appears to break the "explicit is better than implicit" principle, and I think
it could lead to hard-to-debug or understand code.  You'll look at code that
raises OSError, but the exception that gets printed will be one of the
subclasses.  I'm afraid that if you don't know that this is happening, you're
going to think you're going crazy.

The other half is, let's say raising FileNotFoundError with the EEXIST errno.
I'm guessing that the __init__'s for the new OSError subclasses will not have
an `errno` attribute, so there's no way you can do that, but the PEP does not
discuss this.  It probably should.

>> I found more examples of ECHILD and ESRCH than the
>> former two.  How'd you like to add those two to make your BDFOP happy? :)
>
>Wow, I didn't know ESRCH.
>How would you call the respective exceptions?
>- ChildProcessError for ECHILD?

The Linux wait(2) manpage says:

       ECHILD (for wait()) The calling process does not have any unwaited-for
              children.

       ECHILD (for waitpid() or waitid()) The process specified by pid (wait?
              pid())  or  idtype and id (waitid()) does not exist or is not a
              child of the calling process.  (This can happen for  one's  own
              child  if  the  action for SIGCHLD is set to SIG_IGN.  See also
              the Linux Notes section about threads.)

>- ProcessLookupError for ESRCH?

The Linux kill(2) manpage says:

       ESRCH  The pid or process group does not exist.  Note that an existing
              process might be a zombie, a process  which  already  committed
              termination, but has not yet been wait(2)ed for.

So in a sense, both are lookup errors, though I think it's going too far to
multiply inherit from LookupError.  Maybe ChildWaitError or ChildLookupError
for the former?  ProcessLookupError seems good to me.

>> What if all the errno symbolic names were mapped as attributes on IOError?
>> The only advantage of that would be to eliminate the need to import errno,
>> or for the ugly `e.errno == errno.ENOENT` stuff.  That would then be
>> rewritten as `e.errno == IOError.ENOENT`.  A mild savings to be sure, but
>> still.
>
>Hmm, I guess that's explorable as an orthogonal idea.

Cool.  How should we capture that?

>> How dumb/useless/unworkable would it be to add an __future__ to switch from
>> the old hierarchy to the new one?  Probably pretty. ;)
>
>Well, the hierarchy is built-in, since it's about standard exceptions.
>Also, you usually get the exception from some library API, so a
>__future__ in your own module would not achieve much.
>
>> What about an api that applications/libraries could use to add additional
>> exceptions based on other errnos they cared about?  This could be consulted in
>> PyErr_SetFromErrno() and raised instead of IOError.  Okay, yeah, that's
>> probably pretty dumb too.
>
>The problem is that behaviour becomes inconsistent accross libraries.
>I'm not sure that's very helpful to the user.

Yeah, on further reflection, let's forget those last two ideas. ;)

Okay, so here's what's still outstanding for me:

* Should we eliminate FileSystemError? (probably "yes")
* Should we ensure one errno == one exception?
  - i.e. separate EACCES and EPERM
  - i.e. separate EPIPE and ESHUTDOWN
* Should the str of the new exception subclasses be improved (e.g. to include
  the symbolic name instead of the errno first)?
* Is the OSError.__new__() hackery a good idea?
* Should the PEP define the signature of the new exceptions (e.g. to prohibit
  passing in an incorrect errno to an OSError subclass)?
* Can we add ECHILD and ESRCH, and if so, what names should we use?
* Where can we capture the idea of putting the symbolic names on OSError class
  attributes, or is it a dumb idea that should be ditched?
* How long should we wait for other Python implementations to chime in?

Cheers,
-Barry

From barry at python.org  Mon Aug 29 23:21:05 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 29 Aug 2011 17:21:05 -0400
Subject: [Python-Dev] PEP 3151 from the BDFOP
In-Reply-To: <CADiSq7dB_6Vb-=tENjVtGD2RcvkmM6wp2dcg6V6FY5wUWfjf=Q@mail.gmail.com>
References: <20110823170357.3b3ab2fc@resist.wooz.org>
	<20110824015756.51cdceac@pitrou.net>
	<CADiSq7dB_6Vb-=tENjVtGD2RcvkmM6wp2dcg6V6FY5wUWfjf=Q@mail.gmail.com>
Message-ID: <20110829172105.6812cadd@resist.wooz.org>

On Aug 24, 2011, at 12:51 PM, Nick Coghlan wrote:

>On Wed, Aug 24, 2011 at 9:57 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Using IOError.__new__ is the easiest way to ensure that all code
>> raising IO errors takes advantage of the errno mapping. Otherwise you
>> may get APIs raising the proper subclasses, and other APIs always
>> raising base IOError (it doesn't happen often, but some Python
>> library code raises an IOError with an explicit errno).
>
>It's also the natural place to put the errno->exception type mapping
>so that existing code will raise the new errors without requiring
>modification. We could spell it as a new class method ("from_errno" or
>similar), but there isn't any ambiguity in doing it directly in
>__new__, so a class method seems pointlessly inconvenient.

As I mentioned, my main concern with this is the surprise factor for people
debugging and reading the code.  A class method would solve that, but looks
uglier and doesn't work with existing code.

-Barry


From guido at python.org  Mon Aug 29 23:20:49 2011
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Aug 2011 14:20:49 -0700
Subject: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)
In-Reply-To: <A0128757-0E04-4D97-B27E-DA4AC500FFB1@dabeaz.com>
References: <mailman.2419.1314608606.27777.python-dev@python.org>
	<A0128757-0E04-4D97-B27E-DA4AC500FFB1@dabeaz.com>
Message-ID: <CAP7+vJ+K_CmQvu-m9nk7O9GZO4HvQ3+0ApjhLzcbg0yWm52iVg@mail.gmail.com>

Thanks for an insightful post, Dave! I took the liberty of mentioning
it on Google+:

https://plus.google.com/115212051037621986145/posts/NyEiLEfR6HF

(PS. Anyone wanting a G+ invite, go here:
https://plus.google.com/i/7w3niYersIA:8fxDrfW-6TA )

--Guido

On Mon, Aug 29, 2011 at 5:41 AM, David Beazley <dave at dabeaz.com> wrote:
> On Mon, Aug 29, 2011 at 12:27 PM, Guido van Rossum <guido at python.org> wrote:
>
>> I wonder if for
>> this particular purpose SWIG isn't the better match. (If SWIG weren't
>> universally hated, even by its original author. :-)
>
> Hate is probably a strong word, but as the author of Swig, let me chime in here ;-). ? I think there are probably some lessons to be learned from Swig.
>
> As Nick noted, Swig is best suited when you have control over both sides (C/C++ and Python) of whatever code you're working with. ?In fact, the original motivation for ?Swig was to give application programmers (scientists in my case), a means for automatically generating the Python bindings to their code. ?However, there was one other important assumption--and that was the fact that all of your "real code" was going to be written in C/C++ and that the Python scripting interface was just an optional add-on (perhaps even just a throw-away thing). ?Keep in mind, Swig was first created in 1995 and at that time, the use of Python (or any similar language) was a pretty radical idea in the sciences. ?Moreover, there was a lot of legacy code that people just weren't going to abandon. ?Thus, I always viewed Swig as a kind of transitional vehicle for getting people to use Python who might otherwise not even consider it. ? Getting back to Nick's point though, to really use Swig effectiv
> ?ely, it was always known that you might have to reorganize or refactor your C/C++ code to make it more Python friendly. ?However, due to the automatic wrapper generation, you didn't have to do it all at once. ?Basically your code could organically evolve and Swig would just keep up with whatever you were doing. ?In my projects, we'd usually just tuck Swig away in some Makefile somewhere and forget about it.
>
> One of the major complexities of Swig is the fact that it attempts to parse C/C++ header files. ? This very notion is actually a dangerous trap waiting for anyone who wants to wander into it. ?You might look at a header file and say, well how hard could it be to just grab a few definitions out of there? ? I'll just write a few regexs or come up with some simple hack for recognizing function definitions or something. ? Yes, you can do that, but you're immediately going to find that whatever approach you take starts to break down into horrible corner cases. ? Swig started out like this and quickly turned into a quagmire of esoteric bug reports. ?All sorts of problems with preprocessor macros, typedefs, missing headers, and other things. ?For awhile, I would get these bug reports that would go something like "I had this C++ class inside a namespace with an abstract method taking a typedef'd const reference to this smart pointer ..... and Swig broke." ? Hell, I can't even underst
> ?and the bug report let alone know how to fix it. ?Almost all of these bugs were due to the fact that Swig started out as a hack and didn't really have any kind of solid conceptual foundation for how it should be put together.
>
> If you flash forward a bit, from about 2001-2004 there was a very serious push to fix these kinds of issues. ?Although it was not a complete rewrite of Swig, there were a huge number of changes to how it worked during this time. ?Swig grew a fully compatible C++ preprocessor that fully supported macros ? A complete C++ type system was implemented including support for namespaces, templates, and even such things as template partial specialization. ?Swig evolved into a multi-pass compiler that was doing all sorts of global analysis of the interface. ? Just to give you an idea, Swig would do things such as automatically detect/wrap C++ smart pointers. ?It could wrap overloaded C++ methods/function. ?Also, if you had a C++ class with virtual methods, it would only make one Python wrapper function and then reuse across all wrapped subclasses.
>
> Under the covers of all of this, the implementation basically evolved into a sophisticated macro preprocessor coupled with a pattern matching engine built on top of the C++ type system. ? For example, you could write patterns that matched specific C++ types (the much hated "typemap" feature) and you could write patterns that matched entire C++ declarations. ?This whole pattern matching approach had a huge power if you knew what you were doing. ?For example, I had a graduate student working on adding "contracts" to Swig--something that was being funded by a NSF grant. ? It was cool and mind boggling all at once.
>
> In hindsight however, I think the complexity of Swig has exceeded anyone's ability to fully understand it (including my own). ?For example, to even make sense of what's happening, you have to have a pretty solid grasp of the C/C++ type system (easier said than done). ? Couple that with all sorts of crazy pattern matching, low-level code fragments, and a ton of macro definitions, your head will literally explode if you try to figure out what's happening. ? So far as I know, recent versions of Swig have even combined all of this type-pattern matching with regular expressions. ?I can't even fathom it.
>
> Sadly, my involvement was Swig was an unfortunate casualty of my academic career biting the dust. ?By 2005, I was so burned out of working on it and so sick of what I was doing, I quite literally put all of my computer stuff aside to go play in a band for a few years. ? After a few years, I came back to programming (obviously), but not to keep working on the same stuff. ? In particularly, I will die quite happy if I never have to look at another line of C++ code ever again. ?No, I would much rather fling my toddlers, ride my bike, play piano, or do just about anything than ever do that again. ? Although I still subscribe the Swig mailing lists and watch what's happening, I'm not active with it at the moment.
>
> I've sometimes thought it might be interesting to create a Swig replacement purely in Python. ?When I work on the PLY project, this is often what I think about. ? In that project, I've actually built a number of the parsing tools that would be useful in creating such a thing. ? The only catch is that when I start thinking along these lines, I usually reach a point where I say "nah, I'll just write the whole application in Python."
>
> Anyways, this is probably way more than anyone wants to know about Swig. ? Getting back to the original topic of using it to make standard library modules, I just don't know. ? I think you probably could have some success with an automatic code generator of some kind. ?I'm just not sure it should take the Swig approach of parsing C++ headers. ?I think you could do better.
>
> Cheers,
> Dave
>
> P.S. By the way, if people want to know a lot more about Swig internals, they should check out the PyCon 2008 presentation I gave about it. ?http://www.dabeaz.com/SwigMaster/
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)

From solipsis at pitrou.net  Mon Aug 29 23:39:33 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 29 Aug 2011 23:39:33 +0200
Subject: [Python-Dev] PEP 3151 from the BDFOP
References: <20110823170357.3b3ab2fc@resist.wooz.org>
	<20110824015756.51cdceac@pitrou.net>
	<20110829171833.5e0cc40d@resist.wooz.org>
Message-ID: <20110829233933.54c69a99@pitrou.net>

On Mon, 29 Aug 2011 17:18:33 -0400
Barry Warsaw <barry at python.org> wrote:
> On Aug 24, 2011, at 01:57 AM, Antoine Pitrou wrote:
> 
> >> One guiding principle for me is that we should keep the abstraction as thin
> >> as possible.  In particular, I'm concerned about mapping multiple errnos
> >> into a single Error.  For example both EPIPE and ESHUTDOWN mapping to
> >> BrokePipeError, or EACESS or EPERM to PermissionError.  I think we should
> >> resist this, so that one errno maps to exactly one Error.  Where grouping
> >> is desired, Python already has mechanisms to deal with that,
> >> e.g. superclasses and multiple inheritance.  Therefore, I think it would be
> >> better to have
> >> 
> >> + FileSystemPermissionError
> >>   + AccessError (EACCES)
> >>   + PermissionError (EPERM)
> >
> >I'm not sure that's a good idea:
> 
> Was it the specific grouping under FileSystemPermissionError that you're
> objecting to, or the "keep the abstraction thin" principle?

The former. EPERM is generally returned for things which aren't
filesystem-related.
(although I also think separating EACCES and EPERM is of little value
*in practice*)

>  Let's say we
> threw out the idea of FSPE superclass, would you still want to collapse EACCES
> and EPERM into PermissionError, or would separate exceptions for each be okay?

I have a preference for the former, but am not against the latter. I
just think that, given AccessError and PermissionError, most users
won't know up front which one they should care about.

> It's still pretty easy to catch both in one except clause, and it won't be too
> annoying if it's rare.

Indeed.

> Reading your IRC message (sorry, I was afk) it sounds like you think
> FileSystemError can be removed.  I like keeping the hierarchy flat.

Ok. It can be reintroduced later on.
(the main reason why I think it can be removed is that EACCES in itself
is often tied to filesystem access rights; so the EACCES exception
class would have to be a subclass of FileSystemError, while the EPERM
one should not :-))

> >>>> open("foo")
> >Traceback (most recent call last):
> >  File "<stdin>", line 1, in <module>
> >FileNotFoundError: [Errno 2] No such file or directory: 'foo'
> >
> >(see e.g. http://bugs.python.org/issue12762)
> 
> True, but since you're going to be creating a bunch of new exception classes,
> it should be relatively painless to give them a better str.  Thanks for
> pointing out that bug; I agree with it.

Well, the str right now is exactly the same as OSError's.

> My question mostly was about raising OSError (as the current PEP states) with
> an errno that does *not* map to one of the new exceptions.  In that case, I
> don't think there's anything you could raise other than exactly OSError,
> right?

And indeed, that's what the implementation does :)

> So, for raising OSError with an errno mapping to one of the subclasses, it
> appears to break the "explicit is better than implicit" principle, and I think
> it could lead to hard-to-debug or understand code.  You'll look at code that
> raises OSError, but the exception that gets printed will be one of the
> subclasses.  I'm afraid that if you don't know that this is happening, you're
> going to think you're going crazy.

Except that it only happens if you use a recognized errno. For example
if you do:

    >>> OSError(errno.ENOENT, "not found")
    FileNotFoundError(2, 'not found')

Not if you just pass a message (or anything else, actually):

    >>> OSError("some message")
    OSError('some message',)

But if you pass an explicit errno, then the subclass doesn't appear
that surprising, does it?

> The other half is, let's say raising FileNotFoundError with the EEXIST errno.
> I'm guessing that the __init__'s for the new OSError subclasses will not have
> an `errno` attribute, so there's no way you can do that, but the PEP does not
> discuss this. 

Actually, the __new__ and the __init__ are exactly the same as
OSError's:

>>> e = FileNotFoundError("some message")
>>> e.errno
>>> e = FileNotFoundError(errno.ENOENT, "some message")
>>> e.errno
2

> >Wow, I didn't know ESRCH.
> >How would you call the respective exceptions?
> >- ChildProcessError for ECHILD?
>
[...]
> 
> >- ProcessLookupError for ESRCH?
> 
[...]
> 
> So in a sense, both are lookup errors, though I think it's going too far to
> multiply inherit from LookupError.  Maybe ChildWaitError or ChildLookupError
> for the former?  ProcessLookupError seems good to me.

Ok.

> >> What if all the errno symbolic names were mapped as attributes on IOError?
> >> The only advantage of that would be to eliminate the need to import errno,
> >> or for the ugly `e.errno == errno.ENOENT` stuff.  That would then be
> >> rewritten as `e.errno == IOError.ENOENT`.  A mild savings to be sure, but
> >> still.
> >
> >Hmm, I guess that's explorable as an orthogonal idea.
> 
> Cool.  How should we capture that?

A separate PEP perhaps, or more appropriately (IMHO) a tracker entry,
since it's just about enriching the attributes of an existing type.
I think it's a bit weird to define a whole lot of constants on a
built-in type, though.

> Okay, so here's what's still outstanding for me:
> 
> * Should we eliminate FileSystemError? (probably "yes")

Ok.

> * Should we ensure one errno == one exception?
>   - i.e. separate EACCES and EPERM
>   - i.e. separate EPIPE and ESHUTDOWN

I think that's unhelpful (or downright confusing: what is,
intuitively, the difference between an "AccessError" and a
"PermissionError"?) to most users, and users to which it is helpful
already know how to access the errno.

> * Should the str of the new exception subclasses be improved (e.g. to include
>   the symbolic name instead of the errno first)?

As I said, I think it's orthogonal, but I would +1 on including the
symbolic name instead of the integer.

> * Is the OSError.__new__() hackery a good idea?

I think it is, since it also takes care about Python code raising
OSErrors, but YMMV.

> * Should the PEP define the signature of the new exceptions (e.g. to prohibit
>   passing in an incorrect errno to an OSError subclass)?

The OSError constructor, pre-PEP, is very laxist, and I took care to
keep it like that in the implementation. Apparently it's a feature to
help migrating old code.

> * Can we add ECHILD and ESRCH, and if so, what names should we use?

I think the suggested names are ok.

> * Where can we capture the idea of putting the symbolic names on OSError class
>   attributes, or is it a dumb idea that should be ditched?

I think it's a separate task altogether, although I'm in favour of it.

> * How long should we wait for other Python implementations to chime in?

A couple of weeks? I will soon leave on holiday until the end of
September anyway.

Regards

Antoine.



From victor.stinner at haypocalc.com  Mon Aug 29 23:57:36 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 29 Aug 2011 23:57:36 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
Message-ID: <201108292357.36628.victor.stinner@haypocalc.com>

Le lundi 29 ao?t 2011 19:35:14, stefan brunthaler a ?crit :
> pretty much a year ago I wrote about the optimizations I did for my
> PhD thesis that target the Python 3 series interpreters

Does it speed up Python? :-) Could you provide numbers (benchmarks)?

Victor


From victor.stinner at haypocalc.com  Tue Aug 30 00:20:46 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 30 Aug 2011 00:20:46 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5BE9D8.5050309@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de> <4E5B5364.9040100@haypocalc.com>
	<4E5BE9D8.5050309@v.loewis.de>
Message-ID: <201108300020.46140.victor.stinner@haypocalc.com>

Le lundi 29 ao?t 2011 21:34:48, vous avez ?crit :
> >> Those haven't been ported to the new API, yet. Consider, for example,
> >> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test;
> >> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this
> >> is a 25% speedup for PEP 393.
> > 
> > If I understand correctly, the performance now highly depend on the used
> > characters? A pure ASCII string is faster than a string with characters
> > in the ISO-8859-1 charset?
> 
> How did you infer that from above paragraph??? ASCII and Latin-1 are
> mostly identical in terms of performance - the ASCII decoder should be
> slightly slower than the Latin-1 decoder, since the ASCII decoder needs
> to check for errors, whereas the Latin-1 decoder will never be
> confronted with errors.

I don't compare ASCII and ISO-8859-1 decoders. I was asking if decoding b'abc' 
from ISO-8859-1 is faster than decoding b'ab\xff' from ISO-8859-1, and if yes: 
why?

Your patch replaces PyUnicode_New(size, 255) ...  memcpy(), by 
PyUnicode_FromUCS1(). I don't understand how it makes Python faster: 
PyUnicode_FromUCS1() does first scan the input string for the maximum code 
point.

I suppose that the main difference is that the ISO-8859-1 encoded string is 
stored as the UTF-8 encoded string (shared pointer) if all characters of the 
string are ASCII characters. In this case, encoding the string to UTF-8 
doesn't cost anything, we already have the result.

Am I correct?

Victor


From stefan at brunthaler.net  Tue Aug 30 00:23:10 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Mon, 29 Aug 2011 15:23:10 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <201108292357.36628.victor.stinner@haypocalc.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<201108292357.36628.victor.stinner@haypocalc.com>
Message-ID: <CA+j1x0nDJUyyZ=7wmTi3jHZArwpXtuqfT9yLDVCV-vhj602WUw@mail.gmail.com>

> Does it speed up Python? :-) Could you provide numbers (benchmarks)?
>
Yes, it does ;)

The maximum overall speedup I achieved was by a factor of 2.42 on my
i7-920 for the spectralnorm benchmark of the computer language
benchmark game.

Others from the same set are:
  binarytrees: 1.9257 (1.9891)
  fannkuch: 1.6509 (1.7264)
  fasta: 1.5446 (1.7161)
  mandelbrot: 2.0040 (2.1847)
  nbody: 1.6165 (1.7602)
  spectralnorm: 2.2538 (2.4176)
  ---
  overall: 1.8213 (1.9382)

(The first number is the combination of all optimizations, the one in
parentheses is with my last optimization [Interpreter Instruction
Scheduling] enabled, too.)

For a comparative real world benchmark I tested Martin von Loewis'
django port (there are not that many meaningful Python 3 real world
benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably
well, US got a speedup of 1.35 on this benchmark. I just checked that
pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures
seem to be not working currently or *really* fast...), but I cannot
tell directly how that relates to speedups (it just says "less is
better" and I did not quickly find an explanation).
Since I did this benchmark last year, I have spent more time
investigating this benchmark and found that I could do better, but I
would have to guess as to how much (An interesting aside though: on
this benchmark, the executable never grew on more than 5 megs of
memory usage, exactly like the vanilla Python 3 interpreter.)

hth,
--stefan

From meadori at gmail.com  Tue Aug 30 00:44:54 2011
From: meadori at gmail.com (Meador Inge)
Date: Mon, 29 Aug 2011 17:44:54 -0500
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <j3chv2$tvf$1@dough.gmane.org>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org>
Message-ID: <CAK1QoopNFDs=_O5A8XqVD6U=Yy6f+XyO5a6VaVXvzowywDRJ4A@mail.gmail.com>

On Sat, Aug 27, 2011 at 11:58 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> Dan, I once had the more or less the same opinion/question as you with
> regard to ctypes, but I now see at least 3 problems.
>
> 1) It seems hard to write it correctly. There are currently 47 open ctypes
> issues, with 9 being feature requests, leaving 38 behavior-related issues.
> Tom Heller has not been able to work on it since the beginning of 2010 and
> has formally withdrawn as maintainer. No one else that I know of has taken
> his place.

I am trying to work through getting these issues resolved.  The hard part so
far has been getting reviews and commits.  The follow patches are awaiting
review (the patch for issue 11241 has been accepted, just not applied):

1. http://bugs.python.org/issue9041
2. http://bugs.python.org/issue9651
3. http://bugs.python.org/issue11241

I am more than happy to keep working through these issues, but I need some
help getting the patches actually applied since I don't have commit rights.

-- 
# Meador

From ncoghlan at gmail.com  Tue Aug 30 01:47:10 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 30 Aug 2011 09:47:10 +1000
Subject: [Python-Dev] PEP 3151 from the BDFOP
In-Reply-To: <20110829171833.5e0cc40d@resist.wooz.org>
References: <20110823170357.3b3ab2fc@resist.wooz.org>
	<20110824015756.51cdceac@pitrou.net>
	<20110829171833.5e0cc40d@resist.wooz.org>
Message-ID: <CADiSq7fN8eShVRnzng5hUe7JGDbcDq5J32uKEkpcgfF5F7aW2Q@mail.gmail.com>

On Tue, Aug 30, 2011 at 7:18 AM, Barry Warsaw <barry at python.org> wrote:
> Okay, so here's what's still outstanding for me:
>
> * Should we eliminate FileSystemError? (probably "yes")

I've also been persuaded that this isn't a generally meaningful
categorisation, so +1 for dropping it. ConnectionError is worth
keeping, though.

> * Should we ensure one errno == one exception?
> ?- i.e. separate EACCES and EPERM
> ?- i.e. separate EPIPE and ESHUTDOWN

I think the concept of a 1:1 mapping is a complete non-starter, since
"OSError" is always going to map to multiple errnos (i.e. everything
that hasn't been assigned to a specific subclass). Maintaining the
class categorisation down to a certain level for ease of separate
handling is worthwhile, but below that point it's better to let people
realise that they need to understand the subtleties of the different
errno values.

> * Should the str of the new exception subclasses be improved (e.g. to include
> ?the symbolic name instead of the errno first)?

I'd say that's a distinct RFE on the tracker (since it applies
regardless of the acceptance or rejection of PEP 3151). Good idea in
principle, though.

> * Is the OSError.__new__() hackery a good idea?

I agree it's a little magical, but I also think the PEP becomes pretty
useless without it. If OSError.__new__ handles the mapping, then most
code (including C code) doesn't need to change - it will raise the new
subclasses automatically. If we demand that all exception *raising*
code be changed, then exception *catching* code will have a hard time
assuming that the new subclasses are going to be raised correctly
instead of a top level OSError.

To make that transition feasible, I think we *need* to make it as hard
as we can (if not impossible) to raise OSError instances with defined
errno values that *don't* conform to the new hierarchy so that 3.3+
exception catching code doesn't need to worry about things like ENOENT
being raised as OSError instead of FileNotFoundError. Only code that
also supports earlier versions should need to resort to inspecting the
errno values for the coarse distinctions that the PEP provides via the
new class hierarchy.

> * Should the PEP define the signature of the new exceptions (e.g. to prohibit
> ?passing in an incorrect errno to an OSError subclass)?

Unfortunately, I think the variations in errno details across
platforms mean that being too restrictive in this space would cause
more problems than it solves.

So it may be wiser to technically allow people to do silly things like
"raise FileNotFoundError(errno.EPIPE)" with the admonition not to
actually do that because it is obscure and confusing. "Consenting
adults", etc.

> * Can we add ECHILD and ESRCH, and if so, what names should we use?

+1 for ChildProcessError and ProcessLookupError (as peer exceptions on
the tier directly below OSError)

> * Where can we capture the idea of putting the symbolic names on OSError class
> ?attributes, or is it a dumb idea that should be ditched?

"Tracker RFE" for the former and "maybe" for the latter. With this
PEP, the need for direct inspection of errno values should be
significantly reduced in most code, so importing errno shouldn't be
necessary.

> * How long should we wait for other Python implementations to chime in?

"Until Antoine gets back from his holiday" sounds reasonable to me.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From guido at python.org  Tue Aug 30 01:48:11 2011
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Aug 2011 16:48:11 -0700
Subject: [Python-Dev] Ctypes and the stdlib
In-Reply-To: <j3fmo0$68f$1@dough.gmane.org>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
	<j3fmo0$68f$1@dough.gmane.org>
Message-ID: <CAP7+vJL+UoxgSV8s4haq0w+_AC__5iYC4sFusTDLE0BWNbhH1g@mail.gmail.com>

On Mon, Aug 29, 2011 at 2:39 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Guido van Rossum, 29.08.2011 04:27:
>> Hm, the main use that was proposed here for ctypes is to wrap existing
>> libraries (not to create nicer APIs, that can be done in pure Python
>> on top of this).
>
> The same applies to Cython, obviously. The main advantage of Cython over
> ctypes for this is that the Python-level wrapper code is also compiled into
> C, so whenever the need for a thicker wrapper arises in some part of the
> API, you don't loose any performance in intermediate layers.

Yes, this is a very nice advantage. The only advantage that I can
think of for ctypes is that it doesn't require a toolchain -- you can
just write the Python code and get going. With Cython you will always
have to invoke the Cython compiler. Another advantage may be that it
works *today* for PyPy -- I don't know the status of Cython for PyPy.

Also, (maybe this was answered before?), how well does Cython deal
with #include files (especially those you don't have control over,
like the ones typically required to use some lib<foo>.so safely on all
platforms)?

-- 
--Guido van Rossum (python.org/~guido)

From ncoghlan at gmail.com  Tue Aug 30 02:00:28 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 30 Aug 2011 10:00:28 +1000
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <20110829231420.20c3516a@pitrou.net>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
Message-ID: <CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>

On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 29 Aug 2011 11:33:14 -0700
> stefan brunthaler <s.brunthaler at uci.edu> wrote:
>> * The optimized dispatch routine has a changed instruction format
>> (word-sized instead of bytecodes) that allows for regular instruction
>> decoding (without the HAS_ARG-check) and inlinind of some objects in
>> the instruction format on 64bit architectures.
>
> Having a word-sized "bytecode" format would probably be acceptable in
> itself, so if you want to submit a patch for that, go ahead.

Although any such patch should discuss how it compares with Cesare's
work on wpython.

Personally, I *like* CPython fitting into the "simple-and-portable"
niche in the Python interpreter space. Armin Rigo made the judgment
years ago that CPython was a poor platform for serious optimisation
when he stopped working on Psyco and started PyPy instead, and I think
the contrasting fates of PyPy and Unladen Swallow have borne out that
opinion. Significantly increasing the complexity of CPython for
speed-ups that are dwarfed by those available through PyPy seems like
a poor trade-off to me.

At a bare minimum, I don't think any significant changes should be
made under the "it will be faster" justification until the bulk of the
real-world benchmark suite used for speed.pypy.org is available for
Python 3. (Wasn't there a GSoC project about that?)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From guido at python.org  Tue Aug 30 02:02:26 2011
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Aug 2011 17:02:26 -0700
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <4E5C01E4.2050106@canterbury.ac.nz>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
	<4E5C01E4.2050106@canterbury.ac.nz>
Message-ID: <CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>

On Mon, Aug 29, 2011 at 2:17 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>>
>> (Just like Python's own .h files --
>> e.g. the extensive renaming of the Unicode APIs depending on
>> narrow/wide build) How does Cython deal with these?
>
> Pyrex/Cython deal with it by generating C code that includes
> the relevant headers, so the C compiler expands all the
> macros, interprets the struct declarations, etc. All you
> need to do when writing the .pyx file is follow the same
> API that you would if you were writing C code to use the
> library.

Interesting. Then how does Pyrex/Cython typecheck your code at compile time?

-- 
--Guido van Rossum (python.org/~guido)

From stefan at brunthaler.net  Tue Aug 30 02:25:21 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Mon, 29 Aug 2011 17:25:21 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
Message-ID: <CA+j1x0kz0MP5TrW_AB6jGRSBTBy8xs2KiEc9P8YgxoauwUG6gg@mail.gmail.com>

> Personally, I *like* CPython fitting into the "simple-and-portable"
> niche in the Python interpreter space. Armin Rigo made the judgment
> years ago that CPython was a poor platform for serious optimisation
> when he stopped working on Psyco and started PyPy instead, and I think
> the contrasting fates of PyPy and Unladen Swallow have borne out that
> opinion. Significantly increasing the complexity of CPython for
> speed-ups that are dwarfed by those available through PyPy seems like
> a poor trade-off to me.
>
I agree with the trade-off, but the nice thing is that CPython's
interpreter remains simple and portable using my optimizations. All of
these optimizations are purely interpretative and the complexity of
CPython is not affected much. (For example, I have an inline-cached
version of BINARY_ADD that is called INCA_FLOAT_ADD [INCA being my
abbreviation for INline CAching]; you don't actually have to look at
its source code, since it is generated by my code generator but can by
looking at instruction traces immediately tell what's going on.) So,
the interpreter remains fully portable and any compatibility issues
with C modules should not occur either.


> At a bare minimum, I don't think any significant changes should be
> made under the "it will be faster" justification until the bulk of the
> real-world benchmark suite used for speed.pypy.org is available for
> Python 3. (Wasn't there a GSoC project about that?)
>
Having more tests would surely be helpful, as already said, the most
real-world stuff I can do is Martin's django patch (some of the other
benchmarks though are from the shootout and I can [and did] run them,
too {binarytrees, fannkuch, fasta, mandelbrot, nbody and
spectralnorm}. I have also the AI benchmark from Unladden Swallow but
no current figures.)


Best,
--stefan

From tjreedy at udel.edu  Tue Aug 30 02:28:16 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 29 Aug 2011 20:28:16 -0400
Subject: [Python-Dev] issue 6721 "Locks in python standard library
 should be sanitized on fork"
In-Reply-To: <CAEd-RNqrAjWoW+V2NQb4rdYn52zCMPnEFQYG28pr_KhhCVfHqA@mail.gmail.com>
References: <CAEd-RNpGZxJ5D6VS365QFM9Bvh=z8TTT1WJZtTmye+Crri4xZg@mail.gmail.com>
	<20110823205147.3349eaa8@pitrou.net>
	<CAH_1eM1uxFnB7EHgkcXMTZ8KVLxNw1vJtHk-nOS4VN-kEePpFw@mail.gmail.com>
	<1314131362.3485.36.camel@localhost.localdomain>
	<CAEd-RNqeOFRyMGpRjxxyoBbj9hBD=JgzCc5PtoA9zQAoncomQQ@mail.gmail.com>
	<CACQrdOks88HRc-eNDdMWF8paLHDx4xaBd5YB9A1BrhFeUkvd4g@mail.gmail.com>
	<20110826175336.3af6be57@pitrou.net>
	<A659EA9B-6AC4-414D-B2C5-262792C0ADA0@celeryproject.org>
	<CAGE7PN+4q-yenfTUfbpHhLta966ZesqbL2mv8VD4SMLmu1K0Gg@mail.gmail.com>
	<CAH_1eM1jGaf1TirSLv6GyY8bKbUJnezU1+qBNANHKA5G=NfnNA@mail.gmail.com>
	<CACQrdOni_HGkd7AJGSU4HtAbfTaYWGEZbK1tBNpKUMiJywRYxg@mail.gmail.com>
	<20110829191608.7916da73@pitrou.net>
	<CACQrdOmfWKRENi=K46VdHTE3pcB09ciQoXf=DjskEnswz+XZ-A@mail.gmail.com>
	<1314638573.3551.14.camel@localhost.localdomain>
	<CACQrdO=wfH5eNkSBUd0okyGnA69xV5zA9fx=Gify_Put69RvDg@mail.gmail.com>
	<CAEd-RNqrAjWoW+V2NQb4rdYn52zCMPnEFQYG28pr_KhhCVfHqA@mail.gmail.com>
Message-ID: <j3hasa$ef2$1@dough.gmane.org>

On 8/29/2011 3:41 PM, Nir Aides wrote:

> I am not familiar with the python-dev definition for deprecation, but

Possible to planned eventual removal

> when I used the word in the bug discussion I meant to advertize to
> users that they should not mix threading and forking since that mix is
> and will remain broken by design; I did not mean removal or crippling
> of functionality.

This would be a note or warning in the doc. You can suggest what and 
where to add something on an existing issue or a new one.

-- 
Terry Jan Reedy


From solipsis at pitrou.net  Tue Aug 30 02:55:10 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 30 Aug 2011 02:55:10 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
Message-ID: <20110830025510.638b41d9@pitrou.net>

On Tue, 30 Aug 2011 10:00:28 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> >
> > Having a word-sized "bytecode" format would probably be acceptable in
> > itself, so if you want to submit a patch for that, go ahead.
> 
> Although any such patch should discuss how it compares with Cesare's
> work on wpython.
> Personally, I *like* CPython fitting into the "simple-and-portable"
> niche in the Python interpreter space.

Changing the bytecode width wouldn't make the interpreter more complex.

> Armin Rigo made the judgment
> years ago that CPython was a poor platform for serious optimisation
> when he stopped working on Psyco and started PyPy instead, and I think
> the contrasting fates of PyPy and Unladen Swallow have borne out that
> opinion.

Well, PyPy didn't show any significant achievements before they spent
*much* more time on it than the Unladen Swallow guys did. Whether or not
a good JIT is possible on top of CPython might remain a largely
unanswered question.

> Significantly increasing the complexity of CPython for
> speed-ups that are dwarfed by those available through PyPy seems like
> a poor trade-off to me.

Some years ago we were waiting for Unladen Swallow to improve itself
and be ported to Python 3. Now it seems we are waiting for PyPy to be
ported to Python 3. I'm not sure how "let's just wait" is a good
trade-off if someone proposes interesting patches (which, of course,
remains to be seen).

> At a bare minimum, I don't think any significant changes should be
> made under the "it will be faster" justification until the bulk of the
> real-world benchmark suite used for speed.pypy.org is available for
> Python 3. (Wasn't there a GSoC project about that?)

I'm not sure what the bulk is, but have you already taken a look at
http://hg.python.org/benchmarks/ ?

Regards

Antoine.

From greg at krypto.org  Tue Aug 30 04:38:31 2011
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 29 Aug 2011 19:38:31 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0=T0wHary2800Yxju1GRcy5VkyxXtEatbY-AAXJrG1p4g@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<4E5BF9F7.9020608@v.loewis.de>
	<CA+j1x0=T0wHary2800Yxju1GRcy5VkyxXtEatbY-AAXJrG1p4g@mail.gmail.com>
Message-ID: <CAGE7PN+7G6VYM03q4x3x=Tug81oUePOAUgaN8xuUr-6XxBnE0A@mail.gmail.com>

On Mon, Aug 29, 2011 at 2:05 PM, stefan brunthaler <s.brunthaler at uci.edu>wrote:

> > The question really is whether this is an all-or-nothing deal. If you
> > could identify smaller parts that can be applied independently, interest
> > would be higher.
> >
> Well, it's not an all-or-nothing deal. In my current architecture, I
> can selectively enable most of the optimizations as I see fit. The
> only pre-requisite (in my implementation) is that I have two dispatch
> loops with a changed instruction format. It is, however, not a
> technical necessity, just the way I implemented it. Basically, you can
> choose whatever you like best, and I could extract that part. I am
> just offering to add all the things that I have done :)
>
>
+1 from me on going forward with your performance improvements.  The more
you can break them down into individual smaller patch sets the better as
they can be reviewed and applied as needed.  A prerequisites patch, a patch
for the wide opcodes, etc..

For benchmarks given this is python 3, just get as many useful ones running
as you can.

Some in this thread seemed to give the impression that CPython performance
is not something to care about. I disagree. I see CPython being the main
implementation of Python used in most places for a long time. Improving its
performance merely raises the bar to be met by other implementations if they
want to compete. That is a good thing!

-gps


> > Also, I'd be curious whether your techniques help or hinder a potential
> > integration of a JIT generator.
> >
> This is something I have previously frequently discussed with several
> JIT people. IMHO, having my optimizations in-place also helps a JIT
> compiler, since it can re-use the information I gathered to generate
> more aggressively optimized native machine code right away (the inline
> caches can be generated with the type information right away, some
> functions could be inlined with the guard statements subsumed, etc.)
> Another benefit could be that the JIT compiler can spend longer time
> on generating code, because the interpreter is already faster (so in
> some cases it would probably not make sense to include a
> non-optimizing fast and simple JIT compiler).
> There are others on the list, who probably can/want to comment on this,
> too.
>
> That aside, I think that while having a JIT is an important goal, I
> can very well imagine scenarios where the additional memory
> consumption (for the generated native machine code) of a JIT for each
> process (I assume that the native machine code caches are not shared)
> hinders scalability. I have in fact no data to back this up, but I
> think that would be an interesting trade off, say if I have 30% gain
> in performance without substantial additional memory requirements on
> my existing hardware, compared to higher achievable speedups that
> require more machines, though.
>
>
> Regards,
> --stefan
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110829/1ec07d77/attachment.html>

From ncoghlan at gmail.com  Tue Aug 30 05:29:59 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 30 Aug 2011 13:29:59 +1000
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAGE7PN+7G6VYM03q4x3x=Tug81oUePOAUgaN8xuUr-6XxBnE0A@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<4E5BF9F7.9020608@v.loewis.de>
	<CA+j1x0=T0wHary2800Yxju1GRcy5VkyxXtEatbY-AAXJrG1p4g@mail.gmail.com>
	<CAGE7PN+7G6VYM03q4x3x=Tug81oUePOAUgaN8xuUr-6XxBnE0A@mail.gmail.com>
Message-ID: <CADiSq7eJc9b56WDvCfG=Eqgt=HN==WRVCsjuJ8Ez=+4-0nXKQw@mail.gmail.com>

On Tue, Aug 30, 2011 at 12:38 PM, Gregory P. Smith <greg at krypto.org> wrote:
> Some in this thread seemed to give the impression that CPython performance
> is not something to care about. I disagree. I see CPython being the main
> implementation of Python used in most places for a long time. Improving its
> performance merely raises the bar to be met by other implementations if they
> want to compete. That is a good thing!

Not the impression I intended to give. I merely want to highlight that
we need to be careful that incremental increases in complexity are
justified with real, measured performance improvements. PyPy has set
the bar on how to do that - people that seriously want to make CPython
faster need to focus on getting speed.python.org sorted *first* (so we
know where we're starting) and *then* work on trying to improve
CPython's numbers relative to that starting point.

The PSF has the hardware to run the site, but, unless more has been
going in the background than I am aware of, is still lacking trusted
volunteers to do the following:
1. Getting codespeed up and running on the PSF hardware
2. Hooking it in to the CPython source control infrastructure
3. Getting a reasonable set of benchmarks running on 3.x (likely
starting with the already ported set in Mercurial, but eventually we
want the full suite that PyPy uses)
4. Once PyPy, Jython and IronPython offer 3.x compatible versions,
start including them as well (alternatively, offer 2.x performance
comparisons as well, although that's less interesting from a CPython
point of view since it can't be used to guide future CPython
optimisation efforts)

Anecdotal, non-reproducible performance figures are *not* the way to
go about serious optimisation efforts. Using a dedicated machine is
vulnerable to architecture-specific idiosyncracies, but ad hoc testing
on other systems can still be used as a sanity check.

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From greg.ewing at canterbury.ac.nz  Tue Aug 30 07:55:20 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 30 Aug 2011 17:55:20 +1200
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
	<4E5C01E4.2050106@canterbury.ac.nz>
	<CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>
Message-ID: <4E5C7B48.5080402@canterbury.ac.nz>

Guido van Rossum wrote:

> On Mon, Aug 29, 2011 at 2:17 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

>>All you
>>need to do when writing the .pyx file is follow the same
>>API that you would if you were writing C code to use the
>>library.
>
> Interesting. Then how does Pyrex/Cython typecheck your code at compile time?

You might be reading more into that statement than I meant.
You have to supply Pyrex/Cython versions of the C declarations,
either hand-written or generated by a tool. But you write them
based on the advertised C API -- you don't have to manually
expand macros, work out the low-level layout of structs, or
anything like that (as you often have to do when using ctypes).

-- 
Greg


From greg.ewing at canterbury.ac.nz  Tue Aug 30 07:57:28 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 30 Aug 2011 17:57:28 +1200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
Message-ID: <4E5C7BC8.6010302@canterbury.ac.nz>

Nick Coghlan wrote:

> Personally, I *like* CPython fitting into the "simple-and-portable"
> niche in the Python interpreter space.

Me, too! I like that I can read the CPython source and
understand what it's doing most of the time. Please don't
screw that up by attempting to perform heroic optimisations.

-- 
Greg

From martin at v.loewis.de  Tue Aug 30 08:20:46 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Aug 2011 08:20:46 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <201108300020.46140.victor.stinner@haypocalc.com>
References: <4E553FBC.7080501@v.loewis.de>
	<4E5B5364.9040100@haypocalc.com>	<4E5BE9D8.5050309@v.loewis.de>
	<201108300020.46140.victor.stinner@haypocalc.com>
Message-ID: <4E5C813E.9080301@v.loewis.de>

> I don't compare ASCII and ISO-8859-1 decoders. I was asking if decoding b'abc' 
> from ISO-8859-1 is faster than decoding b'ab\xff' from ISO-8859-1, and if yes: 
> why?

No, that makes no difference.

> 
> Your patch replaces PyUnicode_New(size, 255) ...  memcpy(), by 
> PyUnicode_FromUCS1().

You compared to the wrong revision. PyUnicode_New is already a PEP 393
function, and this version you have been comparing to is indeed faster
than the current version. However, it is also incorrect, as it fails
to compute the maxchar, and hence fails to detect pure-ASCII strings.

See below for the actual diff. It should be obvious why the 393 version
is faster: 3.3 currently needs to widen each char (to 16 or 32 bits).

Regards,
Martin

@@ -5569,41 +5569,8 @@
                       Py_ssize_t size,
                       const char *errors)
 {
-    PyUnicodeObject *v;
-    Py_UNICODE *p;
-    const char *e, *unrolled_end;
-
     /* Latin-1 is equivalent to the first 256 ordinals in Unicode. */
-    if (size == 1) {
-        Py_UNICODE r = *(unsigned char*)s;
-        return PyUnicode_FromUnicode(&r, 1);
-    }
-
-    v = _PyUnicode_New(size);
-    if (v == NULL)
-        goto onError;
-    if (size == 0)
-        return (PyObject *)v;
-    p = PyUnicode_AS_UNICODE(v);
-    e = s + size;
-    /* Unrolling the copy makes it much faster by reducing the looping
-       overhead. This is similar to what many memcpy() implementations
do. */
-    unrolled_end = e - 4;
-    while (s < unrolled_end) {
-        p[0] = (unsigned char) s[0];
-        p[1] = (unsigned char) s[1];
-        p[2] = (unsigned char) s[2];
-        p[3] = (unsigned char) s[3];
-        s += 4;
-        p += 4;
-    }
-    while (s < e)
-        *p++ = (unsigned char) *s++;
-    return (PyObject *)v;
-
-  onError:
-    Py_XDECREF(v);
-    return NULL;
+    return PyUnicode_FromUCS1((unsigned char*)s, size);
 }

 /* create or adjust a UnicodeEncodeError */

From eliben at gmail.com  Tue Aug 30 08:22:31 2011
From: eliben at gmail.com (Eli Bendersky)
Date: Tue, 30 Aug 2011 09:22:31 +0300
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <4E5C7BC8.6010302@canterbury.ac.nz>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<4E5C7BC8.6010302@canterbury.ac.nz>
Message-ID: <CAF-Rda-6mo2mz20ssD=gq+aU8jPQH6yuFsk=hBUN7SC7=y55eQ@mail.gmail.com>

On Tue, Aug 30, 2011 at 08:57, Greg Ewing <greg.ewing at canterbury.ac.nz>wrote:

> Nick Coghlan wrote:
>
>  Personally, I *like* CPython fitting into the "simple-and-portable"
>> niche in the Python interpreter space.
>>
>
> Me, too! I like that I can read the CPython source and
> understand what it's doing most of the time. Please don't
> screw that up by attempting to perform heroic optimisations.
>
> --
>

Following this argument to the extreme, the bytecode evaluation code of
CPython can be simplified quite a bit. Lose 2x performance but gain a lot of
readability. Does that sound like a good deal? I don't intend to sound
sarcastic, just show that IMHO this argument isn't a good one. I think that
even clever optimized code can be properly written and *documented* to make
the task of understanding it feasible. Personally, I'd love CPython to be a
bit faster and see no reason to give up optimization opportunities for the
sake of code readability.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110830/dac9191a/attachment.html>

From ncoghlan at gmail.com  Tue Aug 30 09:58:42 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 30 Aug 2011 17:58:42 +1000
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAF-Rda-6mo2mz20ssD=gq+aU8jPQH6yuFsk=hBUN7SC7=y55eQ@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<4E5C7BC8.6010302@canterbury.ac.nz>
	<CAF-Rda-6mo2mz20ssD=gq+aU8jPQH6yuFsk=hBUN7SC7=y55eQ@mail.gmail.com>
Message-ID: <CADiSq7e524Vtmnv2rE7ztS2K-SrgUT3eZg46=Rg4nyOH804-dQ@mail.gmail.com>

On Tue, Aug 30, 2011 at 4:22 PM, Eli Bendersky <eliben at gmail.com> wrote:
> On Tue, Aug 30, 2011 at 08:57, Greg Ewing <greg.ewing at canterbury.ac.nz>
> wrote:
> Following this argument to the extreme, the bytecode evaluation code of
> CPython can be simplified quite a bit. Lose 2x performance but gain a lot of
> readability. Does that sound like a good deal? I don't intend to sound
> sarcastic, just show that IMHO this argument isn't a good one. I think that
> even clever optimized code can be properly written and *documented* to make
> the task of understanding it feasible. Personally, I'd love CPython to be a
> bit faster and see no reason to give up optimization opportunities for the
> sake of code readability.

Yeah, it's definitely a trade-off - the point I was trying to make is
that there *is* a trade-off being made between complexity and speed.

I think the computed-gotos stuff struck a nice balance - the macro-fu
involved means that you can still understand what the main eval loop
is *doing*, even if you don't know exactly what's hidden behind the
target macros. Ditto for the older opcode prediction feature and the
peephole optimiser - separation of concerns means that you can
understand the overall flow of events without needing to understand
every little detail.

This is where the request to extract individual orthogonal changes and
submit separate patches comes from - it makes it clear that the
independent changes *can* be separated cleanly, and aren't a giant
ball of incomprehensible mud. It's the difference between complex
(lots of moving parts, that can each be understood on their own and
are then composed into a meaningful whole) and complicated (massive
patches that don't work at all if any one component is delayed)

Eugene Toder's AST optimiser work that I still hope to get into 3.3
will have to undergo a similar process - the current patch covers a
bit too much ground and needs to be broken up into smaller steps
before we can seriously consider pushing it into the core.

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From martin at v.loewis.de  Tue Aug 30 10:06:26 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Aug 2011 10:06:26 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <20110829225413.689d073c@pitrou.net>
References: <4E553FBC.7080501@v.loewis.de>
	<20110824203228.3e00874d@pitrou.net>	<4E5606C7.9000404@v.loewis.de>	<CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>	<4E577589.4030809@v.loewis.de>	<CAP7+vJ+fYXLcqG_rHkNtg5yuZKJTj2nMxFaLczr+zqeKUBtY8A@mail.gmail.com>	<4E5BF741.50209@v.loewis.de>
	<20110829225413.689d073c@pitrou.net>
Message-ID: <4E5C9A02.6080704@v.loewis.de>

> This looks very nice. Is 3.3 a wide build? (how about a narrow build?)

It's a wide build. For reference, I also attach 64-bit narrow build
results, and 32-bit results (wide, narrow, and PEP 393). Savings are
much smaller in narrow builds (larger on 32-bit systems than on
64-bit systems).

> (is it with your own port of Django to py3k, or is there an official
> branch for it?)

It's https://bitbucket.org/loewis/django-3k

Regards,
Martin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 3k-32-16.txt
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110830/15fd8a04/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 3k-32-32.txt
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110830/15fd8a04/attachment-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 3k-64-16.txt
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110830/15fd8a04/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 393-32.txt
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110830/15fd8a04/attachment-0003.txt>

From stefan_ml at behnel.de  Tue Aug 30 10:15:22 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 30 Aug 2011 10:15:22 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
Message-ID: <j3i66r$c6g$1@dough.gmane.org>

Nick Coghlan, 30.08.2011 02:00:
> On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou wrote:
>> On Mon, 29 Aug 2011 11:33:14 -0700 stefan brunthaler wrote:
>>> * The optimized dispatch routine has a changed instruction format
>>> (word-sized instead of bytecodes) that allows for regular instruction
>>> decoding (without the HAS_ARG-check) and inlinind of some objects in
>>> the instruction format on 64bit architectures.
>>
>> Having a word-sized "bytecode" format would probably be acceptable in
>> itself, so if you want to submit a patch for that, go ahead.
>
> Although any such patch should discuss how it compares with Cesare's
> work on wpython.
>
> Personally, I *like* CPython fitting into the "simple-and-portable"
> niche in the Python interpreter space. Armin Rigo made the judgment
> years ago that CPython was a poor platform for serious optimisation
> when he stopped working on Psyco and started PyPy instead, and I think
> the contrasting fates of PyPy and Unladen Swallow have borne out that
> opinion. Significantly increasing the complexity of CPython for
> speed-ups that are dwarfed by those available through PyPy seems like
> a poor trade-off to me.

If Stefan can cut down his changes into smaller feature chunks, thus making 
their benefit reproducible and verifiable by others, it's well worth 
reconsidering if even a visible increase of complexity isn't worth the 
improved performance, one patch at a time. Even if PyPy's performance tops 
the improvements, it's worth remembering that that's also a very different 
kind of system than CPython, with different resource requirements and a 
different level of maturity, compatibility, portability, etc. There are 
many reasons to continue using CPython, not only in corners, and there are 
many people who would be happy about a faster CPython. Raising the bar has 
its virtues.

That being said, I also second Nick's reference to wpython. If CPython 
grows its byte code size anyway (which, as I understand, is one part of the 
proposed changes), it's worth looking at wpython first, given that it has 
been around and working for a while. The other proposed changes sound like 
at least some of them are independent from this one.

Stefan


From mark at hotpy.org  Tue Aug 30 10:31:12 2011
From: mark at hotpy.org (Mark Shannon)
Date: Tue, 30 Aug 2011 09:31:12 +0100
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
Message-ID: <4E5C9FD0.2040208@hotpy.org>

Nick Coghlan wrote:
> On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> On Mon, 29 Aug 2011 11:33:14 -0700
>> stefan brunthaler <s.brunthaler at uci.edu> wrote:
>>> * The optimized dispatch routine has a changed instruction format
>>> (word-sized instead of bytecodes) that allows for regular instruction
>>> decoding (without the HAS_ARG-check) and inlinind of some objects in
>>> the instruction format on 64bit architectures.
>> Having a word-sized "bytecode" format would probably be acceptable in
>> itself, so if you want to submit a patch for that, go ahead.
> 
> Although any such patch should discuss how it compares with Cesare's
> work on wpython.
> 
> Personally, I *like* CPython fitting into the "simple-and-portable"
> niche in the Python interpreter space.

CPython has a a large number of micro-optimisations, scattered all of 
the code base. By removing these and adding large-scale optimisations, 
like Stephan's, the code base *might* actually get smaller overall (and 
thus simpler) *and* faster.
Of course, CPython must remain portable.

[snip]
> 
> At a bare minimum, I don't think any significant changes should be
> made under the "it will be faster" justification until the bulk of the
> real-world benchmark suite used for speed.pypy.org is available for
> Python 3. (Wasn't there a GSoC project about that?)

+1

Cheers,
Mark.


From mark at hotpy.org  Tue Aug 30 10:32:02 2011
From: mark at hotpy.org (Mark Shannon)
Date: Tue, 30 Aug 2011 09:32:02 +0100
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <4E5BF9F7.9020608@v.loewis.de>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<4E5BF9F7.9020608@v.loewis.de>
Message-ID: <4E5CA002.7010109@hotpy.org>

Martin v. L?wis wrote:
>> So, the two big issues aside, is there any interest in incorporating
>> these optimizations in Python 3?
> 
> The question really is whether this is an all-or-nothing deal. If you
> could identify smaller parts that can be applied independently, interest
> would be higher.
> 
> Also, I'd be curious whether your techniques help or hinder a potential
> integration of a JIT generator.

A JIT compiler is not a silver bullet, translation to machine code is
just one of many optimisations performed by PyPy.
A compiler merely removes interpretative overhead, at the cost of
significantly increased code size, whereas Stephan's work attacks both
interpreter overhead and some of the inefficiencies due to dynamic typing.

If Unladen Swallow achieved anything it was to demonstrate that a JIT
alone does not work well.

My (experimental) HotPy VM has similar base-line speed to CPython, yet
is able to outperform Unladen Swallow using interpreter-only optimisations.
(It goes even faster with the compiler turned on :) )

Cheers,
Mark.


> 
> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org



From martin at v.loewis.de  Tue Aug 30 10:40:16 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Aug 2011 10:40:16 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <20110830025510.638b41d9@pitrou.net>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>	<20110829231420.20c3516a@pitrou.net>	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net>
Message-ID: <4E5CA1F0.2070005@v.loewis.de>

>> Although any such patch should discuss how it compares with Cesare's
>> work on wpython.
>> Personally, I *like* CPython fitting into the "simple-and-portable"
>> niche in the Python interpreter space.
> 
> Changing the bytecode width wouldn't make the interpreter more complex.

No, but I think Stefan is proposing to add a *second* byte code format,
in addition to the one that remains there. That would certainly be an
increase in complexity.

> Some years ago we were waiting for Unladen Swallow to improve itself
> and be ported to Python 3. Now it seems we are waiting for PyPy to be
> ported to Python 3. I'm not sure how "let's just wait" is a good
> trade-off if someone proposes interesting patches (which, of course,
> remains to be seen).

I completely agree. Let's not put unmet preconditions to such projects.

For example, I still plan to write a JIT for Python at some point. This
may happen in two months, or in two years. I wouldn't try to stop
anybody from contributing improvements that may become obsolete with the
JIT. The only recent case where I *did* try to stop people is with
PEP-393, where I do believe that some of the changes that had been
made over the last year become redundant.

Regards,
Martin

From martin at v.loewis.de  Tue Aug 30 10:46:22 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Aug 2011 10:46:22 +0200
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <4E5C7B48.5080402@canterbury.ac.nz>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>	<4E5951D5.5020200@v.loewis.de>	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>	<20110828002642.4765fc89@pitrou.net>	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>	<20110828012705.523e51d4@pitrou.net>	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>	<j3chv2$tvf$1@dough.gmane.org>
	<j3e137$q0g$1@dough.gmane.org>	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>	<4E5C01E4.2050106@canterbury.ac.nz>	<CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>
	<4E5C7B48.5080402@canterbury.ac.nz>
Message-ID: <4E5CA35E.8000509@v.loewis.de>

> You might be reading more into that statement than I meant.
> You have to supply Pyrex/Cython versions of the C declarations,
> either hand-written or generated by a tool. But you write them
> based on the advertised C API -- you don't have to manually
> expand macros, work out the low-level layout of structs, or
> anything like that (as you often have to do when using ctypes).

I can understand how that works when building a CPython extension.
But what about creating Jython/IronPython modules with Cython?
At what point get the header files considered there?

Regards,
Martin

From stefan_ml at behnel.de  Tue Aug 30 10:57:22 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 30 Aug 2011 10:57:22 +0200
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
	support in 3.3)
In-Reply-To: <4E5CA35E.8000509@v.loewis.de>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>	<4E5951D5.5020200@v.loewis.de>	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>	<20110828002642.4765fc89@pitrou.net>	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>	<20110828012705.523e51d4@pitrou.net>	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>	<j3chv2$tvf$1@dough.gmane.org>	<j3e137$q0g$1@dough.gmane.org>	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>	<4E5C01E4.2050106@canterbury.ac.nz>	<CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>	<4E5C7B48.5080402@canterbury.ac.nz>
	<4E5CA35E.8000509@v.loewis.de>
Message-ID: <j3i8li$net$1@dough.gmane.org>

"Martin v. L?wis", 30.08.2011 10:46:
>> You might be reading more into that statement than I meant.
>> You have to supply Pyrex/Cython versions of the C declarations,
>> either hand-written or generated by a tool. But you write them
>> based on the advertised C API -- you don't have to manually
>> expand macros, work out the low-level layout of structs, or
>> anything like that (as you often have to do when using ctypes).
>
> I can understand how that works when building a CPython extension.
> But what about creating Jython/IronPython modules with Cython?
> At what point get the header files considered there?

I had written a bit about this here:

http://thread.gmane.org/gmane.comp.python.devel/126340/focus=126419

Stefan


From ncoghlan at gmail.com  Tue Aug 30 12:55:51 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 30 Aug 2011 20:55:51 +1000
Subject: [Python-Dev] Planned PEP status changes
In-Reply-To: <CAP1=2W5uHA9UbVEiMGib51xTqT8h4dVoi3BCgZfX8AV5h-ev=A@mail.gmail.com>
References: <CADiSq7e7nj+EV4remUejKJYZczCmrs27iekQv9y9TEnkwx4=SA@mail.gmail.com>
	<CAP1=2W5uHA9UbVEiMGib51xTqT8h4dVoi3BCgZfX8AV5h-ev=A@mail.gmail.com>
Message-ID: <CADiSq7dFjinWimP=rEB2e1O3sGVeGjLkrDz4oTDERzy+bwRNaQ@mail.gmail.com>

On Sat, Aug 27, 2011 at 2:35 AM, Brett Cannon <brett at python.org> wrote:
> On Tue, Aug 23, 2011 at 19:42, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Unless I hear any objections, I plan to adjust the current PEP
>> statuses as follows some time this weekend:
>>
>> Move from Accepted to Finished:
>>
>> ? ?389 ?argparse - New Command Line Parsing Module ? ? ? ? ? ? ?Bethard
>> ? ?391 ?Dictionary-Based Configuration For Logging ? ? ? ? ? ? ?Sajip
>> ? ?3108 ?Standard Library Reorganization ? ? ? ? ? ? ? ? ? ? ? ? Cannon
>
> <sigh> I had always hoped to get profile/cProfile taken care of, but
> obviously that just didn't ever happen. So no objection, just a slight
> sting from the reminder of why the PEP was left open.

After starting to write a justification for marking the PEP as Final
despite the outstanding TODO items, I realised that didn't make a lot
of sense, so I left it at Accepted instead. So your call if you want
to say "not gonna happen" and close it out anyway.

I made the other 4 changes though (argparse, logging.dictConfig, new
super -> Final, Unladen Swallow -> Withdrawn).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From solipsis at pitrou.net  Tue Aug 30 13:33:23 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 30 Aug 2011 13:33:23 +0200
Subject: [Python-Dev] PEP 393 review
In-Reply-To: <4E5C9A02.6080704@v.loewis.de>
References: <4E553FBC.7080501@v.loewis.de> <20110824203228.3e00874d@pitrou.net>
	<4E5606C7.9000404@v.loewis.de>
	<CAP7+vJ+f3oN3RmZhCEntgXBccj4gE7ErdKwvL9hv4uaBgmL_qQ@mail.gmail.com>
	<4E577589.4030809@v.loewis.de>
	<CAP7+vJ+fYXLcqG_rHkNtg5yuZKJTj2nMxFaLczr+zqeKUBtY8A@mail.gmail.com>
	<4E5BF741.50209@v.loewis.de> <20110829225413.689d073c@pitrou.net>
	<4E5C9A02.6080704@v.loewis.de>
Message-ID: <20110830133323.13842072@pitrou.net>


By the way, I don't know if you're working on it, but StringIO seems a
bit broken right now. test_memoryio crashes here:

test_newline_cr (test.test_memoryio.CStringIOTest) ... Fatal Python error: Segmentation fault

Current thread 0x00007f3f6353b700:
  File "/home/antoine/cpython/pep-393/Lib/test/test_memoryio.py", line 583 in test_newline_cr
  File "/home/antoine/cpython/pep-393/Lib/unittest/case.py", line 386 in _executeTestPart
  File "/home/antoine/cpython/pep-393/Lib/unittest/case.py", line 441 in run
  File "/home/antoine/cpython/pep-393/Lib/unittest/case.py", line 493 in __call__
  File "/home/antoine/cpython/pep-393/Lib/unittest/suite.py", line 105 in run
  File "/home/antoine/cpython/pep-393/Lib/unittest/suite.py", line 67 in __call__
  File "/home/antoine/cpython/pep-393/Lib/unittest/suite.py", line 105 in run
  File "/home/antoine/cpython/pep-393/Lib/unittest/suite.py", line 67 in __call__
  File "/home/antoine/cpython/pep-393/Lib/unittest/runner.py", line 168 in run
  File "/home/antoine/cpython/pep-393/Lib/test/support.py", line 1293 in _run_suite
  File "/home/antoine/cpython/pep-393/Lib/test/support.py", line 1327 in run_unittest
  File "/home/antoine/cpython/pep-393/Lib/test/test_memoryio.py", line 718 in test_main
  File "/home/antoine/cpython/pep-393/Lib/test/regrtest.py", line 1139 in runtest_inner
  File "/home/antoine/cpython/pep-393/Lib/test/regrtest.py", line 915 in runtest
  File "/home/antoine/cpython/pep-393/Lib/test/regrtest.py", line 707 in main
  File "/home/antoine/cpython/pep-393/Lib/test/__main__.py", line 13 in <module>
  File "/home/antoine/cpython/pep-393/Lib/runpy.py", line 73 in _run_code
  File "/home/antoine/cpython/pep-393/Lib/runpy.py", line 160 in _run_module_as_main
Erreur de segmentation (core dumped)


And here's an excerpt of the C stack:

#0  find_control_char (translated=0, universal=0, readnl=<value optimized out>, kind=4, start=0xa75cf4 "c", end=
    0xa75d00 "", consumed=0x7fffffffab38) at ./Modules/_io/textio.c:1617
#1  _PyIO_find_line_ending (translated=0, universal=0, readnl=<value optimized out>, kind=4, start=0xa75cf4 "c", end=
    0xa75d00 "", consumed=0x7fffffffab38) at ./Modules/_io/textio.c:1678
#2  0x00000000004ed3be in _stringio_readline (self=0x7ffff291a250) at ./Modules/_io/stringio.c:271
#3  stringio_iternext (self=0x7ffff291a250) at ./Modules/_io/stringio.c:322
#4  0x000000000052aa19 in listextend (self=0x7ffff2900ab8, b=<value optimized out>) at Objects/listobject.c:844
#5  0x000000000052afe8 in list_init (self=0x7ffff2900ab8, args=<value optimized out>, kw=<value optimized out>)
    at Objects/listobject.c:2312
#6  0x00000000004283c7 in type_call (type=<value optimized out>, args=(<_io.StringIO at remote 0x7ffff291a250>,), 
    kwds=0x0) at Objects/typeobject.c:692
#7  0x00000000004fdf17 in PyObject_Call (func=<type at remote 0x7f95c0>, arg=<value optimized out>, 
    kw=<value optimized out>) at Objects/abstract.c:2147


Regards

Antoine.

From solipsis at pitrou.net  Tue Aug 30 13:38:29 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 30 Aug 2011 13:38:29 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<4E5BF9F7.9020608@v.loewis.de>
	<CA+j1x0=T0wHary2800Yxju1GRcy5VkyxXtEatbY-AAXJrG1p4g@mail.gmail.com>
	<CAGE7PN+7G6VYM03q4x3x=Tug81oUePOAUgaN8xuUr-6XxBnE0A@mail.gmail.com>
	<CADiSq7eJc9b56WDvCfG=Eqgt=HN==WRVCsjuJ8Ez=+4-0nXKQw@mail.gmail.com>
Message-ID: <20110830133829.099d7714@pitrou.net>

On Tue, 30 Aug 2011 13:29:59 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> Anecdotal, non-reproducible performance figures are *not* the way to
> go about serious optimisation efforts.

What about anecdotal *and* reproducible performance figures? :)
I may be half-joking, but we already have a set of py3k-compatible
benchmarks and, besides, sometimes a timeit invocation gives a good
idea of whether an approach is fruitful or not.
While a permanent public reference with historical tracking of
performance figures is even better, let's not freeze everything until
it's ready.
(for example, do we need to wait for speed.python.org before PEP 393 is
accepted?)

Regards

Antoine.



From ncoghlan at gmail.com  Tue Aug 30 15:05:06 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 30 Aug 2011 23:05:06 +1000
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <20110830133829.099d7714@pitrou.net>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<4E5BF9F7.9020608@v.loewis.de>
	<CA+j1x0=T0wHary2800Yxju1GRcy5VkyxXtEatbY-AAXJrG1p4g@mail.gmail.com>
	<CAGE7PN+7G6VYM03q4x3x=Tug81oUePOAUgaN8xuUr-6XxBnE0A@mail.gmail.com>
	<CADiSq7eJc9b56WDvCfG=Eqgt=HN==WRVCsjuJ8Ez=+4-0nXKQw@mail.gmail.com>
	<20110830133829.099d7714@pitrou.net>
Message-ID: <CADiSq7fjBWXwJWUnO-B-pn=1v74GVmhU7-p4ac0GxiHtR78k5Q@mail.gmail.com>

On Tue, Aug 30, 2011 at 9:38 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Tue, 30 Aug 2011 13:29:59 +1000
> Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> Anecdotal, non-reproducible performance figures are *not* the way to
>> go about serious optimisation efforts.
>
> What about anecdotal *and* reproducible performance figures? :)
> I may be half-joking, but we already have a set of py3k-compatible
> benchmarks and, besides, sometimes a timeit invocation gives a good
> idea of whether an approach is fruitful or not.
> While a permanent public reference with historical tracking of
> performance figures is even better, let's not freeze everything until
> it's ready.
> (for example, do we need to wait for speed.python.org before PEP 393 is
> accepted?)

Yeah, I'd neglected the idea of just running perf.py for pre- and
post-patch performance comparisons. You're right that that can
generate sufficient info to make a well-informed decision.

I'd still really like it if some of the people advocating that we care
about CPython performance actually volunteered to spearhead the effort
to get speed.python.org up and running, though. As far as I know, the
hardware's spinning idly waiting to be given work to do :P

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From vinay_sajip at yahoo.co.uk  Tue Aug 30 15:09:19 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Tue, 30 Aug 2011 13:09:19 +0000 (UTC)
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
	support in 3.3)
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org>
	<CAK1QoopNFDs=_O5A8XqVD6U=Yy6f+XyO5a6VaVXvzowywDRJ4A@mail.gmail.com>
Message-ID: <loom.20110830T150652-118@post.gmane.org>

Meador Inge <meadori <at> gmail.com> writes:

 
> 1. http://bugs.python.org/issue9041

I raised a question about this patch (in the issue tracker).

> 2. http://bugs.python.org/issue9651
> 3. http://bugs.python.org/issue11241

I presume, since Amaury has commit rights, that he could commit these.

Regards,

Vinay Sajip


From riscutiavlad at gmail.com  Tue Aug 30 16:27:08 2011
From: riscutiavlad at gmail.com (Vlad Riscutia)
Date: Tue, 30 Aug 2011 07:27:08 -0700
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <loom.20110830T150652-118@post.gmane.org>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org>
	<CAK1QoopNFDs=_O5A8XqVD6U=Yy6f+XyO5a6VaVXvzowywDRJ4A@mail.gmail.com>
	<loom.20110830T150652-118@post.gmane.org>
Message-ID: <CAJ-9HZ1_sm3ZtuC=E=CSWZZwe_z-KuGXFukEEaDSSMH9b6Xz4A@mail.gmail.com>

I also have some patches sitting on the tracker for some time:

http://bugs.python.org/issue12764
http://bugs.python.org/issue11835
http://bugs.python.org/issue12528 which also fixes
http://bugs.python.org/issue6069 and http://bugs.python.org/issue11920
http://bugs.python.org/issue6068 which also fixes
http://bugs.python.org/issue6493

Thank you,
Vlad

On Tue, Aug 30, 2011 at 6:09 AM, Vinay Sajip <vinay_sajip at yahoo.co.uk>wrote:

> Meador Inge <meadori <at> gmail.com> writes:
>
>
> > 1. http://bugs.python.org/issue9041
>
> I raised a question about this patch (in the issue tracker).
>
> > 2. http://bugs.python.org/issue9651
> > 3. http://bugs.python.org/issue11241
>
> I presume, since Amaury has commit rights, that he could commit these.
>
> Regards,
>
> Vinay Sajip
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/riscutiavlad%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110830/e6d80a75/attachment.html>

From solipsis at pitrou.net  Tue Aug 30 17:20:16 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 30 Aug 2011 17:20:16 +0200
Subject: [Python-Dev] cpython: Remove display options (--name,
 etc.) from the Distribution class.
References: <E1QyPCg-0003kj-0C@dinsdale.python.org>
Message-ID: <20110830172016.01999c5f@pitrou.net>

On Tue, 30 Aug 2011 16:22:14 +0200
eric.araujo <python-checkins at python.org> wrote:

> http://hg.python.org/cpython/rev/af0bcccb935b
> changeset:   72127:af0bcccb935b
> user:        ?ric Araujo <merwok at netwok.org>
> date:        Tue Aug 30 00:55:02 2011 +0200
> summary:
>   Remove display options (--name, etc.) from the Distribution class.
> 
> These options were used to implement ?setup.py --name?,
> ?setup.py --version?, etc. which are now handled by the pysetup metadata
> action or direct parsing of the setup.cfg file.
> 
> As a side effect, the Distribution class no longer accepts a 'url' key
> in its *attrs* argument: it has to be 'home-page' to be recognized as a
> valid metadata field and passed down to the dist.metadata object.

I don't want to sound nitpicky, but it's the first time I see
"home-page" hyphenized. How about "homepage"?

Regards

Antoine.



From stefan at brunthaler.net  Tue Aug 30 17:27:13 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Tue, 30 Aug 2011 08:27:13 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <4E5CA1F0.2070005@v.loewis.de>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
Message-ID: <CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>

>> Changing the bytecode width wouldn't make the interpreter more complex.
>
> No, but I think Stefan is proposing to add a *second* byte code format,
> in addition to the one that remains there. That would certainly be an
> increase in complexity.
>
Yes, indeed I have a more straightforward instruction format to allow
for more efficient decoding. Just going from bytecode size to
word-code size without changing the instruction format is going to
require 8 (or word-size) times more memory on a 64bit system. From an
optimization perspective, the irregular instruction format was the
biggest problem, because checking for HAS_ARG is always on the fast
path and mostly unpredictable. Hence, I chose to extend the
instruction format to have word-size and use the additional space to
have the upper half be used for the argument and the lower half for
the actual opcode. Encoding is more efficient, and *not* more complex.
Using profiling to indicate what code is hot, I don't waste too much
memory on encoding this regular instruction format.


> For example, I still plan to write a JIT for Python at some point. This
> may happen in two months, or in two years. I wouldn't try to stop
> anybody from contributing improvements that may become obsolete with the
> JIT.
>
I would not necessary argue that at least my optimizations would
become obsolete; if you still think about writing a JIT, it might make
sense to re-use what I've got and not start from scratch, e.g.,
building a simple JIT compiler that just inlines the operation
implementations as templates to eliminate the interpretative overhead
(in similar vein as Piumarta and Riccardi's paper from 1998) might be
good start. Thoug I don't want to pre-influence your JIT design, I'm
just thinking out loud...

Regards,
--stefan

From phd at phdru.name  Tue Aug 30 17:34:58 2011
From: phd at phdru.name (Oleg Broytman)
Date: Tue, 30 Aug 2011 19:34:58 +0400
Subject: [Python-Dev] PyPI went down
In-Reply-To: <20110830153001.GA13312@iskra.aviel.ru>
References: <20110830153001.GA13312@iskra.aviel.ru>
Message-ID: <20110830153458.GB13312@iskra.aviel.ru>

On Tue, Aug 30, 2011 at 07:30:01PM +0400, Oleg Broytman wrote:
>    PyPI went down

   More information: ports 80 and 443 are open, the servers performs SSL
handshake but timeouts on HTTP requests (with or without SSL).

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From phd at phdru.name  Tue Aug 30 17:40:46 2011
From: phd at phdru.name (Oleg Broytman)
Date: Tue, 30 Aug 2011 19:40:46 +0400
Subject: [Python-Dev] PyPI went down
In-Reply-To: <20110830153458.GB13312@iskra.aviel.ru>
References: <20110830153001.GA13312@iskra.aviel.ru>
	<20110830153458.GB13312@iskra.aviel.ru>
Message-ID: <20110830154046.GC13312@iskra.aviel.ru>

It is back up. I am very sorry for the fuss.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From merwok at netwok.org  Tue Aug 30 17:34:02 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Tue, 30 Aug 2011 17:34:02 +0200
Subject: [Python-Dev] cpython: Remove display options (--name,
 etc.) from the Distribution class.
In-Reply-To: <20110830172016.01999c5f@pitrou.net>
References: <E1QyPCg-0003kj-0C@dinsdale.python.org>
	<20110830172016.01999c5f@pitrou.net>
Message-ID: <4E5D02EA.2070800@netwok.org>

Hi,

Le 30/08/2011 17:20, Antoine Pitrou a ?crit :
> On Tue, 30 Aug 2011 16:22:14 +0200
> eric.araujo <python-checkins at python.org> wrote:
>> As a side effect, the Distribution class no longer accepts a 'url' key
>> in its *attrs* argument: it has to be 'home-page' to be recognized as a
>> valid metadata field and passed down to the dist.metadata object.
> 
> I don't want to sound nitpicky, but it's the first time I see
> "home-page" hyphenized. How about "homepage"?

This value is defined in the accepted Metadata PEPs, which use home-page.

Regards

From phd at phdru.name  Tue Aug 30 17:30:01 2011
From: phd at phdru.name (Oleg Broytman)
Date: Tue, 30 Aug 2011 19:30:01 +0400
Subject: [Python-Dev] PyPI went down
Message-ID: <20110830153001.GA13312@iskra.aviel.ru>

Hello!

   I released the first package of two and PyPI went down while I was
preparing to release the second. I hope it wasn't me?

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From guido at python.org  Tue Aug 30 18:42:09 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Aug 2011 09:42:09 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
Message-ID: <CAP7+vJ+QEcs7JAPRs60NsfqvGdypw_aadSQ9uNQ--=KOyL9xMw@mail.gmail.com>

Stefan, have you shared a pointer to your code yet? Is it open source?
It sounds like people are definitely interested and it would make
sense to let them experiment with your code and review it.

-- 
--Guido van Rossum (python.org/~guido)

From martin at v.loewis.de  Tue Aug 30 18:46:53 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Aug 2011 18:46:53 +0200
Subject: [Python-Dev] PyPI went down
In-Reply-To: <20110830153001.GA13312@iskra.aviel.ru>
References: <20110830153001.GA13312@iskra.aviel.ru>
Message-ID: <4E5D13FD.1050107@v.loewis.de>

>    I released the first package of two and PyPI went down while I was
> preparing to release the second. I hope it wasn't me?

A few minutes ago, it was responding very slowly, and I found out that
Postgres consumes all time. I haven't put energy into investigating what
was causing this - apparently, somebody was throwing odd queries at it.
Restarting Apache reduced the load. If they continue to do so, I
investigate further.

Regards,
Martin

From martin at v.loewis.de  Tue Aug 30 18:49:15 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 30 Aug 2011 18:49:15 +0200
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <j3i8li$net$1@dough.gmane.org>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>	<4E5951D5.5020200@v.loewis.de>	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>	<20110828002642.4765fc89@pitrou.net>	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>	<20110828012705.523e51d4@pitrou.net>	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>	<j3chv2$tvf$1@dough.gmane.org>	<j3e137$q0g$1@dough.gmane.org>	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>	<4E5C01E4.2050106@canterbury.ac.nz>	<CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>	<4E5C7B48.5080402@canterbury.ac.nz>	<4E5CA35E.8000509@v.loewis.de>
	<j3i8li$net$1@dough.gmane.org>
Message-ID: <4E5D148B.1060606@v.loewis.de>

>> I can understand how that works when building a CPython extension.
>> But what about creating Jython/IronPython modules with Cython?
>> At what point get the header files considered there?
> 
> I had written a bit about this here:
> 
> http://thread.gmane.org/gmane.comp.python.devel/126340/focus=126419

I see. So there is potential for error there.

Regards,
Martin

From thomas at python.org  Tue Aug 30 18:55:47 2011
From: thomas at python.org (Thomas Wouters)
Date: Tue, 30 Aug 2011 18:55:47 +0200
Subject: [Python-Dev] PyPI went down
In-Reply-To: <4E5D13FD.1050107@v.loewis.de>
References: <20110830153001.GA13312@iskra.aviel.ru>
	<4E5D13FD.1050107@v.loewis.de>
Message-ID: <CAPdQG2q1gfF6-61zT+hParqhu0rs78Qya-tZ92aFiSEvdkE=Pw@mail.gmail.com>

On Tue, Aug 30, 2011 at 18:46, "Martin v. L?wis" <martin at v.loewis.de> wrote:

> >    I released the first package of two and PyPI went down while I was
> > preparing to release the second. I hope it wasn't me?
>
> A few minutes ago, it was responding very slowly, and I found out that
> Postgres consumes all time. I haven't put energy into investigating what
> was causing this - apparently, somebody was throwing odd queries at it.
> Restarting Apache reduced the load. If they continue to do so, I
> investigate further.
>

Looks like the issue keeps popping up. It was slow to respond earlier today,
and I keep getting complaints about it (including now.)

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110830/a830220a/attachment.html>

From guido at python.org  Tue Aug 30 19:05:12 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Aug 2011 10:05:12 -0700
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <4E5D148B.1060606@v.loewis.de>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
	<4E5C01E4.2050106@canterbury.ac.nz>
	<CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>
	<4E5C7B48.5080402@canterbury.ac.nz> <4E5CA35E.8000509@v.loewis.de>
	<j3i8li$net$1@dough.gmane.org> <4E5D148B.1060606@v.loewis.de>
Message-ID: <CAP7+vJJJHnqXh5Fk5B37NYA_n3JhLEo6B-etK0TZ-cFe543sog@mail.gmail.com>

On Tue, Aug 30, 2011 at 9:49 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>>> I can understand how that works when building a CPython extension.
>>> But what about creating Jython/IronPython modules with Cython?
>>> At what point get the header files considered there?
>>
>> I had written a bit about this here:
>>
>> http://thread.gmane.org/gmane.comp.python.devel/126340/focus=126419
>
> I see. So there is potential for error there.

To elaborate, with CPython it looks pretty solid, at least for
functions and constants (does it do structs?). You must manually
declare the name and signature of a function, and Pyrex/Cython emits C
code that includes the header and calls the function with the
appropriate types. If the signature you declare doesn't match what's
in the .h file you'll get a compiler error when the C code is
compiled. If (perhaps on some platforms) the function is really a
macro, the macro in the .h file will be invoked and the right thing
will happen. So far so good.

The problem lies with the PyPy backend -- there it generates ctypes
code, which means that the signature you declare to Cython/Pyrex must
match the *linker* level API, not the C compiler level API. Thus, if
in a system header a certain function is really a macro that invokes
another function with a permuted or augmented argument list, you'd
have to know what that macro does. I also don't see how this would
work for #defined constants: where does Cython/Pyrex get their value?
ctypes doesn't have their values.

So, for PyPy, a solution based on Cython/Pyrex has many of the same
downsides as one based on ctypes where it comes to complying with an
API defined by a .h file.

-- 
--Guido van Rossum (python.org/~guido)

From stephen at xemacs.org  Tue Aug 30 19:22:25 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 31 Aug 2011 02:22:25 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110829141440.2e2178c6@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
Message-ID: <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>

Antoine Pitrou writes:
 > On Mon, 29 Aug 2011 12:43:24 +0900
 > "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
 > > 
 > > Since when can s[0] represent a code point outside the BMP, for s a
 > > Unicode string in a narrow build?
 > > 
 > > Remember, the UCS-2/narrow vs. UCS-4/wide distinction is *not* about
 > > what Python supports vs. the outside world.  It's about what the str/
 > > unicode type is an array of.
 > 
 > Why would that be?

Because what the outside world sees is produced by codecs, not by
str.  The outside world can't see whether you have narrow or wide
unless it uses indexing ... ie, experiments to determine what the str
type is an array of.

The problem with a narrow build (whether for space efficiency in
CPython or for platform compatibility in Jython and IronPython) is not
that we have no UTF-16 codecs.  It's that array ops aren't UTF-16
conformant.

From martin at v.loewis.de  Tue Aug 30 19:17:35 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 30 Aug 2011 19:17:35 +0200
Subject: [Python-Dev] PyPI went down
In-Reply-To: <CAPdQG2q1gfF6-61zT+hParqhu0rs78Qya-tZ92aFiSEvdkE=Pw@mail.gmail.com>
References: <20110830153001.GA13312@iskra.aviel.ru>	<4E5D13FD.1050107@v.loewis.de>
	<CAPdQG2q1gfF6-61zT+hParqhu0rs78Qya-tZ92aFiSEvdkE=Pw@mail.gmail.com>
Message-ID: <4E5D1B2F.5060402@v.loewis.de>

> Looks like the issue keeps popping up. It was slow to respond earlier
> today, and I keep getting complaints about it (including now.)

Somebody is mirroring the site with wget. I have null-routed them.

Regards,
Martin

From solipsis at pitrou.net  Tue Aug 30 19:19:46 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 30 Aug 2011 19:19:46 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <1314724786.3554.1.camel@localhost.localdomain>


> The problem with a narrow build (whether for space efficiency in
> CPython or for platform compatibility in Jython and IronPython) is not
> that we have no UTF-16 codecs.  It's that array ops aren't UTF-16
> conformant.

Sorry, what is a conformant UTF-16 array op?

Thanks

Antoine.



From stefan at brunthaler.net  Tue Aug 30 19:23:34 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Tue, 30 Aug 2011 10:23:34 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7+vJ+QEcs7JAPRs60NsfqvGdypw_aadSQ9uNQ--=KOyL9xMw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<CAP7+vJ+QEcs7JAPRs60NsfqvGdypw_aadSQ9uNQ--=KOyL9xMw@mail.gmail.com>
Message-ID: <CA+j1x0myco36U4f3vO6KbRXpLLY_K+PQABNyLR2=o7zt6+u5Xg@mail.gmail.com>

On Tue, Aug 30, 2011 at 09:42, Guido van Rossum <guido at python.org> wrote:
> Stefan, have you shared a pointer to your code yet? Is it open source?
>
I have no shared code repository, but could create one (is there any
pydev preferred provider?). I have all the copyrights on the code, and
I would like to open-source it.

> It sounds like people are definitely interested and it would make
> sense to let them experiment with your code and review it.
>
That sounds fine. I need to do some clean up work (contains most of my
comments to remind me of issues) and currently does not pass all
regression tests. But if people want to take a look first to decide if
they want it than that's good enough for me. (I just wanted to know if
there is substantial interest so that it eventually pays off to find
and fix the remaining bugs)

--stefan

From solipsis at pitrou.net  Tue Aug 30 19:38:06 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 30 Aug 2011 19:38:06 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
Message-ID: <20110830193806.0d718a56@pitrou.net>

On Tue, 30 Aug 2011 08:27:13 -0700
stefan brunthaler <stefan at brunthaler.net> wrote:
> >> Changing the bytecode width wouldn't make the interpreter more complex.
> >
> > No, but I think Stefan is proposing to add a *second* byte code format,
> > in addition to the one that remains there. That would certainly be an
> > increase in complexity.
> >
> Yes, indeed I have a more straightforward instruction format to allow
> for more efficient decoding. Just going from bytecode size to
> word-code size without changing the instruction format is going to
> require 8 (or word-size) times more memory on a 64bit system.

Do you really need it to match a machine word? Or is, say, a 16-bit
format sufficient.

Regards

Antoine.

From stefan at brunthaler.net  Tue Aug 30 19:50:01 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Tue, 30 Aug 2011 10:50:01 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <20110830193806.0d718a56@pitrou.net>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
Message-ID: <CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>

> Do you really need it to match a machine word? Or is, say, a 16-bit
> format sufficient.
>
Hm, technically no, but practically it makes more sense, as (at least
for x86 architectures) having opargs and opcodes in half-words can be
efficiently expressed in assembly. On 64bit architectures, I could
also inline data object references that fit into the 32bit upper half.
It turns out that most constant objects fit nicely into this, and I
have used this for a special cache region (again below 2^32) for
global objects, too. So, technically it's not necessary, but
practically it makes a lot of sense. (Most of these things work on
32bit systems, too. For architectures with a smaller size, we can
adapt or disable the optimizations.)

Cheers,
--stefan

From guido at python.org  Tue Aug 30 20:12:13 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Aug 2011 11:12:13 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
Message-ID: <CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>

On Tue, Aug 30, 2011 at 10:50 AM, stefan brunthaler
<stefan at brunthaler.net> wrote:
>> Do you really need it to match a machine word? Or is, say, a 16-bit
>> format sufficient.
>>
> Hm, technically no, but practically it makes more sense, as (at least
> for x86 architectures) having opargs and opcodes in half-words can be
> efficiently expressed in assembly. On 64bit architectures, I could
> also inline data object references that fit into the 32bit upper half.
> It turns out that most constant objects fit nicely into this, and I
> have used this for a special cache region (again below 2^32) for
> global objects, too. So, technically it's not necessary, but
> practically it makes a lot of sense. (Most of these things work on
> 32bit systems, too. For architectures with a smaller size, we can
> adapt or disable the optimizations.)

Do I sense that the bytecode format is no longer platform-independent?
That will need a bit of discussion. I bet there are some things around
that depend on that.

-- 
--Guido van Rossum (python.org/~guido)

From tjreedy at udel.edu  Tue Aug 30 20:15:32 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 30 Aug 2011 14:15:32 -0400
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
	support in 3.3)
In-Reply-To: <CAP7+vJJJHnqXh5Fk5B37NYA_n3JhLEo6B-etK0TZ-cFe543sog@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
	<4E5C01E4.2050106@canterbury.ac.nz>
	<CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>
	<4E5C7B48.5080402@canterbury.ac.nz>
	<4E5CA35E.8000509@v.loewis.de> <j3i8li$net$1@dough.gmane.org>
	<4E5D148B.1060606@v.loewis.de>
	<CAP7+vJJJHnqXh5Fk5B37NYA_n3JhLEo6B-etK0TZ-cFe543sog@mail.gmail.com>
Message-ID: <j3j9de$9hg$1@dough.gmane.org>

On 8/30/2011 1:05 PM, Guido van Rossum wrote:

>> I see. So there is potential for error there.
>
> To elaborate, with CPython it looks pretty solid, at least for
> functions and constants (does it do structs?). You must manually
> declare the name and signature of a function, and Pyrex/Cython emits C
> code that includes the header and calls the function with the
> appropriate types. If the signature you declare doesn't match what's
> in the .h file you'll get a compiler error when the C code is
> compiled. If (perhaps on some platforms) the function is really a
> macro, the macro in the .h file will be invoked and the right thing
> will happen. So far so good.
>
> The problem lies with the PyPy backend -- there it generates ctypes
> code, which means that the signature you declare to Cython/Pyrex must
> match the *linker* level API, not the C compiler level API. Thus, if
> in a system header a certain function is really a macro that invokes
> another function with a permuted or augmented argument list, you'd
> have to know what that macro does. I also don't see how this would
> work for #defined constants: where does Cython/Pyrex get their value?
> ctypes doesn't have their values.
>
> So, for PyPy, a solution based on Cython/Pyrex has many of the same
> downsides as one based on ctypes where it comes to complying with an
> API defined by a .h file.

Thank you for this elaboration. My earlier comment that ctypes seems to 
be hard to use was based on observation of posts to python-list 
presenting failed attempts (which have included somehow getting function 
signatures wrong) and a sense that ctypes was somehow bypassing the 
public compiler API to make a more direct access via some private api. 
You have explained and named that as the 'linker API', so I understand 
much better now.

Nothing like 'linker API' or 'signature' appears in the ctypes doc. All 
I could find about discovering specific function calling conventions is 
"To find out the correct calling convention you have to look into the C 
header file or the documentation for the function you want to call." 
Perhaps that should be elaborated to explain, as you did above, the need 
to trace macro definitions to find the actual calling convention and the 
need to be aware that macro definitions can change to accommodate 
implementation detail changes even as the surface calling conventions 
seems to remain the same.

-- 
Terry Jan Reedy


From stefan at brunthaler.net  Tue Aug 30 20:23:56 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Tue, 30 Aug 2011 11:23:56 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
Message-ID: <CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>

> Do I sense that the bytecode format is no longer platform-independent?
> That will need a bit of discussion. I bet there are some things around
> that depend on that.
>
Hm, I haven't really thought about that in detail and for longer, I
ran it on PowerPC 970 and Intel Atom & i7 without problems (the latter
ones are a non-issue) and think that it can be portable. I just stuff
argument and opcode into one word for regular instruction decoding
like a RISC CPU, and I realize there might be little/big endian
issues, but they surely can be conditionally compiled...

--stefan

From guido at python.org  Tue Aug 30 20:27:56 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Aug 2011 11:27:56 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
Message-ID: <CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>

On Tue, Aug 30, 2011 at 11:23 AM, stefan brunthaler
<stefan at brunthaler.net> wrote:
>> Do I sense that the bytecode format is no longer platform-independent?
>> That will need a bit of discussion. I bet there are some things around
>> that depend on that.
>>
> Hm, I haven't really thought about that in detail and for longer, I
> ran it on PowerPC 970 and Intel Atom & i7 without problems (the latter
> ones are a non-issue) and think that it can be portable. I just stuff
> argument and opcode into one word for regular instruction decoding
> like a RISC CPU, and I realize there might be little/big endian
> issues, but they surely can be conditionally compiled...

Um, I'm sorry, but that reply sounds incredibly naive, like you're not
really sure what the on-disk format for .pyc files is or why it would
matter. You're not even answering the question, except indirectly --
it seems that you've never even thought about the possibility of
generating a .pyc file on one platform and copying it to a computer
using a different one.

-- 
--Guido van Rossum (python.org/~guido)

From stefan at brunthaler.net  Tue Aug 30 20:34:08 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Tue, 30 Aug 2011 11:34:08 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
Message-ID: <CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>

> Um, I'm sorry, but that reply sounds incredibly naive, like you're not
> really sure what the on-disk format for .pyc files is or why it would
> matter. You're not even answering the question, except indirectly --
> it seems that you've never even thought about the possibility of
> generating a .pyc file on one platform and copying it to a computer
> using a different one.
>
Well, it may sound incredibly naive, but the truth is: I am never
storing the optimized representation to disk, it's done purely at
runtime when profiling tells me it makes sense to make the switch.
Thus I circumvent many of the problems outlined by you. So I am
positive that a full fledged change of the representation has many
more intricacies to it, but my approach is only tangentially
related...

--stefan

From tjreedy at udel.edu  Tue Aug 30 20:41:05 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 30 Aug 2011 14:41:05 -0400
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0myco36U4f3vO6KbRXpLLY_K+PQABNyLR2=o7zt6+u5Xg@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<CAP7+vJ+QEcs7JAPRs60NsfqvGdypw_aadSQ9uNQ--=KOyL9xMw@mail.gmail.com>
	<CA+j1x0myco36U4f3vO6KbRXpLLY_K+PQABNyLR2=o7zt6+u5Xg@mail.gmail.com>
Message-ID: <j3jatc$ki5$1@dough.gmane.org>

On 8/30/2011 1:23 PM, stefan brunthaler wrote:
> (I just wanted to know if there is substantial interest so that
 >  it eventually pays off to find and fix the remaining bugs)

It is the nature of our development process that there usually can be no 
guarantee of acceptance of future code. The rather early acceptance of 
Unladen Swallow was to me something of an anomaly. I also think it was 
something of a mistake insofar as it discouraged other efforts, like yours.

I think the answer you have gotten is that there is a) substantial 
interest and b) a willingness to consider a major change such as 
switfing from bytecode to something else. There also seem to be two main 
concerns: 1) that the increase in complexity be 'less' than the increase 
in speed, and 2) that the changes be presented in small enough chunks 
that they can be reviewed.

Whether this is good enough for you to proceed is for you to decide.

-- 
Terry Jan Reedy


From guido at python.org  Tue Aug 30 20:43:35 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Aug 2011 11:43:35 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
Message-ID: <CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>

On Tue, Aug 30, 2011 at 11:34 AM, stefan brunthaler
<stefan at brunthaler.net> wrote:
>> Um, I'm sorry, but that reply sounds incredibly naive, like you're not
>> really sure what the on-disk format for .pyc files is or why it would
>> matter. You're not even answering the question, except indirectly --
>> it seems that you've never even thought about the possibility of
>> generating a .pyc file on one platform and copying it to a computer
>> using a different one.

> Well, it may sound incredibly naive, but the truth is: I am never
> storing the optimized representation to disk, it's done purely at
> runtime when profiling tells me it makes sense to make the switch.
> Thus I circumvent many of the problems outlined by you. So I am
> positive that a full fledged change of the representation has many
> more intricacies to it, but my approach is only tangentially
> related...

Ok, there there's something else you haven't told us. Are you saying
that the original (old) bytecode is still used (and hence written to
and read from .pyc files)?

-- 
--Guido van Rossum (python.org/~guido)

From g.brandl at gmx.net  Tue Aug 30 22:01:29 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 30 Aug 2011 22:01:29 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
Message-ID: <j3jfg9$kfl$1@dough.gmane.org>

Am 30.08.2011 20:34, schrieb stefan brunthaler:
>> Um, I'm sorry, but that reply sounds incredibly naive, like you're not
>> really sure what the on-disk format for .pyc files is or why it would
>> matter. You're not even answering the question, except indirectly --
>> it seems that you've never even thought about the possibility of
>> generating a .pyc file on one platform and copying it to a computer
>> using a different one.
>>
> Well, it may sound incredibly naive, but the truth is: I am never
> storing the optimized representation to disk, it's done purely at
> runtime when profiling tells me it makes sense to make the switch.
> Thus I circumvent many of the problems outlined by you. So I am
> positive that a full fledged change of the representation has many
> more intricacies to it, but my approach is only tangentially
> related...

You know, instead of all these half-explanations, giving us access to
the code would shut us up much more effectively.  Don't worry about not
passing tests, this is what the official trunk does half of the time ;)

Georg


From stefan_ml at behnel.de  Tue Aug 30 22:03:12 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 30 Aug 2011 22:03:12 +0200
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
	support in 3.3)
In-Reply-To: <CAP7+vJJJHnqXh5Fk5B37NYA_n3JhLEo6B-etK0TZ-cFe543sog@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<4E5951D5.5020200@v.loewis.de>	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>	<20110828002642.4765fc89@pitrou.net>	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>	<20110828012705.523e51d4@pitrou.net>	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>	<j3chv2$tvf$1@dough.gmane.org>
	<j3e137$q0g$1@dough.gmane.org>	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>	<4E5C01E4.2050106@canterbury.ac.nz>	<CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>	<4E5C7B48.5080402@canterbury.ac.nz>
	<4E5CA35E.8000509@v.loewis.de>	<j3i8li$net$1@dough.gmane.org>
	<4E5D148B.1060606@v.loewis.de>
	<CAP7+vJJJHnqXh5Fk5B37NYA_n3JhLEo6B-etK0TZ-cFe543sog@mail.gmail.com>
Message-ID: <j3jfm0$ndu$1@dough.gmane.org>

Guido van Rossum, 30.08.2011 19:05:
> On Tue, Aug 30, 2011 at 9:49 AM, "Martin v. L?wis" wrote:
>>>> I can understand how that works when building a CPython extension.
>>>> But what about creating Jython/IronPython modules with Cython?
>>>> At what point get the header files considered there?
>>>
>>> I had written a bit about this here:
>>>
>>> http://thread.gmane.org/gmane.comp.python.devel/126340/focus=126419
>>
>> I see. So there is potential for error there.
>
> To elaborate, with CPython it looks pretty solid, at least for
> functions and constants (does it do structs?).

Sure. They even coerce from Python dicts and accept keyword arguments in 
Cython.


> You must manually
> declare the name and signature of a function, and Pyrex/Cython emits C
> code that includes the header and calls the function with the
> appropriate types. If the signature you declare doesn't match what's
> in the .h file you'll get a compiler error when the C code is
> compiled. If (perhaps on some platforms) the function is really a
> macro, the macro in the .h file will be invoked and the right thing
> will happen. So far so good.

Right.


> The problem lies with the PyPy backend -- there it generates ctypes
> code, which means that the signature you declare to Cython/Pyrex must
> match the *linker* level API, not the C compiler level API. Thus, if
> in a system header a certain function is really a macro that invokes
> another function with a permuted or augmented argument list, you'd
> have to know what that macro does. I also don't see how this would
> work for #defined constants: where does Cython/Pyrex get their value?
> ctypes doesn't have their values.
>
> So, for PyPy, a solution based on Cython/Pyrex has many of the same
> downsides as one based on ctypes where it comes to complying with an
> API defined by a .h file.

Right again. The declarations that Cython uses describe the API at the C or 
C++ level. They do not describe the ABI. So the situation is the same as 
with ctypes, and the same solutions (or work-arounds) apply, such as 
generating additional glue code that calls macros or reads compile time 
constants, for example. That's the approach that the IronPython backend has 
taken. It's a lot more complex, but also a lot more versatile in the long run.

Stefan


From arigo at tunes.org  Tue Aug 30 22:02:09 2011
From: arigo at tunes.org (Armin Rigo)
Date: Tue, 30 Aug 2011 22:02:09 +0200
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <CAMSv6X2eYWVP_wgPjZtYFwRZAcDwF8a52g9Uvi1M8UkQVbqgAQ@mail.gmail.com>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
	<CAMSv6X3DRGuA+L-3yZ9Ozo_bj7QpTn+qfONrK2b=mCSJKA5jiQ@mail.gmail.com>
	<CAH_1eM2Pj_AWrs_aKjgPsRtgvupM=FC9d7QMa9wZTwkG0dFe5Q@mail.gmail.com>
	<CAMSv6X2eYWVP_wgPjZtYFwRZAcDwF8a52g9Uvi1M8UkQVbqgAQ@mail.gmail.com>
Message-ID: <CAMSv6X36-tROtxZhw0NPar7tXpJO+TbRkZvM2uFhFgouAxiBuA@mail.gmail.com>

Re-hi,

2011/8/29 Armin Rigo <arigo at tunes.org>:
>> The problem is that many locks are actually acquired implicitely.
>> For example, `print` to a buffered stream will acquire the fileobject's mutex.
>
> Indeed.
> (...)
> I suspect that I need to do a more thorough review of the stdlib (...)

I found a solution not involving any change in CPython, and updated
the patch.  The solution is to say that a "with atomic" block doesn't
completely prevent other threads from re-acquiring the GIL, but only
prevents them from proceeding to the following bytecode.  So if
another thread is currently suspended in a place that releases the GIL
for other reasons, then this other thread can still be switched to as
normal, and continue running until the end of the current bytecode.  I
think it's sane enough for the original purpose, and avoids most
deadlock cases.


A bient?t,

Armin.

From stefan at brunthaler.net  Tue Aug 30 22:41:01 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Tue, 30 Aug 2011 13:41:01 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
Message-ID: <CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>

> Ok, there there's something else you haven't told us. Are you saying
> that the original (old) bytecode is still used (and hence written to
> and read from .pyc files)?
>
Short answer: yes.
Long answer: I added an invocation counter to the code object and keep
interpreting in the usual Python interpreter until this counter
reaches a configurable threshold. When it reaches this threshold, I
create the new instruction format and interpret with this optimized
representation. All the macros look exactly the same in the source
code, they are just redefined to use the different instruction format.
I am at no point serializing this representation or the runtime
information gathered by me, as any subsequent invocation might have
different characteristics.

I will remove my development commentaries and create a private
repository at bitbucket for you* to take an early look like Georg (and
more or less Terry, too) suggested. Is that a good way for most of
you? (I would then give access to whomever wants to take a look.)

Best,
--stefan

*: not personally targeted at Guido (who is naturally very much
welcome to take a look, too) but addressed to python-dev in general.

From benjamin at python.org  Tue Aug 30 22:42:54 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Tue, 30 Aug 2011 16:42:54 -0400
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
Message-ID: <CAPZV6o_+rxAbuNKYV8baxE04bkOixkEfiKiUV1pdQHOCYBkBiQ@mail.gmail.com>

2011/8/30 stefan brunthaler <stefan at brunthaler.net>:
> I will remove my development commentaries and create a private
> repository at bitbucket for you* to take an early look like Georg (and
> more or less Terry, too) suggested. Is that a good way for most of
> you? (I would then give access to whomever wants to take a look.)

And what is wrong with a public one?



-- 
Regards,
Benjamin

From stefan at brunthaler.net  Tue Aug 30 22:51:44 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Tue, 30 Aug 2011 13:51:44 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAPZV6o_+rxAbuNKYV8baxE04bkOixkEfiKiUV1pdQHOCYBkBiQ@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<CAPZV6o_+rxAbuNKYV8baxE04bkOixkEfiKiUV1pdQHOCYBkBiQ@mail.gmail.com>
Message-ID: <CA+j1x0nBzU=hzKqPKXwVHRY4-mKG=JsUc0b6m_9gH4VX_MP1Gw@mail.gmail.com>

On Tue, Aug 30, 2011 at 13:42, Benjamin Peterson <benjamin at python.org> wrote:
> 2011/8/30 stefan brunthaler <stefan at brunthaler.net>:
>> I will remove my development commentaries and create a private
>> repository at bitbucket for you* to take an early look like Georg (and
>> more or less Terry, too) suggested. Is that a good way for most of
>> you? (I would then give access to whomever wants to take a look.)
>
> And what is wrong with a public one?
>
Well, since it does not fully pass all regression tests and is just
meant for people to take a first look to find out if it's interesting,
I think I might take it offline after you had a look. It seems to me
that that is easier to be done with a private repository, but in
general, I don't have a problem with a public one...

Regards,
--stefan

PS: If you want to, I can also just put a tarball on my home page and
post a link here. It's not that I would like to have control/influence
about who is allowed to look and who doesn't.

From benjamin at python.org  Tue Aug 30 22:54:07 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Tue, 30 Aug 2011 16:54:07 -0400
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0nBzU=hzKqPKXwVHRY4-mKG=JsUc0b6m_9gH4VX_MP1Gw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<CAPZV6o_+rxAbuNKYV8baxE04bkOixkEfiKiUV1pdQHOCYBkBiQ@mail.gmail.com>
	<CA+j1x0nBzU=hzKqPKXwVHRY4-mKG=JsUc0b6m_9gH4VX_MP1Gw@mail.gmail.com>
Message-ID: <CAPZV6o9LHaPpHv1i60xwZXLjDK7WZN7WmwbiHX6BvmYRkLt7gQ@mail.gmail.com>

2011/8/30 stefan brunthaler <stefan at brunthaler.net>:
> On Tue, Aug 30, 2011 at 13:42, Benjamin Peterson <benjamin at python.org> wrote:
>> 2011/8/30 stefan brunthaler <stefan at brunthaler.net>:
>>> I will remove my development commentaries and create a private
>>> repository at bitbucket for you* to take an early look like Georg (and
>>> more or less Terry, too) suggested. Is that a good way for most of
>>> you? (I would then give access to whomever wants to take a look.)
>>
>> And what is wrong with a public one?
>>
> Well, since it does not fully pass all regression tests and is just
> meant for people to take a first look to find out if it's interesting,
> I think I might take it offline after you had a look. It seems to me
> that that is easier to be done with a private repository, but in
> general, I don't have a problem with a public one...

Well, if your intention is for people to look at it, public seems to
be the best solution.



-- 
Regards,
Benjamin

From yselivanov.ml at gmail.com  Tue Aug 30 23:33:53 2011
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Tue, 30 Aug 2011 17:33:53 -0400
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <CAMSv6X36-tROtxZhw0NPar7tXpJO+TbRkZvM2uFhFgouAxiBuA@mail.gmail.com>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
	<CAMSv6X3DRGuA+L-3yZ9Ozo_bj7QpTn+qfONrK2b=mCSJKA5jiQ@mail.gmail.com>
	<CAH_1eM2Pj_AWrs_aKjgPsRtgvupM=FC9d7QMa9wZTwkG0dFe5Q@mail.gmail.com>
	<CAMSv6X2eYWVP_wgPjZtYFwRZAcDwF8a52g9Uvi1M8UkQVbqgAQ@mail.gmail.com>
	<CAMSv6X36-tROtxZhw0NPar7tXpJO+TbRkZvM2uFhFgouAxiBuA@mail.gmail.com>
Message-ID: <544C8633-8847-4018-875C-2FD093CCD885@gmail.com>

Maybe it'd be better to put 'atomic' in the threading module?

On 2011-08-30, at 4:02 PM, Armin Rigo wrote:

> Re-hi,
> 
> 2011/8/29 Armin Rigo <arigo at tunes.org>:
>>> The problem is that many locks are actually acquired implicitely.
>>> For example, `print` to a buffered stream will acquire the fileobject's mutex.
>> 
>> Indeed.
>> (...)
>> I suspect that I need to do a more thorough review of the stdlib (...)
> 
> I found a solution not involving any change in CPython, and updated
> the patch.  The solution is to say that a "with atomic" block doesn't
> completely prevent other threads from re-acquiring the GIL, but only
> prevents them from proceeding to the following bytecode.  So if
> another thread is currently suspended in a place that releases the GIL
> for other reasons, then this other thread can still be switched to as
> normal, and continue running until the end of the current bytecode.  I
> think it's sane enough for the original purpose, and avoids most
> deadlock cases.
> 
> 
> A bient?t,
> 
> Armin.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com


From greg at krypto.org  Tue Aug 30 23:47:28 2011
From: greg at krypto.org (Gregory P. Smith)
Date: Tue, 30 Aug 2011 14:47:28 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAPZV6o9LHaPpHv1i60xwZXLjDK7WZN7WmwbiHX6BvmYRkLt7gQ@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<CAPZV6o_+rxAbuNKYV8baxE04bkOixkEfiKiUV1pdQHOCYBkBiQ@mail.gmail.com>
	<CA+j1x0nBzU=hzKqPKXwVHRY4-mKG=JsUc0b6m_9gH4VX_MP1Gw@mail.gmail.com>
	<CAPZV6o9LHaPpHv1i60xwZXLjDK7WZN7WmwbiHX6BvmYRkLt7gQ@mail.gmail.com>
Message-ID: <CAGE7PNKE-zws69fhi4cMShSZVKS2vv0OHWL9+bHGmT0hsUAfWg@mail.gmail.com>

On Tue, Aug 30, 2011 at 1:54 PM, Benjamin Peterson <benjamin at python.org>wrote:

> 2011/8/30 stefan brunthaler <stefan at brunthaler.net>:
> > On Tue, Aug 30, 2011 at 13:42, Benjamin Peterson <benjamin at python.org>
> wrote:
> >> 2011/8/30 stefan brunthaler <stefan at brunthaler.net>:
> >>> I will remove my development commentaries and create a private
> >>> repository at bitbucket for you* to take an early look like Georg (and
> >>> more or less Terry, too) suggested. Is that a good way for most of
> >>> you? (I would then give access to whomever wants to take a look.)
> >>
> >> And what is wrong with a public one?
> >>
> > Well, since it does not fully pass all regression tests and is just
> > meant for people to take a first look to find out if it's interesting,
> > I think I might take it offline after you had a look. It seems to me
> > that that is easier to be done with a private repository, but in
> > general, I don't have a problem with a public one...
>
> Well, if your intention is for people to look at it, public seems to
> be the best solution.
>
>
+1

The point of open source is more eyeballs and the ability for anyone else to
pick up code and run in whatever direction they want (license permitting)
with it. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110830/292f73e6/attachment.html>

From jacek.pliszka at gmail.com  Wed Aug 31 00:35:37 2011
From: jacek.pliszka at gmail.com (Jacek Pliszka)
Date: Wed, 31 Aug 2011 00:35:37 +0200
Subject: [Python-Dev] Coding guidelines for os.walk filter
Message-ID: <CABg-ORvHmAGoWhLhDKKY9hiFgxWdwucW25jEHZRodsu+966w_g@mail.gmail.com>

Hi!

I would like to get some opinion on possible os.walk improvement.
For the sake of simplicity let's assume I would like to skip all .svn
and tmp directories.

Current solution looks like this:

for t in os.walk(somedir):
    t[1][:]=set(t[1])-{'.svn','tmp'}
    ... do something

This is a very clever hack but... it relies on internal implementation
of os.walk....

Alternative is adding os.walk parameter e.g. like this:

def walk(top, topdown=True, onerror=None, followlinks=False, walkfilter=None)
....
    if walkfilter is not None:
        dirs,nondirs=walkfilter(top,dirs,nondirs)
.....
and remove .svn and tmp in the walkfilter definition.

What I do not like here is that followlinks is redundant - easily
implementable through walkfilter


Simpler but braking backward-compatibility option would be:

def walk(top, topdown=True, onerror=None, skipdirs=islink)
...
-        if followlinks or not islink(new_path):
-            for x in walk(new_path, topdown, onerror, followlinks):
+        if not skipdirs(new_path):
+            for x in walk(new_path, topdown, onerror, skipdirs):

And user given skipdirs function should return true for new_path
ending in .svn or tmp

Nothing is redundant and works fine with topdown=False!

What do you think?  Shall we:
a) do nothing and use the implicit hack
b) make the option explicit with backward compatibility but with
redundancy and topdown=False incompatibility
c) make the option explicit braking backward compatibility but no redundancy

Best Regards,

Jacek Pliszka

From jnoller at gmail.com  Wed Aug 31 01:21:18 2011
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 30 Aug 2011 19:21:18 -0400
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CADiSq7fjBWXwJWUnO-B-pn=1v74GVmhU7-p4ac0GxiHtR78k5Q@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<4E5BF9F7.9020608@v.loewis.de>
	<CA+j1x0=T0wHary2800Yxju1GRcy5VkyxXtEatbY-AAXJrG1p4g@mail.gmail.com>
	<CAGE7PN+7G6VYM03q4x3x=Tug81oUePOAUgaN8xuUr-6XxBnE0A@mail.gmail.com>
	<CADiSq7eJc9b56WDvCfG=Eqgt=HN==WRVCsjuJ8Ez=+4-0nXKQw@mail.gmail.com>
	<20110830133829.099d7714@pitrou.net>
	<CADiSq7fjBWXwJWUnO-B-pn=1v74GVmhU7-p4ac0GxiHtR78k5Q@mail.gmail.com>
Message-ID: <F2F46469-9CFE-4AFF-8309-DAE3561144D7@gmail.com>



On Aug 30, 2011, at 9:05 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Tue, Aug 30, 2011 at 9:38 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> On Tue, 30 Aug 2011 13:29:59 +1000
>> Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> 
>>> Anecdotal, non-reproducible performance figures are *not* the way to
>>> go about serious optimisation efforts.
>> 
>> What about anecdotal *and* reproducible performance figures? :)
>> I may be half-joking, but we already have a set of py3k-compatible
>> benchmarks and, besides, sometimes a timeit invocation gives a good
>> idea of whether an approach is fruitful or not.
>> While a permanent public reference with historical tracking of
>> performance figures is even better, let's not freeze everything until
>> it's ready.
>> (for example, do we need to wait for speed.python.org before PEP 393 is
>> accepted?)
> 
> Yeah, I'd neglected the idea of just running perf.py for pre- and
> post-patch performance comparisons. You're right that that can
> generate sufficient info to make a well-informed decision.
> 
> I'd still really like it if some of the people advocating that we care
> about CPython performance actually volunteered to spearhead the effort
> to get speed.python.org up and running, though. As far as I know, the
> hardware's spinning idly waiting to be given work to do :P
> 
> Cheers,
> Nick.
> 

Discussion of speed.python.org should happen on the mailing list for that project if possible.

From ncoghlan at gmail.com  Wed Aug 31 01:26:53 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 31 Aug 2011 09:26:53 +1000
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0myco36U4f3vO6KbRXpLLY_K+PQABNyLR2=o7zt6+u5Xg@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<CAP7+vJ+QEcs7JAPRs60NsfqvGdypw_aadSQ9uNQ--=KOyL9xMw@mail.gmail.com>
	<CA+j1x0myco36U4f3vO6KbRXpLLY_K+PQABNyLR2=o7zt6+u5Xg@mail.gmail.com>
Message-ID: <CADiSq7cwJO_pCs=nqTchkbtXxFaCWiXJfVLtM0CtfMMP6RgQ-g@mail.gmail.com>

On Wed, Aug 31, 2011 at 3:23 AM, stefan brunthaler
<stefan at brunthaler.net> wrote:
> On Tue, Aug 30, 2011 at 09:42, Guido van Rossum <guido at python.org> wrote:
>> Stefan, have you shared a pointer to your code yet? Is it open source?
>>
> I have no shared code repository, but could create one (is there any
> pydev preferred provider?). I have all the copyrights on the code, and
> I would like to open-source it.

Currently, the easiest way to create shared repositories for CPython
variants is to start with bitbucket's mirror of the main CPython repo:
https://bitbucket.org/mirror/cpython/overview

Use the website to create your own public CPython fork, then edit the
configuration of your local copy of the CPython repo to point to the
your new bitbucket repo rather than the main one on hg.python.org. hg
push/pull can then be used as normal to publish in-development
material to the world. 'hg pull' from hg.python.org makes it fairly
easy to track the trunk.

One key thing is to avoid making any changes of your own on the
official CPython branches (i.e. default, 3.2, 2.7). Instead, use a
named branch for anything you're working on. This makes it much easier
to generate standalone patches later on.

My own public sandbox
(https://bitbucket.org/ncoghlan/cpython_sandbox/overview) is set up
that way, and you can see plenty of other examples on bitbucket.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Wed Aug 31 01:30:24 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 31 Aug 2011 09:30:24 +1000
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <F2F46469-9CFE-4AFF-8309-DAE3561144D7@gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<4E5BF9F7.9020608@v.loewis.de>
	<CA+j1x0=T0wHary2800Yxju1GRcy5VkyxXtEatbY-AAXJrG1p4g@mail.gmail.com>
	<CAGE7PN+7G6VYM03q4x3x=Tug81oUePOAUgaN8xuUr-6XxBnE0A@mail.gmail.com>
	<CADiSq7eJc9b56WDvCfG=Eqgt=HN==WRVCsjuJ8Ez=+4-0nXKQw@mail.gmail.com>
	<20110830133829.099d7714@pitrou.net>
	<CADiSq7fjBWXwJWUnO-B-pn=1v74GVmhU7-p4ac0GxiHtR78k5Q@mail.gmail.com>
	<F2F46469-9CFE-4AFF-8309-DAE3561144D7@gmail.com>
Message-ID: <CADiSq7fqVuQZQ9bOgrG4X8V+XBaZ8e=zfgZ5F31bWOUK5n83XQ@mail.gmail.com>

On Wed, Aug 31, 2011 at 9:21 AM, Jesse Noller <jnoller at gmail.com> wrote:
> Discussion of speed.python.org should happen on the mailing list for that project if possible.

Hah, that's how out of the loop I am on that front - I didn't even
know there *was* a mailing list for it :)

Subscribed!

Cheers,
Nick.

P.S. For anyone else that is interested:
http://mail.python.org/mailman/listinfo/speed

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From murman at gmail.com  Wed Aug 31 04:21:52 2011
From: murman at gmail.com (Michael Urman)
Date: Tue, 30 Aug 2011 21:21:52 -0500
Subject: [Python-Dev] Coding guidelines for os.walk filter
In-Reply-To: <CABg-ORvHmAGoWhLhDKKY9hiFgxWdwucW25jEHZRodsu+966w_g@mail.gmail.com>
References: <CABg-ORvHmAGoWhLhDKKY9hiFgxWdwucW25jEHZRodsu+966w_g@mail.gmail.com>
Message-ID: <CAOpBPYUWTTeoCNz_fBeipCqHBr-X2ibdH6OcrASSn6qGEKaMtw@mail.gmail.com>

> for t in os.walk(somedir):
> ? ?t[1][:]=set(t[1])-{'.svn','tmp'}
> ? ?... do something
>
> This is a very clever hack but... it relies on internal implementation
> of os.walk....

This doesn't appear to be an internal implementation detail; this is
documented behavior.
http://docs.python.org/dev/library/os.html#os.walk shows a similar example:

    for root, dirs, files in os.walk('python/Lib/email'):
        # ...
        dirs.remove('CVS')  # don't visit CVS directories

-- 
Michael Urman

From stephen at xemacs.org  Wed Aug 31 04:55:09 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 31 Aug 2011 11:55:09 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <1314724786.3554.1.camel@localhost.localdomain>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
Message-ID: <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>

Antoine Pitrou writes:

 > Sorry, what is a conformant UTF-16 array op?

For starters, one that doesn't ever return lone surrogates, but rather
interprets surrogate pairs as Unicode code points as in UTF-16.  (This
is not a Unicode standard definition, it's intended to be suggestive
of why many app writers will be distressed if they must use Python
unicode/str in a narrow build without a fairly comprehensive library
that wraps the arrays in operations that treat unicode/str as an array
of code points.)

From tjreedy at udel.edu  Wed Aug 31 05:22:56 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 30 Aug 2011 23:22:56 -0400
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
Message-ID: <j3k9fr$ida$1@dough.gmane.org>

On 8/30/2011 2:12 PM, Guido van Rossum wrote:
> On Tue, Aug 30, 2011 at 10:50 AM, stefan brunthaler
> <stefan at brunthaler.net>  wrote:
>>> Do you really need it to match a machine word? Or is, say, a 16-bit
>>> format sufficient.
>>>
>> Hm, technically no, but practically it makes more sense, as (at least
>> for x86 architectures) having opargs and opcodes in half-words can be
>> efficiently expressed in assembly. On 64bit architectures, I could
>> also inline data object references that fit into the 32bit upper half.
>> It turns out that most constant objects fit nicely into this, and I
>> have used this for a special cache region (again below 2^32) for
>> global objects, too. So, technically it's not necessary, but
>> practically it makes a lot of sense. (Most of these things work on
>> 32bit systems, too. For architectures with a smaller size, we can
>> adapt or disable the optimizations.)
>
> Do I sense that the bytecode format is no longer platform-independent?
> That will need a bit of discussion. I bet there are some things around
> that depend on that.

I find myself more comfortable with the Cesare Di Mauro's idea of 
expanding to 16 bits as the code unit. His basic idea was using 2, 4, or 
6 bytes instead of 1, 3, or 6. It actually tended to save space because 
many ops with small ints (which are very common) contract from 3 bytes 
to 2 bytes or from 9(?) (two instructions) to 6. I am sorry he was not 
able to followup on the initial promising results. The dis output was 
probably easier to read than the current output.

Perhaps he made a mistake in combining the above idea with a shift from 
stack to hybrid stack+register design.

-- 
Terry Jan Reedy


From guido at python.org  Wed Aug 31 05:35:27 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Aug 2011 20:35:27 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>

On Tue, Aug 30, 2011 at 7:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Antoine Pitrou writes:
>
> ?> Sorry, what is a conformant UTF-16 array op?
>
> For starters, one that doesn't ever return lone surrogates, but rather
> interprets surrogate pairs as Unicode code points as in UTF-16. ?(This
> is not a Unicode standard definition, it's intended to be suggestive
> of why many app writers will be distressed if they must use Python
> unicode/str in a narrow build without a fairly comprehensive library
> that wraps the arrays in operations that treat unicode/str as an array
> of code points.)

That sounds like a contradiction -- it wouldn't be a UTF-16 array if
you couldn't tell that it was using UTF-16.

-- 
--Guido van Rossum (python.org/~guido)

From cesare.di.mauro at gmail.com  Wed Aug 31 07:04:35 2011
From: cesare.di.mauro at gmail.com (Cesare Di Mauro)
Date: Wed, 31 Aug 2011 07:04:35 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <20110830025510.638b41d9@pitrou.net>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net>
Message-ID: <CAP7v7k62ORPzPFcLOsgESw6Tjjc1A5yTOE+heZP71feC_OB3Hg@mail.gmail.com>

2011/8/30 Antoine Pitrou <solipsis at pitrou.net>


> Changing the bytecode width wouldn't make the interpreter more complex.


It depends on the kind of changes. :)

WPython introduced a very different "intermediate code" representation that
required a big change on the peepholer optimizer on 1.0 alpha version.
On 1.1 final I decided to completely move that code on ast.c (mostly for
constant-folding) and compiler.c (for the usual peepholer usage: seeking for
some "patterns" to substitute with better ones) because I found it simpler
and more convenient.

In the end, taking out some new optimizations that I've implemented "on the
road", the interpreter code is a bit more complex.

>
> Some years ago we were waiting for Unladen Swallow to improve itself
> and be ported to Python 3. Now it seems we are waiting for PyPy to be
> ported to Python 3. I'm not sure how "let's just wait" is a good
> trade-off if someone proposes interesting patches (which, of course,
> remains to be seen).
>
> Regards
>
> Antoine.
>
> It isn't, because motivation to do something new with CPython vanishes, at
least on some areas (virtual machine / ceval.c), even having some ideas to
experiment with. That's why in my last talk on EuroPython I decided to move
on other areas (Python objects).

Regards

Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/a99b7f7d/attachment-0001.html>

From cesare.di.mauro at gmail.com  Wed Aug 31 07:10:40 2011
From: cesare.di.mauro at gmail.com (Cesare Di Mauro)
Date: Wed, 31 Aug 2011 07:10:40 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CADiSq7e524Vtmnv2rE7ztS2K-SrgUT3eZg46=Rg4nyOH804-dQ@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<4E5C7BC8.6010302@canterbury.ac.nz>
	<CAF-Rda-6mo2mz20ssD=gq+aU8jPQH6yuFsk=hBUN7SC7=y55eQ@mail.gmail.com>
	<CADiSq7e524Vtmnv2rE7ztS2K-SrgUT3eZg46=Rg4nyOH804-dQ@mail.gmail.com>
Message-ID: <CAP7v7k4ea+cj_i4R6HsJqOHjKN-ExQjs-tyYXj+=GtNSQ8k6kQ@mail.gmail.com>

2011/8/30 Nick Coghlan <ncoghlan at gmail.com>

>
> Yeah, it's definitely a trade-off - the point I was trying to make is
> that there *is* a trade-off being made between complexity and speed.
>
> I think the computed-gotos stuff struck a nice balance - the macro-fu
> involved means that you can still understand what the main eval loop
> is *doing*, even if you don't know exactly what's hidden behind the
> target macros. Ditto for the older opcode prediction feature and the
> peephole optimiser - separation of concerns means that you can
> understand the overall flow of events without needing to understand
> every little detail.
>
> This is where the request to extract individual orthogonal changes and
> submit separate patches comes from - it makes it clear that the
> independent changes *can* be separated cleanly, and aren't a giant
> ball of incomprehensible mud. It's the difference between complex
> (lots of moving parts, that can each be understood on their own and
> are then composed into a meaningful whole) and complicated (massive
> patches that don't work at all if any one component is delayed)
>
> Eugene Toder's AST optimiser work that I still hope to get into 3.3
> will have to undergo a similar process - the current patch covers a
> bit too much ground and needs to be broken up into smaller steps
> before we can seriously consider pushing it into the core.
>
> Regards,
> Nick.
>
> Sometimes it cannot be done, because big changes produces big patches as
well.

I don't see a problem here if the code is well written (as "required" buy
the Python community :) and the developer is available to talk about his
work to clear some doubts.

Regards

Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/563c5e40/attachment.html>

From cesare.di.mauro at gmail.com  Wed Aug 31 07:14:08 2011
From: cesare.di.mauro at gmail.com (Cesare Di Mauro)
Date: Wed, 31 Aug 2011 07:14:08 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
Message-ID: <CAP7v7k7U9PFZ+sfsD0Z489-2Q=TbRe2kj357P3kBSXgzwWs+bg@mail.gmail.com>

2011/8/30 stefan brunthaler <stefan at brunthaler.net>

> Yes, indeed I have a more straightforward instruction format to allow
> for more efficient decoding. Just going from bytecode size to
> word-code size without changing the instruction format is going to
> require 8 (or word-size) times more memory on a 64bit system. From an
> optimization perspective, the irregular instruction format was the
> biggest problem, because checking for HAS_ARG is always on the fast
> path and mostly unpredictable. Hence, I chose to extend the
> instruction format to have word-size and use the additional space to
> have the upper half be used for the argument and the lower half for
> the actual opcode. Encoding is more efficient, and *not* more complex.
> Using profiling to indicate what code is hot, I don't waste too much
> memory on encoding this regular instruction format.
>
> Regards,
> --stefan
>
That seems exactly the WPython approach, albeit I used the new "wordcode" in
place of the old bytecode. Take a look at it. ;)

Regards

Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/f47910e0/attachment.html>

From cesare.di.mauro at gmail.com  Wed Aug 31 07:16:49 2011
From: cesare.di.mauro at gmail.com (Cesare Di Mauro)
Date: Wed, 31 Aug 2011 07:16:49 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
Message-ID: <CAP7v7k5Re9v7ZC5Ti9fj7UuoLJ+MsYP5VJOeRP-ZSRfuaaTthw@mail.gmail.com>

2011/8/30 stefan brunthaler <stefan at brunthaler.net>

> > Do I sense that the bytecode format is no longer platform-independent?
> > That will need a bit of discussion. I bet there are some things around
> > that depend on that.
> >
> Hm, I haven't really thought about that in detail and for longer, I
> ran it on PowerPC 970 and Intel Atom & i7 without problems (the latter
> ones are a non-issue) and think that it can be portable. I just stuff
> argument and opcode into one word for regular instruction decoding
> like a RISC CPU, and I realize there might be little/big endian
> issues, but they surely can be conditionally compiled...
>
> --stefan
>
I think that you must deal with big endianess because some RISC can't handle
at all data in little endian format.

In WPython I have wrote some macros which handle both endianess, but lacking
big endian machines I never had the opportunity to verify if something was
wrong.

Regards

Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/20bb9b25/attachment.html>

From cesare.di.mauro at gmail.com  Wed Aug 31 07:38:44 2011
From: cesare.di.mauro at gmail.com (Cesare Di Mauro)
Date: Wed, 31 Aug 2011 07:38:44 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <j3k9fr$ida$1@dough.gmane.org>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<j3k9fr$ida$1@dough.gmane.org>
Message-ID: <CAP7v7k6pWpSG+ifdfs9VxPniTxe82zV_9G-i2cF+UiW=PurURA@mail.gmail.com>

2011/8/31 Terry Reedy <tjreedy at udel.edu>

> I find myself more comfortable with the Cesare Di Mauro's idea of expanding
> to 16 bits as the code unit. His basic idea was using 2, 4, or 6 bytes
> instead of 1, 3, or 6.
>

It can be expanded to longer than 6 bytes opcodes, if needed. The format is
designed to be flexible enough to accommodate such changes without pains.


> It actually tended to save space because many ops with small ints (which
> are very common) contract from 3 bytes to 2 bytes or from 9(?) (two
> instructions) to 6.
>

It can pack up to 4 (old) opcodes into one wordcode (superinstruction).
Wordcodes are designed to favor instruction "grouping".


> I am sorry he was not able to followup on the initial promising results.
>

In a few words: lack of interest. Why spending (so much) time to a project
when you see that the community is oriented towards other directions
(Unladen Swallow at first, PyPy in the last period, given the substantial
drop of the former)?

Also, Guido seems to dislike what he finds as "hacks", and never showed
interest.

In WPython 1.1 I "rolled back" the "hack" that I introduced in PyObject
types (a couple of fields) in 1.0 alpha, to make the code more "polished"
(but with a sensible drop in the performance). But again, I saw no interest
on WPython, so I decided to put a stop at it, and blocking my initial idea
to  go for Python 3.


> The dis output was probably easier to read than the current output.
>
> Perhaps he made a mistake in combining the above idea with a shift from
> stack to hybrid stack+register design.
>
> --
> Terry Jan Reedy
>
> As I already said, wordcodes are designed to favor "grouping". So It was
quite natural to became an "hybrid" VM. Anyway, both space and performance
gained from this wordcodes "property". ;)

Regards

Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/b42e4164/attachment.html>

From stephen at xemacs.org  Wed Aug 31 08:03:52 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 31 Aug 2011 15:03:52 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
Message-ID: <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>

Guido van Rossum writes:
 > On Tue, Aug 30, 2011 at 7:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:

 > > For starters, one that doesn't ever return lone surrogates, but rather
 > > interprets surrogate pairs as Unicode code points as in UTF-16. ?(This
 > > is not a Unicode standard definition, it's intended to be suggestive
 > > of why many app writers will be distressed if they must use Python
 > > unicode/str in a narrow build without a fairly comprehensive library
 > > that wraps the arrays in operations that treat unicode/str as an array
 > > of code points.)
 > 
 > That sounds like a contradiction -- it wouldn't be a UTF-16 array if
 > you couldn't tell that it was using UTF-16.

Well, that's why I wrote "intended to be suggestive".  The Unicode
Standard does not specify at all what the internal representation of
characters may be, it only specifies what their external behavior must
be when two processes communicate.  (For "process" as used in the
standard, think "Python modules" here, since we are concerned with the
problems of folks who develop in Python.)  When observing the behavior
of a Unicode process, there are no UTF-16 arrays or UTF-8 arrays or
even UTF-32 arrays; only arrays of characters.

Thus, according to the rules of handling a UTF-16 stream, it is an
error to observe a lone surrogate or a surrogate pair that isn't a
high-low pair (Unicode 6.0, Ch. 3 "Conformance", requirements C1 and
C8-C10).  That's what I mean by "can't tell it's UTF-16".  And I
understand those requirements to mean that operations on UTF-16
streams should produce UTF-16 streams, or raise an error.  Without
that closure property for basic operations on str, I think it's a bad
idea to say that the representation of text in a str in a pre-PEP-393
"narrow" build is UTF-16.  For many users and app developers, it
creates expectations that are not fulfilled.

It's true that common usage is that an array of code units that
usually conforms to UTF-16 may be called "UTF-16" without the closure
properties.  I just disagree with that usage, because there are two
camps that interpret "UTF-16" differently.  One side says, "we have an
array representation in UTF-16 that can handle all Unicode code points
efficiently, and if you think you need more, think again", while the
other says "it's too painful to have to check every result for valid
UTF-16, and we need a UTF-16 type that supports the usual array
operations on *characters* via the usual operators; if you think
otherwise, think again."

Note that despite the (presumed) resolution of the UTF-16 issue for
CPython by PEP 393, at some point a very similar discussion will take
place over "characters" anyway, because users and app developers are
going to want a type that handles composition sequences and/or
grapheme clusters for them, as well as comparison that respects
canonical equivalence, even if it is inefficient compared to str.
That's why I insisted on use of "array of code points" to describe the
PEP 393 str type, rather than "array of characters".

From arigo at tunes.org  Wed Aug 31 08:43:45 2011
From: arigo at tunes.org (Armin Rigo)
Date: Wed, 31 Aug 2011 08:43:45 +0200
Subject: [Python-Dev] Software Transactional Memory for Python
In-Reply-To: <544C8633-8847-4018-875C-2FD093CCD885@gmail.com>
References: <CAMSv6X13rLCm8nu2Q8cj4PnQZETFP9h_rBtbRGMd2-1-0XOVAQ@mail.gmail.com>
	<CADiSq7d9TpL=3xqixYbktPj3kzJnRb_tb69Twj3mz-5iua1Saw@mail.gmail.com>
	<CAMSv6X0THuC-aXcPbB6wRdnE+14eiP7gXVjUg4K5zjqqdEGXQA@mail.gmail.com>
	<CAMSv6X3DRGuA+L-3yZ9Ozo_bj7QpTn+qfONrK2b=mCSJKA5jiQ@mail.gmail.com>
	<CAH_1eM2Pj_AWrs_aKjgPsRtgvupM=FC9d7QMa9wZTwkG0dFe5Q@mail.gmail.com>
	<CAMSv6X2eYWVP_wgPjZtYFwRZAcDwF8a52g9Uvi1M8UkQVbqgAQ@mail.gmail.com>
	<CAMSv6X36-tROtxZhw0NPar7tXpJO+TbRkZvM2uFhFgouAxiBuA@mail.gmail.com>
	<544C8633-8847-4018-875C-2FD093CCD885@gmail.com>
Message-ID: <CAMSv6X0Kv3+EwrD+k0mkgMRcXWvkG=1fuENHAQ6yfs5ywCV2Zw@mail.gmail.com>

Hi,

On Tue, Aug 30, 2011 at 11:33 PM, Yury Selivanov
<yselivanov.ml at gmail.com> wrote:
> Maybe it'd be better to put 'atomic' in the threading module?

'threading' is pure Python.  But anyway the consensus is to not have
'atomic' at all in the stdlib, which means it is in its own 3rd-party
extension module.


Armin

From stefan_ml at behnel.de  Wed Aug 31 09:26:49 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 31 Aug 2011 09:26:49 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>	<20110829231420.20c3516a@pitrou.net>	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>	<20110830025510.638b41d9@pitrou.net>
	<4E5CA1F0.2070005@v.loewis.de>	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>	<20110830193806.0d718a56@pitrou.net>	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
Message-ID: <j3knnr$2v1$1@dough.gmane.org>

stefan brunthaler, 30.08.2011 22:41:
>> Ok, there there's something else you haven't told us. Are you saying
>> that the original (old) bytecode is still used (and hence written to
>> and read from .pyc files)?
>>
> Short answer: yes.
> Long answer: I added an invocation counter to the code object and keep
> interpreting in the usual Python interpreter until this counter
> reaches a configurable threshold. When it reaches this threshold, I
> create the new instruction format and interpret with this optimized
> representation. All the macros look exactly the same in the source
> code, they are just redefined to use the different instruction format.
> I am at no point serializing this representation or the runtime
> information gathered by me, as any subsequent invocation might have
> different characteristics.

So, basically, you built a JIT compiler but don't want to call it that, 
right? Just because it compiles byte code to other byte code rather than to 
native CPU instructions does not mean it doesn't compile Just In Time.

That actually sounds like a nice feature in general. It could even replace 
(or accompany?) the existing peep hole optimiser as part of a more general 
optimisation architecture, in the sense that it could apply byte code 
optimisations at runtime rather than compile time, potentially based on 
better knowledge about what's actually going on.


> I will remove my development commentaries and create a private
> repository at bitbucket

I agree with the others that it's best to open up your repository for 
everyone who is interested. I can see no reason why you would want to close 
it back down once it's there.

Stefan


From v+python at g.nevcal.com  Wed Aug 31 10:09:25 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 31 Aug 2011 01:09:25 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E5DEC35.4010404@g.nevcal.com>

On 8/30/2011 11:03 PM, Stephen J. Turnbull wrote:
> Guido van Rossum writes:
>   >  On Tue, Aug 30, 2011 at 7:55 PM, Stephen J. Turnbull<stephen at xemacs.org>  wrote:
>
>   >  >  For starters, one that doesn't ever return lone surrogates, but rather
>   >  >  interprets surrogate pairs as Unicode code points as in UTF-16.  (This
>   >  >  is not a Unicode standard definition, it's intended to be suggestive
>   >  >  of why many app writers will be distressed if they must use Python
>   >  >  unicode/str in a narrow build without a fairly comprehensive library
>   >  >  that wraps the arrays in operations that treat unicode/str as an array
>   >  >  of code points.)
>   >
>   >  That sounds like a contradiction -- it wouldn't be a UTF-16 array if
>   >  you couldn't tell that it was using UTF-16.
>
> Well, that's why I wrote "intended to be suggestive".  The Unicode
> Standard does not specify at all what the internal representation of
> characters may be, it only specifies what their external behavior must
> be when two processes communicate.  (For "process" as used in the
> standard, think "Python modules" here, since we are concerned with the
> problems of folks who develop in Python.)  When observing the behavior
> of a Unicode process, there are no UTF-16 arrays or UTF-8 arrays or
> even UTF-32 arrays; only arrays of characters.
>
> Thus, according to the rules of handling a UTF-16 stream, it is an
> error to observe a lone surrogate or a surrogate pair that isn't a
> high-low pair (Unicode 6.0, Ch. 3 "Conformance", requirements C1 and
> C8-C10).  That's what I mean by "can't tell it's UTF-16".  And I
> understand those requirements to mean that operations on UTF-16
> streams should produce UTF-16 streams, or raise an error.  Without
> that closure property for basic operations on str, I think it's a bad
> idea to say that the representation of text in a str in a pre-PEP-393
> "narrow" build is UTF-16.  For many users and app developers, it
> creates expectations that are not fulfilled.
>
> It's true that common usage is that an array of code units that
> usually conforms to UTF-16 may be called "UTF-16" without the closure
> properties.  I just disagree with that usage, because there are two
> camps that interpret "UTF-16" differently.  One side says, "we have an
> array representation in UTF-16 that can handle all Unicode code points
> efficiently, and if you think you need more, think again", while the
> other says "it's too painful to have to check every result for valid
> UTF-16, and we need a UTF-16 type that supports the usual array
> operations on *characters* via the usual operators; if you think
> otherwise, think again."
>
> Note that despite the (presumed) resolution of the UTF-16 issue for
> CPython by PEP 393, at some point a very similar discussion will take
> place over "characters" anyway, because users and app developers are
> going to want a type that handles composition sequences and/or
> grapheme clusters for them, as well as comparison that respects
> canonical equivalence, even if it is inefficient compared to str.
> That's why I insisted on use of "array of code points" to describe the
> PEP 393 str type, rather than "array of characters".

On topic:

So from reading all this discussion, I think this point is rather a key 
one... and it has been made repeatedly in different ways:  Arrays are 
not suitable for manipulating Unicode character sequences, and the str 
type is an array with a veneer of text manipulation operations, which do 
not, and cannot, by themselves, efficiently implement Unicode character 
sequences.

Python wants to, should, and can implement UTF-16 streams, UTF-8 
streams, and UTF-32 streams.  It should, and can implement streams using 
other encodings as well, and also binary streams.

Python wants to, should, and can implement 8-bit, 16-bit, 32-bit, and 
64-bit arrays.  These are efficient to access, index, and slice.

Python implements a veneer on some 8-bit, 16-bit, and 32-bit arrays 
called str (this will be more true post-PEP 393, although it is true 
with caveats presently), which interpret array elements as code units 
(currently) or codepoints (post-PEP), and implements operations that are 
interesting for text processing, with caveats.

There is presently no support for arrays of Unicode grapheme clusters or 
composed characters.  The Python type called str may or may not be 
properly documented (to the extent that there is confusion between the 
actual contents of the elements of the type, and the concept of 
character as defined by Unicode).  From comments Guido has made, he is 
not interested in changing the efficiency or access methods of the str 
type to raise the level of support of Unicode to the composed character, 
or grapheme cluster concepts.  The str type itself can presently be used 
to process other character encodings: if they are fixed width < 32-bit 
elements those encodings might be considered Unicode encodings, but 
there is no requirement that they are, and some operations on str may 
operate with knowledge of some Unicode semantics, so there are caveats.

So it seems that any semantics in support of composed characters, 
grapheme clusters, or codepoints-stored-as-<32-bit-code-units, must be 
created as either an add-on Python package (in Python) or C extension, 
or a combination.  It could be based on extensions to the existing str 
type, or it could be based on the array type, or it could based on the 
bytes type.  It could use an internal format of 32-bit codepoints, PEP 
393 variable-size codepoints, or 8- or 16-bit codeunits.

In addition to the expected stream operations, character length, 
indexing, and slicing operations, additional more complex operations 
would be expected on Unicode string values: regular expressions, 
comparisons, collations, case-shifting, and perhaps more.  RTL and LTR 
awareness would add complexity to all operations, or at least variants 
of all operations.

The questions are:

1) Is anyone interested in writing a PEP for such a thing?
2) Is anyone interested in writing an implementation for such a thing?
3) How many conflicting opinions and arguments will be spawned, making 
the poor person or persons above lose interest?

Brainstorming ideas (which may wander off-topic in some regards, but 
were all inspired by this discussion):

BI-0: Tom's analysis makes me think that the UTF-8 encoding, since it is 
smallest on the average language, and an implementation based on a 
foundation type of bytes or 'B' arrays, plus some side indexes of some 
sort, could be an appropriate starting point.  UTF-8 is variable length, 
but so are composed characters and grapheme clusters. Building an array, 
each of whose units could hold the largest grapheme cluster would seem 
extremely inefficient, just like 32-bit Unicode is extremely inefficient 
for dealing with ASCII, so variable length units seem to be an 
imperative part of a solution.  At least until one thinks up BI-2.

BI-1: Perhaps a 32-bit base, with the upper 11 bits used to cache 
character characteristics from various character attribute database 
lookups could be an effective alternative, but wouldn't eliminate the 
need for dealing with variable length units for length, indexing, and 
slicing operations.

BI-2: Maybe a 32-bit base would be useful so that one high bit could be 
used to flag that this character position actually holds an index to a 
multi-codepoint character, and the index would then hold the actual 
codes for that character.  This would allow for at most 2^31 (and memory 
limited) different multi-codepoint characters in a string (or perhaps 
per application, if the multi-codepoint characters are shared between 
strings), but would suddenly allow array indexing of grapheme clusters 
and composed characters... with double-indexing required for 
multi-codepoint character access. [This idea seems similar to one that 
was mentioned elsewhere in this thread, suggesting that private use 
characters could be used to represent multi-codepoint characters, but 
(a) doesn't infringe on private uses, and (b) allows for a greater 
number of multi-codepoint characters to be used.]

BI-3: both BI-1 and BI-2 would also allow themselves to be built on top 
of PEP 393 str... allowing multi-codepoint-character-supporting 
applications to benefit from the space efficiencies of PEP 393 when no 
multi-codepoint characters are fed into the application.

BI-4: Since Unicode has 21-bit codepoints, one wonders if 24-bit array 
elements might be appropriate, rather than 32-bit.  BI-2 could still 
operate, with a theoretical reduction to 2^23 possible multi-codepoint 
characters in an application.  Access would be less efficient, but still 
O(1), and 25% of the space would be saved.  This idea could be applied 
to PEP 393 independently of multi-codepoint character support.

BI-5: I'm pretty sure there are inappropriate or illegal sequences of 
combining characters that should not stand alone.  One example of this 
is lone surrogates.  Such characters out of an appropriate sequence 
could be flagged with a high-bit so that they could be quickly 
recognized as illegal Unicode, but codecs could be provided to allow 
them to round-trip, and applications could recognize immediately that 
they should be handled as "binary gibberish" in an otherwise Unicode 
stream. This idea could be applied to PEP 393 independently of 
additional multi-codepoint character support.

BI-6: Maybe another high bit could be used with a different codec error 
handler instead of using lone surrogates when decoding 
not-quite-conformant byte streams (such as OS filenames).  Sad we didn't 
think of this one before doing all the lone surrogate stuff.  Of course, 
this solution wouldn't work on narrow builds, because not even 
surrogates can represent high bits above Unicode codepoints!  But once 
we have PEP 393, we _could_ replace inappropriate use of lone 
surrogates, with use of out-of-the-Unicode-codepoint range integers, 
without introducing ambiguity in the interpretation of lone surrogates. 
This idea could be applied to PEP 393 independently of multi-codepoint 
character support.

Glenn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/7b5029ab/attachment-0001.html>

From stephen at xemacs.org  Wed Aug 31 14:21:38 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 31 Aug 2011 21:21:38 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5DEC35.4010404@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
Message-ID: <87vctdkbzh.fsf@uwakimon.sk.tsukuba.ac.jp>

Glenn Linderman writes:

 > From comments Guido has made, he is not interested in changing the
 > efficiency or access methods of the str type to raise the level of
 > support of Unicode to the composed character, or grapheme cluster
 > concepts.

IMO, that would be a bad idea, as higher-level Unicode support should
either be a wrapper around full implementations such as ICU (or
platform support in .NET or Java), or written in pure Python at first.
Thus there is a need for an efficient array of code units type.  PEP
393 allows this to go to the level of code points, but evidently that
is inappropriate for Jython and IronPython.

 > The str type itself can presently be used to process other
 > character encodings:

Not really.  Remember, on input codecs always decode to Unicode and on
output they always encode from Unicode.  How do you propose to get
other encodings into the array of code units?

 > [A "true Unicode" type] could be based on extensions to the
 > existing str type, or it could be based on the array type, or it
 > could based on the bytes type.  It could use an internal format of
 > 32-bit codepoints, PEP 393 variable-size codepoints, or 8- or
 > 16-bit codeunits.

In theory yes, but in practice all of the string methods and libraries
like re operate on str (and often but not always bytes; in particular,
codecs always decode from byte and encode to bytes).

Why bother with anything except arrays of code points at the start?
PEP 393 makes that time-efficient and reasonably space-efficient as a
starting point and allows starting with re or MRAB's regex to get
basic RE functionality or good UTS #18 functionality respectively.
Plus str already has all the usual string operations (.startswith(),
.join(), etc), and we have modules for dealing with the Unicode
Character Database.  Why waste effort reintegrating with all that,
until we have common use cases that need more efficient representation?

There would be some issue in coming up with an appropriate UTF-16 to
code point API for Jython and IronPython, but Terry Reedy has a rather
efficient library for that already.

So this discussion of alternative representations, including use of
high bits to represent properties, is premature optimization
... especially since we don't even have a proto-PEP specifying how
much conformance we want of this new "true Unicode" type in the first
place.

We need to focus on that before optimizing anything.

From stefan at brunthaler.net  Wed Aug 31 18:54:41 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Wed, 31 Aug 2011 09:54:41 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7v7k5Re9v7ZC5Ti9fj7UuoLJ+MsYP5VJOeRP-ZSRfuaaTthw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7v7k5Re9v7ZC5Ti9fj7UuoLJ+MsYP5VJOeRP-ZSRfuaaTthw@mail.gmail.com>
Message-ID: <CA+j1x0mjz8ONKJgWs3T2bL88kmHN=ja617WNUNVimvKeCe=agQ@mail.gmail.com>

> I think that you must deal with big endianess because some RISC can't handle
> at all data in little endian format.
>
> In WPython I have wrote some macros which handle both endianess, but lacking
> big endian machines I never had the opportunity to verify if something was
> wrong.
>
I am sorry for the temporal lapse of not getting back to this directly
yesterday, we were just heading out for lunch and I figured it only
out then but immediately forgot it on our way back to the lab...

So, as I have already said, I evaluated my optimizations on x86
(little-endian) and PowerPC 970 (big-endian) and I did not have to
change any of my instruction decoding during interpretation. (The only
nasty bug I still remember vividly was that while on gcc for x86 the
data type char defaults to signed, whereas it defaults to unsigned on
PowerPC's gcc.) When I have time and access to a PowerPC machine again
(an ARM might be interesting, too), I will take a look at the
generated assembly code to figure out why this is working. (I have
some ideas why it might work without changing the code.)

If I run into any problems, I'll gladly contact you :)

BTW: AFAIR, we emailed last year regarding wpython and IIRC your
optimizations could primarily be summarized as clever
superinstructions. I have not implemented anything in that area at all
(and have in fact not even touched the compiler and its peephole
optimizer), but if parts my implementation gets in, I am sure that you
could add some of your work on top of that, too.

Cheers,
--stefan

From stefan at brunthaler.net  Wed Aug 31 19:08:12 2011
From: stefan at brunthaler.net (stefan brunthaler)
Date: Wed, 31 Aug 2011 10:08:12 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <j3knnr$2v1$1@dough.gmane.org>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<j3knnr$2v1$1@dough.gmane.org>
Message-ID: <CA+j1x0kx0cfmXK1Rw-gOKhua9OwOuZ-FQMz9=GLZCziTMx=5FQ@mail.gmail.com>

> So, basically, you built a JIT compiler but don't want to call it that,
> right? Just because it compiles byte code to other byte code rather than to
> native CPU instructions does not mean it doesn't compile Just In Time.
>
For me, a definition of a JIT compiler or any dynamic compilation
subsystem entails that native machine code is generated at run-time.
Furthermore, I am not compiling from bytecode to bytecode, but rather
changing the instruction encoding underneath and use subsequently use
quickening to optimize interpretation. But, OTOH, I am not aware of a
canonical definition of JIT compilation, so it depends ;)


> I agree with the others that it's best to open up your repository for
> everyone who is interested. I can see no reason why you would want to close
> it back down once it's there.
>
Well, my code has primarily been a vehicle for my research in that
area and thus is not immediately suited to adoption (it does not
adhere to Python C coding standards, contains lots of private comments
about various facts, debugging hints, etc.). The explanation for this
is easy: When I started out on my research it was far from clear that
it would be successful and really that much faster. So, I would like
to clean up the comments and some parts of the code and publish the
code I have without any of the clean-up work for naming conventions,
etc., so that you can all take a look and it is clear what it's all
about. After that we can then have a factual discussion about whether
it fits the bill for you, too, and if so, which changes (naming
conventions, extensive documentation, etc.) are necessary *before* any
adoption is reasonable for you, too.

That seems to be a good way to start off and get results and feedback
quickly, any ideas/complaints/comments/suggestions?

Best regards,
--stefan

PS: I am using Nick's suggested plan to incorporate my changes
directly to the most recent version, as mine is currently only running
on Python 3.1.

From guido at python.org  Wed Aug 31 19:10:16 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 31 Aug 2011 10:10:16 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>

On Tue, Aug 30, 2011 at 11:03 PM, Stephen J. Turnbull
<stephen at xemacs.org> wrote:
[me]
> ?> That sounds like a contradiction -- it wouldn't be a UTF-16 array if
> ?> you couldn't tell that it was using UTF-16.
>
> Well, that's why I wrote "intended to be suggestive". ?The Unicode
> Standard does not specify at all what the internal representation of
> characters may be, it only specifies what their external behavior must
> be when two processes communicate. ?(For "process" as used in the
> standard, think "Python modules" here, since we are concerned with the
> problems of folks who develop in Python.) ?When observing the behavior
> of a Unicode process, there are no UTF-16 arrays or UTF-8 arrays or
> even UTF-32 arrays; only arrays of characters.

Hm, that's not how I would read "process". IMO that is an
intentionally vague term, and we are free to decide how to interpret
it. I don't think it will work very well to define a process as a
Python module; what about Python modules that agree about passing
along array of code units (or streams of UTF-8, for that matter)?

This is why I find the issue of Python, the language (and stdlib), as
a whole "conforming to the Unicode standard" such a troublesome
concept -- I think it is something that an application may claim, but
the language should make much more modest claims, such as "the regular
expression syntax supports features X, Y and Z from the Unicode
recommendation XXX, or "the UTF-8 codec will never emit a sequence of
bytes that is invalid according Unicode specification YYY". (As long
as the Unicode references are also versioned or dated.)

I'm fine with saying "it is hard to write Unicode-conforming
application code for reason ZZZ" and proposing a fix (e.g. PEP 393
fixes a specific complaint about code units being inferior to code
points for most types of processing). I'm not fine with saying "the
string datatype should conform to the Unicode standard".

> Thus, according to the rules of handling a UTF-16 stream, it is an
> error to observe a lone surrogate or a surrogate pair that isn't a
> high-low pair (Unicode 6.0, Ch. 3 "Conformance", requirements C1 and
> C8-C10). ?That's what I mean by "can't tell it's UTF-16".

But if you can observe (valid) surrogate pairs it is still UTF-16.

> And I
> understand those requirements to mean that operations on UTF-16
> streams should produce UTF-16 streams, or raise an error. ?Without
> that closure property for basic operations on str, I think it's a bad
> idea to say that the representation of text in a str in a pre-PEP-393
> "narrow" build is UTF-16. ?For many users and app developers, it
> creates expectations that are not fulfilled.

Ok, I dig this, to some extent. However saying it is UCS-2 is equally
bad. I guess this is why Java and .NET just say their string types
contain arrays of "16-bit characters", with essentially no semantics
attached to the word "character" besides "16-bit unsigned integer".

At the same time I think it would be useful if certain string
operations like .lower() worked in such a way that *if* the input were
valid UTF-16, *then* the output would also be, while *if* the input
contained an invalid surrogate, the result would simply be something
that is no worse (in particular, those are all mapped to themselves).
We could even go further and have .lower() and friends look at
graphemes (multi-code-point characters) if the Unicode std has a
useful definition of e.g. lowercasing graphemes that differed from
lowercasing code points.

An analogy is actually found in .lower() on 8-bit strings in Python 2:
it assumes the string contains ASCII, and non-ASCII characters are
mapped to themselves. If your string contains Latin-1 or EBCDIC or
UTF-8 it will not do the right thing. But that doesn't mean strings
cannot contain those encodings, it just means that the .lower() method
is not useful if they do. (Why ASCII? Because that is the system
encoding in Python 2.)

> It's true that common usage is that an array of code units that
> usually conforms to UTF-16 may be called "UTF-16" without the closure
> properties. ?I just disagree with that usage, because there are two
> camps that interpret "UTF-16" differently. ?One side says, "we have an
> array representation in UTF-16 that can handle all Unicode code points
> efficiently, and if you think you need more, think again", while the
> other says "it's too painful to have to check every result for valid
> UTF-16, and we need a UTF-16 type that supports the usual array
> operations on *characters* via the usual operators; if you think
> otherwise, think again."

I think we should just document how it behaves and not get hung up on
what it is called. Mentioning UTF-16 is still useful because it
indicates that some operations may act properly on surrogate pairs.
(Also because of course character properties for BMP characters are
respected, etc.)

> Note that despite the (presumed) resolution of the UTF-16 issue for
> CPython by PEP 393, at some point a very similar discussion will take
> place over "characters" anyway, because users and app developers are
> going to want a type that handles composition sequences and/or
> grapheme clusters for them, as well as comparison that respects
> canonical equivalence, even if it is inefficient compared to str.
> That's why I insisted on use of "array of code points" to describe the
> PEP 393 str type, rather than "array of characters".

Let's call those things graphemes (Tom C's term, I quite like leaving
"character" ambiguous) -- they are sequences of multiple code points
that represent a single "visual squiggle" (the kind of thing that
you'd want to be swappable in vim with "xp" :-). I agree that APIs are
needed to manipulate (match, generate, validate, mutilate, etc.)
things at the grapheme level. I don't agree that this means a separate
data type is required. There are ever-larger units of information
encoded in text strings, with ever farther-reaching (and more vague)
requirements on valid sequences. Do you want to have a data type that
can represent (only valid) words in a language? Sentences? Novels?

I think that at this point in time the best we can do is claim that
Python (the language standard) uses either 16-bit code units or 21-bit
code points in its string datatype, and that, thanks to PEP 393,
CPython 3.3 and further will always use 21-bit code points (but Jython
and IronPython may forever use their platform's native 16-bit code
unit representing string type). And then we add APIs that can be used
everywhere to look for code points (even if the string contains code
points), graphemes, or larger constructs. I'd like those APIs to be
designed using a garbage-in-garbage-out principle, where if the input
conforms to some Unicode requirement, the output does too, but if the
input doesn't, the output does what makes most sense. Validation is
then limited to codecs, and optional calls.

If you index or slice a string, or create a string from chr() of a
surrogate or from some other value that the Unicode standard considers
an illegal code point, you better know what you are doing. I want
chr(i) to be valid for all values of i in range(2**21), so it can be
used to create a lone surrogate, or (on systems with 16-bit
"characters") a surrogate pair. And also ord(chr(i)) == i for all i in
range(2**21). I'm not sure about ord() on a 2-character string
containing a surrogate pair on systems where strings contain 21-bit
code points; I think it should be an error there, just as ord() on
other strings of length != 1. But on systems with 16-bit "characters",
ord() of strings of length 2 containing a valid surrogate pair should
work.

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Wed Aug 31 19:12:44 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 31 Aug 2011 10:12:44 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5DEC35.4010404@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
Message-ID: <CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>

On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman <v+python at g.nevcal.com> wrote:
> So from reading all this discussion, I think this point is rather a key
> one... and it has been made repeatedly in different ways:? Arrays are not
> suitable for manipulating Unicode character sequences, and the str type is
> an array with a veneer of text manipulation operations, which do not, and
> cannot, by themselves, efficiently implement Unicode character sequences.

I think this is too strong. The str type is indeed an array, and you
can build useful Unicode manipulation APIs on top of it. Just like
bytes are not UTF-8, but can be used to represent UTF-8 and a
fully-compliant UTF-8 codec can be implemented on top of it.

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Wed Aug 31 19:20:19 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 31 Aug 2011 10:20:19 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5DEC35.4010404@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
Message-ID: <CAP7+vJJtYZ8vspUimoMh1j6ye6SbfT-eT-YPMG3htU+2NPSNXA@mail.gmail.com>

On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman <v+python at g.nevcal.com> wrote:
> The str type itself can presently be used to process other
> character encodings: if they are fixed width < 32-bit elements those
> encodings might be considered Unicode encodings, but there is no requirement
> that they are, and some operations on str may operate with knowledge of some
> Unicode semantics, so there are caveats.

Actually, the str type in Python 3 and the unicode type in Python 2
are constrained everywhere to either 16-bit or 21-bit "characters".
(Except when writing C code, which can do any number of invalid things
so is the equivalent of assuming 1 == 0.) In particular, on a wide
build, there is no way to get a code point >= 2**21, and I don't want
PEP 393 to change this. So at best we can use these types to repesent
arrays of 21-bit unsigned ints. But I think it is more useful to think
of them as always representing "some form of Unicode", whether that is
UTF-16 (on narrow builds) or 21-bit code points or perhaps some
vaguely similar superset -- but for those code units/code points that
are representable *and* valid (either code points or code units)
according to the (supported version of) the Unicode standard, the
meaning of those code points/units matches that of the standard.

Note that this is different from the bytes type, where the meaning of
a byte is entirely determined by what it means in the programmer's
head.

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Wed Aug 31 19:28:57 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 31 Aug 2011 10:28:57 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7v7k62ORPzPFcLOsgESw6Tjjc1A5yTOE+heZP71feC_OB3Hg@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net>
	<CAP7v7k62ORPzPFcLOsgESw6Tjjc1A5yTOE+heZP71feC_OB3Hg@mail.gmail.com>
Message-ID: <CAP7+vJKN02MDGboQDHeeDL2Tusd+yOM5JBx+D1s3Nbkucu9K-w@mail.gmail.com>

On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro
<cesare.di.mauro at gmail.com> wrote:
> It isn't, because motivation to do something new with CPython vanishes, at
> least on some areas (virtual machine / ceval.c), even having some ideas to
> experiment with. That's why in my last talk on EuroPython I decided to move
> on other areas (Python objects).

Cesare, I'm really sorry that you became so disillusioned that you
abandoned wordcode. I agree that we were too optimistic about Unladen
Swallow. Also that the existence of PyPy and its PR machine (:-)
should not stop us from improving CPython.

I'm wondering if, with your experience in creating WPython, you could
review Stefan Brunthaler's code and approach (once he's put it up for
review) and possibly the two of you could even work on a joint
project?

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Wed Aug 31 19:31:13 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 31 Aug 2011 10:31:13 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0kx0cfmXK1Rw-gOKhua9OwOuZ-FQMz9=GLZCziTMx=5FQ@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<j3knnr$2v1$1@dough.gmane.org>
	<CA+j1x0kx0cfmXK1Rw-gOKhua9OwOuZ-FQMz9=GLZCziTMx=5FQ@mail.gmail.com>
Message-ID: <CAP7+vJJ6kK9HtVQTn2CJ1qZp8FZn0xzsbpMk8i7gw+nUDG4QaQ@mail.gmail.com>

On Wed, Aug 31, 2011 at 10:08 AM, stefan brunthaler
<stefan at brunthaler.net> wrote:
> Well, my code has primarily been a vehicle for my research in that
> area and thus is not immediately suited to adoption [...].

But if you want to be taken seriously as a researcher, you should
publish your code! Without publication of your *code* research in your
area cannot be reproduced by others, so it is not science. Please stop
being shy and open up what you have. The software engineering issues
can be dealt with separately!

-- 
--Guido van Rossum (python.org/~guido)

From v+python at g.nevcal.com  Wed Aug 31 20:51:28 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 31 Aug 2011 11:51:28 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
Message-ID: <4E5E82B0.4020302@g.nevcal.com>

On 8/31/2011 10:12 AM, Guido van Rossum wrote:
> On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman<v+python at g.nevcal.com>  wrote:
>> So from reading all this discussion, I think this point is rather a key
>> one... and it has been made repeatedly in different ways:  Arrays are not
>> suitable for manipulating Unicode character sequences, and the str type is
>> an array with a veneer of text manipulation operations, which do not, and
>> cannot, by themselves, efficiently implement Unicode character sequences.
> I think this is too strong. The str type is indeed an array, and you
> can build useful Unicode manipulation APIs on top of it. Just like
> bytes are not UTF-8, but can be used to represent UTF-8 and a
> fully-compliant UTF-8 codec can be implemented on top of it.
>

This statement is a logical conclusion of arguments presented in this 
thread.

1) Applications that wish to do grapheme access, wish to do it by 
grapheme array indexing, because that is the efficient way to do it.

2) As long as str is restricted to holding Unicode code units or code 
points, then it cannot support grapheme array indexing efficiently.

I  have not declared that useful Unicode manipulations APIs cannot be 
built on top of str, only that efficiency will suffer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/2c8c7171/attachment-0001.html>

From guido at python.org  Wed Aug 31 20:56:03 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 31 Aug 2011 11:56:03 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5E82B0.4020302@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
	<4E5E82B0.4020302@g.nevcal.com>
Message-ID: <CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>

On Wed, Aug 31, 2011 at 11:51 AM, Glenn Linderman <v+python at g.nevcal.com>wrote:

>  On 8/31/2011 10:12 AM, Guido van Rossum wrote:
>
> On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman <v+python at g.nevcal.com> <v+python at g.nevcal.com> wrote:
>
>  So from reading all this discussion, I think this point is rather a key
> one... and it has been made repeatedly in different ways:  Arrays are not
> suitable for manipulating Unicode character sequences, and the str type is
> an array with a veneer of text manipulation operations, which do not, and
> cannot, by themselves, efficiently implement Unicode character sequences.
>
>  I think this is too strong. The str type is indeed an array, and you
> can build useful Unicode manipulation APIs on top of it. Just like
> bytes are not UTF-8, but can be used to represent UTF-8 and a
> fully-compliant UTF-8 codec can be implemented on top of it.
>
>
>
> This statement is a logical conclusion of arguments presented in this
> thread.
>
> 1) Applications that wish to do grapheme access, wish to do it by grapheme
> array indexing, because that is the efficient way to do it.
>

I don't believe that should be taken as gospel. In Perl, they don't do array
indexing on strings at all, and use regex matching instead. An API that uses
some kind of cursor on a string might work fine in Python too (for grapheme
matching).

2) As long as str is restricted to holding Unicode code units or code
> points, then it cannot support grapheme array indexing efficiently.
>
> I  have not declared that useful Unicode manipulations APIs cannot be built
> on top of str, only that efficiency will suffer.
>

But you have not proven it.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/1ecbcee5/attachment.html>

From v+python at g.nevcal.com  Wed Aug 31 21:14:25 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 31 Aug 2011 12:14:25 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
	<4E5E82B0.4020302@g.nevcal.com>
	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>
Message-ID: <4E5E8811.90600@g.nevcal.com>

On 8/31/2011 11:56 AM, Guido van Rossum wrote:
> On Wed, Aug 31, 2011 at 11:51 AM, Glenn Linderman 
> <v+python at g.nevcal.com <mailto:v%2Bpython at g.nevcal.com>> wrote:
>
>     On 8/31/2011 10:12 AM, Guido van Rossum wrote:
>>     On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman<v+python at g.nevcal.com>  <mailto:v+python at g.nevcal.com>  wrote:
>>>     So from reading all this discussion, I think this point is rather a key
>>>     one... and it has been made repeatedly in different ways:  Arrays are not
>>>     suitable for manipulating Unicode character sequences, and the str type is
>>>     an array with a veneer of text manipulation operations, which do not, and
>>>     cannot, by themselves, efficiently implement Unicode character sequences.
>>     I think this is too strong. The str type is indeed an array, and you
>>     can build useful Unicode manipulation APIs on top of it. Just like
>>     bytes are not UTF-8, but can be used to represent UTF-8 and a
>>     fully-compliant UTF-8 codec can be implemented on top of it.
>>
>
>     This statement is a logical conclusion of arguments presented in
>     this thread.
>
>     1) Applications that wish to do grapheme access, wish to do it by
>     grapheme array indexing, because that is the efficient way to do it.
>
>
> I don't believe that should be taken as gospel. In Perl, they don't do 
> array indexing on strings at all, and use regex matching instead. An 
> API that uses some kind of cursor on a string might work fine in 
> Python too (for grapheme matching).

The last benchmark I saw, regexp in Perl is faster than regexp in 
Python; that was some years back, before regexp in Perl supported quite 
as much Unicode as it does now; not sure if someone has done a recent 
performance benchmarks; Tom's survey indicates that the functionality 
presently differs, so it is not clear if performance benchmarks are 
presently appropriate to attempt to measure Unicode operations in regexp 
between the two languages.

That said, regexp, or some sort of cursor on a string, might be a 
workable solution.  Will it have adequate performance?  Perhaps, at 
least for some applications.  Will it be as conceptually simple as 
indexing an array of graphemes?  No.  Will it ever reach the efficiency 
of indexing an array of graphemes? No.  Does that matter? Depends on the 
application.

>
>     2) As long as str is restricted to holding Unicode code units or
>     code points, then it cannot support grapheme array indexing
>     efficiently.
>
>     I  have not declared that useful Unicode manipulations APIs cannot
>     be built on top of str, only that efficiency will suffer.
>
>
> But you have not proven it.

Do you disagree that indexing an array is more efficient than 
manipulating strings with regex or binary trees?  I think not, because 
you are insistent that array indexing of str be preserved as O(1).  I 
agree that I have not proven it; it largely depends on whether or not 
indexing by grapheme cluster is a useful operation in applications.  Yet 
Stephen (I think) has commented that emacs performance goes down as soon 
as multi-byte characters are introduced into an edit buffer.  So I think 
he has proven that efficiency can suffer, in some 
implementations/applications.  Terry's O(k) implementation requires data 
beyond strings, and isn't O(1).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/af49e07a/attachment.html>

From v+python at g.nevcal.com  Wed Aug 31 21:14:52 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 31 Aug 2011 12:14:52 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJJtYZ8vspUimoMh1j6ye6SbfT-eT-YPMG3htU+2NPSNXA@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJJtYZ8vspUimoMh1j6ye6SbfT-eT-YPMG3htU+2NPSNXA@mail.gmail.com>
Message-ID: <4E5E882C.1050006@g.nevcal.com>

On 8/31/2011 10:20 AM, Guido van Rossum wrote:
> On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman<v+python at g.nevcal.com>  wrote:
>> The str type itself can presently be used to process other
>> character encodings: if they are fixed width<  32-bit elements those
>> encodings might be considered Unicode encodings, but there is no requirement
>> that they are, and some operations on str may operate with knowledge of some
>> Unicode semantics, so there are caveats.
> Actually, the str type in Python 3 and the unicode type in Python 2
> are constrained everywhere to either 16-bit or 21-bit "characters".
> (Except when writing C code, which can do any number of invalid things
> so is the equivalent of assuming 1 == 0.) In particular, on a wide
> build, there is no way to get a code point>= 2**21, and I don't want
> PEP 393 to change this. So at best we can use these types to repesent
> arrays of 21-bit unsigned ints. But I think it is more useful to think
> of them as always representing "some form of Unicode", whether that is
> UTF-16 (on narrow builds) or 21-bit code points or perhaps some
> vaguely similar superset -- but for those code units/code points that
> are representable *and* valid (either code points or code units)
> according to the (supported version of) the Unicode standard, the
> meaning of those code points/units matches that of the standard.
>
> Note that this is different from the bytes type, where the meaning of
> a byte is entirely determined by what it means in the programmer's
> head.
>

Sorry, my Perl background is leaking through.  I didn't double check 
that str constrains the values of each element to range 0x110000 but I 
see now by testing that it does.  For some of my ideas, then, either a 
subtype of str would have to be able to relax that constraint, or str 
would not be the appropriate base type to use (but there are other base 
types that could be used, so this is not a serious issue for the ideas).

I have no problem with thinking of str as representing "some form of 
Unicode".  None of my proposals change that, although they may change 
other things, and may invent new forms of Unicode representations. You 
have stated that it is better to document what str actually does, rather 
than attempt to adhere slavishly to Unicode standard concepts.  The 
Unicode Consortium may well define legal, conforming bytestreams for 
communicating processes, but languages and applications are free to use 
other representations internally.  We can either artificially constrain 
ourselves to minor tweaks of the legal conforming bytestreams, or we can 
invent a representation (whether called str or something else) that is 
useful and efficient in practice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/2e6d8256/attachment.html>

From v+python at g.nevcal.com  Wed Aug 31 21:15:12 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 31 Aug 2011 12:15:12 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87vctdkbzh.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<87vctdkbzh.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E5E8840.4080600@g.nevcal.com>

On 8/31/2011 5:21 AM, Stephen J. Turnbull wrote:
> Glenn Linderman writes:
>
>   >   From comments Guido has made, he is not interested in changing the
>   >  efficiency or access methods of the str type to raise the level of
>   >  support of Unicode to the composed character, or grapheme cluster
>   >  concepts.
>
> IMO, that would be a bad idea,

OK you agree with Guido.

> as higher-level Unicode support should
> either be a wrapper around full implementations such as ICU (or
> platform support in .NET or Java), or written in pure Python at first.
> Thus there is a need for an efficient array of code units type.  PEP
> 393 allows this to go to the level of code points, but evidently that
> is inappropriate for Jython and IronPython.
>
>   >  The str type itself can presently be used to process other
>   >  character encodings:
>
> Not really.  Remember, on input codecs always decode to Unicode and on
> output they always encode from Unicode.  How do you propose to get
> other encodings into the array of code units?

Here are two ways, there may be more: custom codecs, direct assignment

>   >  [A "true Unicode" type] could be based on extensions to the
>   >  existing str type, or it could be based on the array type, or it
>   >  could based on the bytes type.  It could use an internal format of
>   >  32-bit codepoints, PEP 393 variable-size codepoints, or 8- or
>   >  16-bit codeunits.
>
> In theory yes, but in practice all of the string methods and libraries
> like re operate on str (and often but not always bytes; in particular,
> codecs always decode from byte and encode to bytes).
>
> Why bother with anything except arrays of code points at the start?
> PEP 393 makes that time-efficient and reasonably space-efficient as a
> starting point and allows starting with re or MRAB's regex to get
> basic RE functionality or good UTS #18 functionality respectively.
> Plus str already has all the usual string operations (.startswith(),
> .join(), etc), and we have modules for dealing with the Unicode
> Character Database.  Why waste effort reintegrating with all that,
> until we have common use cases that need more efficient representation?

String methods could be reimplemented on any appropriate type, of 
course.  Rejecting alternatives too soon might make one miss the best 
design.

> There would be some issue in coming up with an appropriate UTF-16 to
> code point API for Jython and IronPython, but Terry Reedy has a rather
> efficient library for that already.

Yes, Terry's implementation is interesting, and inspiring, and that 
concept could be extended to a variety of interesting techniques: 
codepoint access of code unit representations, and multi-codepoint 
character access on top of either code unit or codepoint representations.

> So this discussion of alternative representations, including use of
> high bits to represent properties, is premature optimization
> ... especially since we don't even have a proto-PEP specifying how
> much conformance we want of this new "true Unicode" type in the first
> place.
>
> We need to focus on that before optimizing anything.

You may call it premature optimization if you like, or you can ignore 
the concepts and emails altogether.  I call it brainstorming for ideas, 
looking for non-obvious solutions to the problem of representation of 
Unicode.

I found your discussion of streams versus arrays, as separate concepts 
related to Unicode, along with Terry's bisect indexing implementation, 
to rather inspiring.  Just because Unicode defines streams of codeunits 
of various sizes (UTF-8, UTF-16, UTF-32) to represent characters when 
processes communicate and for storage (which is one way processes 
communicate), that doesn't imply that the internal representation of 
character strings in a programming language must use exactly that 
representation.  While there are efficiencies in using the same 
representation as is used by the communications streams, there are also 
inefficiencies.  I'm unaware of any current Python implementation that 
has chosen to use UTF-8 as the internal representation of character 
strings (I'm also aware Perl has made that choice), yet UTF-8 is one of 
the commonly recommend character representations on the Linux platform, 
from what I read.  So in that sense, Python has rejected the idea of 
using the "native" or "OS configured" representation as its internal 
representation.  So why, then, must one choose from a repertoire of 
Unicode-defined stream representations if they don't meet the goal of 
efficient length, indexing, or slicing operations on actual characters?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/3f5a0d0d/attachment.html>

From v+python at g.nevcal.com  Wed Aug 31 22:04:01 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 31 Aug 2011 13:04:01 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
Message-ID: <4E5E93B1.8070301@g.nevcal.com>

On 8/31/2011 10:10 AM, Guido van Rossum wrote:
> On Tue, Aug 30, 2011 at 11:03 PM, Stephen J. Turnbull
> <stephen at xemacs.org>  wrote:
> [me]
>>   >  That sounds like a contradiction -- it wouldn't be a UTF-16 array if
>>   >  you couldn't tell that it was using UTF-16.
>>
>> Well, that's why I wrote "intended to be suggestive".  The Unicode
>> Standard does not specify at all what the internal representation of
>> characters may be, it only specifies what their external behavior must
>> be when two processes communicate.  (For "process" as used in the
>> standard, think "Python modules" here, since we are concerned with the
>> problems of folks who develop in Python.)  When observing the behavior
>> of a Unicode process, there are no UTF-16 arrays or UTF-8 arrays or
>> even UTF-32 arrays; only arrays of characters.
> Hm, that's not how I would read "process". IMO that is an
> intentionally vague term, and we are free to decide how to interpret
> it. I don't think it will work very well to define a process as a
> Python module; what about Python modules that agree about passing
> along array of code units (or streams of UTF-8, for that matter)?
>
> This is why I find the issue of Python, the language (and stdlib), as
> a whole "conforming to the Unicode standard" such a troublesome
> concept -- I think it is something that an application may claim, but
> the language should make much more modest claims, such as "the regular
> expression syntax supports features X, Y and Z from the Unicode
> recommendation XXX, or "the UTF-8 codec will never emit a sequence of
> bytes that is invalid according Unicode specification YYY". (As long
> as the Unicode references are also versioned or dated.)
>
> I'm fine with saying "it is hard to write Unicode-conforming
> application code for reason ZZZ" and proposing a fix (e.g. PEP 393
> fixes a specific complaint about code units being inferior to code
> points for most types of processing). I'm not fine with saying "the
> string datatype should conform to the Unicode standard".
>
>> Thus, according to the rules of handling a UTF-16 stream, it is an
>> error to observe a lone surrogate or a surrogate pair that isn't a
>> high-low pair (Unicode 6.0, Ch. 3 "Conformance", requirements C1 and
>> C8-C10).  That's what I mean by "can't tell it's UTF-16".
> But if you can observe (valid) surrogate pairs it is still UTF-16.
>
>> And I
>> understand those requirements to mean that operations on UTF-16
>> streams should produce UTF-16 streams, or raise an error.  Without
>> that closure property for basic operations on str, I think it's a bad
>> idea to say that the representation of text in a str in a pre-PEP-393
>> "narrow" build is UTF-16.  For many users and app developers, it
>> creates expectations that are not fulfilled.
> Ok, I dig this, to some extent. However saying it is UCS-2 is equally
> bad. I guess this is why Java and .NET just say their string types
> contain arrays of "16-bit characters", with essentially no semantics
> attached to the word "character" besides "16-bit unsigned integer".
>
> At the same time I think it would be useful if certain string
> operations like .lower() worked in such a way that *if* the input were
> valid UTF-16, *then* the output would also be, while *if* the input
> contained an invalid surrogate, the result would simply be something
> that is no worse (in particular, those are all mapped to themselves).
> We could even go further and have .lower() and friends look at
> graphemes (multi-code-point characters) if the Unicode std has a
> useful definition of e.g. lowercasing graphemes that differed from
> lowercasing code points.
>
> An analogy is actually found in .lower() on 8-bit strings in Python 2:
> it assumes the string contains ASCII, and non-ASCII characters are
> mapped to themselves. If your string contains Latin-1 or EBCDIC or
> UTF-8 it will not do the right thing. But that doesn't mean strings
> cannot contain those encodings, it just means that the .lower() method
> is not useful if they do. (Why ASCII? Because that is the system
> encoding in Python 2.)

So if Python 3.3+ uses Unicode codepoints as its str representation, the 
analogy to ASCII and Python 2 would imply that it should permit 
out-of-range codepoints, if they can be represented in the underlying 
data values.  Valid codecs would not create such on input, and Valid 
codecs would not accept such on output.  Operations on codepoints 
should, like .lower(), use the identity operation when applied to 
non-codepoints.

>
>> It's true that common usage is that an array of code units that
>> usually conforms to UTF-16 may be called "UTF-16" without the closure
>> properties.  I just disagree with that usage, because there are two
>> camps that interpret "UTF-16" differently.  One side says, "we have an
>> array representation in UTF-16 that can handle all Unicode code points
>> efficiently, and if you think you need more, think again", while the
>> other says "it's too painful to have to check every result for valid
>> UTF-16, and we need a UTF-16 type that supports the usual array
>> operations on *characters* via the usual operators; if you think
>> otherwise, think again."
> I think we should just document how it behaves and not get hung up on
> what it is called. Mentioning UTF-16 is still useful because it
> indicates that some operations may act properly on surrogate pairs.
> (Also because of course character properties for BMP characters are
> respected, etc.)
>
>> Note that despite the (presumed) resolution of the UTF-16 issue for
>> CPython by PEP 393, at some point a very similar discussion will take
>> place over "characters" anyway, because users and app developers are
>> going to want a type that handles composition sequences and/or
>> grapheme clusters for them, as well as comparison that respects
>> canonical equivalence, even if it is inefficient compared to str.
>> That's why I insisted on use of "array of code points" to describe the
>> PEP 393 str type, rather than "array of characters".
> Let's call those things graphemes (Tom C's term, I quite like leaving
> "character" ambiguous) -- they are sequences of multiple code points
> that represent a single "visual squiggle" (the kind of thing that
> you'd want to be swappable in vim with "xp" :-). I agree that APIs are
> needed to manipulate (match, generate, validate, mutilate, etc.)
> things at the grapheme level. I don't agree that this means a separate
> data type is required. There are ever-larger units of information
> encoded in text strings, with ever farther-reaching (and more vague)
> requirements on valid sequences. Do you want to have a data type that
> can represent (only valid) words in a language? Sentences? Novels?

Interesting ideas.  Once you break the idea that every code point must 
be directly indexed, and that higher level concepts can be abstracted, 
appropriate codecs could produce a sequence of words, instead of 
characters.  It depends on the purpose of the application whether such 
is interesting or not.  Working a bit with ebook searching algorithms 
lately, and one idea is to extract from the text a list of words, and 
represent the words with codes.  Do the same for the search string.  
Then the search, instead of searching for characters and character 
strings, and skipping over punctuation, etc., it can simply search for 
the appropriate sequence of word codes.  In this case, part of the 
usefulness of the abstraction is the elimination of punctuation, so it 
is more of an index to the character text rather an encoding of it... 
but if the encoding of the text extracted words, the creation of the 
index would then be extremely simple.  I don't have applications in mind 
where representing sentences or novels would be particularly useful, but 
representing words could be extremely useful.  Valid words?  Given a 
language (or languages) and dictionary (or dictionaries), words could be 
flagged as valid or invalid for that dictionary.  Representing invalid 
words, could be similar to the idea of the representing of invalid UTF-8 
bytes using the lone-surrogate error handler... possible when the 
application requests such.

> I think that at this point in time the best we can do is claim that
> Python (the language standard) uses either 16-bit code units or 21-bit
> code points in its string datatype, and that, thanks to PEP 393,
> CPython 3.3 and further will always use 21-bit code points (but Jython
> and IronPython may forever use their platform's native 16-bit code
> unit representing string type). And then we add APIs that can be used
> everywhere to look for code points (even if the string contains code
> points), graphemes, or larger constructs. I'd like those APIs to be
> designed using a garbage-in-garbage-out principle, where if the input
> conforms to some Unicode requirement, the output does too, but if the
> input doesn't, the output does what makes most sense. Validation is
> then limited to codecs, and optional calls.

So limiting the code point values to 21-bits (wasting 11 bits) only 
serves to prevent applications from using those 11 bits when they have 
extra-Unicode values to represent.  There is no shortage of 32-bit 
datatypes to draw from, though, but it seems an unnecessary constraint 
if exact conformance to Unicode is not provided... conforming codecs 
wouldn't create such values on input nor accept them on output, so the 
constraint only serves to restrict applications from using all 32-bits 
of the underlying storage.

> If you index or slice a string, or create a string from chr() of a
> surrogate or from some other value that the Unicode standard considers
> an illegal code point, you better know what you are doing. I want
> chr(i) to be valid for all values of i in range(2**21), so it can be
> used to create a lone surrogate, or (on systems with 16-bit
> "characters") a surrogate pair. And also ord(chr(i)) == i for all i in
> range(2**21). I'm not sure about ord() on a 2-character string
> containing a surrogate pair on systems where strings contain 21-bit
> code points; I think it should be an error there, just as ord() on
> other strings of length != 1. But on systems with 16-bit "characters",
> ord() of strings of length 2 containing a valid surrogate pair should
> work.
>

Yep.  So str != Unicode.  You keep saying that :)  And others point out 
how some applications would benefit from encapsulating the complexities 
of Unicode semantics at various higher levels of abstractions.  Sure, it 
can be tacked on, by adding complex access methods to a subtype of str, 
but losing O(1) indexing of those higher abstractions when that route is 
chosen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/da400a6a/attachment-0001.html>

From cesare.di.mauro at gmail.com  Wed Aug 31 22:10:40 2011
From: cesare.di.mauro at gmail.com (Cesare Di Mauro)
Date: Wed, 31 Aug 2011 22:10:40 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0mjz8ONKJgWs3T2bL88kmHN=ja617WNUNVimvKeCe=agQ@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7v7k5Re9v7ZC5Ti9fj7UuoLJ+MsYP5VJOeRP-ZSRfuaaTthw@mail.gmail.com>
	<CA+j1x0mjz8ONKJgWs3T2bL88kmHN=ja617WNUNVimvKeCe=agQ@mail.gmail.com>
Message-ID: <CAP7v7k7d0R6pdEBTnxfsJ32dgxfTJvx=qJFK=NWEKOViyqYtqw@mail.gmail.com>

2011/8/31 stefan brunthaler <stefan at brunthaler.net>

> > I think that you must deal with big endianess because some RISC can't
> handle
> > at all data in little endian format.
> >
> > In WPython I have wrote some macros which handle both endianess, but
> lacking
> > big endian machines I never had the opportunity to verify if something
> was
> > wrong.
> >
> I am sorry for the temporal lapse of not getting back to this directly
> yesterday, we were just heading out for lunch and I figured it only
> out then but immediately forgot it on our way back to the lab...
>
> So, as I have already said, I evaluated my optimizations on x86
> (little-endian) and PowerPC 970 (big-endian) and I did not have to
> change any of my instruction decoding during interpretation. (The only
> nasty bug I still remember vividly was that while on gcc for x86 the
> data type char defaults to signed, whereas it defaults to unsigned on
> PowerPC's gcc.) When I have time and access to a PowerPC machine again
> (an ARM might be interesting, too), I will take a look at the
> generated assembly code to figure out why this is working. (I have
> some ideas why it might work without changing the code.)
>
> If I run into any problems, I'll gladly contact you :)
>
> BTW: AFAIR, we emailed last year regarding wpython and IIRC your
> optimizations could primarily be summarized as clever
> superinstructions. I have not implemented anything in that area at all
> (and have in fact not even touched the compiler and its peephole
> optimizer), but if parts my implementation gets in, I am sure that you
> could add some of your work on top of that, too.
>
>  Cheers,
> --stefan
>

You're right. I took a look at our old e-mails, and I found more details
about your work. It's definitely not affected by processor endianess, so you
don't need any check: it just works, because you'll produce the new opcodes
in memory, and consume them in memory as well.

Looking at your examples, I think that WPython wordcodes usage can be useful
only for the most simple ones. That's because superinstructions group
together several actions that need to be splitted again to simpler ones by a
tracing-JIT/compiler like your, if you want to keep it simple. You said that
you added about 400 specialized instructions last year with the usual
bytecodes, but wordcodes will require quite more (this can compromise
performance on CPU with small data caches).

So I think that it'll be better to finish your work, with all tests passed,
before thinking about adding something on top (that, for me, sounds like a
machine code JIT O:-)

Regards,
Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/f8ca4ce0/attachment.html>

From cesare.di.mauro at gmail.com  Wed Aug 31 22:18:08 2011
From: cesare.di.mauro at gmail.com (Cesare Di Mauro)
Date: Wed, 31 Aug 2011 22:18:08 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7+vJKN02MDGboQDHeeDL2Tusd+yOM5JBx+D1s3Nbkucu9K-w@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net>
	<CAP7v7k62ORPzPFcLOsgESw6Tjjc1A5yTOE+heZP71feC_OB3Hg@mail.gmail.com>
	<CAP7+vJKN02MDGboQDHeeDL2Tusd+yOM5JBx+D1s3Nbkucu9K-w@mail.gmail.com>
Message-ID: <CAP7v7k5brM3JOVhEzfTT0xZn9Fn5e36v3SwCj4Lx9Gsqqs5R+w@mail.gmail.com>

2011/8/31 Guido van Rossum <guido at python.org>

> On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro
> <cesare.di.mauro at gmail.com> wrote:
> > It isn't, because motivation to do something new with CPython vanishes,
> at
> > least on some areas (virtual machine / ceval.c), even having some ideas
> to
> > experiment with. That's why in my last talk on EuroPython I decided to
> move
> > on other areas (Python objects).
>
> Cesare, I'm really sorry that you became so disillusioned that you
> abandoned wordcode. I agree that we were too optimistic about Unladen
> Swallow. Also that the existence of PyPy and its PR machine (:-)
> should not stop us from improving CPython.
>

I never stopped thinking about new optimization. A lot can be made on
CPython, even without resorting to something like JIT et all.

>
> I'm wondering if, with your experience in creating WPython, you could
> review Stefan Brunthaler's code and approach (once he's put it up for
> review) and possibly the two of you could even work on a joint
> project?
>
> --
> --Guido van Rossum (python.org/~guido)
>


Yes, I can. I'll wait for Stefan to update its source (reaching Python 3.2
at least) as he has intended to do, and that everything is published, in
order to review the code.

I also agree with you that right now it doesn't need to look as
state-of-the-art. First make it work, then make it nicer. ;)

Regards,
Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/cdc5fa6b/attachment.html>