From ncoghlan at gmail.com  Thu Dec  1 00:39:52 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 1 Dec 2011 09:39:52 +1000
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <CALeMXf4fi=wSQtjG4RjBOXgg1AG_t2LSPVej0RgTfiDfx8s9yQ@mail.gmail.com>
References: <4E43E9A6.7020608@netwok.org>
	<20110811183114.701DF3A406B@sparrow.telecommunity.com>
	<4ED1196E.8090505@netwok.org>
	<CALeMXf4fi=wSQtjG4RjBOXgg1AG_t2LSPVej0RgTfiDfx8s9yQ@mail.gmail.com>
Message-ID: <CADiSq7emFxO5WNJjenfNVyNSy4BZ=6WN1M52RTe94jfCfUoxEA@mail.gmail.com>

On Thu, Dec 1, 2011 at 1:28 AM, PJ Eby <pje at telecommunity.com> wrote:
> It doesn't help at all that I'm not really in a position to provide an
> implementation, and the persons most likely to implement have been leaning
> somewhat towards 382, or wanting to modify 402 such that it uses .pyp
> directory extensions so that PEP 395 can be supported...

While I was initially a fan of the possibilities of PEP 402, I
eventually decided that we would be trading an easy problem ("you need
an '__init__.py' marker file or a '.pyp' extension to get Python to
recognise your package directory") for a hard one ("What's your
sys.path look like? What did you mean for it to look like?"). Symlinks
(and the fact we implicitly call realname() during system
initialisation and import) just make things even messier.
*Deliberately* allowing package structures on the filesystem to become
ambiguous is a recipe for future pain (and could potentially undo a
lot of the good work done by PEP 328's elimination of implicit
relative imports).

I acknowledge there is a lot of confusion amongst novices as to how
packages and imports actually work, but my diagnosis of the root cause
of that problem is completely different from that supposed by PEP 402
(as documented in the more recent versions of PEP 395, I've come to
believe it is due to the way we stuff up the default sys.path[0]
initialisation when packages are involved).

So, in the end, I've come to strongly prefer the PEP 382 approach. The
principle of "Explicit is better than implicit" applies to package
detection on the filesystem just as much as it does to any other kind
of API design, and it really isn't that different from the way we
treat actual Python files (i.e. you can *execute* arbitrary files, but
they need to have an appropriate extension if you want to import
them).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From anacrolix at gmail.com  Thu Dec  1 01:46:47 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Thu, 1 Dec 2011 11:46:47 +1100
Subject: [Python-Dev] STM and python
In-Reply-To: <CAPZV6o_Ud9-23NhOsP9sNy24LUXPRA05Ki2g3KwduZt8dNMwOg@mail.gmail.com>
References: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
	<CAPZV6o_Ud9-23NhOsP9sNy24LUXPRA05Ki2g3KwduZt8dNMwOg@mail.gmail.com>
Message-ID: <CAB4yi1MMWVjsHoh0yLtVsQKZAiMRq_pqR9ykGSya=27+SFd60g@mail.gmail.com>

I did see this, I'm not convinced it's only relevant to PyPy.

On Thu, Dec 1, 2011 at 2:25 AM, Benjamin Peterson <benjamin at python.org> wrote:
> 2011/11/30 Matt Joiner <anacrolix at gmail.com>:
>> Given GCC's announcement that Intel's STM will be an extension for C
>> and C++ in GCC 4.7, what does this mean for Python, and the GIL?
>>
>> I've seen efforts made to make STM available as a context, and for use
>> in user code. I've also read about the "old attempts way back" that
>> attempted to use finer grain locking. The understandably failed due to
>> the heavy costs involved in both the locking mechanisms used, and the
>> overhead of a reference counting garbage collection system.
>>
>> However given advances in locking and garbage collection in the last
>> decade, what attempts have been made recently to try these new ideas
>> out? In particular, how unlikely is it that all the thread safe
>> primitives, global contexts, and reference counting functions be made
>> __transaction_atomic, and magical parallelism performance boosts
>> ensue?
>
> Have you seen http://morepypy.blogspot.com/2011/08/we-need-software-transactional-memory.html
> ?
>
>
> --
> Regards,
> Benjamin

From solipsis at pitrou.net  Thu Dec  1 01:50:12 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 1 Dec 2011 01:50:12 +0100
Subject: [Python-Dev] STM and python
References: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
Message-ID: <20111201015012.3a6f1ca2@pitrou.net>

On Thu, 1 Dec 2011 01:31:14 +1100
Matt Joiner <anacrolix at gmail.com> wrote:
> 
> However given advances in locking and garbage collection in the last
> decade, what attempts have been made recently to try these new ideas
> out? In particular, how unlikely is it that all the thread safe
> primitives, global contexts, and reference counting functions be made
> __transaction_atomic, and magical parallelism performance boosts
> ensue?

IMHO, it sounds a bit too magical to be true.

> I'm aware that C89, platforms without STM/GCC, and single threaded
> performance are concerns. Please ignore these for the sake of
> discussion about possibilities.
> 
> http://gcc.gnu.org/wiki/TransactionalMemory

I find it interesting that the only example of hardware transactional
memory mentioned in this page is a Sun CPU project which has been
cancelled. Does Intel have anything similar in the works?

Regards

Antoine.



From greg at krypto.org  Thu Dec  1 01:58:29 2011
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 30 Nov 2011 16:58:29 -0800
Subject: [Python-Dev] STM and python
In-Reply-To: <CAB4yi1MMWVjsHoh0yLtVsQKZAiMRq_pqR9ykGSya=27+SFd60g@mail.gmail.com>
References: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
	<CAPZV6o_Ud9-23NhOsP9sNy24LUXPRA05Ki2g3KwduZt8dNMwOg@mail.gmail.com>
	<CAB4yi1MMWVjsHoh0yLtVsQKZAiMRq_pqR9ykGSya=27+SFd60g@mail.gmail.com>
Message-ID: <CAGE7PNKs4po2pYFGsw61JoAJaQ1g=YH_5wqR7VgRMaacn1BTuA@mail.gmail.com>

Azul has been using hardware transactional memory on their custom CPUs (and
likely STM in their current x86 virtual machine based products) to great
effect for their massively parallel Java VM (700+ cpu cores and gobs of
ram) for over 4 years.  I'll leave it to the reader to do the relevant
searching to read more on that.

My point is: This is up to any given Python VM implementation to take
advantage of or not as it sees fit.  Shoe horning it into an existing VM
may not make much sense but anyone is welcome to try.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111130/9e9621cd/attachment.html>

From ncoghlan at gmail.com  Thu Dec  1 06:41:35 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 1 Dec 2011 15:41:35 +1000
Subject: [Python-Dev] STM and python
In-Reply-To: <CAGE7PNKs4po2pYFGsw61JoAJaQ1g=YH_5wqR7VgRMaacn1BTuA@mail.gmail.com>
References: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
	<CAPZV6o_Ud9-23NhOsP9sNy24LUXPRA05Ki2g3KwduZt8dNMwOg@mail.gmail.com>
	<CAB4yi1MMWVjsHoh0yLtVsQKZAiMRq_pqR9ykGSya=27+SFd60g@mail.gmail.com>
	<CAGE7PNKs4po2pYFGsw61JoAJaQ1g=YH_5wqR7VgRMaacn1BTuA@mail.gmail.com>
Message-ID: <CADiSq7fiHBo43mChODT2cr7ydV9FB3APBW-x=cHvS95Nod3qLQ@mail.gmail.com>

On Thu, Dec 1, 2011 at 10:58 AM, Gregory P. Smith <greg at krypto.org> wrote:
> Azul has been using hardware transactional memory on their custom CPUs (and
> likely STM in their current x86 virtual machine based products) to great
> effect for their massively parallel Java VM (700+ cpu cores and gobs of ram)
> for over 4 years. ?I'll leave it to the reader to do the relevant searching
> to read more on that.
>
> My point is: This is up to any given Python VM implementation to take
> advantage of or not as it sees fit. ?Shoe horning it into an existing VM may
> not make much sense but anyone is welcome to try.

There's a patch somewhere on the tracker to add an "Armin Rigo hook"
to the CPython eval loop so he can play with STM in Python as well (at
least, I think it was STM he wanted it for - it might have been
something else).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From anacrolix at gmail.com  Thu Dec  1 07:06:43 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Thu, 1 Dec 2011 17:06:43 +1100
Subject: [Python-Dev] STM and python
In-Reply-To: <CADiSq7fiHBo43mChODT2cr7ydV9FB3APBW-x=cHvS95Nod3qLQ@mail.gmail.com>
References: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
	<CAPZV6o_Ud9-23NhOsP9sNy24LUXPRA05Ki2g3KwduZt8dNMwOg@mail.gmail.com>
	<CAB4yi1MMWVjsHoh0yLtVsQKZAiMRq_pqR9ykGSya=27+SFd60g@mail.gmail.com>
	<CAGE7PNKs4po2pYFGsw61JoAJaQ1g=YH_5wqR7VgRMaacn1BTuA@mail.gmail.com>
	<CADiSq7fiHBo43mChODT2cr7ydV9FB3APBW-x=cHvS95Nod3qLQ@mail.gmail.com>
Message-ID: <CAB4yi1MvCu258S9zTHGRdAV3RbXVWeiCO-3vYpuZNfYBA=z4Hw@mail.gmail.com>

I saw this, I believe it just exposes an STM primitive to user code.
It doesn't make use of STM for Python internals.

Explicit STM doesn't seem particularly useful for a language that
doesn't expose raw memory in its normal usage.

On Thu, Dec 1, 2011 at 4:41 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Thu, Dec 1, 2011 at 10:58 AM, Gregory P. Smith <greg at krypto.org> wrote:
>> Azul has been using hardware transactional memory on their custom CPUs (and
>> likely STM in their current x86 virtual machine based products) to great
>> effect for their massively parallel Java VM (700+ cpu cores and gobs of ram)
>> for over 4 years. ?I'll leave it to the reader to do the relevant searching
>> to read more on that.
>>
>> My point is: This is up to any given Python VM implementation to take
>> advantage of or not as it sees fit. ?Shoe horning it into an existing VM may
>> not make much sense but anyone is welcome to try.
>
> There's a patch somewhere on the tracker to add an "Armin Rigo hook"
> to the CPython eval loop so he can play with STM in Python as well (at
> least, I think it was STM he wanted it for - it might have been
> something else).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From raymond.hettinger at gmail.com  Thu Dec  1 07:10:12 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Wed, 30 Nov 2011 22:10:12 -0800
Subject: [Python-Dev] Warnings
Message-ID: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>

When updating the documentation, please don't go overboard with warnings.
The docs need to be worded affirmatively -- say what a tool does and show how to use it correctly.
See http://docs.python.org/documenting/style.html#affirmative-tone

The docs for the subprocess module currently have SEVEN warning boxes on one page:
http://docs.python.org/library/subprocess.html#module-subprocess
The implicit message is that our tools are hazardous and should be avoided.

Please show some restraint and aim for clean looking, high-quality technical writing without the FUD.

Look at the SQLite3 docs for an example of good writing.  The prevention of SQL injection attacks is discussed briefly and effectively without big red boxes littering the page.


Raymond






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111130/943b6f09/attachment-0001.html>

From glyph at twistedmatrix.com  Thu Dec  1 08:02:25 2011
From: glyph at twistedmatrix.com (Glyph)
Date: Thu, 1 Dec 2011 02:02:25 -0500
Subject: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
In-Reply-To: <CADiSq7emFxO5WNJjenfNVyNSy4BZ=6WN1M52RTe94jfCfUoxEA@mail.gmail.com>
References: <4E43E9A6.7020608@netwok.org>
	<20110811183114.701DF3A406B@sparrow.telecommunity.com>
	<4ED1196E.8090505@netwok.org>
	<CALeMXf4fi=wSQtjG4RjBOXgg1AG_t2LSPVej0RgTfiDfx8s9yQ@mail.gmail.com>
	<CADiSq7emFxO5WNJjenfNVyNSy4BZ=6WN1M52RTe94jfCfUoxEA@mail.gmail.com>
Message-ID: <B4B27DBF-D737-4C54-A81C-610FA3B48BAA@twistedmatrix.com>


On Nov 30, 2011, at 6:39 PM, Nick Coghlan wrote:

> On Thu, Dec 1, 2011 at 1:28 AM, PJ Eby <pje at telecommunity.com> wrote:
>> It doesn't help at all that I'm not really in a position to provide an
>> implementation, and the persons most likely to implement have been leaning
>> somewhat towards 382, or wanting to modify 402 such that it uses .pyp
>> directory extensions so that PEP 395 can be supported...
> 
> While I was initially a fan of the possibilities of PEP 402, I
> eventually decided that we would be trading an easy problem ("you need
> an '__init__.py' marker file or a '.pyp' extension to get Python to
> recognise your package directory") for a hard one ("What's your
> sys.path look like? What did you mean for it to look like?"). Symlinks
> (and the fact we implicitly call realname() during system
> initialisation and import) just make things even messier.
> *Deliberately* allowing package structures on the filesystem to become
> ambiguous is a recipe for future pain (and could potentially undo a
> lot of the good work done by PEP 328's elimination of implicit
> relative imports).
> 
> I acknowledge there is a lot of confusion amongst novices as to how
> packages and imports actually work, but my diagnosis of the root cause
> of that problem is completely different from that supposed by PEP 402
> (as documented in the more recent versions of PEP 395, I've come to
> believe it is due to the way we stuff up the default sys.path[0]
> initialisation when packages are involved).
> 
> So, in the end, I've come to strongly prefer the PEP 382 approach. The
> principle of "Explicit is better than implicit" applies to package
> detection on the filesystem just as much as it does to any other kind
> of API design, and it really isn't that different from the way we
> treat actual Python files (i.e. you can *execute* arbitrary files, but
> they need to have an appropriate extension if you want to import
> them).

I've helped an almost distressing number of newbies overcome their confusion about sys.path and packages.  Systems using Twisted are, almost by definition, hairy integration problems, and are frequently being created or maintained by people with little to no previous Python experience.

Given that experience, I completely agree with everything you've written above (except for the part where you initially liked it).  I appreciate the insight that PEP 402 offers about python's package mechanism (and the difficulties introduced by namespace packages).  Its statement of the problem is good, but in my opinion its solution points in exactly the wrong direction: packages need to be _more_ explicit about their package-ness and tools need to be stricter about how they're laid out.  It would be great if sys.path[0] were actually correct when running a script inside a package, or at least issued a warning which would explain how to correctly lay out said package.  I would love to see a loud alarm every time a module accidentally got imported by the same name twice.  I wish I knew, once and for all, whether it was 'import Image' or 'from PIL import Image'.

My hope is that if Python starts to tighten these things up a bit, or at least communicate better about best practices, editors and IDEs will develop better automatic discovery features and frameworks will start to normalize their sys.path setups and stop depending on accidents of current directory and script location.  This will in turn vastly decrease confusion among new python developers taking on large projects with a bunch of libraries, who mostly don't care what the rules for where files are supposed to go are, and just want to put them somewhere that works.

-glyph

From glyph at twistedmatrix.com  Thu Dec  1 08:15:01 2011
From: glyph at twistedmatrix.com (Glyph)
Date: Thu, 1 Dec 2011 02:15:01 -0500
Subject: [Python-Dev] Warnings
In-Reply-To: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
Message-ID: <22C86443-2C02-4D0A-A62A-A1CD75F87D08@twistedmatrix.com>


On Dec 1, 2011, at 1:10 AM, Raymond Hettinger wrote:

> When updating the documentation, please don't go overboard with warnings.
> The docs need to be worded affirmatively -- say what a tool does and show how to use it correctly.
> See http://docs.python.org/documenting/style.html#affirmative-tone
> 
> The docs for the subprocess module currently have SEVEN warning boxes on one page:
> http://docs.python.org/library/subprocess.html#module-subprocess
> The implicit message is that our tools are hazardous and should be avoided.
> 
> Please show some restraint and aim for clean looking, high-quality technical writing without the FUD.
> 
> Look at the SQLite3 docs for an example of good writing.  The prevention of SQL injection attacks is discussed briefly and effectively without big red boxes littering the page.

I'm not convinced this is actually a great example of how to outline pitfalls clearly; it doesn't say what an SQL injection attack is, or what the consequences might be.

Also, it's not the best example of a positive tone.  The narrative is:

You probably want to do X.
Don't do Y, because it will make you vulnerable to a Q attack.
Instead, do Z.
Here's an example of Y.  Don't do it!
Okay, finally, here's an example of Z.

It would be better to say "You probably want to do X.  Here's how you do X, with Z.  Here's an example of Z."  Then, later, discuss why some people want to do Y, and why you should avoid that impulse.

However, what 'subprocess' is doing clearly isn't an improvement, it's not an effective introduction to secure process execution, just a reference document punctuated with ambiguous anxiety.  sqlite3 is at least somewhat specific :).

I think both of these documents point to a need for a recommended idiom for discussing security, or at least common antipatterns, within the Python documentation.  I like the IETF's "security considerations" section, because it separates things off into a section that can be referred to later, once the developer has had an opportunity to grasp the basics.  Any section with security implications can easily say "please refer to the 'security considerations' section for important information on how to avoid common mistakes" without turning into a big security digression on its own.

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111201/692f0dde/attachment.html>

From ncoghlan at gmail.com  Thu Dec  1 08:32:36 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 1 Dec 2011 17:32:36 +1000
Subject: [Python-Dev] Warnings
In-Reply-To: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
Message-ID: <CADiSq7cwyPZHQHSc3esQ01RE_6_jjdFvndo1LFBC3cd9Chn1QQ@mail.gmail.com>

On Thu, Dec 1, 2011 at 4:10 PM, Raymond Hettinger
<raymond.hettinger at gmail.com> wrote:
> When updating the documentation, please don't go overboard with warnings.
> The docs need to be worded affirmatively -- say what a tool does and show
> how to use it correctly.
> See?http://docs.python.org/documenting/style.html#affirmative-tone
>
> The docs for the subprocess module currently have SEVEN warning boxes on one
> page:
> http://docs.python.org/library/subprocess.html#module-subprocess
> The implicit message is that our tools are hazardous and should be avoided.

I have no problem with eliminating a lot of those specific warnings -
I kept them there in the last rewrite (and added a couple of new ones)
because avoiding shell injection vulnerabilities is such a driving
theme behind the subprocess module design. Since I was already
changing a lot of other things, messing with that aspect really wasn't
high on my priority list.

Now that we have the "frequently used arguments" section, though, the
rest of the warnings could fairly readily be downgraded to notes or
inline references to that section.

> Please?show some restraint and aim for clean looking, high-quality technical
> writing without the FUD.

I do object to you calling genuine attempts to educate programmers
about security issues FUD, though. It's not FUD - novice programmers
inflict shell injection, script injection and SQL injection
vulnerabilities on the world every day. The multiple warnings are
there in the subprocess docs because people often only look at the
documentation for the specific function they're interested in, not at
the broader context of the page it is part of.

"Overkill" is a legitimate complaint, but calling attempts to
highlight genuinely insecure practices FUD is the kind of attitude
that has given the world so many years of persistent vulnerability to
buffer overflow attacks :P

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Thu Dec  1 08:36:37 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 1 Dec 2011 17:36:37 +1000
Subject: [Python-Dev] Warnings
In-Reply-To: <22C86443-2C02-4D0A-A62A-A1CD75F87D08@twistedmatrix.com>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
	<22C86443-2C02-4D0A-A62A-A1CD75F87D08@twistedmatrix.com>
Message-ID: <CADiSq7eVZJLQ+cShRC=WQE7cD3vi6muPFuxqCf-QQoKbmW9+PQ@mail.gmail.com>

On Thu, Dec 1, 2011 at 5:15 PM, Glyph <glyph at twistedmatrix.com> wrote:
> I think both of these documents point to a need for a recommended idiom for
> discussing security, or at least common antipatterns, within the Python
> documentation. ?I like the IETF's "security considerations" section, because
> it separates things off into a section that can be referred to later, once
> the developer has had an opportunity to grasp the basics. ?Any section with
> security implications can easily say "please refer to the 'security
> considerations' section for important information on how to avoid common
> mistakes" without turning into a big security digression on its own.

I like that approach - one of the problems with online docs is the
fact people don't read them in order, hence the proliferation of
warnings for the subprocess module. A clear "Security Considerations"
section with appropriate cross links would allow us to be clear and
explicit about common problems without littering the docs with red
warning boxes for security issues that are inherent in a particular
task rather than being a Python-specific problem.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Thu Dec  1 08:55:19 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 1 Dec 2011 17:55:19 +1000
Subject: [Python-Dev] Warnings
In-Reply-To: <CADiSq7eVZJLQ+cShRC=WQE7cD3vi6muPFuxqCf-QQoKbmW9+PQ@mail.gmail.com>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
	<22C86443-2C02-4D0A-A62A-A1CD75F87D08@twistedmatrix.com>
	<CADiSq7eVZJLQ+cShRC=WQE7cD3vi6muPFuxqCf-QQoKbmW9+PQ@mail.gmail.com>
Message-ID: <CADiSq7fnanrSYMr0J7Oi_HdzHi4P+a1ETSfPjr-TpqpUVJ4dVQ@mail.gmail.com>

On Thu, Dec 1, 2011 at 5:36 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Thu, Dec 1, 2011 at 5:15 PM, Glyph <glyph at twistedmatrix.com> wrote:
>> I think both of these documents point to a need for a recommended idiom for
>> discussing security, or at least common antipatterns, within the Python
>> documentation. ?I like the IETF's "security considerations" section, because
>> it separates things off into a section that can be referred to later, once
>> the developer has had an opportunity to grasp the basics. ?Any section with
>> security implications can easily say "please refer to the 'security
>> considerations' section for important information on how to avoid common
>> mistakes" without turning into a big security digression on its own.
>
> I like that approach - one of the problems with online docs is the
> fact people don't read them in order, hence the proliferation of
> warnings for the subprocess module. A clear "Security Considerations"
> section with appropriate cross links would allow us to be clear and
> explicit about common problems without littering the docs with red
> warning boxes for security issues that are inherent in a particular
> task rather than being a Python-specific problem.

I created http://bugs.python.org/issue13515 to propose a specific
documentation style guide adopt along these lines (expanded a bit to
cover other cross-cutting concerns like the pipe buffer blocking I/O
problem in subprocess).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From arigo at tunes.org  Thu Dec  1 12:01:44 2011
From: arigo at tunes.org (Armin Rigo)
Date: Thu, 1 Dec 2011 12:01:44 +0100
Subject: [Python-Dev] STM and python
In-Reply-To: <CAB4yi1MvCu258S9zTHGRdAV3RbXVWeiCO-3vYpuZNfYBA=z4Hw@mail.gmail.com>
References: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
	<CAPZV6o_Ud9-23NhOsP9sNy24LUXPRA05Ki2g3KwduZt8dNMwOg@mail.gmail.com>
	<CAB4yi1MMWVjsHoh0yLtVsQKZAiMRq_pqR9ykGSya=27+SFd60g@mail.gmail.com>
	<CAGE7PNKs4po2pYFGsw61JoAJaQ1g=YH_5wqR7VgRMaacn1BTuA@mail.gmail.com>
	<CADiSq7fiHBo43mChODT2cr7ydV9FB3APBW-x=cHvS95Nod3qLQ@mail.gmail.com>
	<CAB4yi1MvCu258S9zTHGRdAV3RbXVWeiCO-3vYpuZNfYBA=z4Hw@mail.gmail.com>
Message-ID: <CAMSv6X2mntP6crK2L7v4_U-p+0+GwrBYdrLQ03Kcz7zZ36sKpA@mail.gmail.com>

Hi,

On Thu, Dec 1, 2011 at 07:06, Matt Joiner <anacrolix at gmail.com> wrote:
> I saw this, I believe it just exposes an STM primitive to user code.
> It doesn't make use of STM for Python internals.

That's correct.

> Explicit STM doesn't seem particularly useful for a language that
> doesn't expose raw memory in its normal usage.

In my opinion, that sentence could not be more wrong.

It is true that, as I discuss on the blog post cited a few times in
this thread, the first goal I see is to use STM to replace the GIL as
an internal way of keeping the state of the interpreter consistent.
This could quite possibly be achieved using the new GCC
__transaction_atomic keyword, although I see already annoying issues
(e.g. the keyword can only protect a _syntactically nested_ piece of
code as a transaction).

However there is another aspect: user-exposed STM, which I didn't
explore much.  While it is potentially even more important, it is a
language design question, so I'm happy to delegate it to python-dev.
In my opinion, explicit STM (like Clojure) is not only *a* way to
write multithreaded Python programs, but it seems to be *the only* way
that really makes sense in general, for more than small examples and
more than examples where other hacks are enough (see
http://en.wikipedia.org/wiki/Software_transactional_memory#Composable_operations
).  In other words, locks are low-level and should not be used in a
high-level language, like direct memory accesses, just because it
forces the programmer to think about increasingly complicated
situations.

And of course there is the background idea that TM might be available
in hardware someday.  My own guess is that it will occur, and I bet
that in 5 to 10 years all new Intel and AMD CPUs will have Hybrid TM.
On such hardware, the performance penalty mostly disappears (which is
also, I guess, the reasoning behind GCC 4.7, offering a future path to
use Hybrid TM).

If python-dev people are interested in exploring the language design
space in that direction, I would be most happy to look in more detail
at GCC 4.7.  If we manage to make use of it, then we could get a
version of CPython using STM internally with a very minimal patch.  If
it seems useful we can then turn that patch into #ifdefs into the
normal CPython.  It would of course be off by default because of the
performance hit; still, it would give an optional alternate
"CPythonSTM" to play with in order to come up with good user-level
abstractions.  (This is what I'm already trying to do with PyPy
without using GCC 4.7, and it's progressing nicely.)  (My existing
patch to CPython emulating user-level STM with the GIL is not really
satisfying, also for the reason that it cannot emulate some other
potentially useful user constructs, like abort_and_retry().)


A bient?t,

Armin.

From g.brandl at gmx.net  Thu Dec  1 22:24:54 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 01 Dec 2011 22:24:54 +0100
Subject: [Python-Dev] Warnings
In-Reply-To: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
Message-ID: <jb8rau$50t$1@dough.gmane.org>

Am 01.12.2011 07:10, schrieb Raymond Hettinger:
> When updating the documentation, please don't go overboard with warnings.
> The docs need to be worded affirmatively -- say what a tool does and show how to
> use it correctly.
> See http://docs.python.org/documenting/style.html#affirmative-tone
> 
> The docs for the subprocess module currently have SEVEN warning boxes on one page:
> http://docs.python.org/library/subprocess.html#module-subprocess
> The implicit message is that our tools are hazardous and should be avoided.
> 
> Please show some restraint and aim for clean looking, high-quality technical
> writing without the FUD.
> 
> Look at the SQLite3 docs for an example of good writing.  The prevention of SQL
> injection attacks is discussed briefly and effectively without big red boxes
> littering the page.

Obviously, +1.

Georg


From anacrolix at gmail.com  Fri Dec  2 06:32:59 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Fri, 2 Dec 2011 16:32:59 +1100
Subject: [Python-Dev] STM and python
In-Reply-To: <CAMSv6X2mntP6crK2L7v4_U-p+0+GwrBYdrLQ03Kcz7zZ36sKpA@mail.gmail.com>
References: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
	<CAPZV6o_Ud9-23NhOsP9sNy24LUXPRA05Ki2g3KwduZt8dNMwOg@mail.gmail.com>
	<CAB4yi1MMWVjsHoh0yLtVsQKZAiMRq_pqR9ykGSya=27+SFd60g@mail.gmail.com>
	<CAGE7PNKs4po2pYFGsw61JoAJaQ1g=YH_5wqR7VgRMaacn1BTuA@mail.gmail.com>
	<CADiSq7fiHBo43mChODT2cr7ydV9FB3APBW-x=cHvS95Nod3qLQ@mail.gmail.com>
	<CAB4yi1MvCu258S9zTHGRdAV3RbXVWeiCO-3vYpuZNfYBA=z4Hw@mail.gmail.com>
	<CAMSv6X2mntP6crK2L7v4_U-p+0+GwrBYdrLQ03Kcz7zZ36sKpA@mail.gmail.com>
Message-ID: <CAB4yi1NciyQ+ROLC4DW2iFM6A3f_zNF_aqeJpNobsf1Q+J8Zzg@mail.gmail.com>

Armin, thanks for weighing in on this. I'm keen to see a CPython
making use of STM, maybe I'll give it a try over Christmas break. I'm
willing to take the single threaded performance hit, as I have several
applications that degrade due to significant contention with the GIL.

The other benefits of STM you describe make it a lot more appealing. I
actually tried out Haskell recently to make use of many of the
advanced features but came crawling back.

If anyone else is keen to try this, I'm happy to receive patches for
testing and review.

On Thu, Dec 1, 2011 at 10:01 PM, Armin Rigo <arigo at tunes.org> wrote:
> Hi,
>
> On Thu, Dec 1, 2011 at 07:06, Matt Joiner <anacrolix at gmail.com> wrote:
>> I saw this, I believe it just exposes an STM primitive to user code.
>> It doesn't make use of STM for Python internals.
>
> That's correct.
>
>> Explicit STM doesn't seem particularly useful for a language that
>> doesn't expose raw memory in its normal usage.
>
> In my opinion, that sentence could not be more wrong.
>
> It is true that, as I discuss on the blog post cited a few times in
> this thread, the first goal I see is to use STM to replace the GIL as
> an internal way of keeping the state of the interpreter consistent.
> This could quite possibly be achieved using the new GCC
> __transaction_atomic keyword, although I see already annoying issues
> (e.g. the keyword can only protect a _syntactically nested_ piece of
> code as a transaction).
>
> However there is another aspect: user-exposed STM, which I didn't
> explore much. ?While it is potentially even more important, it is a
> language design question, so I'm happy to delegate it to python-dev.
> In my opinion, explicit STM (like Clojure) is not only *a* way to
> write multithreaded Python programs, but it seems to be *the only* way
> that really makes sense in general, for more than small examples and
> more than examples where other hacks are enough (see
> http://en.wikipedia.org/wiki/Software_transactional_memory#Composable_operations
> ). ?In other words, locks are low-level and should not be used in a
> high-level language, like direct memory accesses, just because it
> forces the programmer to think about increasingly complicated
> situations.
>
> And of course there is the background idea that TM might be available
> in hardware someday. ?My own guess is that it will occur, and I bet
> that in 5 to 10 years all new Intel and AMD CPUs will have Hybrid TM.
> On such hardware, the performance penalty mostly disappears (which is
> also, I guess, the reasoning behind GCC 4.7, offering a future path to
> use Hybrid TM).
>
> If python-dev people are interested in exploring the language design
> space in that direction, I would be most happy to look in more detail
> at GCC 4.7. ?If we manage to make use of it, then we could get a
> version of CPython using STM internally with a very minimal patch. ?If
> it seems useful we can then turn that patch into #ifdefs into the
> normal CPython. ?It would of course be off by default because of the
> performance hit; still, it would give an optional alternate
> "CPythonSTM" to play with in order to come up with good user-level
> abstractions. ?(This is what I'm already trying to do with PyPy
> without using GCC 4.7, and it's progressing nicely.) ?(My existing
> patch to CPython emulating user-level STM with the GIL is not really
> satisfying, also for the reason that it cannot emulate some other
> potentially useful user constructs, like abort_and_retry().)
>
>
> A bient?t,
>
> Armin.



-- 
?_?

From status at bugs.python.org  Fri Dec  2 18:07:32 2011
From: status at bugs.python.org (Python tracker)
Date: Fri,  2 Dec 2011 18:07:32 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20111202170732.659371CE85@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-11-25 - 2011-12-02)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3148 (+14)
  closed 22154 (+26)
  total  25302 (+40)

Open issues with patches: 1342 


Issues opened (29)
==================

#13483: Use VirtualAlloc to allocate memory arenas
http://bugs.python.org/issue13483  opened by pitrou

#13486: msvc9compiler.py doesn't properly generate manifest files.
http://bugs.python.org/issue13486  opened by Jahangir

#13491: Fixes for sqlite3 doc
http://bugs.python.org/issue13491  opened by Nebelhom

#13492: ./configure --with-system-ffi=LIBFFI-PATH
http://bugs.python.org/issue13492  opened by michael.kraus

#13493: Import error with embedded python on AIX 6.1
http://bugs.python.org/issue13493  opened by python_hu

#13494: 'cast' any value to a Boolean?
http://bugs.python.org/issue13494  opened by mark.dickinson

#13495: IDLE: Regression - Two ColorDelegator instances loaded
http://bugs.python.org/issue13495  opened by serwy

#13496: bisect module: Overflow at index computation
http://bugs.python.org/issue13496  opened by Voo

#13497: Fix for broken nice test on non-broken platforms with pedantic
http://bugs.python.org/issue13497  opened by yaneurabeya

#13498: os.makedirs exist_ok documentation is incorrect, as is some of
http://bugs.python.org/issue13498  opened by r.david.murray

#13499: uuid documentation example uses invalid REPL/doctest syntax
http://bugs.python.org/issue13499  opened by petri.lehtinen

#13500: Hitting EOF gets cmd.py into a infinite EOF on return loop
http://bugs.python.org/issue13500  opened by yaneurabeya

#13501: Make libedit support more generic; port readline / libedit to 
http://bugs.python.org/issue13501  opened by yaneurabeya

#13502: Documentation for Event.wait return value is either wrong or i
http://bugs.python.org/issue13502  opened by r.david.murray

#13503: improved efficiency of bytearray pickling by using bytes type 
http://bugs.python.org/issue13503  opened by irmen

#13504: Meta-issue for "Invent with Python" IDLE feedback
http://bugs.python.org/issue13504  opened by ncoghlan

#13505: Bytes objects pickled in 3.x with protocol <=2 are unpickled i
http://bugs.python.org/issue13505  opened by pitrou

#13506: IDLE sys.path does not contain Current Working Directory
http://bugs.python.org/issue13506  opened by MarcoScataglini

#13507: Modify OS X installer builds to package liblzma for the new lz
http://bugs.python.org/issue13507  opened by ned.deily

#13508: ctypes' find_library breaks with ARM ABIs
http://bugs.python.org/issue13508  opened by lool

#13510: Clarify that readlines() is not needed to iterate over a file
http://bugs.python.org/issue13510  opened by potten

#13511: ./configure --includedir, --libdir accept multiple
http://bugs.python.org/issue13511  opened by rpq

#13512: ~/.pypirc created insecurely
http://bugs.python.org/issue13512  opened by Vincent.Danen

#13513: IOBase docs incorrectly link to the GNU readline module
http://bugs.python.org/issue13513  opened by meador.inge

#13515: Consistent documentation practices for security concerns and c
http://bugs.python.org/issue13515  opened by ncoghlan

#13516: Gzip old log files in rotating handlers
http://bugs.python.org/issue13516  opened by ramhux

#13518: configparser
http://bugs.python.org/issue13518  opened by mickeyju

#13519: Tkinter rowconfigure and columnconfigure functions crash if mi
http://bugs.python.org/issue13519  opened by aoi.leslie

#13520: Patch to make pickle aware of __qualname__
http://bugs.python.org/issue13520  opened by sbt



Most recent 15 issues with no replies (15)
==========================================

#13520: Patch to make pickle aware of __qualname__
http://bugs.python.org/issue13520

#13519: Tkinter rowconfigure and columnconfigure functions crash if mi
http://bugs.python.org/issue13519

#13516: Gzip old log files in rotating handlers
http://bugs.python.org/issue13516

#13513: IOBase docs incorrectly link to the GNU readline module
http://bugs.python.org/issue13513

#13507: Modify OS X installer builds to package liblzma for the new lz
http://bugs.python.org/issue13507

#13501: Make libedit support more generic; port readline / libedit to 
http://bugs.python.org/issue13501

#13499: uuid documentation example uses invalid REPL/doctest syntax
http://bugs.python.org/issue13499

#13498: os.makedirs exist_ok documentation is incorrect, as is some of
http://bugs.python.org/issue13498

#13495: IDLE: Regression - Two ColorDelegator instances loaded
http://bugs.python.org/issue13495

#13478: No documentation for timeit.default_timer
http://bugs.python.org/issue13478

#13476: Simple exclusion filter for unittest autodiscovery
http://bugs.python.org/issue13476

#13464: HTTPResponse is missing an implementation of readinto
http://bugs.python.org/issue13464

#13463: Fix parsing of package_data
http://bugs.python.org/issue13463

#13456: Providing a custom HTTPResponse class to HTTPConnection
http://bugs.python.org/issue13456

#13438: "Delete patch set" review action doesn't work
http://bugs.python.org/issue13438



Most recent 15 issues waiting for review (15)
=============================================

#13520: Patch to make pickle aware of __qualname__
http://bugs.python.org/issue13520

#13516: Gzip old log files in rotating handlers
http://bugs.python.org/issue13516

#13513: IOBase docs incorrectly link to the GNU readline module
http://bugs.python.org/issue13513

#13512: ~/.pypirc created insecurely
http://bugs.python.org/issue13512

#13511: ./configure --includedir, --libdir accept multiple
http://bugs.python.org/issue13511

#13508: ctypes' find_library breaks with ARM ABIs
http://bugs.python.org/issue13508

#13503: improved efficiency of bytearray pickling by using bytes type 
http://bugs.python.org/issue13503

#13501: Make libedit support more generic; port readline / libedit to 
http://bugs.python.org/issue13501

#13500: Hitting EOF gets cmd.py into a infinite EOF on return loop
http://bugs.python.org/issue13500

#13497: Fix for broken nice test on non-broken platforms with pedantic
http://bugs.python.org/issue13497

#13495: IDLE: Regression - Two ColorDelegator instances loaded
http://bugs.python.org/issue13495

#13491: Fixes for sqlite3 doc
http://bugs.python.org/issue13491

#13486: msvc9compiler.py doesn't properly generate manifest files.
http://bugs.python.org/issue13486

#13483: Use VirtualAlloc to allocate memory arenas
http://bugs.python.org/issue13483

#13473: Add tests for files byte-compiled by distutils[2]
http://bugs.python.org/issue13473



Top 10 most discussed issues (10)
=================================

#6715: xz compressor support
http://bugs.python.org/issue6715  18 msgs

#7652: Merge C version of decimal into py3k.
http://bugs.python.org/issue7652  13 msgs

#11379: Remove "lightweight" from minidom description
http://bugs.python.org/issue11379  13 msgs

#1040439: Missing documentation on how to link with libpython
http://bugs.python.org/issue1040439  10 msgs

#13400: packaging: build command should have options to control byte-c
http://bugs.python.org/issue13400   9 msgs

#13493: Import error with embedded python on AIX 6.1
http://bugs.python.org/issue13493   9 msgs

#12567: curses implementation of Unicode is wrong in Python 3
http://bugs.python.org/issue12567   7 msgs

#13475: Add '-p'/'--path0' command line option to override sys.path[0]
http://bugs.python.org/issue13475   7 msgs

#13496: bisect module: Overflow at index computation
http://bugs.python.org/issue13496   7 msgs

#13405: Add DTrace probes
http://bugs.python.org/issue13405   6 msgs



Issues closed (26)
==================

#6753: Python 3.1.1 test_cmd_line fails on Fedora 11
http://bugs.python.org/issue6753  closed by haypo

#7111: abort when stderr is closed
http://bugs.python.org/issue7111  closed by pitrou

#8414: Add test cases for assert
http://bugs.python.org/issue8414  closed by ezio.melotti

#11427: ctypes from_buffer no longer accepts bytes
http://bugs.python.org/issue11427  closed by haypo

#12307: Inconsistent formatting of section titles in PEP 0
http://bugs.python.org/issue12307  closed by eric.araujo

#12618: py_compile cannot create files in current directory
http://bugs.python.org/issue12618  closed by meador.inge

#12850: [PATCH] stm.atomic
http://bugs.python.org/issue12850  closed by arigo

#12856: tempfile PRNG reuse between parent and child process
http://bugs.python.org/issue12856  closed by pitrou

#12945: ctypes works incorrectly with _swappedbytes_ = 1
http://bugs.python.org/issue12945  closed by meador.inge

#13380: ctypes: add an internal function for reseting the ctypes cache
http://bugs.python.org/issue13380  closed by meador.inge

#13434: time.xmlrpc.com dead
http://bugs.python.org/issue13434  closed by pitrou

#13448: PEP 3155 implementation
http://bugs.python.org/issue13448  closed by pitrou

#13452: PyUnicode_EncodeDecimal: reject error handlers different than 
http://bugs.python.org/issue13452  closed by haypo

#13467: Typo in doc for library/sysconfig
http://bugs.python.org/issue13467  closed by eric.araujo

#13471: setting access time beyond Jan. 2038 on remote share failes on
http://bugs.python.org/issue13471  closed by Thorsten.Simons

#13481: Use an accurate clock in timeit
http://bugs.python.org/issue13481  closed by pitrou

#13482: _tkinter.TclError: invalid command name "tixDirSelectBox"
http://bugs.python.org/issue13482  closed by Martin.Unzner

#13484: mail rejected: tutor at python.org
http://bugs.python.org/issue13484  closed by eric.araujo

#13485: tcl question
http://bugs.python.org/issue13485  closed by amaury.forgeotdarc

#13487: inspect.getmodule fails when module imports change sys.modules
http://bugs.python.org/issue13487  closed by eric.araujo

#13488: Some old preprocessors have problem with "#define" not in the 
http://bugs.python.org/issue13488  closed by jcea

#13489: collections.Counter doc does not list added version
http://bugs.python.org/issue13489  closed by ezio.melotti

#13490: broken downloads counting on pypi.python.org
http://bugs.python.org/issue13490  closed by loewis

#13509: On uninstallation, distutils bdist_wininst fails to run post i
http://bugs.python.org/issue13509  closed by eric.araujo

#13514: PIL does not support iTXt PNG chunks [patch]
http://bugs.python.org/issue13514  closed by ezio.melotti

#13517: readdir() in os.listdir not threadsafe on OSX 10.6.8
http://bugs.python.org/issue13517  closed by thouis

From solipsis at pitrou.net  Sat Dec  3 21:39:03 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 3 Dec 2011 21:39:03 +0100
Subject: [Python-Dev] Style guide for FAQs?
Message-ID: <20111203213903.1ebfe7c5@pitrou.net>


Hello,

I notice that some FAQs are not only outdated but seem to favour a
writing style that's quite lengthy and full of anecdotal details.
It seems to me that there is value in giving terse answers in FAQs (we
have - or should have - reference documentation where things are
explained in more detail).

One primary example is the performance question:
file:///home/antoine/cpython/32/Doc/build/html/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up

It mixes a couple of generalities with incredibly specific suggestions
such as early binding of methods or use of default argument values to
fold constants. I think a beginner reading this entry won't get any
meaningful information out of it.

Any advice on whether it's ok to hack and slash into the fat? :)

Regards

Antoine.



From solipsis at pitrou.net  Sat Dec  3 21:58:01 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 3 Dec 2011 21:58:01 +0100
Subject: [Python-Dev] Style guide for FAQs?
References: <20111203213903.1ebfe7c5@pitrou.net>
Message-ID: <20111203215801.74ea1209@pitrou.net>

On Sat, 3 Dec 2011 21:39:03 +0100
Antoine Pitrou <solipsis at pitrou.net> wrote:
> 
> One primary example is the performance question:
> file:///home/antoine/cpython/32/Doc/build/html/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up

Woohoo. This should of course be:
http://docs.python.org/dev/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up

cheers

Antoine.



From tjreedy at udel.edu  Sun Dec  4 03:55:35 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 03 Dec 2011 21:55:35 -0500
Subject: [Python-Dev] Style guide for FAQs?
In-Reply-To: <20111203215801.74ea1209@pitrou.net>
References: <20111203213903.1ebfe7c5@pitrou.net>
	<20111203215801.74ea1209@pitrou.net>
Message-ID: <jbenfe$j8u$2@dough.gmane.org>

On 12/3/2011 3:58 PM, Antoine Pitrou wrote:
> On Sat, 3 Dec 2011 21:39:03 +0100
> Antoine Pitrou<solipsis at pitrou.net>  wrote:
>>
>> One primary example is the performance question:
>> file:///home/antoine/cpython/32/Doc/build/html/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up
>
> Woohoo. This should of course be:
> http://docs.python.org/dev/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up

That looks like a mini-howto ;-),
rather than a FAQ entry.

The changes you have made so far have looked good to me.

-- 
Terry Jan Reedy


From ncoghlan at gmail.com  Sun Dec  4 05:11:58 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 4 Dec 2011 14:11:58 +1000
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Issue #13211: Add
 .reason attribute to HTTPError to implement parent class
In-Reply-To: <E1RWqr1-0000qP-2f@dinsdale.python.org>
References: <E1RWqr1-0000qP-2f@dinsdale.python.org>
Message-ID: <CADiSq7ckdv32_JEdLxLzq6fRhx3cFs2eTpR3yAtYYR=fPz9-yA@mail.gmail.com>

On Sun, Dec 4, 2011 at 12:46 AM, jason.coombs
<python-checkins at python.org> wrote:
> +def test_HTTPError_interface():
> + ? ?"""
> + ? ?Issue 13211 reveals that HTTPError didn't implement the URLError
> + ? ?interface even though HTTPError is a subclass of URLError.
> +
> + ? ?>>> err = urllib.error.HTTPError(msg='something bad happened', url=None, code=None, hdrs=None, fp=None)
> + ? ?>>> assert hasattr(err, 'reason')
> + ? ?>>> err.reason
> + ? ?'something bad happened'
> + ? ?"""
> +

Did you re-run the test suite after forward-porting to 3.3? I'm
consistently getting failures:

$ ./python -m test test_urllib2
[1/1] test_urllib2
**********************************************************************
File "/home/ncoghlan/devel/py3k/Lib/test/test_urllib2.py", line 1457,
in test.test_urllib2.test_HTTPError_interface
Failed example:
    err = urllib.error.HTTPError(msg='something bad happened',
url=None, code=None, hdrs=None, fp=None)
Exception raised:
    Traceback (most recent call last):
      File "/home/ncoghlan/devel/py3k/Lib/doctest.py", line 1253, in __run
        compileflags, 1), test.globs)
      File "<doctest test.test_urllib2.test_HTTPError_interface[0]>",
line 1, in <module>
        err = urllib.error.HTTPError(msg='something bad happened',
url=None, code=None, hdrs=None, fp=None)
    TypeError: HTTPError does not take keyword arguments
**********************************************************************
File "/home/ncoghlan/devel/py3k/Lib/test/test_urllib2.py", line 1458,
in test.test_urllib2.test_HTTPError_interface
Failed example:
    assert hasattr(err, 'reason')
Exception raised:
    Traceback (most recent call last):
      File "/home/ncoghlan/devel/py3k/Lib/doctest.py", line 1253, in __run
        compileflags, 1), test.globs)
      File "<doctest test.test_urllib2.test_HTTPError_interface[1]>",
line 1, in <module>
        assert hasattr(err, 'reason')
    NameError: name 'err' is not defined
**********************************************************************
File "/home/ncoghlan/devel/py3k/Lib/test/test_urllib2.py", line 1459,
in test.test_urllib2.test_HTTPError_interface
Failed example:
    err.reason
Exception raised:
    Traceback (most recent call last):
      File "/home/ncoghlan/devel/py3k/Lib/doctest.py", line 1253, in __run
        compileflags, 1), test.globs)
      File "<doctest test.test_urllib2.test_HTTPError_interface[2]>",
line 1, in <module>
        err.reason
    NameError: name 'err' is not defined
**********************************************************************
1 items had failures:
   3 of   3 in test.test_urllib2.test_HTTPError_interface
***Test Failed*** 3 failures.
test test_urllib2 failed -- 3 of 65 doctests failed
1 test failed:
    test_urllib2
[142313 refs]

Now, this failure is quite possibly due to a flaw in the PEP 3151
implementation (see http://bugs.python.org/issue12555), but picking up
this kind of thing is the reason we say to always run the tests before
committing, even for a simple merge.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From g.brandl at gmx.net  Sun Dec  4 09:42:23 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 04 Dec 2011 09:42:23 +0100
Subject: [Python-Dev] Style guide for FAQs?
In-Reply-To: <jbenfe$j8u$2@dough.gmane.org>
References: <20111203213903.1ebfe7c5@pitrou.net>
	<20111203215801.74ea1209@pitrou.net> <jbenfe$j8u$2@dough.gmane.org>
Message-ID: <jbfbpe$ban$1@dough.gmane.org>

Am 04.12.2011 03:55, schrieb Terry Reedy:
> On 12/3/2011 3:58 PM, Antoine Pitrou wrote:
>> On Sat, 3 Dec 2011 21:39:03 +0100
>> Antoine Pitrou<solipsis at pitrou.net>  wrote:
>>>
>>> One primary example is the performance question:
>>> file:///home/antoine/cpython/32/Doc/build/html/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up
>>
>> Woohoo. This should of course be:
>> http://docs.python.org/dev/faq/programming.html#my-program-is-too-slow-how-do-i-speed-it-up
> 
> That looks like a mini-howto ;-),
> rather than a FAQ entry.
> 
> The changes you have made so far have looked good to me.

Definitely.

Georg


From martin at v.loewis.de  Sun Dec  4 10:56:06 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 04 Dec 2011 10:56:06 +0100
Subject: [Python-Dev] STM and python
In-Reply-To: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
References: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
Message-ID: <4EDB43B6.1080501@v.loewis.de>

> However given advances in locking and garbage collection in the last
> decade, what attempts have been made recently to try these new ideas
> out?

If that's the question you want an answer to, it would have been better
had you listed the efforts that you are already aware of. If you really
are unaware of any effort, try googling to find

http://www.kamaelia.org/STM
http://peak.telecommunity.com/DevCenter/TrellisSTM
http://bugs.python.org/issue12850
http://dl.acm.org/citation.cfm?id=1978911
http://www-sal.cs.uiuc.edu/~zilles/papers/python_htm.dls2006.pdf
and more

Regards,
Martin

From mail at timgolden.me.uk  Sun Dec  4 11:59:13 2011
From: mail at timgolden.me.uk (Tim Golden)
Date: Sun, 04 Dec 2011 10:59:13 +0000
Subject: [Python-Dev] Issue 13524: subprocess on Windows
Message-ID: <4EDB5281.8040807@timgolden.me.uk>

http://bugs.python.org/issue13524

Someone raised issue13524 yesterday to illustrate that a
subprocess will crash immediately if an environment block is
passed which does not contain a valid SystemRoot environment
variable.

Note that the calling (Python) process is unaffected; this
isn't - strictly - a Python crash. The issue is essentially
a Windows one where a fairly unusual cornercase -- passing
an empty environment -- has unforseen effects.

The smallest reproducible example is this:

import os, sys
import subprocess
subprocess.Popen(
     [sys.executable],
     env={}
)

and it can be prevented like this:

import os, sys
import subprocess
subprocess.Popen(
     [sys.executable],
     env={"SystemRoot" : os.environ['SystemRoot']}
)

There's a blog post here which gives a worked example:

 
http://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/

but as the author points out, nowhere on MSDN is there a warning
that SystemRoot is mandatory. (And, in effect, it's not as it
would just be possible to write code which had no need of it).

So... what's our take on this? As I see it we could:

1) Do nothing: it's the caller's responsibility to understand the
    complications of the chosen Operating System.

2) Add a doc warning (ironically, considering the recent to-and-fro
    on doc warnings in this very module).

3) Add a check into the subprocess.Popen code which would raise some
    exception if the environment block is empty (or doesn't contain
    SystemRoot) on the grounds that this probably wasn't what the user
    thought they were doing.

4) Automatically add an entry for SystemRoot to the env block if it's
    not present already.


It's tempting to opt for (1) and if we were exposing an API called
CreateProcess which mimicked the underlying Windows API I would be
inclined to go that way. But we're abstracting a little bit away
from that and I think that that layer of abstraction carries its
own responsibilities.

Option (3) seems to give the best balance. It *is* a cornercase, but at
the same time it's easy to misunderstand that the env block you're
passing in *replaces* rather than *augments* that of the current
process.

Thoughts?

TJG

From ncoghlan at gmail.com  Sun Dec  4 12:42:14 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 4 Dec 2011 21:42:14 +1000
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <4EDB5281.8040807@timgolden.me.uk>
References: <4EDB5281.8040807@timgolden.me.uk>
Message-ID: <CADiSq7cpicaUQcaM2V7xjSp2MKeD0xXu4ZVT+4QwcUiO0T8cDA@mail.gmail.com>

On Sun, Dec 4, 2011 at 8:59 PM, Tim Golden <mail at timgolden.me.uk> wrote:
> So... what's our take on this? As I see it we could:
>
> 1) Do nothing: it's the caller's responsibility to understand the
> ? complications of the chosen Operating System.
>
> 2) Add a doc warning (ironically, considering the recent to-and-fro
> ? on doc warnings in this very module).
>
> 3) Add a check into the subprocess.Popen code which would raise some
> ? exception if the environment block is empty (or doesn't contain
> ? SystemRoot) on the grounds that this probably wasn't what the user
> ? thought they were doing.
>
> 4) Automatically add an entry for SystemRoot to the env block if it's
> ? not present already.
>
>
> It's tempting to opt for (1) and if we were exposing an API called
> CreateProcess which mimicked the underlying Windows API I would be
> inclined to go that way. But we're abstracting a little bit away
> from that and I think that that layer of abstraction carries its
> own responsibilities.
>
> Option (3) seems to give the best balance. It *is* a cornercase, but at
> the same time it's easy to misunderstand that the env block you're
> passing in *replaces* rather than *augments* that of the current
> process.

There's actually two questions to be answered:
1. What should we do in 3.2 and 2.7?
2. Should we do anything more in 3.3?

Raising an exception is not really an appropriate response for any of
them - running without SystemRoot actually works fine in most cases,
so raising an exception could break currently working code. As the
blog post noted, it's only some specific modules that don't work if
SystemRoot is not set. Should we really be inserting workarounds in
subprocess for buggy platform code that doesn't fall back to a
sensible default if a particular environment variable isn't set?

So, I don't think this is really a subprocess problem at all. It's a
platform bug on Windows that means the 'random' module may fail if
SystemRoot is not set in the environment. So, I think the right
approach is to:

1. Unset 'SystemRoot' in a windows shell
2. Run the test suite and observe the scale of the breakage
3. Then either:
- figure out a workaround that allows us to set an appropriate default
value for SystemRoot if needed (depending on the scope of the problem,
either do this at interpreter startup, or only in affected modules)
- if no feasible workaround is found, detect the failures related to
this problem and report a more meaningful error message

Either way, add explicit tests to the test suite to ensure that
affected modules behave as expected when SystemRoot is not set.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From mail at timgolden.me.uk  Sun Dec  4 13:20:11 2011
From: mail at timgolden.me.uk (Tim Golden)
Date: Sun, 04 Dec 2011 12:20:11 +0000
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <CADiSq7cpicaUQcaM2V7xjSp2MKeD0xXu4ZVT+4QwcUiO0T8cDA@mail.gmail.com>
References: <4EDB5281.8040807@timgolden.me.uk>
	<CADiSq7cpicaUQcaM2V7xjSp2MKeD0xXu4ZVT+4QwcUiO0T8cDA@mail.gmail.com>
Message-ID: <4EDB657B.3030105@timgolden.me.uk>

On 04/12/2011 11:42, Nick Coghlan wrote:
> There's actually two questions to be answered:
> 1. What should we do in 3.2 and 2.7?
> 2. Should we do anything more in 3.3?

Agreed.

> 1. Unset 'SystemRoot' in a windows shell
> 2. Run the test suite and observe the scale of the breakage

Sorry; something I should have highlighted in the earlier post.
Behaviour varies between Windows versions. On WinXP, if you
unset SystemRoot in a cmd shell, you won't be able to run the
test suite: Python won't even start up. On Win7 Python will
start but, eg, the random module will fail.

This is actually a separate issue: how much of Python will work
without a valid SystemRoot. The OP's issue was that if you use
subprocess to start an arbitrary process (you get the same problem
if you try "notepad.exe") and pass it an env block without a valid
SystemRoot then that process will likely fail to start up. And it
won't be obvious why.

The case where someone tries to run Python (in general) without
a valid SystemRoot is a tiny cornercase and you'd be quite right
to push that back and say "Don't do that". I don't believe we have
to test for it or add code to work around it.

While I put the idea forward, I agree that an exception is more likely
than not to break existing code. I just can't see any clear alternative,
apart from option 1: we do nothing.

TJG

From p.f.moore at gmail.com  Sun Dec  4 13:41:46 2011
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 4 Dec 2011 12:41:46 +0000
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <4EDB657B.3030105@timgolden.me.uk>
References: <4EDB5281.8040807@timgolden.me.uk>
	<CADiSq7cpicaUQcaM2V7xjSp2MKeD0xXu4ZVT+4QwcUiO0T8cDA@mail.gmail.com>
	<4EDB657B.3030105@timgolden.me.uk>
Message-ID: <CACac1F_aqjfngqaAvvgiB=otcaytwb5V__eFxKcSNDamvOqgEQ@mail.gmail.com>

On 4 December 2011 12:20, Tim Golden <mail at timgolden.me.uk> wrote:
> On 04/12/2011 11:42, Nick Coghlan wrote:
>>
>> There's actually two questions to be answered:
>> 1. What should we do in 3.2 and 2.7?
>> 2. Should we do anything more in 3.3?

See below...

> This is actually a separate issue: how much of Python will work
> without a valid SystemRoot. The OP's issue was that if you use
> subprocess to start an arbitrary process (you get the same problem
> if you try "notepad.exe") and pass it an env block without a valid
> SystemRoot then that process will likely fail to start up. And it
> won't be obvious why.

I'm not 100% clear on the problem here. From how I'm reading things,
the problem is that not supplying SystemRoot will cause (some or all)
invocations of subprocess.Popen to fail - it's not specific to
starting Python. In that case, it seems to me that it's an OS issue,
but one that we should work around.

My feeling is that option 4 is best - set SystemRoot to its current
value if it's not been set by the user. This leaves the user unable to
set an environment with SystemRoot missing, but if the OS fails to
handle that properly, then I'm OK with that limitation.

As regards the version question above, I'd take the view that as an OS
issue, it's OK to leave it unchanged in 2.7 and 3.2, but add the above
to 3.3.

Paul.

From mail at timgolden.me.uk  Sun Dec  4 15:08:36 2011
From: mail at timgolden.me.uk (Tim Golden)
Date: Sun, 04 Dec 2011 14:08:36 +0000
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <CACac1F_aqjfngqaAvvgiB=otcaytwb5V__eFxKcSNDamvOqgEQ@mail.gmail.com>
References: <4EDB5281.8040807@timgolden.me.uk>
	<CADiSq7cpicaUQcaM2V7xjSp2MKeD0xXu4ZVT+4QwcUiO0T8cDA@mail.gmail.com>
	<4EDB657B.3030105@timgolden.me.uk>
	<CACac1F_aqjfngqaAvvgiB=otcaytwb5V__eFxKcSNDamvOqgEQ@mail.gmail.com>
Message-ID: <4EDB7EE4.3030403@timgolden.me.uk>

On 04/12/2011 12:41, Paul Moore wrote:
> I'm not 100% clear on the problem here. From how I'm reading things,
> the problem is that not supplying SystemRoot will cause (some or all)
> invocations of subprocess.Popen to fail - it's not specific to
> starting Python.

That's basically the situation.

>
> My feeling is that option 4 is best - set SystemRoot to its current
> value if it's not been set by the user. This leaves the user unable to
> set an environment with SystemRoot missing, but if the OS fails to
> handle that properly, then I'm OK with that limitation.

FWIW if we went this route we could set it if it's missing but
that still allows the user to set it to blank. I'm just a little
bit wary of altering the environment which the user believes has
been set.

TJG

From martin.packman at canonical.com  Sun Dec  4 17:48:16 2011
From: martin.packman at canonical.com (Martin Packman)
Date: Sun, 4 Dec 2011 16:48:16 +0000
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <4EDB5281.8040807@timgolden.me.uk>
References: <4EDB5281.8040807@timgolden.me.uk>
Message-ID: <CAOG+YdqCME+JjLiztXyZzrVhj-uOK7msKFVmD1=j3WXZ3EfQ9A@mail.gmail.com>

On 04/12/2011, Tim Golden <mail at timgolden.me.uk> wrote:
>
> Someone raised issue13524 yesterday to illustrate that a
> subprocess will crash immediately if an environment block is
> passed which does not contain a valid SystemRoot environment
> variable.
...
> 2) Add a doc warning (ironically, considering the recent to-and-fro
>     on doc warnings in this very module).

There appears to already be such a warning, added because of a similar
earlier bug:

<http://bugs.python.org/issue3440>

Really this is a problem with the subprocess api making a common case
harder to do than necessary. If you read the documentation, you'll get
it right, but that's not ideal:

<http://sourcefrog.net/weblog/software/aesthetics/interface-levels.html>

>From the bug, the problem with the reporter's code is he passes a dict
with the one value he cares about as `env` to subprocess.Popen without
realising that it will prevent the inheriting of the current
environment. Your suggested fix for him also has an issue, it changes
the environment of the parent process without resetting it. Instead
you need something like:

    e = dict(os.environ)
    e['PATH_TO_MY_APPS'] = "path/to/my/apps"

The bzrlib TestCase has a method using subprocess that provides an
`env_changes` argument. With that, it's much easier to override or
remove just one variable without accidentally clearing the current
environment.

Martin

From ncoghlan at gmail.com  Sun Dec  4 21:52:14 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 5 Dec 2011 06:52:14 +1000
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <4EDB657B.3030105@timgolden.me.uk>
References: <4EDB5281.8040807@timgolden.me.uk>
	<CADiSq7cpicaUQcaM2V7xjSp2MKeD0xXu4ZVT+4QwcUiO0T8cDA@mail.gmail.com>
	<4EDB657B.3030105@timgolden.me.uk>
Message-ID: <CADiSq7erFoadB7uRxYgopWQXZx2H91YeZMzyUMmgbvT6DT9OFw@mail.gmail.com>

That's why I'm suggesting we look specifically at the cases where *Python*
misbehaves in an empty environment on Windows. Those are legitimately our
issue.

The problem in *general* is a platform one, so I don't think it makes sense
for us to modify the environment that has explicitly been passed in (e.g.
how would you test running without SystemRoot if subprocess added it
automatically?).

An extra parameter in the already confusing Popen signature wouldn't be
clearer than explicitly copying os.environ and modifying it.

--
Nick Coghlan (via Gmail on Android, so likely to be more terse than usual)
On Dec 4, 2011 10:22 PM, "Tim Golden" <mail at timgolden.me.uk> wrote:

> On 04/12/2011 11:42, Nick Coghlan wrote:
>
>> There's actually two questions to be answered:
>> 1. What should we do in 3.2 and 2.7?
>> 2. Should we do anything more in 3.3?
>>
>
> Agreed.
>
>  1. Unset 'SystemRoot' in a windows shell
>> 2. Run the test suite and observe the scale of the breakage
>>
>
> Sorry; something I should have highlighted in the earlier post.
> Behaviour varies between Windows versions. On WinXP, if you
> unset SystemRoot in a cmd shell, you won't be able to run the
> test suite: Python won't even start up. On Win7 Python will
> start but, eg, the random module will fail.
>
> This is actually a separate issue: how much of Python will work
> without a valid SystemRoot. The OP's issue was that if you use
> subprocess to start an arbitrary process (you get the same problem
> if you try "notepad.exe") and pass it an env block without a valid
> SystemRoot then that process will likely fail to start up. And it
> won't be obvious why.
>
> The case where someone tries to run Python (in general) without
> a valid SystemRoot is a tiny cornercase and you'd be quite right
> to push that back and say "Don't do that". I don't believe we have
> to test for it or add code to work around it.
>
> While I put the idea forward, I agree that an exception is more likely
> than not to break existing code. I just can't see any clear alternative,
> apart from option 1: we do nothing.
>
> TJG
> ______________________________**_________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
> ncoghlan%40gmail.com<http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111205/f6b3735a/attachment.html>

From tjreedy at udel.edu  Sun Dec  4 22:08:33 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 04 Dec 2011 16:08:33 -0500
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <4EDB5281.8040807@timgolden.me.uk>
References: <4EDB5281.8040807@timgolden.me.uk>
Message-ID: <jbgngp$md7$1@dough.gmane.org>

On 12/4/2011 5:59 AM, Tim Golden wrote:
> http://bugs.python.org/issue13524
>
> Someone raised issue13524 yesterday to illustrate that a
> subprocess will crash immediately if an environment block is
> passed which does not contain a valid SystemRoot environment
> variable.
>
> Note that the calling (Python) process is unaffected; this
> isn't - strictly - a Python crash. The issue is essentially
> a Windows one where a fairly unusual cornercase -- passing
> an empty environment -- has unforseen effects.
>
> The smallest reproducible example is this:
>
> import os, sys
> import subprocess
> subprocess.Popen(
> [sys.executable],
> env={}
> )
>
> and it can be prevented like this:
>
> import os, sys
> import subprocess
> subprocess.Popen(
> [sys.executable],
> env={"SystemRoot" : os.environ['SystemRoot']}
> )
>
> There's a blog post here which gives a worked example:
>
>
> http://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/
>
>
> but as the author points out, nowhere on MSDN is there a warning
> that SystemRoot is mandatory. (And, in effect, it's not as it
> would just be possible to write code which had no need of it).
>
> So... what's our take on this? As I see it we could:
>
> 1) Do nothing: it's the caller's responsibility to understand the
> complications of the chosen Operating System.
>
> 2) Add a doc warning (ironically, considering the recent to-and-fro
> on doc warnings in this very module).
>
> 3) Add a check into the subprocess.Popen code which would raise some
> exception if the environment block is empty (or doesn't contain
> SystemRoot) on the grounds that this probably wasn't what the user
> thought they were doing.
>
> 4) Automatically add an entry for SystemRoot to the env block if it's
> not present already.
>
>
> It's tempting to opt for (1) and if we were exposing an API called
> CreateProcess which mimicked the underlying Windows API I would be
> inclined to go that way. But we're abstracting a little bit away
> from that and I think that that layer of abstraction carries its
> own responsibilities.
>
> Option (3) seems to give the best balance. It *is* a cornercase, but at
> the same time it's easy to misunderstand that the env block you're
> passing in *replaces* rather than *augments* that of the current
> process.
>
> Thoughts?

My inclination would be #4 on Windows, certainly for 3.3, unless there 
is a clear reason not to.

For 2.7/3.2, at least say (not warn, just say) in the doc that that a 
subprocess on Windows may require that SystemRoot be set.

The blog post says the problem is worse on Win 7. So it is not going away.

The blog post has a comment from Martin Loewis a year ago linking to
http://mail.python.org/pipermail/python-dev/2010-November/105866.html
That thread refers to a bug that was not posted on the tracker. This 
makes at least three (including #3440).

-- 
Terry Jan Reedy


From ncoghlan at gmail.com  Mon Dec  5 01:16:01 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 5 Dec 2011 10:16:01 +1000
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <jbgngp$md7$1@dough.gmane.org>
References: <4EDB5281.8040807@timgolden.me.uk> <jbgngp$md7$1@dough.gmane.org>
Message-ID: <CADiSq7fPfmYM6OvXLD0uJGfA+raBMsk6H4SVeXX=XdBCV+ZLvA@mail.gmail.com>

On Mon, Dec 5, 2011 at 7:08 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> My inclination would be #4 on Windows, certainly for 3.3, unless there is a
> clear reason not to.

Yes, there is: that environment is the *exact* environment that should
be passed to the child processes. It's not our place to go implicitly
adding things to it. If MS aren't willing to add SystemRoot
automatically in CreateProcess (despite releasing libraries that
require it to be set), there's no way we should be adding it for them.

Fixing our stuff (like importing the random module) to work to at
least some degree even if SystemRoot isn't set should definitely be
done, but beyond that a comment in the docs pointing out the problem
(i.e. MS releasing things that require SystemRoot be set without
updating CreateProcess to ensure that it *is* set) is as far as we
should go.

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From martin at v.loewis.de  Mon Dec  5 09:10:51 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 05 Dec 2011 09:10:51 +0100
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <4EDB5281.8040807@timgolden.me.uk>
References: <4EDB5281.8040807@timgolden.me.uk>
Message-ID: <4EDC7C8B.6040007@v.loewis.de>

> Thoughts?

Apparently, there are at least two "users" of SystemRoot:
- side-by-side (fusion?) apparently uses it to locate the WinSxS
  folder, at least on some Windows releases,
- certain registry keys contain SystemRoot, in particular the
  path names of crypto providers (this apparently is XP only,
  and fixed on Windows 7)

I agree with Nick that we shouldn't do anything except perhaps
for documentation changes. There are many other environment variables
whose absence could also cause failures to run the executable,
such as PATH, LD_LIBRARY_PATH, etc. Even not passing DISPLAY may
cause the subprocess to fail starting.

IOW, users should "normally" pass all environment variables, and
only augment it with any specific additions and deletions that
they know are needed for the subprocess. If a user deliberately
passes a small set of environment variables (e.g. none), we must
assume that it was deliberate, and that any resulting failures
are desired. People do such stuff for security reasons, and
side-stepping their enforcement is not appropriate for Python
to do.

Regards,
Martin

From mail at timgolden.me.uk  Mon Dec  5 10:01:17 2011
From: mail at timgolden.me.uk (Tim Golden)
Date: Mon, 05 Dec 2011 09:01:17 +0000
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <4EDC7C8B.6040007@v.loewis.de>
References: <4EDB5281.8040807@timgolden.me.uk> <4EDC7C8B.6040007@v.loewis.de>
Message-ID: <4EDC885D.5030708@timgolden.me.uk>

On 05/12/2011 08:10, "Martin v. L?wis" wrote:
> I agree with Nick that we shouldn't do anything except perhaps
> for documentation changes. There are many other environment variables
> whose absence could also cause failures to run the executable,
> such as PATH, LD_LIBRARY_PATH, etc. Even not passing DISPLAY may
> cause the subprocess to fail starting.
>
> IOW, users should "normally" pass all environment variables, and
> only augment it with any specific additions and deletions that
> they know are needed for the subprocess. If a user deliberately
> passes a small set of environment variables (e.g. none), we must
> assume that it was deliberate, and that any resulting failures
> are desired. People do such stuff for security reasons, and
> side-stepping their enforcement is not appropriate for Python
> to do.

Having slept on this I must confess that this is pretty much the
conclusion I'd come to: we can't do anything in code which is
guaranteed to be correct in every case. The best we can do is
document. And, as Martin Packman pointed out (and I had missed),
this particular condition is already documented, at least enough
to point a user to.

We could probably do with a HOWTO (or blog post or whatever) on using
subprocess on Windows, not least because a fair amount of the docs
are Unix-centric and actually very slightly confusing for naive
Windows-based developers.

I think my proposal now is: do nothing. I'm aware that Nick Coghlan
has been making fairly extensive changes to the subprocess docs
recently and I don't I can propose anything on this matter which
amounts to more than shuffling the pieces around.

TJG

From ncoghlan at gmail.com  Mon Dec  5 10:41:18 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 5 Dec 2011 19:41:18 +1000
Subject: [Python-Dev] Issue 13524: subprocess on Windows
In-Reply-To: <4EDC885D.5030708@timgolden.me.uk>
References: <4EDB5281.8040807@timgolden.me.uk> <4EDC7C8B.6040007@v.loewis.de>
	<4EDC885D.5030708@timgolden.me.uk>
Message-ID: <CADiSq7dDQXLXXu2kyPD5LuFK==T95ZHbatz+r1e25oktkehCuQ@mail.gmail.com>

On Mon, Dec 5, 2011 at 7:01 PM, Tim Golden <mail at timgolden.me.uk> wrote:
> We could probably do with a HOWTO (or blog post or whatever) on using
> subprocess on Windows, not least because a fair amount of the docs
> are Unix-centric and actually very slightly confusing for naive
> Windows-based developers.
>
> I think my proposal now is: do nothing. I'm aware that Nick Coghlan
> has been making fairly extensive changes to the subprocess docs
> recently and I don't I can propose anything on this matter which
> amounts to more than shuffling the pieces around.

The subprocess module could probably do with a HOWTO, full stop.
Subprocess invocation is something where platform details are always
going to matter a lot, and there are subtle details even on Unix that
are confusing (e.g. I have a command in my current project that I've
only managed to get working by running it via the shell - I still
don't know why direct invocation of the binary with the appropriate
arguments doesn't work).

At the moment, we're still trying to cram an entire essay on
subprocess invocation into the subprocess.Popen constructor
definition, which is far from optimal.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From mathieu.malaterre at gmail.com  Mon Dec  5 16:26:50 2011
From: mathieu.malaterre at gmail.com (Mathieu Malaterre)
Date: Mon, 5 Dec 2011 16:26:50 +0100
Subject: [Python-Dev] ImportError: No module named multiarray (is back)
In-Reply-To: <4ED127C5.1060004@in.waw.pl>
References: <CA+7wUswQVDddJJiXaJFk8gNmYMW9L4W_3GtSWKDV9LAL=SatNQ@mail.gmail.com>
	<4ECBFF19.8080100@in.waw.pl> <4ECD1D31.7080802@netwok.org>
	<4ED127C5.1060004@in.waw.pl>
Message-ID: <CA+7wUswjbPL+Q3D7U_kJV2p_injQM2nTV8MVkN0X17+uUpumdw@mail.gmail.com>

Hi Zbyszek,

  See below my comment.

2011/11/26 Zbigniew J?drzejewski-Szmek <zbyszek at in.waw.pl>:
> Hi,
> I apologize in advance for the length of this mail.
>
> sys.path
> ========
> When a script or a module is executed by invoking python with proper
> arguments, sys.path is extended. When a path to script is given, the
> directory containing the script is prepended. When '-m' or '-c' is used,
> $CWD is prepended. This is documented in
> http://docs.python.org/dev/using/cmdline.html, so far ok.
>
> sys.path and $PYTHONPATH is like $PATH -- if you can convince someone to put
> a directory under your control in any of them, you can execute code as this
> someone. Therefore, sys.path is dangerous and important. Unfortunately,
> sys.path manipulations are only described very briefly, and without any
> commentary, in the on-line documentation. python(1) manpage doesn't even
> mention them.
>
> The problem: each of the commands below is insecure:
>
> python /tmp/script.py ? ? ? ? ? ? ? ? (when script.py is safe by itself)
> ? ? ? ?('/tmp' is added to sys.path, so an attacker can override any
> ? ? ? ? module imported in /tmp/script.py by writing to /tmp/module.py)
>
> cd /tmp && python -mtimeit -s 'import numpy' 'numpy.test()'
> ? ? ? ?(UNIX users are accustomed to being able to safely execute
> ? ? ? ? programs in any directory, e.g. ls, or gcc, or something.
>
> ? ? ? ? Here '' is added to sys.path, so it is not secure to run
> ? ? ? ? python is other-user-writable directories.)
>
> cd /tmp/ && python -c 'import numpy; print(numpy.version.version)'
> ? ? ? ? (The same as above, '' is added to sys.path.)
>
> cd /tmp && python
> ? ? ? ? (The same as above).
>
> IMHO, if this (long-lived) behaviour is necessary, it should at least be
> prominently documented. Also in the manpage.
>
> Prepending realpath(dirname(scriptname))
> ========================================
> Before adding a directory to sys.path as described above, Python actually
> runs os.path.realpath over it. This means that if the path to a script given
> on the commandline is actually a symlink, the directory containing the real
> file will be executed. This behaviour is not really documented (the
> documentation only says "the directory containing that file is added to the
> start of sys.path"), but since the integrity of sys.path is so important, it
> should be, IMHO.
>
> Using realpath instead of the (expected) path specified by the user breaks
> imports of non-pure-python (mixed .py and .so) modules from modules executed
> as scripts on Debian. This is because Debian installs
> architecture-independent python files in /usr/share/pyshared, and symlinks
> those files into /usr/lib/pymodules/pythonX.Y/. The architecture-dependent
> .so and python-version-dependent .pyc files are installed in
> ?/usr/lib/pymodules/pythonX.Y/. When a script, e.g.
> /usr/lib/pymodules/pythonX.Y/script.py, is executed, the directory
> /usr/share/pyshared is prepended to sys.path. If the script tries to import
> a module which has architecture-dependent parts (e.g. numpy) it first sees
> the incomplete module in /usr/share/pyshared and fails.
>
> This happens for example in parallel python
> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620551) and recently when
> packaging CellProfiler for Debian.
>
> Again, if this is on purpose, it should be documented.
>
> PEP 395 (Qualified Names for Modules)
> =====================================
>
> PEP 395 proposes another sys.path manipulation. When running a script, the
> directory tree will be walked upwards as long as there are __init__.py
> files, and then the first directory without will be added.
>
> This is of course a fine idea, but it makes a scenario, which was previously
> safe, insecure. More precisely, when executing a script in a directory in a
> parent directory-writable-by-other-users, the parent directory will be added
> to sys.path.
>
> So the (safe) operation of downloading an archive with a package, unzipping
> it in /tmp, changing into the created directory, checking that the script
> doesn't do anything bad, and running a script is now insecure if there is
> __init__.py in the archive root.
>
>
> I guess that it would be useful to have an option to turn off those sys.path
> manipulations.


Thanks very much for the details explanation. Given this, I believe I
can safely give up on CellProfiler packaging until this issue is
addressed upstream (either in CellProfiler using an indirection, or in
python).

Thanks,
-- 
Mathieu

From arigo at tunes.org  Tue Dec  6 10:55:58 2011
From: arigo at tunes.org (Armin Rigo)
Date: Tue, 6 Dec 2011 10:55:58 +0100
Subject: [Python-Dev] STM and python
In-Reply-To: <CAB4yi1NciyQ+ROLC4DW2iFM6A3f_zNF_aqeJpNobsf1Q+J8Zzg@mail.gmail.com>
References: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
	<CAPZV6o_Ud9-23NhOsP9sNy24LUXPRA05Ki2g3KwduZt8dNMwOg@mail.gmail.com>
	<CAB4yi1MMWVjsHoh0yLtVsQKZAiMRq_pqR9ykGSya=27+SFd60g@mail.gmail.com>
	<CAGE7PNKs4po2pYFGsw61JoAJaQ1g=YH_5wqR7VgRMaacn1BTuA@mail.gmail.com>
	<CADiSq7fiHBo43mChODT2cr7ydV9FB3APBW-x=cHvS95Nod3qLQ@mail.gmail.com>
	<CAB4yi1MvCu258S9zTHGRdAV3RbXVWeiCO-3vYpuZNfYBA=z4Hw@mail.gmail.com>
	<CAMSv6X2mntP6crK2L7v4_U-p+0+GwrBYdrLQ03Kcz7zZ36sKpA@mail.gmail.com>
	<CAB4yi1NciyQ+ROLC4DW2iFM6A3f_zNF_aqeJpNobsf1Q+J8Zzg@mail.gmail.com>
Message-ID: <CAMSv6X1mw1bj8RFSkV3u4dwP9G+tk3VQWAfpftk6L-tBuCT7tg@mail.gmail.com>

Hi,

Actually, not even one month ago, Intel announced that its processors
will offer Hardware Transactional Memory in 2013:

http://www.h-online.com/newsticker/news/item/Processor-Whispers-About-Haskell-and-Haswell-1389507.html

So yes, obviously, it's going to happen.


A bient?t,

Armin.

From anacrolix at gmail.com  Tue Dec  6 13:28:42 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Tue, 6 Dec 2011 23:28:42 +1100
Subject: [Python-Dev] STM and python
In-Reply-To: <CAMSv6X1mw1bj8RFSkV3u4dwP9G+tk3VQWAfpftk6L-tBuCT7tg@mail.gmail.com>
References: <CAB4yi1ORAJhvASE_s9g3nJ69Be1sEwOGdRfYE62tQy1t7VMphw@mail.gmail.com>
	<CAPZV6o_Ud9-23NhOsP9sNy24LUXPRA05Ki2g3KwduZt8dNMwOg@mail.gmail.com>
	<CAB4yi1MMWVjsHoh0yLtVsQKZAiMRq_pqR9ykGSya=27+SFd60g@mail.gmail.com>
	<CAGE7PNKs4po2pYFGsw61JoAJaQ1g=YH_5wqR7VgRMaacn1BTuA@mail.gmail.com>
	<CADiSq7fiHBo43mChODT2cr7ydV9FB3APBW-x=cHvS95Nod3qLQ@mail.gmail.com>
	<CAB4yi1MvCu258S9zTHGRdAV3RbXVWeiCO-3vYpuZNfYBA=z4Hw@mail.gmail.com>
	<CAMSv6X2mntP6crK2L7v4_U-p+0+GwrBYdrLQ03Kcz7zZ36sKpA@mail.gmail.com>
	<CAB4yi1NciyQ+ROLC4DW2iFM6A3f_zNF_aqeJpNobsf1Q+J8Zzg@mail.gmail.com>
	<CAMSv6X1mw1bj8RFSkV3u4dwP9G+tk3VQWAfpftk6L-tBuCT7tg@mail.gmail.com>
Message-ID: <CAB4yi1Nw1G+--dab_FGH6cAW565CB5s=5rsR64DL1vB2-xb9UQ@mail.gmail.com>

This is very interesting, cheers for the link.

On Tue, Dec 6, 2011 at 8:55 PM, Armin Rigo <arigo at tunes.org> wrote:
> Hi,
>
> Actually, not even one month ago, Intel announced that its processors
> will offer Hardware Transactional Memory in 2013:
>
> http://www.h-online.com/newsticker/news/item/Processor-Whispers-About-Haskell-and-Haswell-1389507.html
>
> So yes, obviously, it's going to happen.
>
>
> A bient?t,
>
> Armin.



-- 
?_?

From jaraco at jaraco.com  Tue Dec  6 23:34:07 2011
From: jaraco at jaraco.com (Jason R. Coombs)
Date: Tue, 6 Dec 2011 22:34:07 +0000
Subject: [Python-Dev] [Python-checkins] cpython (2.7): PDB now will
 properly escape backslashes in the names of modules it executes.
In-Reply-To: <4EC67559.90409@netwok.org>
References: <E1RRBlU-0003e2-FH@dinsdale.python.org> <4EC67559.90409@netwok.org>
Message-ID: <7E79234E600438479EC119BD241B48D6A246E8@CH1PRD0602MB098.namprd06.prod.outlook.com>

?ric, These are all good suggestions. I'll make them at some point.

Thanks.

> -----Original Message-----
> From: python-dev-bounces+jaraco=jaraco.com at python.org [mailto:python-
> dev-bounces+jaraco=jaraco.com at python.org] On Behalf Of ?ric Araujo
> Sent: Friday, 18 November, 2011 10:10
> To: python-dev at python.org
> Subject: Re: [Python-Dev] [Python-checkins] cpython (2.7): PDB now will
> properly escape backslashes in the names of modules it executes.
> 
> Hi Jason,
> 
> > http://hg.python.org/cpython/rev/f7dd5178f36a
> > branch:      2.7
> > user:        Jason R. Coombs <jaraco at jaraco.com>
> > date:        Thu Nov 17 18:03:24 2011 -0500
> > summary:
> >   PDB now will properly escape backslashes in the names of modules it
> > executes. Fixes #7750
> 
> > diff --git a/Lib/test/test_pdb.py b/Lib/test/test_pdb.py
> > +class Tester7750(unittest.TestCase):
> I think we have an unwritten rule that test class and method names should
> tell something about what they test.  (We do have things like TestWeirdBugs
> and test_12345, but I don?t think it?s a useful pattern to follow :)  Not a big
> deal anyway.
> 
> > +    # if the filename has something that resolves to a python
> > +    #  escape character (such as \t), it will fail
> > +    test_fn = '.\\test7750.py'
> > +
> > +    msg = "issue7750 only applies when os.sep is a backslash"
> > +    @unittest.skipUnless(os.path.sep == '\\', msg)
> > +    def test_issue7750(self):
> > +        with open(self.test_fn, 'w') as f:
> > +            f.write('print("hello world")')
> > +        cmd = [sys.executable, '-m', 'pdb', self.test_fn,]
> > +        proc = subprocess.Popen(cmd,
> > +            stdout=subprocess.PIPE,
> > +            stdin=subprocess.PIPE,
> > +            stderr=subprocess.STDOUT,
> > +            )
> > +        stdout, stderr = proc.communicate('quit\n')
> > +        self.assertNotIn('IOError', stdout, "pdb munged the
> > + filename")
> Why not check for assertIn(filename, stdout)?  (In other words, check for
> intended behavior rather than implementation of the erstwhile bug.)
> 
> BTW, I?ve just tested that giving a message argument to assertNotIn (the
> third argument), unittest still displays the other arguments to allow for easier
> debugging.  I didn?t know that, it?s cool!
> 
> > +    def tearDown(self):
> > +        if os.path.isfile(self.test_fn):
> > +            os.remove(self.test_fn)
> In my own tests, I?ve become fond of using ?self.addCleanup(os.remove,
> filename)?: It?s shorter that a tearDown and is right there on the line that
> follows or precedes the file creation.
> 
> >  if __name__ == '__main__':
> >      test_main()
> > +    unittest.main()
> This looks strange.
> 
> Regards
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-
> dev/jaraco%40jaraco.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6662 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111206/fb63e257/attachment.bin>

From cs at zip.com.au  Wed Dec  7 02:23:12 2011
From: cs at zip.com.au (Cameron Simpson)
Date: Wed, 7 Dec 2011 12:23:12 +1100
Subject: [Python-Dev] Warnings
In-Reply-To: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
Message-ID: <20111207012312.GA7566@cskk.homeip.net>

On 30Nov2011 22:10, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
| When updating the documentation, please don't go overboard with warnings.
| The docs need to be worded affirmatively -- say what a tool does and show how to use it correctly.
| See http://docs.python.org/documenting/style.html#affirmative-tone

I come to this late, but if we're going after the docs...

At the above link one finds this text:

  This assures that files are flushed [...]

It does not. It _ensures_ that files are flushed. The doco style "affirmative
tone" _assures_. The coding practice _ensures_!

Pedanticly,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

There is one evil which...should never be passed over in silence but be
continually publicly attacked, and that is corruption of the language...
        - W.H. Auden

From raymond.hettinger at gmail.com  Wed Dec  7 07:40:31 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Wed, 7 Dec 2011 00:40:31 -0600
Subject: [Python-Dev] Warnings
In-Reply-To: <20111207012312.GA7566@cskk.homeip.net>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
	<20111207012312.GA7566@cskk.homeip.net>
Message-ID: <9B227A4E-9788-4E6D-B415-0DF7CED47455@gmail.com>


On Dec 6, 2011, at 7:23 PM, Cameron Simpson wrote:

>  This assures that files are flushed [...]
> 
> It does not. It _ensures_ that files are flushed. The doco style "affirmative
> tone" _assures_. The coding practice _ensures_!
> 
> Pedanticly,
> -- 
> Cameron Simpson 

I can assure you that I've ensured that you're fully insured ;-)

Raymond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111207/0cbaddb9/attachment.html>

From g.brandl at gmx.net  Wed Dec  7 19:22:56 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 07 Dec 2011 19:22:56 +0100
Subject: [Python-Dev] Warnings
In-Reply-To: <20111207012312.GA7566@cskk.homeip.net>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
	<20111207012312.GA7566@cskk.homeip.net>
Message-ID: <jboatu$4vl$1@dough.gmane.org>

Am 07.12.2011 02:23, schrieb Cameron Simpson:
> On 30Nov2011 22:10, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
> | When updating the documentation, please don't go overboard with warnings.
> | The docs need to be worded affirmatively -- say what a tool does and show how to use it correctly.
> | See http://docs.python.org/documenting/style.html#affirmative-tone
> 
> I come to this late, but if we're going after the docs...
> 
> At the above link one finds this text:
> 
>   This assures that files are flushed [...]
> 
> It does not. It _ensures_ that files are flushed. The doco style "affirmative
> tone" _assures_. The coding practice _ensures_!
> 
> Pedanticly,

Oh, come on, surely this doesn't effect the casual reader?

Georg


From martin at v.loewis.de  Wed Dec  7 19:33:57 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 07 Dec 2011 19:33:57 +0100
Subject: [Python-Dev] [Python-checkins] cpython (2.7): PDB now will
 properly escape backslashes in the names of modules it executes.
In-Reply-To: <4EC67559.90409@netwok.org>
References: <E1RRBlU-0003e2-FH@dinsdale.python.org> <4EC67559.90409@netwok.org>
Message-ID: <4EDFB195.60607@v.loewis.de>

> I think we have an unwritten rule that test class and method names
> should tell something about what they test.  (We do have things like
> TestWeirdBugs and test_12345, but I don?t think it?s a useful pattern to
> follow :)

I completely disagree. test_12345 is a very good name for a test case,
in particular if it tests the value of a tau constant in the math
module. There can't be any more precise documentation of the test
purpose.

Regards,
Martin

From steve at holdenweb.com  Wed Dec  7 19:40:56 2011
From: steve at holdenweb.com (Steve Holden)
Date: Wed, 7 Dec 2011 10:40:56 -0800
Subject: [Python-Dev] Python Best Again
Message-ID: <48E6CE91-AA36-427D-A1C5-FFC4B9A4690E@holdenweb.com>

I've just added a news item to the python.org home page noting that Linux Journal readers have voted Python the Best Programming Language for the third year in a row.

This is excellent news, though I find it hard to believe that coming up on the outside we see C++. While it demonstrates that Linux Journal readers like object-oriented programming, it shows an uncomfortable tendency towards masochism :) and implies we can't necessarily trust their judgment. ;-)

Attempted humor aside, here I am taking the opportunity as PSF chairman to say a big "thank you" to all developers and everyone else who helps to keep putting out releases that gain the kind of popularity that this most recent vote indicates. I know we do it to create a great programming environment, not for popularity, but the Foundation's mission involves encouraging the growth of the international Python community. Please pass this on to other members of your developer community who may not receive this message directly.

Seriously, thanks. Having quality releases of a great language really does make it easier to promote Python!

regards
 Steve
-- 
Steve Holden steve at holdenweb.com,  Holden Web, LLC http://holdenweb.com/
Python classes (and much more) through the web http://oreillyschool.com/




From massimo.dipierro at gmail.com  Wed Dec  7 19:45:31 2011
From: massimo.dipierro at gmail.com (Massimo Di Pierro)
Date: Wed, 7 Dec 2011 12:45:31 -0600
Subject: [Python-Dev] [PSF-Members] Python Best Again
In-Reply-To: <48E6CE91-AA36-427D-A1C5-FFC4B9A4690E@holdenweb.com>
References: <48E6CE91-AA36-427D-A1C5-FFC4B9A4690E@holdenweb.com>
Message-ID: <37B986D5-DE20-473F-A438-D99AFB7FF7C4@gmail.com>

Hello Steve,

congratulations to all of you in the foundation who work hard to make Python the success that it is.

Massimo

On Dec 7, 2011, at 12:40 PM, Steve Holden wrote:

> I've just added a news item to the python.org home page noting that Linux Journal readers have voted Python the Best Programming Language for the third year in a row.
> 
> This is excellent news, though I find it hard to believe that coming up on the outside we see C++. While it demonstrates that Linux Journal readers like object-oriented programming, it shows an uncomfortable tendency towards masochism :) and implies we can't necessarily trust their judgment. ;-)
> 
> Attempted humor aside, here I am taking the opportunity as PSF chairman to say a big "thank you" to all developers and everyone else who helps to keep putting out releases that gain the kind of popularity that this most recent vote indicates. I know we do it to create a great programming environment, not for popularity, but the Foundation's mission involves encouraging the growth of the international Python community. Please pass this on to other members of your developer community who may not receive this message directly.
> 
> Seriously, thanks. Having quality releases of a great language really does make it easier to promote Python!
> 
> regards
> Steve
> -- 
> Steve Holden steve at holdenweb.com,  Holden Web, LLC http://holdenweb.com/
> Python classes (and much more) through the web http://oreillyschool.com/
> 
> 
> 
> _______________________________________________
> PSF-Members mailing list
> PSF-Members at python.org
> http://mail.python.org/mailman/listinfo/psf-members


From ethan at stoneleaf.us  Wed Dec  7 20:00:41 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 07 Dec 2011 11:00:41 -0800
Subject: [Python-Dev] Warnings
In-Reply-To: <jboatu$4vl$1@dough.gmane.org>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>	<20111207012312.GA7566@cskk.homeip.net>
	<jboatu$4vl$1@dough.gmane.org>
Message-ID: <4EDFB7D9.6010206@stoneleaf.us>

Georg Brandl wrote:
> Am 07.12.2011 02:23, schrieb Cameron Simpson:
>> On 30Nov2011 22:10, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
>> | When updating the documentation, please don't go overboard with warnings.
>> | The docs need to be worded affirmatively -- say what a tool does and show how to use it correctly.
>> | See http://docs.python.org/documenting/style.html#affirmative-tone
>>
>> I come to this late, but if we're going after the docs...
>>
>> At the above link one finds this text:
>>
>>   This assures that files are flushed [...]
>>
>> It does not. It _ensures_ that files are flushed. The doco style "affirmative
>> tone" _assures_. The coding practice _ensures_!
>>
>> Pedanticly,
> 
> Oh, come on, surely this doesn't effect the casual reader?

No, of course not -- although it might /affect/ said reader by causing 
him/her to think, "I don't think that word means what you think it 
means..."  ;)

Seriously, it's best to use the correct words with the correct meanings. 
  If someone is willing to fix it, let them.

~Ethan~

From wolfson at gmail.com  Wed Dec  7 21:01:52 2011
From: wolfson at gmail.com (Ben Wolfson)
Date: Wed, 7 Dec 2011 12:01:52 -0800
Subject: [Python-Dev] Warnings
In-Reply-To: <4EDFB7D9.6010206@stoneleaf.us>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
	<20111207012312.GA7566@cskk.homeip.net>
	<jboatu$4vl$1@dough.gmane.org> <4EDFB7D9.6010206@stoneleaf.us>
Message-ID: <CAPc-aXk=pMwgU2TOVvo5fvZM9qdRZeO9pt0_BRPaO9Z1vYT4DA@mail.gmail.com>

On Wed, Dec 7, 2011 at 11:00 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>
> No, of course not -- although it might /affect/ said reader by causing
> him/her to think, "I don't think that word means what you think it means..."
> ?;)
>
> Seriously, it's best to use the correct words with the correct meanings. ?If
> someone is willing to fix it, let them.

I'm sure this hypothetical reader will then look "assure" up in the
OED and find this:

 5. To make certain the occurrence or arrival of (an event); to ensure.

-- 
Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks,
which may be sweet, aromatic, fermented or spirit-based. ... Family
and social life also offer numerous other occasions to consume drinks
for pleasure." [Larousse, "Drink" entry]

From ben+python at benfinney.id.au  Wed Dec  7 21:15:18 2011
From: ben+python at benfinney.id.au (Ben Finney)
Date: Thu, 08 Dec 2011 07:15:18 +1100
Subject: [Python-Dev] Warnings
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
	<20111207012312.GA7566@cskk.homeip.net> <jboatu$4vl$1@dough.gmane.org>
Message-ID: <87k4682l61.fsf@benfinney.id.au>

Georg Brandl <g.brandl at gmx.net> writes:

> Am 07.12.2011 02:23, schrieb Cameron Simpson:
> >   This assures that files are flushed [...]
> > 
> > It does not. It _ensures_ that files are flushed. The doco style
> > "affirmative tone" _assures_. The coding practice _ensures_!
>
> Oh, come on, surely this doesn't effect the casual reader?

Some readers could of been confused irregardless.

-- 
 \       ?We must find our way to a time when faith, without evidence, |
  `\    disgraces anyone who would claim it.? ?Sam Harris, _The End of |
_o__)                                                     Faith_, 2004 |
Ben Finney


From tseaver at palladion.com  Wed Dec  7 21:16:24 2011
From: tseaver at palladion.com (Tres Seaver)
Date: Wed, 07 Dec 2011 15:16:24 -0500
Subject: [Python-Dev] Warnings
In-Reply-To: <jboatu$4vl$1@dough.gmane.org>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
	<20111207012312.GA7566@cskk.homeip.net>
	<jboatu$4vl$1@dough.gmane.org>
Message-ID: <jbohio$mq4$1@dough.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/07/2011 01:22 PM, Georg Brandl wrote:
> Am 07.12.2011 02:23, schrieb Cameron Simpson:
>> On 30Nov2011 22:10, Raymond Hettinger <raymond.hettinger at gmail.com>
>> wrote: | When updating the documentation, please don't go overboard
>> with warnings. | The docs need to be worded affirmatively -- say
>> what a tool does and show how to use it correctly. | See
>> http://docs.python.org/documenting/style.html#affirmative-tone
>> 
>> I come to this late, but if we're going after the docs...
>> 
>> At the above link one finds this text:
>> 
>> This assures that files are flushed [...]
>> 
>> It does not. It _ensures_ that files are flushed. The doco style
>> "affirmative tone" _assures_. The coding practice _ensures_!
>> 
>> Pedanticly,
> 
> Oh, come on, surely this doesn't effect the casual reader?

/me presumes an ironic mispeling there. ;)


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7fyZgACgkQ+gerLs4ltQ5eaQCeL+E4CVxa1BWhm/MsPw29u/Ym
QnUAoKBOY37dNA9aT5TZkv4hu9ixZjBn
=jg86
-----END PGP SIGNATURE-----


From victor.stinner at haypocalc.com  Thu Dec  8 02:43:40 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 08 Dec 2011 02:43:40 +0100
Subject: [Python-Dev] Reject characters bigger than U+10FFFF and Solaris
	issues
Message-ID: <1504453.f4XqDVp2GQ@ned>

Hi,

I would like to deny the creation of an Unicode string containing characters 
outside the range [U+0000; U+10FFFF]. The check is already present in some 
places (e.g. the builtin chr() function), but not everywhere. The last 
important function is PyUnicode_FromWideChar, function used to decode text 
from the OS.

The problem is that test_locale fails on Solaris with such checks. I would 
like to know how to handle Solaris issues. One possible solution is to not 
handle issues, and just raise exceptions and skip the failing tests on Solaris 
;-) Another solution is to modify locale.strxfrm() on all platforms to return 
a list of int, instead of a str. The type of the result is not really 
important, we just have to be able to compare two results (equal, greater, 
lesser or equal, etc.). Another solution?

--

The two Solaris issues:

 - in the hu_HU locale, localeconv() returns U+30000020 for the thousands 
separator 
 - locale.strxfrm() calls wcsxfrm() which returns characters in the range 
[0x1000000; 0x1FFFFFF]

For localeconv(), it is the b'\xA0' byte string decoded from an encoding 
looking like ISO-8859-?? (b'\xA0' is not decodable from UTF-8). It looks like 
a bug in the decoder. It also looks like OpenIndiana doesn't use ISO-8859 
locale anymore, only UTF-8 locales (which is much better!). I'm unable to 
reproduce the issue on my OpenIndiana VM.

For wcsxfrm(), I'm not sure of the range. Example of a result: {0x1010163, 
0x1010101, 0x1010103, 0x1010101, 0x1010103, 0x1010101, 0x1010101}. It looks 
like wcsxfrm() uses the result of strxfrm() by grouping bytes 3 by 3 and add 
0x1000000 to each group. Example of strxfrm() output for the same input: 
{0x01, 0x01, 0x63, 0x01, 0x01, 0x01, ...}.

See http://bugs.python.org/issue13441 for more information.

Victor

From stephen at xemacs.org  Thu Dec  8 03:13:30 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 08 Dec 2011 11:13:30 +0900
Subject: [Python-Dev] Warnings
In-Reply-To: <jboatu$4vl$1@dough.gmane.org>
References: <17CC15CD-539C-4214-ADD5-E85322259C64@gmail.com>
	<20111207012312.GA7566@cskk.homeip.net>
	<jboatu$4vl$1@dough.gmane.org>
Message-ID: <87d3bz24l1.fsf@uwakimon.sk.tsukuba.ac.jp>

Georg Brandl writes:

 > Oh, come on, surely this doesn't effect the casual reader?

Casual readers aren't effective in any case; you want to hear the
opinions of those who care.


From chrism at plope.com  Thu Dec  8 06:08:39 2011
From: chrism at plope.com (Chris McDonough)
Date: Thu, 08 Dec 2011 00:08:39 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
Message-ID: <1323320919.2710.24.camel@thinko>

On the heels of Armin's blog post about the troubles of making the same
codebase run on both Python 2 and Python 3, I have a concrete
suggestion.

It would help a lot for code that straddles both Py2 and Py3 to be able
to make use of u'' literals.  It would seem to be an easy thing to
reenable (see
http://www.reddit.com/r/Python/comments/n3q7q/thoughts_on_python_3_armin_ronachers_thoughts_and/c36397t ) .  It would seem to cost very little in terms of maintenance, and not much in docs.

It would make it possible to share code like this across py2 and py3:

   a = u'foo'

Instead of (with e.g. six):

   a = u('foo')

Or:

   from __future__ import unicode_literals
   a = 'foo'

I recognize that the last option is probably the way "its meant to be
done", but in reality it's just more practical to not fail when literal
notation is more specific than strictly necessary.

- C



From benjamin at python.org  Thu Dec  8 07:02:22 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 8 Dec 2011 01:02:22 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323320919.2710.24.camel@thinko>
References: <1323320919.2710.24.camel@thinko>
Message-ID: <CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>

2011/12/8 Chris McDonough <chrism at plope.com>:
> On the heels of Armin's blog post about the troubles of making the same
> codebase run on both Python 2 and Python 3, I have a concrete
> suggestion.
>
> It would help a lot for code that straddles both Py2 and Py3 to be able
> to make use of u'' literals.

Helpful or not helpful, I think that ship has sailed. The earliest it
could see the light of day is 3.3, which would leave people trying to
support 3.1 and 3.2 in a bind.


-- 
Regards,
Benjamin

From chrism at plope.com  Thu Dec  8 07:10:44 2011
From: chrism at plope.com (Chris McDonough)
Date: Thu, 08 Dec 2011 01:10:44 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
Message-ID: <1323324644.2710.28.camel@thinko>

On Thu, 2011-12-08 at 01:02 -0500, Benjamin Peterson wrote:
> 2011/12/8 Chris McDonough <chrism at plope.com>:
> > On the heels of Armin's blog post about the troubles of making the same
> > codebase run on both Python 2 and Python 3, I have a concrete
> > suggestion.
> >
> > It would help a lot for code that straddles both Py2 and Py3 to be able
> > to make use of u'' literals.
> 
> Helpful or not helpful, I think that ship has sailed. The earliest it
> could see the light of day is 3.3, which would leave people trying to
> support 3.1 and 3.2 in a bind.

Right.. the title does say "readd ... support in 3.3".  Are you
suggesting "the ship has sailed" for eternity because it can't be
supported in Python < 3.3?

- C



From benjamin at python.org  Thu Dec  8 07:18:06 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 8 Dec 2011 01:18:06 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323324644.2710.28.camel@thinko>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
Message-ID: <CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>

2011/12/8 Chris McDonough <chrism at plope.com>:
> On Thu, 2011-12-08 at 01:02 -0500, Benjamin Peterson wrote:
>> 2011/12/8 Chris McDonough <chrism at plope.com>:
>> > On the heels of Armin's blog post about the troubles of making the same
>> > codebase run on both Python 2 and Python 3, I have a concrete
>> > suggestion.
>> >
>> > It would help a lot for code that straddles both Py2 and Py3 to be able
>> > to make use of u'' literals.
>>
>> Helpful or not helpful, I think that ship has sailed. The earliest it
>> could see the light of day is 3.3, which would leave people trying to
>> support 3.1 and 3.2 in a bind.
>
> Right.. the title does say "readd ... support in 3.3". ?Are you
> suggesting "the ship has sailed" for eternity because it can't be
> supported in Python < 3.3?

I'm questioning the real utility of it.


-- 
Regards,
Benjamin

From chrism at plope.com  Thu Dec  8 07:31:56 2011
From: chrism at plope.com (Chris McDonough)
Date: Thu, 08 Dec 2011 01:31:56 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
Message-ID: <1323325916.2710.39.camel@thinko>

On Thu, 2011-12-08 at 01:18 -0500, Benjamin Peterson wrote:
> 2011/12/8 Chris McDonough <chrism at plope.com>:
> > On Thu, 2011-12-08 at 01:02 -0500, Benjamin Peterson wrote:
> >> 2011/12/8 Chris McDonough <chrism at plope.com>:
> >> > On the heels of Armin's blog post about the troubles of making the same
> >> > codebase run on both Python 2 and Python 3, I have a concrete
> >> > suggestion.
> >> >
> >> > It would help a lot for code that straddles both Py2 and Py3 to be able
> >> > to make use of u'' literals.
> >>
> >> Helpful or not helpful, I think that ship has sailed. The earliest it
> >> could see the light of day is 3.3, which would leave people trying to
> >> support 3.1 and 3.2 in a bind.
> >
> > Right.. the title does say "readd ... support in 3.3".  Are you
> > suggesting "the ship has sailed" for eternity because it can't be
> > supported in Python < 3.3?
> 
> I'm questioning the real utility of it.

All I can really offer is my own experience here based on writing code
that needs to straddle Python 2.5, 2.6, 2.7 and 3.2 without use of 2to3.
Having u'' work across all of these would mean porting would not require
as much eyeballing as code modified via "from future import
unicode_literals", it would let more code work on 2.5 unchanged, and the
resulting code would execute faster than code that required us to use a
u() function.

What's the case against?

- C




From ncoghlan at gmail.com  Thu Dec  8 08:33:29 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 8 Dec 2011 17:33:29 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323325916.2710.39.camel@thinko>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
Message-ID: <CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>

Such code still won't work on 3.2, hence restoring the redundant notation
would be ultimately pointless.

--
Nick Coghlan (via Gmail on Android, so likely to be more terse than usual)
On Dec 8, 2011 4:34 PM, "Chris McDonough" <chrism at plope.com> wrote:

> On Thu, 2011-12-08 at 01:18 -0500, Benjamin Peterson wrote:
> > 2011/12/8 Chris McDonough <chrism at plope.com>:
> > > On Thu, 2011-12-08 at 01:02 -0500, Benjamin Peterson wrote:
> > >> 2011/12/8 Chris McDonough <chrism at plope.com>:
> > >> > On the heels of Armin's blog post about the troubles of making the
> same
> > >> > codebase run on both Python 2 and Python 3, I have a concrete
> > >> > suggestion.
> > >> >
> > >> > It would help a lot for code that straddles both Py2 and Py3 to be
> able
> > >> > to make use of u'' literals.
> > >>
> > >> Helpful or not helpful, I think that ship has sailed. The earliest it
> > >> could see the light of day is 3.3, which would leave people trying to
> > >> support 3.1 and 3.2 in a bind.
> > >
> > > Right.. the title does say "readd ... support in 3.3".  Are you
> > > suggesting "the ship has sailed" for eternity because it can't be
> > > supported in Python < 3.3?
> >
> > I'm questioning the real utility of it.
>
> All I can really offer is my own experience here based on writing code
> that needs to straddle Python 2.5, 2.6, 2.7 and 3.2 without use of 2to3.
> Having u'' work across all of these would mean porting would not require
> as much eyeballing as code modified via "from future import
> unicode_literals", it would let more code work on 2.5 unchanged, and the
> resulting code would execute faster than code that required us to use a
> u() function.
>
> What's the case against?
>
> - C
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/04ade226/attachment.html>

From chrism at plope.com  Thu Dec  8 08:45:08 2011
From: chrism at plope.com (Chris McDonough)
Date: Thu, 08 Dec 2011 02:45:08 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
Message-ID: <1323330308.2710.52.camel@thinko>

On Thu, 2011-12-08 at 17:33 +1000, Nick Coghlan wrote:
> Such code still won't work on 3.2, hence restoring the redundant
> notation would be ultimately pointless. 

None of the code I've written which straddles Python 2/3 supports
anything except Python 3.2+, and likewise I expect that for the next
crop of porters/straddlers, their code won't support anything but Python
3.3+.  So there is a point, which is to make it easier for people to
port code that can straddle the most recent Python 3 release as well as
2.7/2.6.

In that context, I don't see much relevance of having no support for u''
in Python 3.2.

- C



From lukasz at langa.pl  Thu Dec  8 08:54:18 2011
From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=)
Date: Thu, 8 Dec 2011 08:54:18 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323320919.2710.24.camel@thinko>
References: <1323320919.2710.24.camel@thinko>
Message-ID: <CBCA20F2-2E8A-454C-8144-691880948FA7@langa.pl>


Wiadomo?? napisana przez Chris McDonough w dniu 8 gru 2011, o godz. 06:08:

> It would make it possible to share code like this across py2 and py3:
> 
>   a = u'foo'
> 

As Armin himself wrote, py3k-compatible code ported from 2.x is often very ugly. This kind of change would only deepen the problem.

-1


> Or:
> 
>   from __future__ import unicode_literals
>   a = 'foo'
> 
> I recognize that the last option is probably the way "its meant to be
> done"

Yes, that's the reason 2.x has b''. If Python 2.8 ever came to be, making this __future__ work with the standard library would be the right way to do it.

-- 
Pozdrawiam serdecznie,
?ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.


Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??!
Please consider the environment before printing out this e-mail.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/8dc668db/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 1898 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/8dc668db/attachment.jpg>

From stefan at bytereef.org  Thu Dec  8 10:17:52 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 8 Dec 2011 10:17:52 +0100
Subject: [Python-Dev] Reject characters bigger than U+10FFFF and	Solaris
	issues
In-Reply-To: <1504453.f4XqDVp2GQ@ned>
References: <1504453.f4XqDVp2GQ@ned>
Message-ID: <20111208091752.GA29901@sleipnir.bytereef.org>

Victor Stinner <victor.stinner at haypocalc.com> wrote:
> For localeconv(), it is the b'\xA0' byte string decoded from an encoding 
> looking like ISO-8859-?? (b'\xA0' is not decodable from UTF-8). It looks like 
> a bug in the decoder. It also looks like OpenIndiana doesn't use ISO-8859 
> locale anymore, only UTF-8 locales (which is much better!). I'm unable to 
> reproduce the issue on my OpenIndiana VM.

I'm think that b'\xA0' is a valid thousands separator. The 'fi_FI' locale also
uses that. Decimal.__format__() has to handle the 'n' specifier, which takes the
thousands separator directly from localeconv(). Currently I have this horrible
function to deal with the problem:

/* Convert decimal_point or thousands_sep, which may be multibyte or in
   the range [128, 255], to a UTF8 string. */
static PyObject *
dotsep_as_utf8(const char *s)
{
        PyObject *utf8;
        PyObject *tmp;
        wchar_t buf[2];
        size_t n;

        n = mbstowcs(buf, s, 2);
        if (n != 1) { /* Issue #7442 */
                PyErr_SetString(PyExc_ValueError,
                    "invalid decimal point or unsupported "
                    "combination of LC_CTYPE and LC_NUMERIC");
                return NULL;
        }
        tmp = PyUnicode_FromWideChar(buf, n);
        if (tmp == NULL) {
                return NULL;
        }
        utf8 = PyUnicode_AsUTF8String(tmp);
        Py_DECREF(tmp);
        return utf8;
}


The main issue is that there is no portable function mbst_to_utf8()
that uses the current locale. If possible, it would be great to have
such a thing in the C-API.

I'm not sure why the b'\xA0' problem only occurs in Solaris. Many systems
have this thousands separator.



Stefan Krah



From stefan at bytereef.org  Thu Dec  8 10:42:31 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 8 Dec 2011 10:42:31 +0100
Subject: [Python-Dev] Reject characters bigger than U+10FFFF and	Solaris
	issues
In-Reply-To: <20111208091752.GA29901@sleipnir.bytereef.org>
References: <1504453.f4XqDVp2GQ@ned>
	<20111208091752.GA29901@sleipnir.bytereef.org>
Message-ID: <20111208094231.GA30187@sleipnir.bytereef.org>

Stefan Krah <stefan at bytereef.org> wrote:
> I'm not sure why the b'\xA0' problem only occurs in Solaris. Many systems
> have this thousands separator.

Are LC_CTYPE and LC_NUMERIC set to the same value on the buildbot? Otherwise
you encounter http://bugs.python.org/issue7442 .


Stefan Krah



From tjreedy at udel.edu  Thu Dec  8 11:54:28 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 08 Dec 2011 05:54:28 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323325916.2710.39.camel@thinko>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
Message-ID: <jbq51g$dpl$1@dough.gmane.org>

On 12/8/2011 1:31 AM, Chris McDonough wrote:

> What's the case against?

 From a 3.x perpective, an irrelevant 'u' would be pure noise and make 
the language a bit harder to learn. The intent for 3.x is that one be 
able to learn 3.x without knowing anything about 2.x. So bridge stuff 
has been put into 2.6 and even more in 2.7. But it does not really 
belong in 3.x.

-- 
Terry Jan Reedy


From vinay_sajip at yahoo.co.uk  Thu Dec  8 12:01:49 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Thu, 8 Dec 2011 11:01:49 +0000 (UTC)
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<1323330308.2710.52.camel@thinko>
Message-ID: <loom.20111208T115357-170@post.gmane.org>

Chris McDonough <chrism <at> plope.com> writes:

> 
> In that context, I don't see much relevance of having no support for u''
> in Python 3.2.
> 

Well, if 3.2 remains in use for a longish time, then it is relevant, in the
broader context, isn't it?  We know how conservative Linux distributions can be
with their Python releases - although most are still releasing 2.x as their
system Python, this could change at some point in the future. Even if it
doesn't, there might be a fair user base of people stuck with 3.2 for any number
of reasons, and to support them, the change you propose won't help, because some
variant of a package will still have to use u() and b(), just for 3.2 support.

I'm not arguing against your proposed change itself - just against your point
about the relevance of 3.2.

Regards,

Vinay Sajip


From stephan.richter at gmail.com  Thu Dec  8 12:05:51 2011
From: stephan.richter at gmail.com (Stephan Richter)
Date: Thu, 08 Dec 2011 06:05:51 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
Message-ID: <5242067.5aBSYdFaIB@einstein>

On Thursday, December 08, 2011 01:18:06 AM Benjamin Peterson wrote:
> > Right.. the title does say "readd ... support in 3.3".  Are you
> > suggesting "the ship has sailed" for eternity because it can't be
> > supported in Python < 3.3?
> 
> I'm questioning the real utility of it.

The real utility is to make it possible to port libraries to Py3 or at least 
make it a lot easier. It is somewhat naive to think that you can just tell 
everyone to upgrade to Python 2.7 and then use the future import. Having to 
change all that code can also be a big bug magnet.

Chris has been a great champion of bringing the Web app community closer to 
Python 3. His experience with porting code is pretty extensive especially in 
keeping it compatible with older Pythonn 2 versions (down to 2.5).

If the Python Devs want more adoption of Python 3, they should at least throw 
a bone from time to time and make adoption a bit easier. The arguments against 
this proposal seem academic and purist to me. (Mmh, I cannot believe I just 
wrote that having been accused of that myself in the past.)

Regards,
Stephan
-- 
Entrepreneur and Software Geek
Google me. "Zope Stephan Richter"

From anacrolix at gmail.com  Thu Dec  8 12:08:17 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Thu, 8 Dec 2011 22:08:17 +1100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
Message-ID: <CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>

Nobody is using 3 yet ;)

Sure, I use it for some personal projects, and other people pretend to
support it. Not really.

The worst of the pain in porting to Python 3000 has yet to even begin!

On Thu, Dec 8, 2011 at 6:33 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Such code still won't work on 3.2, hence restoring the redundant notation
> would be ultimately pointless.
>
> --
> Nick Coghlan (via Gmail on Android, so likely to be more terse than usual)
>
> On Dec 8, 2011 4:34 PM, "Chris McDonough" <chrism at plope.com> wrote:
>>
>> On Thu, 2011-12-08 at 01:18 -0500, Benjamin Peterson wrote:
>> > 2011/12/8 Chris McDonough <chrism at plope.com>:
>> > > On Thu, 2011-12-08 at 01:02 -0500, Benjamin Peterson wrote:
>> > >> 2011/12/8 Chris McDonough <chrism at plope.com>:
>> > >> > On the heels of Armin's blog post about the troubles of making the
>> > >> > same
>> > >> > codebase run on both Python 2 and Python 3, I have a concrete
>> > >> > suggestion.
>> > >> >
>> > >> > It would help a lot for code that straddles both Py2 and Py3 to be
>> > >> > able
>> > >> > to make use of u'' literals.
>> > >>
>> > >> Helpful or not helpful, I think that ship has sailed. The earliest it
>> > >> could see the light of day is 3.3, which would leave people trying to
>> > >> support 3.1 and 3.2 in a bind.
>> > >
>> > > Right.. the title does say "readd ... support in 3.3". ?Are you
>> > > suggesting "the ship has sailed" for eternity because it can't be
>> > > supported in Python < 3.3?
>> >
>> > I'm questioning the real utility of it.
>>
>> All I can really offer is my own experience here based on writing code
>> that needs to straddle Python 2.5, 2.6, 2.7 and 3.2 without use of 2to3.
>> Having u'' work across all of these would mean porting would not require
>> as much eyeballing as code modified via "from future import
>> unicode_literals", it would let more code work on 2.5 unchanged, and the
>> resulting code would execute faster than code that required us to use a
>> u() function.
>>
>> What's the case against?
>>
>> - C
>>
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
>



-- 
?_?

From lukasz at langa.pl  Thu Dec  8 13:08:31 2011
From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=)
Date: Thu, 8 Dec 2011 13:08:31 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <5242067.5aBSYdFaIB@einstein>
References: <1323320919.2710.24.camel@thinko> <1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<5242067.5aBSYdFaIB@einstein>
Message-ID: <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>

Wiadomo?? napisana przez Stephan Richter w dniu 8 gru 2011, o godz. 12:05:

> It is somewhat naive to think that you can just tell 
> everyone to upgrade to Python 2.7 and then use the future import. Having to 
> change all that code can also be a big bug magnet.

A big bug magnet is using a Python version that is not getting any fixes whatsoever. When I'm backporting stuff from Python 3, I'm targeting 2.6+ because it's still somewhat supported by us. What's more important though is that there were tremendous changes in that release in terms of bridging the gap between Python 2 and 3.

I'm wondering why developers inflict so much impediment to support a Python version that's 5+ years old and was replaced by a newer one in virtually every operating system. Recent versions of Mac OS X, RedHat and Debian all sport Python 2.6+. It seems only GAE and Jython are stuck on Python 2.5.

Python 2.6 has ABCs, supports b'' (and even has a "bytes" alias for the str type), forward compatibility __futures__ (print_function, unicode_literals, division and absolute_imports), "except Exception as e", etc.

The thing we did miss was making sure the std lib doesn't break when unicode_literals are used. And that's a bummer.

-- 
Pozdrawiam serdecznie,
?ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.


Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??!
Please consider the environment before printing out this e-mail.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/f794063d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 1898 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/f794063d/attachment.jpg>

From stephan.richter at gmail.com  Thu Dec  8 13:14:09 2011
From: stephan.richter at gmail.com (Stephan Richter)
Date: Thu, 08 Dec 2011 07:14:09 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
Message-ID: <3344831.JP9Cfj4Ety@einstein>

On Thursday, December 08, 2011 01:08:31 PM ?ukasz Langa wrote:
> A big bug magnet is using a Python version that is not getting any fixes
> whatsoever. When I'm backporting stuff from Python 3, I'm targeting 2.6+
> because it's still somewhat supported by us. What's more important though
> is that there were tremendous changes in that release in terms of bridging
> the gap between Python 2 and 3.

But you might not have that luxury and updating code to a new Python version 
is a lot of work. As you can see in my signature, I am very much involved in 
the Zope community. The entire Zope, Plone and Pyramid ecosystem is extremely 
large and one can simply not make blanket statements about Python version use. 
We try very hard to move our libraries up the version ladder but we must also 
take great care of backwards-compatibility. (We have seen already what happens 
if we do not with Zoep 2 versus 3. And Python is struggling with similar 
issues, even though the changes were much less drastic.)

Regards,
Stephan
-- 
Entrepreneur and Software Geek
Google me. "Zope Stephan Richter"

From barry at python.org  Thu Dec  8 13:18:44 2011
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 Dec 2011 07:18:44 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323320919.2710.24.camel@thinko>
References: <1323320919.2710.24.camel@thinko>
Message-ID: <20111208071844.6fe1970c@limelight.wooz.org>

On Dec 08, 2011, at 12:08 AM, Chris McDonough wrote:

>   from __future__ import unicode_literals
>   a = 'foo'

I agree this is an annoying thing to have to change when supporting a
dual-Python-version codebase, but it's not the most annoying.  print-functions
are a little more painful to switch because there's no easy Emacs conversion
for them. ;)  This one is actually pretty useful because it does make you go
through and be very specific about which literals are bytes and which are
unicodes.  Also, re-adding u'' prefixes doesn't help you much because you
might still have byte literals which you have to b'' prefix.  Do you really
want both 'foo' and u'foo' to be unicode literals?

-1

Cheers,
-Barry

From barry at python.org  Thu Dec  8 13:27:20 2011
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 Dec 2011 07:27:20 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <loom.20111208T115357-170@post.gmane.org>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<1323330308.2710.52.camel@thinko>
	<loom.20111208T115357-170@post.gmane.org>
Message-ID: <20111208072720.0d243557@limelight.wooz.org>

On Dec 08, 2011, at 11:01 AM, Vinay Sajip wrote:

>Well, if 3.2 remains in use for a longish time, then it is relevant, in the
>broader context, isn't it?  We know how conservative Linux distributions can
>be with their Python releases - although most are still releasing 2.x as
>their system Python, this could change at some point in the future. Even if
>it doesn't, there might be a fair user base of people stuck with 3.2 for any
>number of reasons, and to support them, the change you propose won't help,
>because some variant of a package will still have to use u() and b(), just
>for 3.2 support.

Case in point: Ubuntu 12.04 is a long term support release, meaning 5 years of
official support on both the desktop and server.  It will ship with Python 2.7
and 3.2 only.

-Barry

From ncoghlan at gmail.com  Thu Dec  8 13:32:43 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 8 Dec 2011 22:32:43 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <3344831.JP9Cfj4Ety@einstein>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
Message-ID: <CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>

If people decide to delay their Py3k migrations until they can drop 2.5
support, they're quite free to do so. The only reason for porting right now
is to support 3.2, thus making a future reintroduction of u'' useless.
Those that delay their ports can use the forward compatibility in 2.6.

Having just purged so much cruft from the language, pleas to add some back
permanently for a problem that is going to fade from significance within
the next couple of years are unlikely to get very far.

--
Nick Coghlan (via Gmail on Android, so likely to be more terse than usual)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/052981ee/attachment.html>

From victor.stinner at haypocalc.com  Thu Dec  8 13:24:51 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 08 Dec 2011 13:24:51 +0100
Subject: [Python-Dev] Reject characters bigger than U+10FFFF and	Solaris
 issues
In-Reply-To: <20111208091752.GA29901@sleipnir.bytereef.org>
References: <1504453.f4XqDVp2GQ@ned>
	<20111208091752.GA29901@sleipnir.bytereef.org>
Message-ID: <4EE0AC93.5030706@haypocalc.com>

Le 08/12/2011 10:17, Stefan Krah a ?crit :
> I'm think that b'\xA0' is a valid thousands separator.

I agree, but it's not the point: the problem is that b'\xA0' is decoded 
to a strange U+30000020 character by mbstowcs().

> Currently I have this horrible function to deal with the problem:
>
> ...
>          n = mbstowcs(buf, s, 2);
> ...
>          tmp = PyUnicode_FromWideChar(buf, n);
>          if (tmp == NULL) {
>                  return NULL;
>          }
>          utf8 = PyUnicode_AsUTF8String(tmp);
>          Py_DECREF(tmp);
>          return utf8;

I would not help this specific issue: b'\xA0' is not decodable from UTF-8.

> I'm not sure why the b'\xA0' problem only occurs in Solaris. Many systems
> have this thousands separator.

The problem is not directly in the C localeconv() function, but in 
mbstowcs() with the hu_HU locale.

You can try my test program for this issue:
http://bugs.python.org/file23876/localeconv_wchar.c

My test is maybe not correct, because it only sets LC_ALL, which is a 
little bit different than Python tests (see below).

--

I don't remember on which buildbot the issue occurred :-(

  - "sparc solaris10 gcc 3.x" has "LANG=C" and "TZ=Europe/Berlin" 
environement variable
  - "x86 OpenIndiana 3.x" and "AMD64 OpenIndian a%203.x" have 
"TZ=Europe/London" and no locale variable!?

The issue occurred for example in test_lc_numeric_basic() of 
test__locale which sets LC_NUMERIC and LC_CTYPE locales (but not 
LC_ALL). LC_ALL and LC_NUMERIC are different in this test, but 
LC_NUMERIC and LC_CTYPE are the same.

--

Stefan: would you accept that locale.localeconv() and locale.strxfrm() 
stop working (instead of returning invalid data) on Solaris in certains 
cases (it looks like the issue depends on the locale and the OS 
version)? It can be a motivation to fix the root of the issue ;-)

Victor

From stefan at bytereef.org  Thu Dec  8 14:42:11 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 8 Dec 2011 14:42:11 +0100
Subject: [Python-Dev] Reject characters bigger than U+10FFFF and	Solaris
	issues
In-Reply-To: <4EE0AC93.5030706@haypocalc.com>
References: <1504453.f4XqDVp2GQ@ned>
	<20111208091752.GA29901@sleipnir.bytereef.org>
	<4EE0AC93.5030706@haypocalc.com>
Message-ID: <20111208134211.GA31211@sleipnir.bytereef.org>

Victor Stinner <victor.stinner at haypocalc.com> wrote:
> The problem is not directly in the C localeconv() function, but in  
> mbstowcs() with the hu_HU locale.

Ah, I see.

> You can try my test program for this issue:
> http://bugs.python.org/file23876/localeconv_wchar.c

Can't test on OpenSolaris, since Oracle removed the package repo and
I need the ISO locales.


> Stefan: would you accept that locale.localeconv() and locale.strxfrm()
> stop working (instead of returning invalid data) on Solaris in certains
> cases (it looks like the issue depends on the locale and the OS  
> version)? It can be a motivation to fix the root of the issue ;-)

Yes, if the cause is a broken mbstowcs() that sounds good.



Stefan Krah



From vinay_sajip at yahoo.co.uk  Thu Dec  8 16:27:57 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Thu, 8 Dec 2011 15:27:57 +0000 (UTC)
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
Message-ID: <loom.20111208T161219-187@post.gmane.org>

Matt Joiner <anacrolix <at> gmail.com> writes:

> 
> Nobody is using 3 yet ;)
> 
> Sure, I use it for some personal projects, and other people pretend to
> support it. Not really.
> 
> The worst of the pain in porting to Python 3000 has yet to even begin!
>

The classic chicken-and-egg problem, right? Someone's got to make a start. If
you aim for porting with a single codebase and are not too hung up about
"practicality beats purity" hacks like e = sys.exc_info()[1], then I think
decent progress can be made with little risk, as long as the project has good
test coverage (and if it doesn't ... well, that's risky even if you stay on 2.x
...).

Django porting took a week of elapsed time (i.e. < 1 person-week of effort) to
go from thousands of test failures under 3.x and sqlite to zero test failures.
Django is a pretty big project, so I can't imagine "ordinary mortal" projects
are going to be too bad (as long as not implemented pathologically). Of course,
the Django port has some way to go, but still ... pip and virtualenv are
relatively mature single code base ports, too. As additional examples - I've
done Babel, Whoosh, Elixir, WTForms and others the same way.

Of course, I understand that YMMV.

Regards,

Vinay Sajip



From jannis at leidel.info  Thu Dec  8 16:53:22 2011
From: jannis at leidel.info (Jannis Leidel)
Date: Thu, 8 Dec 2011 16:53:22 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <loom.20111208T161219-187@post.gmane.org>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
Message-ID: <C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>


On 08.12.2011, at 16:27, Vinay Sajip wrote:

> Matt Joiner <anacrolix <at> gmail.com> writes:
> 
>> 
>> Nobody is using 3 yet ;)
>> 
>> Sure, I use it for some personal projects, and other people pretend to
>> support it. Not really.
>> 
>> The worst of the pain in porting to Python 3000 has yet to even begin!
>> 
> 
> The classic chicken-and-egg problem, right? Someone's got to make a start. If
> you aim for porting with a single codebase and are not too hung up about
> "practicality beats purity" hacks like e = sys.exc_info()[1], then I think
> decent progress can be made with little risk, as long as the project has good
> test coverage (and if it doesn't ... well, that's risky even if you stay on 2.x
> ...).
> 
> Django porting took a week of elapsed time (i.e. < 1 person-week of effort) to
> go from thousands of test failures under 3.x and sqlite to zero test failures.
> Django is a pretty big project, so I can't imagine "ordinary mortal" projects
> are going to be too bad (as long as not implemented pathologically). Of course,
> the Django port has some way to go, but still ... pip and virtualenv are
> relatively mature single code base ports, too. As additional examples - I've
> done Babel, Whoosh, Elixir, WTForms and others the same way.

I don't want to rain on your parade, but even if your port of Django passes all tests, it's not at all near completion. As a framework we not only have to worry about the ability to run on Python 3.X but also how to teach our community to upgrade their projects (if possible at all). That means to reduce the number of hacks needed and thoroughly reviewing to not suddenly lead into a maintenance dead end. E.g. I'm still not sure the one codebase strategy is better than the 2to3 strategy.

Also, stating that pip and virtualenv were easy to port like other projects seems to me like only half of the story -- Carl and
me had to fix a non trivial part of your port before being able to do the Py3k release.

I don't mean to diminish your work, it *is* appreciated, but I'm rather careful with generalizations when it comes to changes of a platform on such epic scale.

Best,
Jannis


From vinay_sajip at yahoo.co.uk  Thu Dec  8 17:46:31 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Thu, 8 Dec 2011 16:46:31 +0000 (UTC)
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
Message-ID: <loom.20111208T171956-389@post.gmane.org>

Jannis Leidel <jannis <at> leidel.info> writes:

> I don't want to rain on your parade,

Not at all - feel free. I don't feel rained on in the least :-)

> but even if your port of Django passes all tests, it's not at all near
> completion. As a framework we not only have to worry about the ability to run
> on Python 3.X but also how to teach our community to upgrade their projects
> (if possible at all). That means to reduce the number of hacks needed and
> thoroughly reviewing to not suddenly lead into a maintenance dead end.
> E.g. I'm still not sure the one codebase strategy is better than the 2to3
> strategy.

Of course, and I did say in the post you're replying to that I know that the
Django port has some way to go. But even if you decide that the single code
base port is not something you want for Django, nevertheless, I think I've
shown that the single port strategy can work for a large project like Django
from a purely technical perspective such as passing a very large test suite.

Of course, there are many non-technical issues such as documentation, ease of
ongoing maintenance etc. which no doubt you will be reviewing in due course.

(In the above, I'm using "technical" in a very narrow sense, obviously.)

> Also, stating that pip and virtualenv were easy to port like other projects
> seems to me like only half of the story -- Carl and me had to fix a
> non-trivial part of your port before being able to do the Py3k release.

Sure, and I didn't mean to imply that I did all the work - but I did announce
it only after I got almost all, if not all, tests passing on 2.x and 3.x from
a single code base - just as I did with Django. If the tests didn't cover
everything, then more work would certainly have been required, but it's still
a respectable milestone to have achieved, IMO. But it's the single code base
strategy that I wanted to highlight - and AFAIK you haven't had to back-pedal
on that (or at least, if you did, it might have been nice to drop me a line to
that effect).

> I don't mean to diminish your work, it *is* appreciated, but I'm rather
> careful with generalizations when it comes to changes of a platform on
> such epic scale.

I hope I'm not being careless where you're being careful, but where does
caution start and timidity begin? You might remember that you brought up the
desirability of the Python 3 port on django-developers in September, which
got me thinking about it. My view of it is, if everyone thinks of it like
eating an elephant, no one is even going to take the first bite, for fear of
indigestion. Don't get me wrong - I understand about priorities and
commitments, and everyone scratching their own itch. So, I scratched mine, and
bet on the hunch that the elephant was only a chocolate elephant, and not a
real one. Time will of course tell ;-)

Regards,

Vinay Sajip


From martin at v.loewis.de  Thu Dec  8 18:26:59 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 08 Dec 2011 18:26:59 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323320919.2710.24.camel@thinko>
References: <1323320919.2710.24.camel@thinko>
Message-ID: <4EE0F363.4060208@v.loewis.de>

> It would make it possible to share code like this across py2 and py3:
> 
>    a = u'foo'
> 
> Instead of (with e.g. six):
> 
>    a = u('foo')
> 
> Or:
> 
>    from __future__ import unicode_literals
>    a = 'foo'
> 
> I recognize that the last option is probably the way "its meant to be
> done", but in reality it's just more practical to not fail when literal
> notation is more specific than strictly necessary.

You are giving these two options already:
- The former works for all Python versions. Although it may appear
  tedious to convert existing code to replace all Unicode literals
  with function calls, it would actually be possible/easy to write
  an automatic converter that does so for a complete code base,
  based on lib2to3.
- the second version is truly practical for all applications/libraries
  that only support 2.6+.

In addition, there also is another option:
- use 2to3, in some form

So you have already three solutions which are all transitional in some
sense, and you want yet another option? I fail to see why this option
is more practical than the options that are already there.

Regards,
Martin

From shane at hathawaymix.org  Thu Dec  8 19:21:40 2011
From: shane at hathawaymix.org (Shane Hathaway)
Date: Thu, 08 Dec 2011 11:21:40 -0700
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323325916.2710.39.camel@thinko>
References: <1323320919.2710.24.camel@thinko>	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>	<1323324644.2710.28.camel@thinko>	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
Message-ID: <4EE10034.2070809@hathawaymix.org>

On 12/07/2011 11:31 PM, Chris McDonough wrote:
> All I can really offer is my own experience here based on writing code
> that needs to straddle Python 2.5, 2.6, 2.7 and 3.2 without use of 2to3.
> Having u'' work across all of these would mean porting would not require
> as much eyeballing as code modified via "from future import
> unicode_literals", it would let more code work on 2.5 unchanged, and the
> resulting code would execute faster than code that required us to use a
> u() function.

Could you elaborate on why "from __future__ import unicode_literals" is 
inadequate (other than the Python 2.6 requirement)?

Shane

From tseaver at palladion.com  Thu Dec  8 20:03:15 2011
From: tseaver at palladion.com (Tres Seaver)
Date: Thu, 08 Dec 2011 14:03:15 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE0F363.4060208@v.loewis.de>
References: <1323320919.2710.24.camel@thinko> <4EE0F363.4060208@v.loewis.de>
Message-ID: <jbr1lj$16i$1@dough.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/08/2011 12:26 PM, "Martin v. L?wis" wrote:

>> It would make it possible to share code like this across py2 and
>> py3:
>> 
>> a = u'foo'
>> 
>> Instead of (with e.g. six):
>> 
>> a = u('foo')
>> 
>> Or:
>> 
>> from __future__ import unicode_literals a = 'foo'
>> 
>> I recognize that the last option is probably the way "its meant to
>> be done", but in reality it's just more practical to not fail when
>> literal notation is more specific than strictly necessary.
> 
> You are giving these two options already: - The former works for all
> Python versions. Although it may appear tedious to convert existing
> code to replace all Unicode literals with function calls, it would
> actually be possible/easy to write an automatic converter that does so
> for a complete code base, based on lib2to3.


I guess this could be done to generate "straddling" code from 2-only
code.  Note that the overhead of the function call is likely significant
in some cases:  generating a module scope constant is the only sane
replacement there, which might be harder to do in a fixer (I haven't
tried to write one yet).


> - the second version is truly practical for all
> applications/libraries that only support 2.6+.


Right.  The question is would running more P2 code unmodified in P3 be a
"Good Thing" from the perspective of P3 uptake:  developers who run up
against such issues tend to hit "camelback-meet-straw" points and bounce
off the effort.  Such a tiny change (a six line patch and an extra '..
note::' in the language reference section on string literal syntax) might
be worth avoiding that risk.


> In addition, there also is another option: - use 2to3, in some form


2to3 is not practical in a "straddling" case:

- - The script is too slow to use in development mode (like being back
  in "compile the world" Java / C++ land).

- - The transformed code generates tracebacks that don't match the source.


> So you have already three solutions which are all transitional in
> some sense, and you want yet another option? I fail to see why this
> option is more practical than the options that are already there.


The "redundant" u'*' spelling would be present in Python3 for the same
reason that the equally-reduntant b'*' spelling is present in Python
2.6+:  it makes writing portable code simpler.



Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7hCfIACgkQ+gerLs4ltQ5t8wCfalykXvpSq6awllQUpCymf8iM
3P0An0cCY/iZHcK82V+CqW07wCpGfBtf
=Q4Fv
-----END PGP SIGNATURE-----


From glyph at twistedmatrix.com  Thu Dec  8 21:32:20 2011
From: glyph at twistedmatrix.com (Glyph)
Date: Thu, 8 Dec 2011 15:32:20 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
Message-ID: <E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>

On Dec 8, 2011, at 7:32 AM, Nick Coghlan wrote:
> Having just purged so much cruft from the language, pleas to add some back permanently for a problem that is going to fade from significance within the next couple of years are unlikely to get very far.
> 

This problem is never going to go away.

This is not a comment on the success of py3, but rather the persistence of old versions of things.  Even assuming an awesomely optimistic schedule for py3k migrations, even assuming that *everything* on PyPI supports Py3 by the end of 2013, consider that all around the world, every day, new code is still being written in FORTRAN.  Much of it is being in FORTRAN 77, despite the fact that Fotran 90 is now over 20 years old.  Efforts still crop up periodically (some successful, some failed) to migrate these "legacy" projects to other languages, some of them as modern as C.

There are plenty of proprietary Python 2 systems which exist today for which there will not be a budget for a Python 3 migration this decade.  If history is an accurate guide, people will still be hired to work on python 2.x systems in the year 2100.  Some of them will be being hired to migrate that python 2.x code to python 3 (or 4, or 5, whatever we have by then).  If they're not, it will be because they're being hired to try to migrate it to Javascript instead, not because the Python 3 migration is "done" by then.

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/0d107230/attachment.html>

From martin at v.loewis.de  Thu Dec  8 22:27:06 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 08 Dec 2011 22:27:06 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
References: <1323320919.2710.24.camel@thinko>
	<5242067.5aBSYdFaIB@einstein>	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>	<3344831.JP9Cfj4Ety@einstein>	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
Message-ID: <4EE12BAA.1050601@v.loewis.de>

> This is not a comment on the success of py3, but rather the persistence
> of old versions of things.  Even assuming an awesomely optimistic
> schedule for py3k migrations, even assuming that *everything* on PyPI
> supports Py3 by the end of 2013, consider that all around the world,
> every day, new code is still being written in FORTRAN.

While this is true for FORTRAN, it is not for Python 1.5: no new
Python 1.5 code is written around the world, at least not every day.
Also for FORTRAN, new code that is written every day likely isn't
FORTRAN 66, but more likely FORTRAN 90 or newer.

The reason for that is that FORTRAN just isn't an obsolete language,
by any means, else people wouldn't bother producing new versions of
it, porting compilers to new processors, and so on. Contrast this to
Python 1, and soon Python 2, which actually *is* obsolete (just as
FORTRAN 66 *is* obsolete).

> Much of it is being in FORTRAN 77

Can you prove this? I trust that existing code is being maintained
in FORTRAN 77. For new code, I'm skeptical.

> There are plenty of proprietary Python 2 systems which exist today for
> which there will not be a budget for a Python 3 migration this decade.

And people using it can happily continue to use Python 2. If they
don't have a need to port their code to Python 3, they are not concerned
by whether you use a u prefix for strings in Python 3 or not.

Regards,
Martin

From robert.kern at gmail.com  Thu Dec  8 22:41:09 2011
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 08 Dec 2011 21:41:09 +0000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE12BAA.1050601@v.loewis.de>
References: <1323320919.2710.24.camel@thinko>
	<5242067.5aBSYdFaIB@einstein>	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>	<3344831.JP9Cfj4Ety@einstein>	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
Message-ID: <jbratl$48j$1@dough.gmane.org>

On 12/8/11 9:27 PM, "Martin v. L?wis" wrote:

[Glyph writes:]
>> Much of it is being in FORTRAN 77
>
> Can you prove this? I trust that existing code is being maintained
> in FORTRAN 77. For new code, I'm skeptical.

Personally, I've written more new code in FORTRAN 77 than in Fortran 90+. Even 
with all of the quirks in FORTRAN 77 compilers, it's still substantially easier 
to connect FORTRAN 77 code to C and Python than 90+. When they introduced some 
of the nicer language features, they left the precise details of memory 
structures of the new types undefined, so compilers chose different ways to 
implement them. Some of the very latest developments in modern Fortran have 
begun to standardize the FFI for these features (or at least let you write a 
standardized shim for them) and compilers are catching up.

For people writing new whole programs in Fortran, yes, they are probably mostly 
using 90+.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco


From janssen at parc.com  Thu Dec  8 23:09:59 2011
From: janssen at parc.com (Bill Janssen)
Date: Thu, 8 Dec 2011 14:09:59 PST
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE12BAA.1050601@v.loewis.de>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
Message-ID: <51106.1323382199@parc.com>

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at v.loewis.de> wrote:

> While this is true for FORTRAN, it is not for Python 1.5: no new
> Python 1.5 code is written around the world, at least not every day.

I don't know about that.  I've seen a lot of Python 2 code which was
apparently written by folks who learned Python 1.5.2 and never needed to
learn about newer features.  I suspect that's still going on fairly
widely.

Bill

From solipsis at pitrou.net  Fri Dec  9 01:35:35 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 9 Dec 2011 01:35:35 +0100
Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and
 PyUnicode_EncodeCodePage()
References: <E1RYnC6-0005HV-Ep@dinsdale.python.org>
Message-ID: <20111209013535.6fb38068@pitrou.net>

On Fri, 09 Dec 2011 00:16:02 +0100
victor.stinner <python-checkins at python.org> wrote:
>  
> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
> +
> +   Get a new copy of a Unicode object.
> +
> +   .. versionadded:: 3.3

I'm not sure I understand. Why would you make a copy of an immutable
object?




From tjreedy at udel.edu  Fri Dec  9 01:44:32 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 08 Dec 2011 19:44:32 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
Message-ID: <jbrllu$99u$1@dough.gmane.org>

On 12/8/2011 10:53 AM, Jannis Leidel wrote:

> possible at all). That means to reduce the number of hacks needed and
> thoroughly reviewing to not suddenly lead into a maintenance dead
> end. E.g. I'm still not sure the one codebase strategy is better than
> the 2to3 strategy.

One codebase with version compatibility hacks and no use of 2to3 is one 
pure strategy. Two codebases with no compatibility hacks (at least for 2 
versus 3) and use of 2to3 to bridge all differences is another.
Perhaps we need something in between, with a mix of compatibility hacks 
and automatic 2to3 conversions that has not been discovered yet, or that 
can be customized on a project by project basis.

Deleting 'u' prefixes from string literals is something that is easy to 
do with 2to3 for anyone who cannot use the future import because of 
supporting 2.5.

More that one person has said that *any* use of 2to3 is impractical for 
rapid-turnaround development because 2to3 is 'too slow'. If so, have the 
usual methods for speeding up a Python program been applied? Has anyone 
profiled 2to3? Is most of the time spent in 2to3 itself or some 
particular module that it uses? Is the time that is spend in 2to3 itself 
a result of the overall framework or particular fixers? If the latter, 
can slow fixers be eliminated by using a compatibility hack in the 
Python 2 code? Has anyone tried to compile 2to3 and prerequisite 
Python-coded modules?

-- 
Terry Jan Reedy


From glyph at twistedmatrix.com  Fri Dec  9 01:52:28 2011
From: glyph at twistedmatrix.com (Glyph)
Date: Thu, 8 Dec 2011 19:52:28 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE12BAA.1050601@v.loewis.de>
References: <1323320919.2710.24.camel@thinko>
	<5242067.5aBSYdFaIB@einstein>	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>	<3344831.JP9Cfj4Ety@einstein>	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
Message-ID: <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>

Zooming back in to the actual issue this thread is about, I think the u""-vs-"" issue is a bit of a red herring, because the _real_ problem here is that 2to3 is slow and buggy and so migration efforts are starting to work around it, and therefore want to run the same code on 3.x and all the way back to 2.5.

In my opinion, effort should be spent on optimizing the suggested migration tools and getting them to work properly, not twiddling the syntax so that it's marginally easier to avoid them.

On Dec 8, 2011, at 4:27 PM, Martin v. L?wis wrote:

>> This is not a comment on the success of py3, but rather the persistence
>> of old versions of things.  Even assuming an awesomely optimistic
>> schedule for py3k migrations, even assuming that *everything* on PyPI
>> supports Py3 by the end of 2013, consider that all around the world,
>> every day, new code is still being written in FORTRAN.
> 
> While this is true for FORTRAN, it is not for Python 1.5: no new
> Python 1.5 code is written around the world, at least not every day.
> Also for FORTRAN, new code that is written every day likely isn't
> FORTRAN 66, but more likely FORTRAN 90 or newer.

That's because Python 1.5 was upward-compatible with 2.x, and pretty much everyone could gently migrate, and start developing on the new versions even while supporting the old ones.  That is obviously not true of 3.x, by design; 2to3 requires that you still develop on the old version even if you support a new one, not to mention the substantially increased effort of migration.

> The reason for that is that FORTRAN just isn't an obsolete language,
> by any means, else people wouldn't bother producing new versions of
> it, porting compilers to new processors, and so on. Contrast this to
> Python 1, and soon Python 2, which actually *is* obsolete (just as
> FORTRAN 66 *is* obsolete).

Much as the Python core team might wish Python 2 would "soon" be obsolete, all of these things are happening for python 2.x now and all indications are that they will continue to happen.  PyPy, Jython, ShedSkin, Skulpt, IronPython, and possibly a few others are (to varying degrees) all targeting 2.x right now, because that's where the application code they want to run is.  PyPy is even porting the JIT compiler to a new processor (ARM).

F66 is indeed obsolete, but it became obsolete because people stopped using it, not because the standards committee declared it so.

>> Much of it is being in FORTRAN 77
> 
> Can you prove this? I trust that existing code is being maintained
> in FORTRAN 77. For new code, I'm skeptical.

I am not deeply immersed in the world where F77 is still popular, so I don't have any citations for you, but casual conversations with people working in the sciences, especially chemistry and materials science, suggests to me that a lot of F77 and start new projects in it.  (I can see someone with more direct experience promptly replied in this thread already, anyway.)

>> There are plenty of proprietary Python 2 systems which exist today for
>> which there will not be a budget for a Python 3 migration this decade.
> 
> And people using it can happily continue to use Python 2. If they
> don't have a need to port their code to Python 3, they are not concerned
> by whether you use a u prefix for strings in Python 3 or not.


I didn't say they didn't have a need ever, I said they didn't have a budget now.  What you are saying to those users here is basically: "if you can't migrate today, then just don't bother, we're never going to make it any easier".  Despite the fact that I ultimately agree on u'' (nobody should care about this), it is not a good message to send.

-glyph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/48e02e49/attachment.html>

From solipsis at pitrou.net  Fri Dec  9 01:56:00 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 9 Dec 2011 01:56:00 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
Message-ID: <20111209015600.4cbc5cf1@pitrou.net>

On Thu, 8 Dec 2011 19:52:28 -0500
Glyph <glyph at twistedmatrix.com> wrote:
> Zooming back in to the actual issue this thread is about, I think the u""-vs-"" issue is a bit of a red herring, because the _real_ problem here is that 2to3 is slow and buggy and so migration efforts are starting to work around it, and therefore want to run the same code on 3.x and all the way back to 2.5.
> 
> In my opinion, effort should be spent on optimizing the suggested migration tools and getting them to work properly, not twiddling the syntax so that it's marginally easier to avoid them.

Instead of modifying 2.x code and running 2to3 time after time on it,
you can use 2to3 on unmodified 2.x code and fix the generated 3.x code.
With proper use of branches and a DVCS, merging later 2.x changes
should be mostly painless.
(at least it works on https://bitbucket.org/pitrou/t3k/)

Regards

Antoine.



From vinay_sajip at yahoo.co.uk  Fri Dec  9 02:39:39 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Fri, 9 Dec 2011 01:39:39 +0000 (UTC)
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
	<jbrllu$99u$1@dough.gmane.org>
Message-ID: <loom.20111209T022519-121@post.gmane.org>

Terry Reedy <tjreedy <at> udel.edu> writes:


> More that one person has said that *any* use of 2to3 is impractical for 
> rapid-turnaround development because 2to3 is 'too slow'. If so, have the 
> usual methods for speeding up a Python program been applied? Has anyone 
> profiled 2to3? Is most of the time spent in 2to3 itself or some 
> particular module that it uses? Is the time that is spend in 2to3 itself 
> a result of the overall framework or particular fixers? If the latter, 
> can slow fixers be eliminated by using a compatibility hack in the 
> Python 2 code? Has anyone tried to compile 2to3 and prerequisite 
> Python-coded modules?
> 

It's not the speed of 2to3 per se; this seems very reasonable for a tool of its
type. It's the overall process, which currently involves running 2to3 on an
entire codebase (for example, using setup.py with flags to run 2to3 during
setup). With a large project like Django, and hundreds or thousands of source
files, 2to3 used in this way is on a hiding to nothing; no amount of profiling
and tweaking is likely to lead to acceptable turnaround.

However, 2to3 tools could be developed which are based on 2to3/lib2to3 and are
*incremental* in nature; then as you edit and save a file, its processed version
could be available very shortly afterwards (since we only need to translate the
file that was saved) - this would be even quicker in an IDE where the 2to3 code
(and perhaps the AST of files being worked on) could remain loaded in memory
over an entire development session. That, along with some more/smarter fixers,
could go some way to addressing the "too slow" issue.

Regards,


Vinay Sajip


From tjreedy at udel.edu  Fri Dec  9 03:01:30 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 08 Dec 2011 21:01:30 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
References: <1323320919.2710.24.camel@thinko>
	<5242067.5aBSYdFaIB@einstein>	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>	<3344831.JP9Cfj4Ety@einstein>	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
Message-ID: <jbrq67$28o$1@dough.gmane.org>

On 12/8/2011 7:52 PM, Glyph wrote:
> Zooming back in to the actual issue this thread is about, I think the
> u""-vs-"" issue is a bit of a red herring, because the _real_ problem
> here is that 2to3 is slow and buggy and so migration efforts are
> starting to work around it, and therefore want to run the same code on
> 3.x and all the way back to 2.5.

I would expect that running one codebase would push one to only run on 
2.6+, which would make one codebase easier, but it does not seem to.

> In my opinion, effort should be spent on optimizing the suggested
> migration tools and getting them to work properly, not twiddling the
> syntax so that it's marginally easier to avoid them.

This is what I tried to say in my last post.

...
> I didn't say they didn't have a /need ever/, I said they didn't have a
> /budget now/. What you are saying to those users here is basically: "if
> you can't migrate today, then just don't bother, we're never going to
> make it any easier". Despite the fact that I ultimately agree on u''
> (nobody should care about this), it is not a good message to send.

I agree that would not be a good message, but a) I do not think that was 
the intent (I think is was more like "the *current* start of porting 
tools is a moot point for those not now porting") and b) good messages 
go both ways. People say "Python 2 is where the money is, it has 
(almost?) all the production apps, etcetera." Probably (mostly?) true. 
So where is the support from the vast army of 2.7 users for continuing 
to polish 2.7 past the normal 2 years (which ended last June)? Or for 
improving the migration tools?

-- 
Terry Jan Reedy


From regebro at gmail.com  Fri Dec  9 03:50:16 2011
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 9 Dec 2011 03:50:16 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <jbrq67$28o$1@dough.gmane.org>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
Message-ID: <CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>

"from future import unicode_literals" is my fault. I'm sorry. It's
pretty useless. It was suggested by somebody and I then supported it's
adding, instead of allowing u'' which I suggested. But it doesn't
work.

One reason is that you need to be able to say "This should be str in
Python 2, and binary in Python 3, that should be Unicode in Python 2
and str in Python 3, and that over there should be str in both
versions", and the future import doesn't support that.

Adding u'' support solves the problem, but then again, so does having
a b() and an u() method. I'm not sure of the utility of adding
functionality to Python 3 that can be solved with six.

//Lennart

From guido at python.org  Fri Dec  9 03:53:55 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 8 Dec 2011 18:53:55 -0800
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
Message-ID: <CAP7+vJKJ5VrPj1hR0TOFmCUU6hBz5tD2KZTQcqBC1pYkhBUA4w@mail.gmail.com>

Are you saying that with that future import, b"..." is still a Unicode
literal?

On Thu, Dec 8, 2011 at 6:50 PM, Lennart Regebro <regebro at gmail.com> wrote:

> "from future import unicode_literals" is my fault. I'm sorry. It's
> pretty useless. It was suggested by somebody and I then supported it's
> adding, instead of allowing u'' which I suggested. But it doesn't
> work.
>
> One reason is that you need to be able to say "This should be str in
> Python 2, and binary in Python 3, that should be Unicode in Python 2
> and str in Python 3, and that over there should be str in both
> versions", and the future import doesn't support that.
>
> Adding u'' support solves the problem, but then again, so does having
> a b() and an u() method. I'm not sure of the utility of adding
> functionality to Python 3 that can be solved with six.
>
> //Lennart
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/abb04a77/attachment.html>

From ncoghlan at gmail.com  Fri Dec  9 04:11:10 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 9 Dec 2011 13:11:10 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <jbrq67$28o$1@dough.gmane.org>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
Message-ID: <CADiSq7dFKqzHuJs2ve39UAqt4uwtTQZT9JzzU3bA1m+G3kUL9A@mail.gmail.com>

On Fri, Dec 9, 2011 at 12:01 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 12/8/2011 7:52 PM, Glyph wrote:
>>
>> Zooming back in to the actual issue this thread is about, I think the
>> u""-vs-"" issue is a bit of a red herring, because the _real_ problem
>> here is that 2to3 is slow and buggy and so migration efforts are
>> starting to work around it, and therefore want to run the same code on
>> 3.x and all the way back to 2.5.
>
>
> I would expect that running one codebase would push one to only run on 2.6+,
> which would make one codebase easier, but it does not seem to.

Actually, most of the feedback I've heard is that using one codebase
is comparatively straightforward if you can drop support for 2.5 and
earlier. Mainly because of this:

>>> from __future__ import unicode_literals
>>> from __future__ import print_function
>>> print
<built-in function print>
>>> print(type(''))
<type 'unicode'>
>>> print(type(b''))
<type 'str'>

That's why I'm quite happy to say to people that if they currently
have to support 2.5 or earlier, and they're not prepared to fork their
codebase or drop support for those earlier Python versions in new
releases, then it's *perfectly fine* for them to delay their 3.x
support until they *can* use the compatibility tools we provide to
make "single source" approaches easier.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From barry at python.org  Fri Dec  9 04:34:08 2011
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 Dec 2011 22:34:08 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
Message-ID: <20111208223408.0e2e8bd1@limelight.wooz.org>

On Dec 09, 2011, at 03:50 AM, Lennart Regebro wrote:

>One reason is that you need to be able to say "This should be str in
>Python 2, and binary in Python 3, that should be Unicode in Python 2
>and str in Python 3, and that over there should be str in both
>versions", and the future import doesn't support that.

Sorry, I don't understand this.  What does it mean to be "str in both
versions"?  And why would you want that?

As for "str in Python 2 and binary in Python 3", b'' prefixes do that in
Python >= 2.6 without the future import (if I take "binary" to mean bytes
type).

As for "Unicode in Python 2 and str in Python 3", unadorned strings with the
future import in Python >= 2.6 does that just fine.

One of the nice things too is that with #include <bytesobject.h> in Python >=
2.6, changing all your PyStrings to PyBytes, you can get the same behavior in
your extension modules.

You still need to be clear about what are bytes and what are strings.  The
problem comes when you aren't or can't be sure, i.e. you have objects that are
sometimes one and sometimes the other.  Such as email headers.  In that case,
you're kind of screwed.  Python 2's str type let you cheat, but not without
consequences.  Those consequences are spelled "UnicodeErrors" and I'll be glad
to be rid of them.

Cheers,
-Barry

From barry at python.org  Fri Dec  9 04:38:16 2011
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 Dec 2011 22:38:16 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAP7+vJKJ5VrPj1hR0TOFmCUU6hBz5tD2KZTQcqBC1pYkhBUA4w@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<CAP7+vJKJ5VrPj1hR0TOFmCUU6hBz5tD2KZTQcqBC1pYkhBUA4w@mail.gmail.com>
Message-ID: <20111208223816.2329a110@limelight.wooz.org>

On Dec 08, 2011, at 06:53 PM, Guido van Rossum wrote:

>Are you saying that with that future import, b"..." is still a Unicode
>literal?

No, the future import has no impact on b-strings.

-----snip snip-----
from __future__ import print_function
import sys
print(sys.version_info.major, sys.version_info.minor, type(b''))
-----snip snip-----

$ python /tmp/foo.py
2 7 <type 'str'>
$ python3 /tmp/foo.py
3 2 <class 'bytes'>

-----snip snip-----
from __future__ import print_function, unicode_literals
import sys
print(sys.version_info.major, sys.version_info.minor, type(b''))
-----snip snip-----

$ python /tmp/foo.py
2 7 <type 'str'>
$ python3 /tmp/foo.py
3 2 <class 'bytes'>

Cheers,
-Barry

From chrism at plope.com  Fri Dec  9 05:24:33 2011
From: chrism at plope.com (Chris McDonough)
Date: Thu, 08 Dec 2011 23:24:33 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <20111208223408.0e2e8bd1@limelight.wooz.org>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
Message-ID: <1323404673.2710.132.camel@thinko>

On Thu, 2011-12-08 at 22:34 -0500, Barry Warsaw wrote:
> On Dec 09, 2011, at 03:50 AM, Lennart Regebro wrote:
> 
> >One reason is that you need to be able to say "This should be str in
> >Python 2, and binary in Python 3, that should be Unicode in Python 2
> >and str in Python 3, and that over there should be str in both
> >versions", and the future import doesn't support that.
> 
> Sorry, I don't understand this.  What does it mean to be "str in both
> versions"?  And why would you want that?
> 
> As for "str in Python 2 and binary in Python 3", b'' prefixes do that in
> Python >= 2.6 without the future import (if I take "binary" to mean bytes
> type).
> 
> As for "Unicode in Python 2 and str in Python 3", unadorned strings with the
> future import in Python >= 2.6 does that just fine.
> 
> One of the nice things too is that with #include <bytesobject.h> in Python >=
> 2.6, changing all your PyStrings to PyBytes, you can get the same behavior in
> your extension modules.
> 
> You still need to be clear about what are bytes and what are strings.  The
> problem comes when you aren't or can't be sure, i.e. you have objects that are
> sometimes one and sometimes the other.  Such as email headers.  In that case,
> you're kind of screwed.  Python 2's str type let you cheat, but not without
> consequences.  Those consequences are spelled "UnicodeErrors" and I'll be glad
> to be rid of them.

The PEP 3333 WSGI protocol *requires* that you present its APIs with
"native strings" (str on Python 3, str on Python 2).  So while the
oversimplification "don't do that" sounds great here, in real life, not
so much.

- C



From chrism at plope.com  Fri Dec  9 05:33:24 2011
From: chrism at plope.com (Chris McDonough)
Date: Thu, 08 Dec 2011 23:33:24 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
Message-ID: <1323405204.2710.139.camel@thinko>

On Fri, 2011-12-09 at 03:50 +0100, Lennart Regebro wrote:
> "from future import unicode_literals" is my fault. I'm sorry. It's
> pretty useless. It was suggested by somebody and I then supported it's
> adding, instead of allowing u'' which I suggested. But it doesn't
> work.
> 
> One reason is that you need to be able to say "This should be str in
> Python 2, and binary in Python 3, that should be Unicode in Python 2
> and str in Python 3, and that over there should be str in both
> versions", and the future import doesn't support that.

This is also true.

But even so, b'' exists as a porting nicety.  The argument for
supporting u'' is the same one the one which exists for b'', except in
the opposite direction.  Since popular library code is going to need to
run on both Python 2 and Python 3 for the foreseeable future, anything
to make this easier helps.

Supporting u'' in 3.3 will prevent me from needing to think about
bytes/text distinction again while porting/straddling.  Every time I say
this to somebody who isn't listening closely they say "AHA!  You're
*supposed* to think about bytes vs. text, that's the whole point
stupid!"

They fail to hear the "again" in that sentence.  I've clearly already
thought about the distinction between bytes and text at least once:
that's *why* I'm using a u'' literal there.  I shouldn't have to think
about it again to service syntax constraints.  Code that is more
explicit than strictly necessary should not be needlessly punished.

Continuing to not support u'' in Python 3 will be like having an
immigration station where folks who have a  b'ritish' passport can get
through right away, but folks with a u'kranian' passport need to get
back on a plane that appears to come from the Ukraine before they
receive another tag that says they are indeed from the Ukraine.  It's
just pointless makework.

- C



From ncoghlan at gmail.com  Fri Dec  9 06:30:36 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 9 Dec 2011 15:30:36 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323405204.2710.139.camel@thinko>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<1323405204.2710.139.camel@thinko>
Message-ID: <CADiSq7deGnPd3M=Zao_6qThSvL2YGYfTOwCLephDHp0z0cKfNQ@mail.gmail.com>

On Fri, Dec 9, 2011 at 2:33 PM, Chris McDonough <chrism at plope.com> wrote:
> Continuing to not support u'' in Python 3 will be like having an
> immigration station where folks who have a ?b'ritish' passport can get
> through right away, but folks with a u'kranian' passport need to get
> back on a plane that appears to come from the Ukraine before they
> receive another tag that says they are indeed from the Ukraine. ?It's
> just pointless makework.

OK, I think I finally understand your point. You want the ability to
be able to, in your Python 2.x code, write modules that use *all
three* kinds of string literal:

----------
foo = u"this is a Unicode string in both Python 2.x and 3.x"
bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x"
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------

This is driven by the desire to use APIs (like the PEP 3333 version of
WSGI) that are defined in terms of "native strings" in the context of
applications that already include a strong binary/text separation.

Currently, in modules shared between the two series, you can't use the
"u" marker at all, since Python 3.x leaves it out as being redundant -
instead, you have a binary switch (in the form of the future import)
that lets you toggle the behaviour of basic string literals between
the first two forms:

----------
bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x"
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------
from __future__ import unicode_literals
foo = "this is a Unicode string in both Python 2.x and 3.x"
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------

Currently, to get all 3 kinds of behaviour in a shared codebase
without additional function calls at runtime, you need to pick one set
of strings (either "always Unicode" or "native string type") and move
them out to a separate module. So, for example, depending on which set
you decided to move:

----------
from unicode_strings import foo
bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x"
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------
from __future__ import unicode_literals
foo = "this is a Unicode string in both Python 2.x and 3.x"
from native_strings import bar
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------

Or, alternatively, you use 'six' (or a similar compatibility module)
and ensure unicode at runtime, using native or binary strings
otherwise:

----------
from six import u
foo = u("this is a Unicode string in both Python 2.x and 3.x")
bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x"
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------

If you want to target 3.2, you *have* to use one of those mechanisms -
any potential restoration of u'' syntax support won't help you (and
even after 3.3 gets released in the latter half of next year, it's
still going to be a fair while before it makes it's way into the
various distros, especially the ones that include long term support
from major vendors).

So, instead of attempting to paper over the problem by reintroducing
u'', perhaps the discussion we should be having is whether or not PEP
3333's superficially appealing concept of defining an API in terms of
"native strings" is a loser in practice, and we should instead be
looking more closely at PEP 444 (since that goes the route of using
'str' in 2.x and 'bytes' in 3.x, thus rendering "from __future__
import unicode_literals" an adequate solution for 2.6+ compatibility).

The amount of pain that PEP 3333 seems to be causing in the web
development world suggests to me we may simply have been *wrong* to
think that PEP 3333 would be a workable long term approach.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From chrism at plope.com  Fri Dec  9 06:33:59 2011
From: chrism at plope.com (Chris McDonough)
Date: Fri, 09 Dec 2011 00:33:59 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
Message-ID: <1323408839.2710.143.camel@thinko>

On Thu, 2011-12-08 at 19:52 -0500, Glyph wrote:
> Zooming back in to the actual issue this thread is about, I think the
> u""-vs-"" issue is a bit of a red herring, because the _real_ problem
> here is that 2to3 is slow and buggy and so migration efforts are
> starting to work around it, and therefore want to run the same code on
> 3.x and all the way back to 2.5.

Even if it weren't slow, I still wouldn't use it to automatically
convert code at install time; a single codebase is easier to reason
about, and easier to support.  Users send me tracebacks all the time;
having them match the source is a wonderful thing.

- C




From ncoghlan at gmail.com  Fri Dec  9 06:41:40 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 9 Dec 2011 15:41:40 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323408839.2710.143.camel@thinko>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
Message-ID: <CADiSq7cbZfkAOU4+1V+WmThMFyUAeMSQrOVw-TLePz=fQnXiog@mail.gmail.com>

On Fri, Dec 9, 2011 at 3:33 PM, Chris McDonough <chrism at plope.com> wrote:
> Even if it weren't slow, I still wouldn't use it to automatically
> convert code at install time; a single codebase is easier to reason
> about, and easier to support. ?Users send me tracebacks all the time;
> having them match the source is a wonderful thing.

Yeah, if single source doesn't work, then I think Antoine's suggested
way (i.e. convert once, then maintain two distinct branches and
builds, the way python-dev did for years with the standard library) is
a more sane option. It lets you investigate tracebacks properly, it
reduces your cycle times, etc, etc.

With a modern DVCS, it should be significantly less painful than it
was for us when we were maintaining four branches with only svnmerge
to help out.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From guido at python.org  Fri Dec  9 06:43:35 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 8 Dec 2011 21:43:35 -0800
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323408839.2710.143.camel@thinko>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
Message-ID: <CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>

On Thu, Dec 8, 2011 at 9:33 PM, Chris McDonough <chrism at plope.com> wrote:

> On Thu, 2011-12-08 at 19:52 -0500, Glyph wrote:
> > Zooming back in to the actual issue this thread is about, I think the
> > u""-vs-"" issue is a bit of a red herring, because the _real_ problem
> > here is that 2to3 is slow and buggy and so migration efforts are
> > starting to work around it, and therefore want to run the same code on
> > 3.x and all the way back to 2.5.
>
> Even if it weren't slow, I still wouldn't use it to automatically
> convert code at install time; a single codebase is easier to reason
> about, and easier to support.  Users send me tracebacks all the time;
> having them match the source is a wonderful thing.


Even though 2to3 was my idea, I am gradually beginning to appreciate this
approach. I skimmed the docs for "six" and liked it.

But I think the specific proposal of adding u"..." literals back to 3.3 is
not going to do much good. If we had had the foresight way back when, we
could have added them back to 3.1 and we would have been okay. But having
them in 3.3 but not in 3.2 is just adding insult to injury. I recommend
writing b"...".decode('utf-8'); maybe six's u() does the same?

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/f0ac6db1/attachment.html>

From chrism at plope.com  Fri Dec  9 07:01:10 2011
From: chrism at plope.com (Chris McDonough)
Date: Fri, 09 Dec 2011 01:01:10 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
	<CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
Message-ID: <1323410470.2710.158.camel@thinko>

On Thu, 2011-12-08 at 21:43 -0800, Guido van Rossum wrote:
> On Thu, Dec 8, 2011 at 9:33 PM, Chris McDonough <chrism at plope.com>
> wrote:
>         On Thu, 2011-12-08 at 19:52 -0500, Glyph wrote:
>         > Zooming back in to the actual issue this thread is about, I
>         think the
>         > u""-vs-"" issue is a bit of a red herring, because the
>         _real_ problem
>         > here is that 2to3 is slow and buggy and so migration efforts
>         are
>         > starting to work around it, and therefore want to run the
>         same code on
>         > 3.x and all the way back to 2.5.
>         
>         
>         Even if it weren't slow, I still wouldn't use it to
>         automatically
>         convert code at install time; a single codebase is easier to
>         reason
>         about, and easier to support.  Users send me tracebacks all
>         the time;
>         having them match the source is a wonderful thing.
> 
> Even though 2to3 was my idea, I am gradually beginning to appreciate
> this approach. I skimmed the docs for "six" and liked it.
> 
> But I think the specific proposal of adding u"..." literals back to
> 3.3 is not going to do much good. If we had had the foresight way back
> when, we could have added them back to 3.1 and we would have been
> okay. But having them in 3.3 but not in 3.2 is just adding insult to
> injury.

AFAICT, at the current pace of porting, lots of authors of existing,
popular Python 2 libraries won't be releasing a ported/straddled version
any time soon; almost certainly many won't even begin work on a port
until after 3.3 is final.  As a result, on the supplier side, there will
be plenty of code that will eventually work only as a straddle across
2.6, 2.7, and 3.3.

On the consumer side, folks who want to run 2.6/2.7/3.3-only codebases
will have the wherewithal to compile their own Python 3 (or use a PPA or
equivalent) until the distros catch up.

So I'm not sure why 3.2 not having support for u'' should be a real
blocker for the change.

>  I recommend writing b"...".decode('utf-8'); maybe six's u() does the
> same?

It does this:

    def u(s):
        return unicode(s, "unicode_escape")

That's two Python function calls, of course, which is obviously icky if
you use a lot of literals at a nonmodule scope.

- C




From ncoghlan at gmail.com  Fri Dec  9 07:36:03 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 9 Dec 2011 16:36:03 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323410470.2710.158.camel@thinko>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
	<CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
	<1323410470.2710.158.camel@thinko>
Message-ID: <CADiSq7cwER0Hkkv2xGh_4p_1aZ3DBqFkkUt8Yk_fwxxDGp3NNA@mail.gmail.com>

On Fri, Dec 9, 2011 at 4:01 PM, Chris McDonough <chrism at plope.com> wrote:
> On the consumer side, folks who want to run 2.6/2.7/3.3-only codebases
> will have the wherewithal to compile their own Python 3 (or use a PPA or
> equivalent) until the distros catch up.
>
> So I'm not sure why 3.2 not having support for u'' should be a real
> blocker for the change.

If this argument was valid, people wouldn't be so worried about
maintaining 2.5 compatibility in their libraries. Consider if I tried
to make this argument to justify everyone dropping 2.5 and earlier
support today:

"""On the consumer side, folks who want to run 2.6+ codebases on older
Linux distros have the wherewithal to compile their own more recent
Python 2 (or use a PPA or
equivalent) until they can move to a more recent version of their distro."""

It's simply not true in the general case - people don't maintain 2.4+
compatibility for fun, they do it because RHEL5 (and CentOS 5, etc)
are still reasonably common and ship with 2.4 as the system Python. As
soon as you switch away from the system provided Python, you're
switching away from the vendors entire pre-packaged Python *stack*,
not just the interpreter itself. You then have to install (and
generally build) *everything* for yourself. While that is certainly
possible these days (and a lot simpler than it used to be), it's still
not trivial [1].

Since 3.2 is already quite usable for applications that aren't
fighting with the "native strings" problem (which seems to be the
common thread running through the complaints I've heard from web
framework authors), and with it being included in at least the next
Ubuntu LTS, current versions of Fedora, Arch, etc, it's going to be
around for a long time. Ignoring 3.1 is a reasonable option. Ignoring
3.2 entirely is unlikely to be viable for anyone that is interested in
supporting 3.x within the next couple of years - the 3.3 release is at
least 9 months away, and it's also going to take a while for it to
make its way into distros after the final release gets published on
python.org.

Hence my suggestion: perhaps the problem is the fact that PEP 3.3/WSGI
1.0.1 introduced the "native string" concept as a minimalist hack to
try to get a usable gateway interface in Python 3, and that just
doesn't work in practice when attempting to straddle 2.x and 3.x
(because the values WSGI is dealing with aren't really text, they're
bytes, only *some* of which represent text). Perhaps a PEP 444 based
model would be less painful and more coherent in the long run?

Cheers,
Nick.

[1] http://readthedocs.org/docs/ncoghlan_devs-python-notes/en/latest/venv_bootstrap.html

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From chrism at plope.com  Fri Dec  9 08:38:05 2011
From: chrism at plope.com (Chris McDonough)
Date: Fri, 09 Dec 2011 02:38:05 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7cwER0Hkkv2xGh_4p_1aZ3DBqFkkUt8Yk_fwxxDGp3NNA@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
	<CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
	<1323410470.2710.158.camel@thinko>
	<CADiSq7cwER0Hkkv2xGh_4p_1aZ3DBqFkkUt8Yk_fwxxDGp3NNA@mail.gmail.com>
Message-ID: <1323416285.2710.219.camel@thinko>

On Fri, 2011-12-09 at 16:36 +1000, Nick Coghlan wrote:
> On Fri, Dec 9, 2011 at 4:01 PM, Chris McDonough <chrism at plope.com> wrote:
> > On the consumer side, folks who want to run 2.6/2.7/3.3-only codebases
> > will have the wherewithal to compile their own Python 3 (or use a PPA or
> > equivalent) until the distros catch up.
> >
> > So I'm not sure why 3.2 not having support for u'' should be a real
> > blocker for the change.
> 
> If this argument was valid, people wouldn't be so worried about
> maintaining 2.5 compatibility in their libraries. Consider if I tried
> to make this argument to justify everyone dropping 2.5 and earlier
> support today:
> 
> """On the consumer side, folks who want to run 2.6+ codebases on older
> Linux distros have the wherewithal to compile their own more recent
> Python 2 (or use a PPA or
> equivalent) until they can move to a more recent version of their distro."""

Fair point.

That said, personally, I have given up entirely on Python 2.4 and 2.5
support for newer versions of my OSS libraries.  I continue to backport
fixes and (some) features to older library versions so folks can run
those on systems that require older Pythons.  I gave up 2.5 support
fairly recently across everything new, and I gave up support for 2.4 a
year ago or more in new releases with the same intent.

In reality, there is only one major platform that requires 2.4: RHEL 5
and folks who use it will just need to also use old versions of popular
libraries; trying to support it for all future feature work until it's
EOLed is not sane unless someone pays for it.  Python 2.5 has slightly
more compelling platforms (GAE and Jython), but GAE is moving to Python
2.7 and Jython is a bit moribund these days and is not really popular
enough that a critical mass of folks will clamor for new-and-shiny
releases that run on it.

The upshot is that most newly created code only needs to run on Python
2.6 and *some* version of Python 3.  And being able to eventually write
that code in a nonsucky subset of Python 2/3 is important to me, because
I'm going to be developing software in that subset for many years (way
past the timeframe we're talking about in which Python 3.2 will rule the
roost).

> It's simply not true in the general case - people don't maintain 2.4+
> compatibility for fun, they do it because RHEL5 (and CentOS 5, etc)
> are still reasonably common and ship with 2.4 as the system Python. As
> soon as you switch away from the system provided Python, you're
> switching away from the vendors entire pre-packaged Python *stack*,
> not just the interpreter itself. You then have to install (and
> generally build) *everything* for yourself. While that is certainly
> possible these days (and a lot simpler than it used to be), it's still
> not trivial [1].
> 
> Since 3.2 is already quite usable for applications that aren't
> fighting with the "native strings" problem (which seems to be the
> common thread running through the complaints I've heard from web
> framework authors), and with it being included in at least the next
> Ubuntu LTS, current versions of Fedora, Arch, etc, it's going to be
> around for a long time. Ignoring 3.1 is a reasonable option. Ignoring
> 3.2 entirely is unlikely to be viable for anyone that is interested in
> supporting 3.x within the next couple of years - the 3.3 release is at
> least 9 months away, and it's also going to take a while for it to
> make its way into distros after the final release gets published on
> python.org.
> 
> Hence my suggestion: perhaps the problem is the fact that PEP 3.3/WSGI
> 1.0.1 introduced the "native string" concept as a minimalist hack to
> try to get a usable gateway interface in Python 3, and that just
> doesn't work in practice when attempting to straddle 2.x and 3.x
> (because the values WSGI is dealing with aren't really text, they're
> bytes, only *some* of which represent text). Perhaps a PEP 444 based
> model would be less painful and more coherent in the long run?

Possibly.  I was the original author of PEP 444 with help from Armin.
(although it has since been taken up by Alice and I do not support the
updates it has received since then).

A bytes-oriented WSGI-like protocol was always the saner option.  The
native string idea optimized in exactly the wrong place, which was to
make it easy to write WSGI middleware, where you're required to do lots
of textlike manipulation of header values.  The idea of using bytes in
places where PEP 3333 now mandates native strings was rejected because
people were (somewhat justifiably) horrified at what they had to do in
order to attempt treat bytes like strings in this context on Python 3 at
the time.  It has gotten better, but maybe still not better enough to
appease the folks who blocked the idea originally.

But all of that is just arguing with the umpire at this point.
Promoting and getting consensus about a different protocol will hurt a
lot.  PEP 3333 was borne of months of intense periods of arguing and
compromise.  It is the way it is now because everyone was too exhausted
to argue about it any more.  I don't think that has changed much since
it was accepted, and asking folks to go back to that particular drawing
board is unlikely to have promising results.  Folks have already spent
many hours, and lots of money on implementations that the current PEP.
They may hunt us down and murder us one by one. ;-)  PEP 3333, to its
credit, is also remarkably backwards compatible with PEP 333, requiring
very little change in existing Python 2 WSGI implementations, which
helps Python 2 folks a lot.

Given an effective choice between enabling six lines of code in Python
3.3 to support u'' and months of political wrangling and code rewriting,
I'll choose the former any day.  If we were talking about a change to
Python that actually required nontrivial effort, had some sort of
nominal consequence, or had some sort of non-theoretical downside, I'd
be a lot less sanguine about it.  But this is just a no-brainer in the
long term, AFAICT.

- C



From stefan_ml at behnel.de  Fri Dec  9 09:02:35 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 09 Dec 2011 09:02:35 +0100
Subject: [Python-Dev] Fixing the XML batteries
Message-ID: <jbsfar$en7$1@dough.gmane.org>

Hi everyone,

I think Py3.3 would be a good milestone for cleaning up the stdlib support 
for XML. Note upfront: you may or may not know me as the maintainer of 
lxml, the de-facto non-stdlib standard Python XML tool. This (lengthy) post 
was triggered by the following kind of conversation that I keep having with 
new XML users in Python (mostly on c.l.py), which hints at some serious 
flaw in the stdlib.

User: I'm trying to do XML stuff XYZ in Python and have problem ABC.
Me: What library are you using? Could you show us some code?
User: My code looks like this snippet: ...
Me: You are using minidom which is known to be hard to use, slow and uses 
lots of memory. Use the xml.etree.ElementTree package instead, or rather 
its C implementation cElementTree, also in the stdlib.
User (coming back after a while): thanks, that was exactly what [I didn't 
know] I was looking for.

What does this tell us?

1) MiniDOM is what new users find first. It's highly visible because there 
are still lots of ancient "Python and XML" web pages out there that date 
back from the time before Python 2.5 (or rather something like 2.2), when 
it was the only XML tree library in the stdlib. It's also the first hit 
from the top when you search for "XML" on the stdlib docs page and contains 
the (to some people) familiar word "DOM", which lets users stop their 
search and start writing code, not expecting to find a separate alternative 
in the same stdlib, way further down. And the description as "mini", 
"simple" and "lightweight" suggests to users that it's going to be easy to 
use and efficient.

2) MiniDOM is not what users want. It leads to complicated, unpythonic code 
and lots of problems. It is neither easy to use, nor efficient, nor 
"lightweight", "simple" or "mini", not in absolute numbers (see 
http://bugs.python.org/issue11379#msg148584 and following for a recent 
discussion). It's also badly maintained in the sense that its performance 
characteristics could likely be improved, but no-one is seriously 
interested in doing that, because it would not lead to something that 
actually *is* fast or memory friendly compared to any of the 'real' 
alternatives that are available right now.

3) ElementTree is what users should use, MiniDOM is not. ElementTree was 
added to the stdlib in Py2.5 on popular demand, exactly because it is very 
easy to use, very fast, and very memory friendly. And because users did not 
want to use MiniDOM any more. Today, ElementTree has a rather straight 
upgrade path towards lxml.etree if more XML features like validation or 
XSLT are needed. MiniDOM has nothing like that to offer. It's a dead end.

4) In the stdlib, cElementTree is independent of ElementTree, but totally 
hidden in the documentation. In conversations like the above, it's 
unnecessarily complex to explain to users that there is ElementTree (which 
is documented in the stdlib), but that what they want to use is really 
cElementTree, which has the same API but does not have a stdlib 
documentation page that I can send them to. Note that the other Python 
implementations simply provide cElementTree as an alias for ElementTree. 
That leaves CPython as the only Python implementation that really has these 
two separate modules.

So, there are many problems here. And I think they make it unnecessarily 
complicated for users to process XML in Python and that the current 
situation helps in turning away new users from Python as a language for XML 
processing. Python does have impressively great tools for working with XML. 
It's just that the stdlib and its documentation do not reflect or even 
appreciate that.

What should change?

a) The stdlib documentation should help users to choose the right tool 
right from the start. Instead of using the totally misleading wording that 
it uses now, it should be honest about the performance characteristics of 
MiniDOM and should actively suggest that those who don't know what to 
choose (or even *that* they can choose) should not use MiniDOM in the first 
place. I created a ticket (issue11379) for a minor step in this direction, 
but given the responses, I'm rather convinced that there's a lot more that 
can be done and should be done, and that it should be done now, right for 
the next release.

b) cElementTree should finally loose it's "special" status as a separate 
library and disappear as an accelerator module behind ElementTree. This has 
been suggested a couple of times already, and AFAIR, there was some 
opposition because 1) ET was maintained outside of the stdlib and 2) the 
APIs of both were not identical. However, getting ET 1.3 into Py2.7 and 3.2 
was a U-turn. Today, ET is *only* being maintained in the stdlib by Florent 
Xicluna (who is doing a good job with it), and ET 1.3 has basically made 
the APIs of both implementations compatible again. So, 3.3 would be the 
right milestone for fixing the "two libs for one" quirk.

Given that this is the third time during the last couple of years that I'm 
suggesting to finally fix the stdlib and its documentation, I won't provide 
any further patches before it has finally been accepted that a) this is a 
problem and b) it should be fixed, thus allowing the patches to actually 
serve a purpose. If we can agree on that, I'll happily help in making this 
change happen.

Stefan


From ncoghlan at gmail.com  Fri Dec  9 09:09:46 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 9 Dec 2011 18:09:46 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323416285.2710.219.camel@thinko>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
	<CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
	<1323410470.2710.158.camel@thinko>
	<CADiSq7cwER0Hkkv2xGh_4p_1aZ3DBqFkkUt8Yk_fwxxDGp3NNA@mail.gmail.com>
	<1323416285.2710.219.camel@thinko>
Message-ID: <CADiSq7c=Q-aqn3MKiDBiRBOZVZ0hJ1cGg2TrasxuB=74Ak1KkQ@mail.gmail.com>

Given that WSGI 1.0.1 is defined in terms of native strings and restoring
u'' support allows that to be expressed clearly in a shared codebase, I at
least understand the point of the suggestion now. I'm not quite convinced
restoring u'' is the right answer as yet, but a solid use case is always a
nice place to start :)

--
Nick Coghlan (via Gmail on Android, so likely to be more terse than usual)
On Dec 9, 2011 5:38 PM, "Chris McDonough" <chrism at plope.com> wrote:

> On Fri, 2011-12-09 at 16:36 +1000, Nick Coghlan wrote:
> > On Fri, Dec 9, 2011 at 4:01 PM, Chris McDonough <chrism at plope.com>
> wrote:
> > > On the consumer side, folks who want to run 2.6/2.7/3.3-only codebases
> > > will have the wherewithal to compile their own Python 3 (or use a PPA
> or
> > > equivalent) until the distros catch up.
> > >
> > > So I'm not sure why 3.2 not having support for u'' should be a real
> > > blocker for the change.
> >
> > If this argument was valid, people wouldn't be so worried about
> > maintaining 2.5 compatibility in their libraries. Consider if I tried
> > to make this argument to justify everyone dropping 2.5 and earlier
> > support today:
> >
> > """On the consumer side, folks who want to run 2.6+ codebases on older
> > Linux distros have the wherewithal to compile their own more recent
> > Python 2 (or use a PPA or
> > equivalent) until they can move to a more recent version of their
> distro."""
>
> Fair point.
>
> That said, personally, I have given up entirely on Python 2.4 and 2.5
> support for newer versions of my OSS libraries.  I continue to backport
> fixes and (some) features to older library versions so folks can run
> those on systems that require older Pythons.  I gave up 2.5 support
> fairly recently across everything new, and I gave up support for 2.4 a
> year ago or more in new releases with the same intent.
>
> In reality, there is only one major platform that requires 2.4: RHEL 5
> and folks who use it will just need to also use old versions of popular
> libraries; trying to support it for all future feature work until it's
> EOLed is not sane unless someone pays for it.  Python 2.5 has slightly
> more compelling platforms (GAE and Jython), but GAE is moving to Python
> 2.7 and Jython is a bit moribund these days and is not really popular
> enough that a critical mass of folks will clamor for new-and-shiny
> releases that run on it.
>
> The upshot is that most newly created code only needs to run on Python
> 2.6 and *some* version of Python 3.  And being able to eventually write
> that code in a nonsucky subset of Python 2/3 is important to me, because
> I'm going to be developing software in that subset for many years (way
> past the timeframe we're talking about in which Python 3.2 will rule the
> roost).
>
> > It's simply not true in the general case - people don't maintain 2.4+
> > compatibility for fun, they do it because RHEL5 (and CentOS 5, etc)
> > are still reasonably common and ship with 2.4 as the system Python. As
> > soon as you switch away from the system provided Python, you're
> > switching away from the vendors entire pre-packaged Python *stack*,
> > not just the interpreter itself. You then have to install (and
> > generally build) *everything* for yourself. While that is certainly
> > possible these days (and a lot simpler than it used to be), it's still
> > not trivial [1].
> >
> > Since 3.2 is already quite usable for applications that aren't
> > fighting with the "native strings" problem (which seems to be the
> > common thread running through the complaints I've heard from web
> > framework authors), and with it being included in at least the next
> > Ubuntu LTS, current versions of Fedora, Arch, etc, it's going to be
> > around for a long time. Ignoring 3.1 is a reasonable option. Ignoring
> > 3.2 entirely is unlikely to be viable for anyone that is interested in
> > supporting 3.x within the next couple of years - the 3.3 release is at
> > least 9 months away, and it's also going to take a while for it to
> > make its way into distros after the final release gets published on
> > python.org.
> >
> > Hence my suggestion: perhaps the problem is the fact that PEP 3.3/WSGI
> > 1.0.1 introduced the "native string" concept as a minimalist hack to
> > try to get a usable gateway interface in Python 3, and that just
> > doesn't work in practice when attempting to straddle 2.x and 3.x
> > (because the values WSGI is dealing with aren't really text, they're
> > bytes, only *some* of which represent text). Perhaps a PEP 444 based
> > model would be less painful and more coherent in the long run?
>
> Possibly.  I was the original author of PEP 444 with help from Armin.
> (although it has since been taken up by Alice and I do not support the
> updates it has received since then).
>
> A bytes-oriented WSGI-like protocol was always the saner option.  The
> native string idea optimized in exactly the wrong place, which was to
> make it easy to write WSGI middleware, where you're required to do lots
> of textlike manipulation of header values.  The idea of using bytes in
> places where PEP 3333 now mandates native strings was rejected because
> people were (somewhat justifiably) horrified at what they had to do in
> order to attempt treat bytes like strings in this context on Python 3 at
> the time.  It has gotten better, but maybe still not better enough to
> appease the folks who blocked the idea originally.
>
> But all of that is just arguing with the umpire at this point.
> Promoting and getting consensus about a different protocol will hurt a
> lot.  PEP 3333 was borne of months of intense periods of arguing and
> compromise.  It is the way it is now because everyone was too exhausted
> to argue about it any more.  I don't think that has changed much since
> it was accepted, and asking folks to go back to that particular drawing
> board is unlikely to have promising results.  Folks have already spent
> many hours, and lots of money on implementations that the current PEP.
> They may hunt us down and murder us one by one. ;-)  PEP 3333, to its
> credit, is also remarkably backwards compatible with PEP 333, requiring
> very little change in existing Python 2 WSGI implementations, which
> helps Python 2 folks a lot.
>
> Given an effective choice between enabling six lines of code in Python
> 3.3 to support u'' and months of political wrangling and code rewriting,
> I'll choose the former any day.  If we were talking about a change to
> Python that actually required nontrivial effort, had some sort of
> nominal consequence, or had some sort of non-theoretical downside, I'd
> be a lot less sanguine about it.  But this is just a no-brainer in the
> long term, AFAICT.
>
> - C
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111209/7e60efba/attachment.html>

From martin at v.loewis.de  Fri Dec  9 09:20:42 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 09 Dec 2011 09:20:42 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <20111208223408.0e2e8bd1@limelight.wooz.org>
References: <1323320919.2710.24.camel@thinko>
	<5242067.5aBSYdFaIB@einstein>	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>	<3344831.JP9Cfj4Ety@einstein>	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>	<4EE12BAA.1050601@v.loewis.de>	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>	<jbrq67$28o$1@dough.gmane.org>	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
Message-ID: <4EE1C4DA.9060809@v.loewis.de>

> Sorry, I don't understand this.  What does it mean to be "str in both
> versions"?  And why would you want that?

One use case (and the only one I'm aware of) is to pass keyword
parameters. Python 2 insists that they are str (and doesn't accept
unicode), Python 3 insists that they are str (and doesn't accept bytes).

This is fairly uncommon as a problem, though, and is also solved in
Python 2.6, which does accept Unicode strings as keyword parameter
names.

Regards,
Martin

From martin at v.loewis.de  Fri Dec  9 09:25:08 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 09 Dec 2011 09:25:08 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323405204.2710.139.camel@thinko>
References: <1323320919.2710.24.camel@thinko>
	<5242067.5aBSYdFaIB@einstein>	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>	<3344831.JP9Cfj4Ety@einstein>	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>	<4EE12BAA.1050601@v.loewis.de>	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>	<jbrq67$28o$1@dough.gmane.org>	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<1323405204.2710.139.camel@thinko>
Message-ID: <4EE1C5E4.6090602@v.loewis.de>

 > They fail to hear the "again" in that sentence.  I've clearly already
> thought about the distinction between bytes and text at least once:
> that's *why* I'm using a u'' literal there.  I shouldn't have to think
> about it again to service syntax constraints.  Code that is more
> explicit than strictly necessary should not be needlessly punished.

But you don't have to think about this *again*, in none of the proposed
alternatives (whether you use a u() function, whether you use the future
import, or whether you use 2to3). They differ only (slightly) in how
you spell Unicode literals, but all provide for explicit spelling of
Unicode literals when applied.

Regards,
Martin


From martin at v.loewis.de  Fri Dec  9 09:32:03 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 09 Dec 2011 09:32:03 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7deGnPd3M=Zao_6qThSvL2YGYfTOwCLephDHp0z0cKfNQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>
	<5242067.5aBSYdFaIB@einstein>	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>	<3344831.JP9Cfj4Ety@einstein>	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>	<4EE12BAA.1050601@v.loewis.de>	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>	<jbrq67$28o$1@dough.gmane.org>	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>	<1323405204.2710.139.camel@thinko>
	<CADiSq7deGnPd3M=Zao_6qThSvL2YGYfTOwCLephDHp0z0cKfNQ@mail.gmail.com>
Message-ID: <4EE1C783.8050306@v.loewis.de>

> Or, alternatively, you use 'six' (or a similar compatibility module)
> and ensure unicode at runtime, using native or binary strings
> otherwise:
> 
> ----------
> from six import u
> foo = u("this is a Unicode string in both Python 2.x and 3.x")
> bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x"
> baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
> ----------

An alternative here is to use a function for bar, not foo:

from __future__ import unicode_literals
from six.next import native_str
foo = "this is a Unicode string in both Python 2.x and 3.x"
bar = native_str("this is an 7-bit string in Python 2.x"
                 " and a Unicode string in 3.x")
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"

Which of them is "better" depends on which of the two string types are
more common.

Regards,
Martin

From martin at v.loewis.de  Fri Dec  9 09:41:15 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 09 Dec 2011 09:41:15 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jbsfar$en7$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>
Message-ID: <4EE1C9AB.2040301@v.loewis.de>

> a) The stdlib documentation should help users to choose the right tool
> right from the start. Instead of using the totally misleading wording
> that it uses now, it should be honest about the performance
> characteristics of MiniDOM and should actively suggest that those who
> don't know what to choose (or even *that* they can choose) should not
> use MiniDOM in the first place.

I disagree. The right approach is not to document performance problems,
but to fix them.

> b) cElementTree should finally loose it's "special" status as a separate
> library and disappear as an accelerator module behind ElementTree. This
> has been suggested a couple of times already, and AFAIR, there was some
> opposition because 1) ET was maintained outside of the stdlib and 2) the
> APIs of both were not identical. However, getting ET 1.3 into Py2.7 and
> 3.2 was a U-turn.

Unfortunately (?), there is a near-contract-like agreement with Fredrik
Lundh that any significant changes to ElementTree in the standard
library have to be agreed by him. So whatever change you plan: make sure
Fredrik gives his explicit support.

Regards,
Martin

From martin at v.loewis.de  Fri Dec  9 09:44:13 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 09 Dec 2011 09:44:13 +0100
Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and
	PyUnicode_EncodeCodePage()
In-Reply-To: <20111209013535.6fb38068@pitrou.net>
References: <E1RYnC6-0005HV-Ep@dinsdale.python.org>
	<20111209013535.6fb38068@pitrou.net>
Message-ID: <4EE1CA5D.70705@v.loewis.de>

Am 09.12.2011 01:35, schrieb Antoine Pitrou:
> On Fri, 09 Dec 2011 00:16:02 +0100
> victor.stinner <python-checkins at python.org> wrote:
>>  
>> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
>> +
>> +   Get a new copy of a Unicode object.
>> +
>> +   .. versionadded:: 3.3
> 
> I'm not sure I understand. Why would you make a copy of an immutable
> object?

It can convert a unicode subtype object into a an exact unicode
object.

I'd rename it to _PyUnicode_AsExactUnicode, and undocument it.

Regards,
Martin

From stefan_ml at behnel.de  Fri Dec  9 09:59:24 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 09 Dec 2011 09:59:24 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4EE1C9AB.2040301@v.loewis.de>
References: <jbsfar$en7$1@dough.gmane.org> <4EE1C9AB.2040301@v.loewis.de>
Message-ID: <jbsile$4vu$1@dough.gmane.org>

"Martin v. L?wis", 09.12.2011 09:41:
>> a) The stdlib documentation should help users to choose the right tool
>> right from the start. Instead of using the totally misleading wording
>> that it uses now, it should be honest about the performance
>> characteristics of MiniDOM and should actively suggest that those who
>> don't know what to choose (or even *that* they can choose) should not
>> use MiniDOM in the first place.
>
> I disagree. The right approach is not to document performance problems,
> but to fix them.

Here's the relevant part of my mail that you stripped:

>> It's also badly maintained in the sense that its performance
>> characteristics could likely be improved, but no-one is seriously
>> interested in doing that, because it would not lead to something that
>> actually *is* fast or memory friendly compared to any of the 'real'
>> alternatives that are available right now.

I can't recall anyone working on any substantial improvements during the 
last six years or so, and the reason for that seems obvious to me.


>> b) cElementTree should finally loose it's "special" status as a separate
>> library and disappear as an accelerator module behind ElementTree. This
>> has been suggested a couple of times already, and AFAIR, there was some
>> opposition because 1) ET was maintained outside of the stdlib and 2) the
>> APIs of both were not identical. However, getting ET 1.3 into Py2.7 and
>> 3.2 was a U-turn.
>
> Unfortunately (?), there is a near-contract-like agreement with Fredrik
> Lundh that any significant changes to ElementTree in the standard
> library have to be agreed by him. So whatever change you plan: make sure
> Fredrik gives his explicit support.

Ok, I'll try to contact him.

Stefan


From solipsis at pitrou.net  Fri Dec  9 09:54:15 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 9 Dec 2011 09:54:15 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<1323405204.2710.139.camel@thinko>
	<CADiSq7deGnPd3M=Zao_6qThSvL2YGYfTOwCLephDHp0z0cKfNQ@mail.gmail.com>
Message-ID: <20111209095415.1ae242d0@pitrou.net>

On Fri, 9 Dec 2011 15:30:36 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> Or, alternatively, you use 'six' (or a similar compatibility module)
> and ensure unicode at runtime, using native or binary strings
> otherwise:
> 
> ----------
> from six import u
> foo = u("this is a Unicode string in both Python 2.x and 3.x")
> bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x"
> baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
> ----------
> 
> If you want to target 3.2, you *have* to use one of those mechanisms -
> any potential restoration of u'' syntax support won't help you (and
> even after 3.3 gets released in the latter half of next year, it's
> still going to be a fair while before it makes it's way into the
> various distros, especially the ones that include long term support
> from major vendors).
> 
> So, instead of attempting to paper over the problem by reintroducing
> u'', perhaps the discussion we should be having is whether or not PEP
> 3333's superficially appealing concept of defining an API in terms of
> "native strings" is a loser in practice, and we should instead be
> looking more closely at PEP 444

It's not only PEP 3333. Many network protocol implementations will
show the same characteristics (an FTP implementation accepting str in
2.x will also want to accept str in 3.x). But using six is a reasonable
suggestion for those who want to share a single codebase accross 2.x
and 3.x.

Regards

Antoine.



From python-dev at masklinn.net  Fri Dec  9 10:09:39 2011
From: python-dev at masklinn.net (Xavier Morel)
Date: Fri, 9 Dec 2011 10:09:39 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4EE1C9AB.2040301@v.loewis.de>
References: <jbsfar$en7$1@dough.gmane.org> <4EE1C9AB.2040301@v.loewis.de>
Message-ID: <E8A216E9-6AC9-4307-8250-B9B185604B82@masklinn.net>

On 2011-12-09, at 09:41 , Martin v. L?wis wrote:
>> a) The stdlib documentation should help users to choose the right tool
>> right from the start. Instead of using the totally misleading wording
>> that it uses now, it should be honest about the performance
>> characteristics of MiniDOM and should actively suggest that those who
>> don't know what to choose (or even *that* they can choose) should not
>> use MiniDOM in the first place.
> 
> I disagree. The right approach is not to document performance problems,
> but to fix them.
Even if performance problems "should not be documented", I think Stefan's point that users should be steered away from minidom and towards ET and cET is completely valid and worthy of support: the *only* advantage minidom has over ET is that it uses an interface familiar to Java users[0] (they are about the only people using actual W3C DOM, while the DOM exists in javascript I'd say most code out there actively tries to not touch it with anything less than a 10-foot library pole like jQuery). That interface is also, of course, absolutely dreadful.

Minidom is inferior in interface flow and pythonicity, in terseness, in speed, in memory consumption (even more so using cElementTree, and that's not something which can be fixed unless minidom gets a C accelerator), etc? Even after fixing minidom (if anybody has the time and drive to commit to it), ET/cET should be preferred over it.

And that's not even considering the ease of switching to lxml (if only for validators), which Stefan outlined.

[0] not 100% true now that I think about it: handling mixed content is simpler in minidom as there is no .text/.tail duality and text nodes are nodes like every other, but I really can't think of an other reason to prefer minidom

From ncoghlan at gmail.com  Fri Dec  9 10:10:07 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 9 Dec 2011 19:10:07 +1000
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4EE1C9AB.2040301@v.loewis.de>
References: <jbsfar$en7$1@dough.gmane.org>
	<4EE1C9AB.2040301@v.loewis.de>
Message-ID: <CADiSq7etPwJh4sz+k1AnYm0iZFDP4FtSvb9dkK+D64Zs48h00w@mail.gmail.com>

On Fri, Dec 9, 2011 at 6:41 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> a) The stdlib documentation should help users to choose the right tool
>> right from the start. Instead of using the totally misleading wording
>> that it uses now, it should be honest about the performance
>> characteristics of MiniDOM and should actively suggest that those who
>> don't know what to choose (or even *that* they can choose) should not
>> use MiniDOM in the first place.
>
> I disagree. The right approach is not to document performance problems,
> but to fix them.

When we offer a better way to do something that new users are want to
do, we generally redirect them to the more recent alternative. I
believe the redirection from the getopt module to the argparse module
strikes the right tone for that kind of thing:
http://docs.python.org/library/getopt

For the various XML libraries, a message along the lines of "Note: The
<whatever> module is a <yada, yada, DOM based, whatever>. If all you
are trying to do is read and write XML files, consider using the
xml.etree.ElementTree module instead".

I'd also be +1 on adjusting the order of the XML pages in the main
index such that xml.etree.ElementTree appeared before xml.parser.expat
and all the others slid down one entry.

These are simple changes that don't harm current users of the modules
in the least, while being up front and very helpful for beginners.
Again, I think argparse vs getopt is a good comparison: argparse
appears first in the main index, and there's a redirection from getopt
to argparse that says "if you don't have a specific reason to be using
getopt, you probably want argparse instead".

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Fri Dec  9 10:12:50 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 9 Dec 2011 19:12:50 +1000
Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and
	PyUnicode_EncodeCodePage()
In-Reply-To: <4EE1CA5D.70705@v.loewis.de>
References: <E1RYnC6-0005HV-Ep@dinsdale.python.org>
	<20111209013535.6fb38068@pitrou.net> <4EE1CA5D.70705@v.loewis.de>
Message-ID: <CADiSq7fMeU+8L95ziXepBbA1bQ98Sut-3_Uzz6GT9mvn1symdw@mail.gmail.com>

On Fri, Dec 9, 2011 at 6:44 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Am 09.12.2011 01:35, schrieb Antoine Pitrou:
>> On Fri, 09 Dec 2011 00:16:02 +0100
>> victor.stinner <python-checkins at python.org> wrote:
>>>
>>> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
>>> +
>>> + ? Get a new copy of a Unicode object.
>>> +
>>> + ? .. versionadded:: 3.3
>>
>> I'm not sure I understand. Why would you make a copy of an immutable
>> object?
>
> It can convert a unicode subtype object into a an exact unicode
> object.
>
> I'd rename it to _PyUnicode_AsExactUnicode, and undocument it.

Isn't it basically just exposing a C level version of the unicode()
builtin's behaviour? While I agree the name could be better (and
PyUnicode_AsExactUnicode would certainly work), why make it private?

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From solipsis at pitrou.net  Fri Dec  9 10:15:17 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 9 Dec 2011 10:15:17 +0100
Subject: [Python-Dev] Fixing the XML batteries
References: <jbsfar$en7$1@dough.gmane.org>
Message-ID: <20111209101517.47e03eae@pitrou.net>



Mostly uninformed +1 to Stefan's suggestions from me.

Regards

Antoine.


On Fri, 09 Dec 2011 09:02:35 +0100
Stefan Behnel <stefan_ml at behnel.de> wrote:
> Hi everyone,
> 
> I think Py3.3 would be a good milestone for cleaning up the stdlib support 
> for XML. Note upfront: you may or may not know me as the maintainer of 
> lxml, the de-facto non-stdlib standard Python XML tool. This (lengthy) post 
> was triggered by the following kind of conversation that I keep having with 
> new XML users in Python (mostly on c.l.py), which hints at some serious 
> flaw in the stdlib.
[etc.]



From tjreedy at udel.edu  Fri Dec  9 11:03:41 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 09 Dec 2011 05:03:41 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <loom.20111209T022519-121@post.gmane.org>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
	<jbrllu$99u$1@dough.gmane.org>
	<loom.20111209T022519-121@post.gmane.org>
Message-ID: <jbsme9$t5d$1@dough.gmane.org>

On 12/8/2011 8:39 PM, Vinay Sajip wrote:
> It's not the speed of 2to3 per se; this seems very reasonable for a
 > tool of its type > It's the overall process, which currently involves 
running 2to3
 > on an
> entire codebase (for example, using setup.py with flags to run 2to3
> during setup).

Oh. That explains the 'slow' complaint.

> However, 2to3 tools could be developed which are based on
> 2to3/lib2to3 and are *incremental* in nature; then as you edit and
> save a file, its processed version could be available very shortly
> afterwards (since we only need to translate the file that was saved)

I had assumed that people were aleady running 2to3 on a per edited file 
basis already. On a multi-core machine, I would think it possible to run 
2to3 and then a test on the result in a separate process while tests are 
running on the 2.x version.

-- 
Terry Jan Reedy


From ncoghlan at gmail.com  Fri Dec  9 11:17:29 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 9 Dec 2011 20:17:29 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <jbsme9$t5d$1@dough.gmane.org>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
	<jbrllu$99u$1@dough.gmane.org>
	<loom.20111209T022519-121@post.gmane.org>
	<jbsme9$t5d$1@dough.gmane.org>
Message-ID: <CADiSq7eh--b4Moe1XHnXnH1tHJNa51HHp-18hpcsWcGg_dTYyQ@mail.gmail.com>

On Fri, Dec 9, 2011 at 8:03 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 12/8/2011 8:39 PM, Vinay Sajip wrote:
>> on an
>>
>> entire codebase (for example, using setup.py with flags to run 2to3
>> during setup).
>
>
> Oh. That explains the 'slow' complaint.

As Chris pointed out though, the real problem with the "repeatedly run
2to3" workflow is that it can make interpreting tracebacks from the
field *really* hard. That's where Antoine's suggested approach may be
better - use 2to3 to do the initial mechanical update in a new branch,
then subsequently use a process similar to what we do ourselves for
the standard library (i.e. update the 2.x and 3.x versions in
parallel, perhaps using 2to3 on a few files if they have changed
substantially in a particular patch).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From solipsis at pitrou.net  Fri Dec  9 11:35:35 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 9 Dec 2011 11:35:35 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
	<CADiSq7cbZfkAOU4+1V+WmThMFyUAeMSQrOVw-TLePz=fQnXiog@mail.gmail.com>
Message-ID: <20111209113535.2e9d2d1b@pitrou.net>

On Fri, 9 Dec 2011 15:41:40 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Fri, Dec 9, 2011 at 3:33 PM, Chris McDonough <chrism at plope.com> wrote:
> > Even if it weren't slow, I still wouldn't use it to automatically
> > convert code at install time; a single codebase is easier to reason
> > about, and easier to support. ?Users send me tracebacks all the time;
> > having them match the source is a wonderful thing.
> 
> Yeah, if single source doesn't work, then I think Antoine's suggested
> way (i.e. convert once, then maintain two distinct branches and
> builds, the way python-dev did for years with the standard library) is
> a more sane option.

My suggestion is actually to convert each time you pull changes from
the 2.x sources. You have three branches:

- the default 2.x branch
- a branch containing changesets which are pristine 2to3 runs over the
  2.x codebase
- a branch containing the modified 3.x code

The 2to3 branch can be updated through an automatic script. Each
changeset should be a child of both the previous 2to3 changeset, and
the 2.x changeset which 2to3 has been run on (in other words, each
changeset - except the first one - is a merge).

Then the changes from the 2to3 branch are simply merged to the 3.x
branch. This is the only manual step, in that you have to fix
potential conflicts and regressions.

(I suppose the strategy can be reversed, i.e. maintain code primarily
in the 3.x branch and use 3to2 to backport them to the 2.x codebase)

Regards

Antoine.



From regebro at gmail.com  Fri Dec  9 15:11:17 2011
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 9 Dec 2011 15:11:17 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAP7+vJKJ5VrPj1hR0TOFmCUU6hBz5tD2KZTQcqBC1pYkhBUA4w@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<CAP7+vJKJ5VrPj1hR0TOFmCUU6hBz5tD2KZTQcqBC1pYkhBUA4w@mail.gmail.com>
Message-ID: <CAL0kPAVtCYV7LMo=QSH-SFKxAuJWFf1benGWRgAjMB8ikG60nw@mail.gmail.com>

On Fri, Dec 9, 2011 at 03:53, Guido van Rossum <guido at python.org> wrote:
> Are you saying that with that future import, b"..." is still a Unicode
> literal?

If I said that, this is not what I was trying to say. :-)

//Lennart

From stephen at xemacs.org  Fri Dec  9 15:14:08 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 09 Dec 2011 23:14:08 +0900
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7deGnPd3M=Zao_6qThSvL2YGYfTOwCLephDHp0z0cKfNQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<1323405204.2710.139.camel@thinko>
	<CADiSq7deGnPd3M=Zao_6qThSvL2YGYfTOwCLephDHp0z0cKfNQ@mail.gmail.com>
Message-ID: <87ty59zvbj.fsf@uwakimon.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > So, instead of attempting to paper over the problem by reintroducing
 > u'', perhaps the discussion we should be having is whether or not PEP
 > 3333's superficially appealing concept of defining an API in terms of
 > "native strings" is a loser in practice,

+1

to that discussion.  str is a different type in the two
implementations, binary sludge with essentially undefined semantics in
Python 2 and highly standardized text in Python 3.  I don't understand
how this can be expected to work well, and especially not in a code
base that is trying to be portable across Python 2 and 3.

I sympathize with Chris's complaint that he has to think about it
"again", but in fact, it seems to me that may not be entirely true.
AFAICS, having the WSGI APIs mask the difference between str and bytes
(or unicode and str, depending on where you're working) requires that
you think about it every time you pass something to a WSGI API.

I could be wrong, of course (I don't do WSGI stuff, which is why I'm
really surprised to hear this, and so I don't know the rationale for
the WSGI API decision), but this description of the API just triggers
all my alarms.

I am somewhat sympathetic to the request for reintroduction of u'' (in
my personal use it would just be cruft, so I'm -0.1 on that ground),
but I can't see how the WSGI API is an argument for it.  Making that
case requires showing that the "native string" API makes pragmatic
sense, and then that u'' can help.


From regebro at gmail.com  Fri Dec  9 15:18:33 2011
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 9 Dec 2011 15:18:33 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <20111208223408.0e2e8bd1@limelight.wooz.org>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
Message-ID: <CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>

On Fri, Dec 9, 2011 at 04:34, Barry Warsaw <barry at python.org> wrote:
> Sorry, I don't understand this. ?What does it mean to be "str in both
> versions"? ?And why would you want that?

It means that it's a str, that is a string of bytes, in Python 2, and
a str, that is a string of Unicode characters, in Python 3. There are
cases where you want this, for example not all libraries will accept
both str and Unicode under Python 2.

> As for "Unicode in Python 2 and str in Python 3", unadorned strings with the
> future import in Python >= 2.6 does that just fine.

Yes, but the future import will change this for *all* strings, making
it impossible to have a string that is a "str" in both Python 2 and
Python 3. For that reason, the future import is not enough as a
solution (and I suspect, one major reason why I haven't actually seen
any one using it).

For most cases, using something like six's b() and u() has turned out
to be a better solution. It's uglier than having u'' support in Python
3, but has the benefit that b() works also in Python 2.5.

>?The
> problem comes when you aren't or can't be sure, i.e. you have objects that are
> sometimes one and sometimes the other. ?Such as email headers. ?In that case,
> you're kind of screwed. ?Python 2's str type let you cheat, but not without
> consequences. ?Those consequences are spelled "UnicodeErrors" and I'll be glad
> to be rid of them.

Me too.

From stephen at xemacs.org  Fri Dec  9 15:27:56 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 09 Dec 2011 23:27:56 +0900
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323416285.2710.219.camel@thinko>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
	<CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
	<1323410470.2710.158.camel@thinko>
	<CADiSq7cwER0Hkkv2xGh_4p_1aZ3DBqFkkUt8Yk_fwxxDGp3NNA@mail.gmail.com>
	<1323416285.2710.219.camel@thinko>
Message-ID: <87sjktzuoj.fsf@uwakimon.sk.tsukuba.ac.jp>

Chris McDonough writes:

 > Given an effective choice between enabling six lines of code in Python
 > 3.3 to support u'' and months of political wrangling and code rewriting,
 > I'll choose the former any day.

Sure, but the real question is whether that *is* the effective choice.
Maybe the effective choice is between enabling six lines of code in
Python 3.3 to support u'' and not doing so, with both options
eventually entailing months of political wrangling and code rewriting
because it doesn't help with the underlying problems.


From fuzzyman at voidspace.org.uk  Fri Dec  9 15:42:43 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Fri, 9 Dec 2011 14:42:43 +0000
Subject: [Python-Dev] Unicode re support in Python 3
Message-ID: <D1AFFF88-54C6-4B91-A635-53850A1CA7FE@voidspace.org.uk>

Hey python-devers,

As I'm sure many of you are aware, Armin Ronacher posted a blog entry explaining the reasons he dislikes Python 3 in its current form. 

Whilst I don't agree with all of his complaints, he makes a fair point about the re module Unicode support. It seems that the specific issue he has could be fixed by accepting the re module improvement / overhaul implemented by mrab:

	http://bugs.python.org/issue2636

As it comes with an active maintainer, and is a big step forward for Python regex support, I'd like to see it in Python 3.3. Reading through the issue it's not clear to me what needs to be done for it to be accepted (or rejected), beyond a general "it's a big change". 

All the best,

Michael Foord

--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html






From regebro at gmail.com  Fri Dec  9 15:45:39 2011
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 9 Dec 2011 15:45:39 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
Message-ID: <CAL0kPAWOzQ80HjfpkRTL-DH1jamm364FGJ4Ap5FfPAf75jGkyQ@mail.gmail.com>

Slightly OT:

The slowness of running 2to3 during install time can be fixed by not
doing so, but instead running it when the distribution is created,
including both Python 2 and Python 3 code in the distribution.

http://python3porting.com/2to3.html#distribution-section

There are no tools that support this at the moment though. I guess it
would be cool if Distribute supported making these kinds of
distributions...

//Lennart

From solipsis at pitrou.net  Fri Dec  9 15:43:49 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 9 Dec 2011 15:43:49 +0100
Subject: [Python-Dev] Unicode re support in Python 3
References: <D1AFFF88-54C6-4B91-A635-53850A1CA7FE@voidspace.org.uk>
Message-ID: <20111209154349.47eb6dcc@pitrou.net>

On Fri, 9 Dec 2011 14:42:43 +0000
Michael Foord <fuzzyman at voidspace.org.uk> wrote:
> 
> Whilst I don't agree with all of his complaints, he makes a fair point about the re module Unicode support. It seems that the specific issue he has could be fixed by accepting the re module improvement / overhaul implemented by mrab:
> 
> 	http://bugs.python.org/issue2636
> 
> As it comes with an active maintainer, and is a big step forward for Python regex support, I'd like to see it in Python 3.3. Reading through the issue it's not clear to me what needs to be done for it to be accepted (or rejected), beyond a general "it's a big change".

Reviewing. Do you volunteer?

Regards

Antoine.



From dirkjan at ochtman.nl  Fri Dec  9 16:09:51 2011
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Fri, 9 Dec 2011 16:09:51 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jbsfar$en7$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>
Message-ID: <CAKmKYaBid4c8Y0pe7txxZMk9+0WN8Hr5ZodS=HP05MdV-ysPhQ@mail.gmail.com>

On Fri, Dec 9, 2011 at 09:02, Stefan Behnel <stefan_ml at behnel.de> wrote:
> a) The stdlib documentation should help users to choose the right tool right
> from the start.
> b) cElementTree should finally loose it's "special" status as a separate
> library and disappear as an accelerator module behind ElementTree.

An at least somewhat informed +1 from me. The ElementTree API is a
very good way to deal with XML from Python, and it deserves to be
promoted over the included alternatives.

Let's deprecate the NiCad batteries and try to guide users toward the
Li-Ion ones.

Cheers,

Dirkjan

From barry at python.org  Fri Dec  9 16:11:23 2011
From: barry at python.org (Barry Warsaw)
Date: Fri, 9 Dec 2011 10:11:23 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
Message-ID: <20111209101123.01e92326@limelight.wooz.org>

On Dec 09, 2011, at 03:18 PM, Lennart Regebro wrote:

>On Fri, Dec 9, 2011 at 04:34, Barry Warsaw <barry at python.org> wrote:
>> Sorry, I don't understand this. ?What does it mean to be "str in both
>> versions"? ?And why would you want that?
>
>It means that it's a str, that is a string of bytes, in Python 2, and
>a str, that is a string of Unicode characters, in Python 3. There are
>cases where you want this, for example not all libraries will accept
>both str and Unicode under Python 2.

As Chris points out, this seems to be a use case tied to WSGI and PEP 3333.  I
guess it's an unfortunate choice for so recent a PEP, but maybe there was no
way to do better.  Still, it seems the "native string" discussion is an
indication that the PEP is introducing a binary vs. text ambiguity when
switching Python versions.  My previous "you're screwed" comment comes back to
mind. ;)

>> As for "Unicode in Python 2 and str in Python 3", unadorned strings with the
>> future import in Python >= 2.6 does that just fine.
>
>Yes, but the future import will change this for *all* strings, making
>it impossible to have a string that is a "str" in both Python 2 and
>Python 3. For that reason, the future import is not enough as a
>solution (and I suspect, one major reason why I haven't actually seen
>any one using it).

It can certainly be useful in many contexts outside of WSGI.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111209/d8662610/attachment.pgp>

From barry at python.org  Fri Dec  9 16:13:17 2011
From: barry at python.org (Barry Warsaw)
Date: Fri, 9 Dec 2011 10:13:17 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE1C4DA.9060809@v.loewis.de>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<4EE1C4DA.9060809@v.loewis.de>
Message-ID: <20111209101317.0f7f6db7@limelight.wooz.org>

On Dec 09, 2011, at 09:20 AM, Martin v. L?wis wrote:

>One use case (and the only one I'm aware of) is to pass keyword
>parameters. Python 2 insists that they are str (and doesn't accept
>unicode), Python 3 insists that they are str (and doesn't accept bytes).
>
>This is fairly uncommon as a problem, though, and is also solved in
>Python 2.6, which does accept Unicode strings as keyword parameter
>names.

Oh, I remember this one, because I think I reported and fixed it.  But I take
it as a given that Python 2.6 is the minimal (sane) version to target for
one-codebase cross-Python code.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111209/17fdcef2/attachment.pgp>

From barry at python.org  Fri Dec  9 16:23:56 2011
From: barry at python.org (Barry Warsaw)
Date: Fri, 9 Dec 2011 10:23:56 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7c=Q-aqn3MKiDBiRBOZVZ0hJ1cGg2TrasxuB=74Ak1KkQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
	<CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
	<1323410470.2710.158.camel@thinko>
	<CADiSq7cwER0Hkkv2xGh_4p_1aZ3DBqFkkUt8Yk_fwxxDGp3NNA@mail.gmail.com>
	<1323416285.2710.219.camel@thinko>
	<CADiSq7c=Q-aqn3MKiDBiRBOZVZ0hJ1cGg2TrasxuB=74Ak1KkQ@mail.gmail.com>
Message-ID: <20111209102356.4ec6c646@limelight.wooz.org>

On Dec 09, 2011, at 06:09 PM, Nick Coghlan wrote:

>Given that WSGI 1.0.1 is defined in terms of native strings and restoring
>u'' support allows that to be expressed clearly in a shared codebase, I at
>least understand the point of the suggestion now. I'm not quite convinced
>restoring u'' is the right answer as yet, but a solid use case is always a
>nice place to start :)

Maybe a more interesting approach would be to expand on the `six` idea and
bring some of those concepts into the stdlib for 3.3.  You could implement the
u() function somewhat more efficiently in an extension module, and make that
available for older Pythons via the Cheeseshop.  I now also have a few more
Python and C level compatibility hacks that could make it into such a module.

-Barry

From fuzzyman at voidspace.org.uk  Fri Dec  9 16:35:05 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Fri, 9 Dec 2011 15:35:05 +0000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <20111209101317.0f7f6db7@limelight.wooz.org>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<4EE1C4DA.9060809@v.loewis.de>
	<20111209101317.0f7f6db7@limelight.wooz.org>
Message-ID: <2DBA4A01-EA18-4F33-908F-D271633AB52D@voidspace.org.uk>


On 9 Dec 2011, at 15:13, Barry Warsaw wrote:

> On Dec 09, 2011, at 09:20 AM, Martin v. L?wis wrote:
> 
>> One use case (and the only one I'm aware of) is to pass keyword
>> parameters. Python 2 insists that they are str (and doesn't accept
>> unicode), Python 3 insists that they are str (and doesn't accept bytes).
>> 
>> This is fairly uncommon as a problem, though, and is also solved in
>> Python 2.6, which does accept Unicode strings as keyword parameter
>> names.
> 
> Oh, I remember this one, because I think I reported and fixed it.  But I take
> it as a given that Python 2.6 is the minimal (sane) version to target for
> one-codebase cross-Python code.
> 

In mock (at least 5000 lines of code including tests) I target 2.4 -> 3.2+. Admittedly mock does little I/O but does some fairly crazy introspection (and even found bugs in Python 3 because of it).

The exception handling is the worst - no compatible syntax between 2.4-5 and Python 3. So you have to use sys.exc_info. Other than that it isn't too hard / bad.

All the best,

Michael

> -Barry
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk


--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html






From barry at python.org  Fri Dec  9 16:42:51 2011
From: barry at python.org (Barry Warsaw)
Date: Fri, 9 Dec 2011 10:42:51 -0500
Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and
 PyUnicode_EncodeCodePage()
In-Reply-To: <CADiSq7fMeU+8L95ziXepBbA1bQ98Sut-3_Uzz6GT9mvn1symdw@mail.gmail.com>
References: <E1RYnC6-0005HV-Ep@dinsdale.python.org>
	<20111209013535.6fb38068@pitrou.net> <4EE1CA5D.70705@v.loewis.de>
	<CADiSq7fMeU+8L95ziXepBbA1bQ98Sut-3_Uzz6GT9mvn1symdw@mail.gmail.com>
Message-ID: <20111209104251.072d9766@limelight.wooz.org>

On Dec 09, 2011, at 07:12 PM, Nick Coghlan wrote:

>Isn't it basically just exposing a C level version of the unicode()
>builtin's behaviour? While I agree the name could be better (and
>PyUnicode_AsExactUnicode would certainly work), why make it private?

Don't we already have that in PyObject_Str(), or in Python 2,
PyObject_Unicode()?

-Barry

From carl at oddbird.net  Fri Dec  9 17:34:47 2011
From: carl at oddbird.net (Carl Meyer)
Date: Fri, 09 Dec 2011 09:34:47 -0700
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <2DBA4A01-EA18-4F33-908F-D271633AB52D@voidspace.org.uk>
References: <1323320919.2710.24.camel@thinko>
	<5242067.5aBSYdFaIB@einstein>	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>	<3344831.JP9Cfj4Ety@einstein>	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>	<4EE12BAA.1050601@v.loewis.de>	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>	<jbrq67$28o$1@dough.gmane.org>	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>	<20111208223408.0e2e8bd1@limelight.wooz.org>	<4EE1C4DA.9060809@v.loewis.de>	<20111209101317.0f7f6db7@limelight.wooz.org>
	<2DBA4A01-EA18-4F33-908F-D271633AB52D@voidspace.org.uk>
Message-ID: <4EE238A7.9050708@oddbird.net>

On 12/09/2011 08:35 AM, Michael Foord wrote:
> On 9 Dec 2011, at 15:13, Barry Warsaw wrote:
>> Oh, I remember this one, because I think I reported and fixed it.
>> But I take it as a given that Python 2.6 is the minimal (sane)
>> version to target for one-codebase cross-Python code.
>> 
> 
> In mock (at least 5000 lines of code including tests) I target 2.4 ->
> 3.2+. Admittedly mock does little I/O but does some fairly crazy
> introspection (and even found bugs in Python 3 because of it).

pip and virtualenv also both support 2.4 - 3.2+ from a single codebase
(pip is ~7300 lines of code including tests, virtualenv ~1600). I
consider them a bit of a special case; since they are both early-stage
bootstrapping tools, the inconvenience level for users of a 2to3 step or
having to keep separate versions around would be higher than for an
ordinary library.

But I will say that the workarounds necessary to support 2.4 - 3.2 have
not really been problematic enough to tempt me towards a more complex
workflow, and I would probably take the single-codebase approach with
another port, even if I needed to support pre-2.6. The sys.exc_info()
business is ugly indeed, but (IMHO) not bad enough to warrant adding
2to3 hassles into the maintenance workflow.

Carl

From l at lrowe.co.uk  Fri Dec  9 17:36:40 2011
From: l at lrowe.co.uk (Laurence Rowe)
Date: Fri, 09 Dec 2011 17:36:40 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<1323330308.2710.52.camel@thinko>
	<loom.20111208T115357-170@post.gmane.org>
	<20111208072720.0d243557@limelight.wooz.org>
Message-ID: <op.v58drewf2y3dqy@laurence-rowes-macbook-3.local>

On Thu, 08 Dec 2011 13:27:20 +0100, Barry Warsaw <barry at python.org> wrote:

> On Dec 08, 2011, at 11:01 AM, Vinay Sajip wrote:
>
>> Well, if 3.2 remains in use for a longish time, then it is relevant, in  
>> the
>> broader context, isn't it?  We know how conservative Linux  
>> distributions can
>> be with their Python releases - although most are still releasing 2.x as
>> their system Python, this could change at some point in the future.  
>> Even if
>> it doesn't, there might be a fair user base of people stuck with 3.2  
>> for any
>> number of reasons, and to support them, the change you propose won't  
>> help,
>> because some variant of a package will still have to use u() and b(),  
>> just
>> for 3.2 support.
>
> Case in point: Ubuntu 12.04 is a long term support release, meaning 5  
> years of
> official support on both the desktop and server.  It will ship with  
> Python 2.7
> and 3.2 only.

 From a Plone perspective, Python 3 support is something that I don't see  
becoming important for maybe 5 years, so support for 3.2 is simply not an  
issue for us. Before Plone can consider a move to Python 3 we first need  
support in the libraries we depend on. For those libraries under active  
development it seems that compatibility with both 2.x and 3.x is the best  
way to go. Adding support for u'' to Python 3.x certainly looks like it  
would cut down the amount of work required for libraries like the Zope  
Toolkit which already use unicode extensively.

Laurence


From merwok at netwok.org  Fri Dec  9 17:38:56 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Fri, 09 Dec 2011 17:38:56 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <loom.20111209T022519-121@post.gmane.org>
References: <1323320919.2710.24.camel@thinko>	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>	<1323324644.2710.28.camel@thinko>	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>	<1323325916.2710.39.camel@thinko>	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>	<loom.20111208T161219-187@post.gmane.org>	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>	<jbrllu$99u$1@dough.gmane.org>
	<loom.20111209T022519-121@post.gmane.org>
Message-ID: <4EE239A0.2020004@netwok.org>

Hi,

When running 2to3 from a setup.py script, does it run on the whole
codebase or only files that are found newer by the make-like
timestamp-based dependency system?  If it?s the former, as some messages
seem to show (sorry no time to test right now), ISTM we can fix
distutils to do the latter (unless there are bugs due to import
rewriting to use explicit relative imports when there are extension
modules?blergh).

Regards

From carl at oddbird.net  Fri Dec  9 17:23:18 2011
From: carl at oddbird.net (Carl Meyer)
Date: Fri, 09 Dec 2011 09:23:18 -0700
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAL0kPAWOzQ80HjfpkRTL-DH1jamm364FGJ4Ap5FfPAf75jGkyQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>
	<5242067.5aBSYdFaIB@einstein>	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>	<3344831.JP9Cfj4Ety@einstein>	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>	<4EE12BAA.1050601@v.loewis.de>	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>	<jbrq67$28o$1@dough.gmane.org>	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>	<20111208223408.0e2e8bd1@limelight.wooz.org>	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<CAL0kPAWOzQ80HjfpkRTL-DH1jamm364FGJ4Ap5FfPAf75jGkyQ@mail.gmail.com>
Message-ID: <4EE235F6.6030001@oddbird.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/09/2011 07:45 AM, Lennart Regebro wrote:
> The slowness of running 2to3 during install time can be fixed by not
> doing so, but instead running it when the distribution is created,
> including both Python 2 and Python 3 code in the distribution.
> 
> http://python3porting.com/2to3.html#distribution-section
> 
> There are no tools that support this at the moment though. I guess it
> would be cool if Distribute supported making these kinds of
> distributions...

Doesn't just this move the problem to testing? Presumably one wants to
test that changes to the code don't break under Python 3, and ideally at
every change, not only at release time.

Carl
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7iNfYACgkQ8W4rlRKtE2dsqACffHkX7fVtCnmu8E4rdbfNdAfS
0fIAoLKzkmV3woLjXQP2sb8FcnlSgrux
=7pRs
-----END PGP SIGNATURE-----

From solipsis at pitrou.net  Fri Dec  9 17:46:31 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 9 Dec 2011 17:46:31 +0100
Subject: [Python-Dev] 2to3 and timestamps
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
	<jbrllu$99u$1@dough.gmane.org>
	<loom.20111209T022519-121@post.gmane.org>
	<4EE239A0.2020004@netwok.org>
Message-ID: <20111209174631.68a311f5@pitrou.net>

On Fri, 09 Dec 2011 17:38:56 +0100
?ric Araujo <merwok at netwok.org> wrote:
> Hi,
> 
> When running 2to3 from a setup.py script, does it run on the whole
> codebase or only files that are found newer by the make-like
> timestamp-based dependency system?  If it?s the former, as some messages
> seem to show (sorry no time to test right now), ISTM we can fix
> distutils to do the latter (unless there are bugs due to import
> rewriting to use explicit relative imports when there are extension
> modules?blergh).

It would be better to teach 2to3 to do it by itself. Not everybody runs
2to3 through a setup.py script.

Regards

Antoine.



From anacrolix at gmail.com  Fri Dec  9 18:02:26 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Sat, 10 Dec 2011 04:02:26 +1100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <CAKmKYaBid4c8Y0pe7txxZMk9+0WN8Hr5ZodS=HP05MdV-ysPhQ@mail.gmail.com>
References: <jbsfar$en7$1@dough.gmane.org>
	<CAKmKYaBid4c8Y0pe7txxZMk9+0WN8Hr5ZodS=HP05MdV-ysPhQ@mail.gmail.com>
Message-ID: <CAB4yi1NAfpKVGzKkM9a1UjhC-Y6ZJrA_P6Pxw2PH3TNfQcYxpA@mail.gmail.com>

+1

On Sat, Dec 10, 2011 at 2:09 AM, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
> On Fri, Dec 9, 2011 at 09:02, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> a) The stdlib documentation should help users to choose the right tool right
>> from the start.
>> b) cElementTree should finally loose it's "special" status as a separate
>> library and disappear as an accelerator module behind ElementTree.
>
> An at least somewhat informed +1 from me. The ElementTree API is a
> very good way to deal with XML from Python, and it deserves to be
> promoted over the included alternatives.
>
> Let's deprecate the NiCad batteries and try to guide users toward the
> Li-Ion ones.
>
> Cheers,
>
> Dirkjan
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com



-- 
?_?

From status at bugs.python.org  Fri Dec  9 18:07:33 2011
From: status at bugs.python.org (Python tracker)
Date: Fri,  9 Dec 2011 18:07:33 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20111209170733.899571CDE4@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-12-02 - 2011-12-09)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3169 (+21)
  closed 22180 (+26)
  total  25349 (+47)

Open issues with patches: 1351 


Issues opened (39)
==================

#5905: strptime fails in non-UTF locale
http://bugs.python.org/issue5905  reopened by haypo

#12555: PEP 3151 implementation
http://bugs.python.org/issue12555  reopened by ncoghlan

#13521: Make dict.setdefault() atomic
http://bugs.python.org/issue13521  opened by rhettinger

#13522: Document error return values for PyFloat_* and PyComplex_*
http://bugs.python.org/issue13522  opened by skrah

#13525: Tutorial: Example of Source Code Encoding triggers error
http://bugs.python.org/issue13525  opened by nicolasg

#13528: Rework performance FAQ
http://bugs.python.org/issue13528  opened by pitrou

#13530: Docs for os.lseek neglect to mention what it returns
http://bugs.python.org/issue13530  opened by nedbat

#13532: In IDLE, sys.stdout.write and sys.stderr can write any picklea
http://bugs.python.org/issue13532  opened by maniram.maniram

#13533: Would like Py_Initialize to play friendly with host app
http://bugs.python.org/issue13533  opened by dangermouseb

#13535: Improved two's complement arithmetic support: to_signed() and 
http://bugs.python.org/issue13535  opened by ncoghlan

#13537: Namedtuple instances can't be pickled in a daemonized process
http://bugs.python.org/issue13537  opened by Popa.Claudiu

#13538: Docstring of str() and/or behavior
http://bugs.python.org/issue13538  opened by Guillaume.Bouchard

#13539: A return is missing in TimeEncoding of calendar.py
http://bugs.python.org/issue13539  opened by psam

#13540: Document the Action API in argparse
http://bugs.python.org/issue13540  opened by jason.coombs

#13541: HTTPResponse (urllib) has no attribute read1 needed for TextIO
http://bugs.python.org/issue13541  opened by maubp

#13543: shlex with string ending in space gives "ValueError: No closin
http://bugs.python.org/issue13543  opened by ekorn

#13544: Add __qualname__ to functools.WRAPPER_ASSIGNMENTS
http://bugs.python.org/issue13544  opened by ncoghlan

#13545: Pydoc3.2: TypeError: unorderable types
http://bugs.python.org/issue13545  opened by threewestwinds

#13547: Clean Lib/_sysconfigdata.py and Modules/_testembed
http://bugs.python.org/issue13547  opened by skrah

#13548: Invalid 'line' tracer event on pass within else clause
http://bugs.python.org/issue13548  opened by sdeibel

#13549: Incorrect nested list comprehension documentation
http://bugs.python.org/issue13549  opened by mattlong

#13550: Rewrite logging hack of the threading module
http://bugs.python.org/issue13550  opened by haypo

#13551: pulldom doesn't populate DOM tree
http://bugs.python.org/issue13551  opened by AchimGaedke

#13552: Compilation issues of the curses module on OpenIndiana
http://bugs.python.org/issue13552  opened by haypo

#13553: Tkinter doesn't set proper application name
http://bugs.python.org/issue13553  opened by th9

#13554: Tkinter doesn't use higher resolution app icon
http://bugs.python.org/issue13554  opened by th9

#13555: cPickle MemoryError when loading large file (while pickle work
http://bugs.python.org/issue13555  opened by phillies

#13556: When tzinfo.utcoffset is out-of-bounds, the exception message 
http://bugs.python.org/issue13556  opened by exarkun

#13557: exec of list comprehension fails on NameError
http://bugs.python.org/issue13557  opened by sdeibel

#13558: multiprocessing package incompatible with PyObjC
http://bugs.python.org/issue13558  opened by mrmekon

#13559: Use sendfile where possible in httplib
http://bugs.python.org/issue13559  opened by benjamin.peterson

#13560: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize
http://bugs.python.org/issue13560  opened by haypo

#13561: os.listdir documentation should mention surrogateescape
http://bugs.python.org/issue13561  opened by michael.foord

#13562: Notes about module load path
http://bugs.python.org/issue13562  opened by Nam.Nguyen

#13563: Make use of with statement in ftplib
http://bugs.python.org/issue13563  opened by giampaolo.rodola

#13564: ftplib and sendfile()
http://bugs.python.org/issue13564  opened by giampaolo.rodola

#13565: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Le
http://bugs.python.org/issue13565  opened by haypo

#13566: Array objects pickled in 3.x with protocol <=2 are unpickled i
http://bugs.python.org/issue13566  opened by sbt

#13567: HTTPError interface changes / breaks depending on what was pas
http://bugs.python.org/issue13567  opened by Keto



Most recent 15 issues with no replies (15)
==========================================

#13565: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Le
http://bugs.python.org/issue13565

#13564: ftplib and sendfile()
http://bugs.python.org/issue13564

#13562: Notes about module load path
http://bugs.python.org/issue13562

#13561: os.listdir documentation should mention surrogateescape
http://bugs.python.org/issue13561

#13560: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize
http://bugs.python.org/issue13560

#13556: When tzinfo.utcoffset is out-of-bounds, the exception message 
http://bugs.python.org/issue13556

#13554: Tkinter doesn't use higher resolution app icon
http://bugs.python.org/issue13554

#13553: Tkinter doesn't set proper application name
http://bugs.python.org/issue13553

#13544: Add __qualname__ to functools.WRAPPER_ASSIGNMENTS
http://bugs.python.org/issue13544

#13540: Document the Action API in argparse
http://bugs.python.org/issue13540

#13539: A return is missing in TimeEncoding of calendar.py
http://bugs.python.org/issue13539

#13528: Rework performance FAQ
http://bugs.python.org/issue13528

#13525: Tutorial: Example of Source Code Encoding triggers error
http://bugs.python.org/issue13525

#13516: Gzip old log files in rotating handlers
http://bugs.python.org/issue13516

#13507: Modify OS X installer builds to package liblzma for the new lz
http://bugs.python.org/issue13507



Most recent 15 issues waiting for review (15)
=============================================

#13567: HTTPError interface changes / breaks depending on what was pas
http://bugs.python.org/issue13567

#13564: ftplib and sendfile()
http://bugs.python.org/issue13564

#13563: Make use of with statement in ftplib
http://bugs.python.org/issue13563

#13562: Notes about module load path
http://bugs.python.org/issue13562

#13560: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize
http://bugs.python.org/issue13560

#13552: Compilation issues of the curses module on OpenIndiana
http://bugs.python.org/issue13552

#13550: Rewrite logging hack of the threading module
http://bugs.python.org/issue13550

#13549: Incorrect nested list comprehension documentation
http://bugs.python.org/issue13549

#13528: Rework performance FAQ
http://bugs.python.org/issue13528

#13520: Patch to make pickle aware of __qualname__
http://bugs.python.org/issue13520

#13516: Gzip old log files in rotating handlers
http://bugs.python.org/issue13516

#13515: Consistent documentation practices for security concerns and c
http://bugs.python.org/issue13515

#13512: ~/.pypirc created insecurely
http://bugs.python.org/issue13512

#13511: ./configure --includedir, --libdir accept multiple
http://bugs.python.org/issue13511

#13508: ctypes' find_library breaks with ARM ABIs
http://bugs.python.org/issue13508



Top 10 most discussed issues (10)
=================================

#11051: system calls per import
http://bugs.python.org/issue11051   9 msgs

#13549: Incorrect nested list comprehension documentation
http://bugs.python.org/issue13549   8 msgs

#11816: Refactor the dis module to provide better building blocks for 
http://bugs.python.org/issue11816   6 msgs

#12555: PEP 3151 implementation
http://bugs.python.org/issue12555   6 msgs

#6715: xz compressor support
http://bugs.python.org/issue6715   5 msgs

#11682: PEP 380 reference implementation for 3.3
http://bugs.python.org/issue11682   5 msgs

#11838: IDLE: make interactive code savable as a runnable script
http://bugs.python.org/issue11838   5 msgs

#13515: Consistent documentation practices for security concerns and c
http://bugs.python.org/issue13515   5 msgs

#13538: Docstring of str() and/or behavior
http://bugs.python.org/issue13538   5 msgs

#13545: Pydoc3.2: TypeError: unorderable types
http://bugs.python.org/issue13545   5 msgs



Issues closed (26)
==================

#3635: pickle.dumps cannot save instance of dict-derived class that o
http://bugs.python.org/issue3635  closed by alexandre.vassalotti

#9663: importlib should exclusively open bytecode files
http://bugs.python.org/issue9663  closed by brett.cannon

#11147: _Py_ANNOTATE_MEMORY_ORDER has unused argument, effects code wh
http://bugs.python.org/issue11147  closed by barry

#11894: test_multiprocessing failure on "AMD64 OpenIndiana 3.x": KeyEr
http://bugs.python.org/issue11894  closed by haypo

#12208: Glitches in email.policy docs
http://bugs.python.org/issue12208  closed by eric.araujo

#12567: curses implementation of Unicode is wrong in Python 3
http://bugs.python.org/issue12567  closed by haypo

#12612: Valgrind suppressions
http://bugs.python.org/issue12612  closed by neologix

#12666: map semantic change not documented in What's New
http://bugs.python.org/issue12666  closed by jason.coombs

#13211: urllib2.HTTPError does not have 'reason' attribute.
http://bugs.python.org/issue13211  closed by jason.coombs

#13441: TestEnUSCollation.test_strxfrm() fails on Solaris
http://bugs.python.org/issue13441  closed by haypo

#13464: HTTPResponse is missing an implementation of readinto
http://bugs.python.org/issue13464  closed by pitrou

#13494: 'cast' any value to a Boolean?
http://bugs.python.org/issue13494  closed by ezio.melotti

#13499: uuid documentation example uses invalid REPL/doctest syntax
http://bugs.python.org/issue13499  closed by ezio.melotti

#13500: Hitting EOF gets cmd.py into a infinite EOF on return loop
http://bugs.python.org/issue13500  closed by python-dev

#13503: improved efficiency of bytearray pickling by using bytes type 
http://bugs.python.org/issue13503  closed by pitrou

#13513: IOBase docs incorrectly link to the readline module
http://bugs.python.org/issue13513  closed by meador.inge

#13523: Python does not warn in module .py files does not exist if the
http://bugs.python.org/issue13523  closed by ncoghlan

#13524: critical error with import tempfile
http://bugs.python.org/issue13524  closed by Andrey.Morozov

#13526: Deprecate the old Unicode API
http://bugs.python.org/issue13526  closed by loewis

#13527: Remove obsolete mentions in the GUIs page
http://bugs.python.org/issue13527  closed by pitrou

#13529: Segfault inside of gc/weakref
http://bugs.python.org/issue13529  closed by alex

#13531: add test for defaultdict with non-callable first argument
http://bugs.python.org/issue13531  closed by ezio.melotti

#13534: test_cmath fails on ppc with glibc-2.14.90 due to buggy archit
http://bugs.python.org/issue13534  closed by dmalcolm

#13536: ast.literal_eval fails on sets
http://bugs.python.org/issue13536  closed by benjamin.peterson

#13542: Memory leak in multiprocessing.pool
http://bugs.python.org/issue13542  closed by neologix

#13546: sys.setrecursionlimit() crashes IDLE
http://bugs.python.org/issue13546  closed by ned.deily

From janssen at parc.com  Fri Dec  9 18:47:11 2011
From: janssen at parc.com (Bill Janssen)
Date: Fri, 9 Dec 2011 09:47:11 PST
Subject: [Python-Dev] Unicode re support in Python 3
In-Reply-To: <D1AFFF88-54C6-4B91-A635-53850A1CA7FE@voidspace.org.uk>
References: <D1AFFF88-54C6-4B91-A635-53850A1CA7FE@voidspace.org.uk>
Message-ID: <67010.1323452831@parc.com>

Michael Foord <fuzzyman at voidspace.org.uk> wrote:

> Hey python-devers,
> 
> As I'm sure many of you are aware, Armin Ronacher posted a blog entry
> explaining the reasons he dislikes Python 3 in its current form.
> 
> Whilst I don't agree with all of his complaints, he makes a fair point
> about the re module Unicode support. It seems that the specific issue
> he has could be fixed by accepting the re module improvement /
> overhaul implemented by mrab:
> 
> 	http://bugs.python.org/issue2636
> 
> As it comes with an active maintainer, and is a big step forward for
> Python regex support, I'd like to see it in Python 3.3. Reading
> through the issue it's not clear to me what needs to be done for it to
> be accepted (or rejected), beyond a general "it's a big change".

I've been using mrab's regex daily for about six months, and have found
it stable and useful.  It now includes two features which are both
unusual and useful (very Pythonic!), named lists and fuzzy matching.

Bill

From mwm at mired.org  Fri Dec  9 19:07:36 2011
From: mwm at mired.org (Mike Meyer)
Date: Fri, 9 Dec 2011 10:07:36 -0800
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jbsfar$en7$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>
Message-ID: <20111209100736.5f16419a@mikmeyer-vm-fedora>

On Fri, 09 Dec 2011 09:02:35 +0100
Stefan Behnel <stefan_ml at behnel.de> wrote:

> a) The stdlib documentation should help users to choose the right
> tool right from the start.
> b) cElementTree should finally loose it's "special" status as a
> separate library and disappear as an accelerator module behind
> ElementTree.

+1 and +1.

I've done a lot of xml work in Python, and unless you've got a
particular reason for wanting to use the dom, ElementTree is the only
sane way to go.

I recently converted a middling-sized app from using the dom to using
ElementTree, and wrote up some guidelines for the process for the
client. I can try and shake it out of my clients lawyers if it would
help with this or others are interested.

     <mike

From janssen at parc.com  Fri Dec  9 19:15:54 2011
From: janssen at parc.com (Bill Janssen)
Date: Fri, 9 Dec 2011 10:15:54 PST
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <20111209100736.5f16419a@mikmeyer-vm-fedora>
References: <jbsfar$en7$1@dough.gmane.org>
	<20111209100736.5f16419a@mikmeyer-vm-fedora>
Message-ID: <68178.1323454554@parc.com>

Mike Meyer <mwm at mired.org> wrote:

> On Fri, 09 Dec 2011 09:02:35 +0100
> Stefan Behnel <stefan_ml at behnel.de> wrote:
> 
> > a) The stdlib documentation should help users to choose the right
> > tool right from the start.
> > b) cElementTree should finally loose it's "special" status as a
> > separate library and disappear as an accelerator module behind
> > ElementTree.
> 
> +1 and +1.
> 
> I've done a lot of xml work in Python, and unless you've got a
> particular reason for wanting to use the dom, ElementTree is the only
> sane way to go.

I use ElementTree for parsing valid XML, but minidom for producing it.

I think another thing that might go into "refreshing the batteries" is a
feature comparison of BeautifulSoup and HTML5lib against the stdlib
competition, to see what needs to be added/revised.  Having to switch to
an outside package for parsing possibly invalid HTML is a pain.

Bill

From p.f.moore at gmail.com  Fri Dec  9 19:24:41 2011
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 9 Dec 2011 18:24:41 +0000
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <68178.1323454554@parc.com>
References: <jbsfar$en7$1@dough.gmane.org>
	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com>
Message-ID: <CACac1F9sNFt0uEmWtkoTWPjzjgVjrQ7hy4wqaqjr6s9Vq-5Nzg@mail.gmail.com>

On 9 December 2011 18:15, Bill Janssen <janssen at parc.com> wrote:
> I use ElementTree for parsing valid XML, but minidom for producing it.
>
> I think another thing that might go into "refreshing the batteries" is a
> feature comparison of BeautifulSoup and HTML5lib against the stdlib
> competition, to see what needs to be added/revised. ?Having to switch to
> an outside package for parsing possibly invalid HTML is a pain.

For what little use I make of XML/HTML parsing, I use lxml, simply
because it has a parser that covers the sort of HTML I have to deal
with in real life. As I have lxml installed, I use it for any XML
parsing tasks, just because I'm used to it.

Paul

From glyph at twistedmatrix.com  Fri Dec  9 19:39:20 2011
From: glyph at twistedmatrix.com (Glyph)
Date: Fri, 9 Dec 2011 13:39:20 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
	<CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
Message-ID: <7DEE32A7-1426-4E93-8708-BDF3B0CAF8EC@twistedmatrix.com>


On Dec 9, 2011, at 12:43 AM, Guido van Rossum wrote:

> Even if it weren't slow, I still wouldn't use it to automatically
> convert code at install time; a single codebase is easier to reason
> about, and easier to support.  Users send me tracebacks all the time;
> having them match the source is a wonderful thing.
> 
> Even though 2to3 was my idea, I am gradually beginning to appreciate this approach. I skimmed the docs for "six" and liked it.

Actually, maybe I like it a bit better than I thought.

The biggest issue for the single-codebase approach is 'except ... as ...'.  Peppering one's codebase with calls to sys.exc_info() can be a real performance problem, especially on PyPy.  Not to mention how ugly it is.  For some reason I thought that this syntax was only supported by 2.7 and up; I see now that it's 2.6 and up.

This is still a problem for 2.5 support, of course, but 2.6-only may not be too far away for many projects; Twisted's support schedule for Python versions typically follows Ubuntu's, which means that we might be able to drop 2.5 as early as 2013! :).  Even in the plans that involve 2to3 though, "drop everything prior to 2.6" was always supposed to be step 0, so "single codebase" adds much less of a burden than I thought.

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111209/ae1e198f/attachment.html>

From python-dev at masklinn.net  Fri Dec  9 19:39:17 2011
From: python-dev at masklinn.net (Xavier Morel)
Date: Fri, 9 Dec 2011 19:39:17 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <68178.1323454554@parc.com>
References: <jbsfar$en7$1@dough.gmane.org>
	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com>
Message-ID: <C0C8FDCA-61F7-469C-9C62-6861D604C637@masklinn.net>

On 2011-12-09, at 19:15 , Bill Janssen wrote:
> I use ElementTree for parsing valid XML, but minidom for producing it.
Could you expand on your reasons to use minidom for producing XML?

From victor.stinner at haypocalc.com  Fri Dec  9 19:51:14 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 09 Dec 2011 19:51:14 +0100
Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and
	PyUnicode_EncodeCodePage()
In-Reply-To: <20111209013535.6fb38068@pitrou.net>
References: <E1RYnC6-0005HV-Ep@dinsdale.python.org>
	<20111209013535.6fb38068@pitrou.net>
Message-ID: <4EE258A2.8020902@haypocalc.com>

On 09/12/2011 01:35, Antoine Pitrou wrote:
> On Fri, 09 Dec 2011 00:16:02 +0100
> victor.stinner<python-checkins at python.org>  wrote:
>>
>> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
>> +
>> +   Get a new copy of a Unicode object.
>> +
>> +   .. versionadded:: 3.3
>
> I'm not sure I understand. Why would you make a copy of an immutable
> object?

PyUnicode_Copy() can be used to modify a string to create a new string 
with the same length. It is used for example by str.upper(), 
str.title(), ... (fixup()).

It is also used by str.__getnewargs__(). I am not sure that 
str.__getnewargs__() must be a copy of str (s.__getnewargs__() is not x).

As mentionned by Martin, PyUnicode_Copy() is also used to get "an exact" 
Unicode object when you have a subtype.

We can maybe make the function private.

Victor

From flying-sheep at web.de  Thu Dec  8 14:31:26 2011
From: flying-sheep at web.de (Philipp A.)
Date: Thu, 8 Dec 2011 14:31:26 +0100
Subject: [Python-Dev] re.findall() should return named tuple
Message-ID: <CAN8d9gkTRgupzXE-eBEmtRNsC2diY0MA_7KvAuYwBQ15=4h2iA@mail.gmail.com>

hi devs,

just an idea that popped up in my mind: re.findall() returns a list of
tuples, where every entry of each tuple represents a match group.
since match groups can be named, we are able to use named tuples instead of
plain tuples here, in the same fashion as namedtuple?s rename works:
misssing group names get renamed to _1 and so on. i suggest to add the
rename keyword option, to findall, defaulting to True, since mixed
positional and named tuples are more common than in usual use cases of
namedtuple.

do you think it?s a good idea?

finally: should i join the mailing list to see answers? should i file a
PEP? i have no idea how the inner workings of python development are, but i
wanted to share this idea with you :)

thanks for keeping python great,
philipp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111208/c9e3df2c/attachment.html>

From janssen at parc.com  Fri Dec  9 20:33:17 2011
From: janssen at parc.com (Bill Janssen)
Date: Fri, 9 Dec 2011 11:33:17 PST
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <C0C8FDCA-61F7-469C-9C62-6861D604C637@masklinn.net>
References: <jbsfar$en7$1@dough.gmane.org>
	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com>
	<C0C8FDCA-61F7-469C-9C62-6861D604C637@masklinn.net>
Message-ID: <69816.1323459197@parc.com>

Xavier Morel <python-dev at masklinn.net> wrote:

> On 2011-12-09, at 19:15 , Bill Janssen wrote:
> > I use ElementTree for parsing valid XML, but minidom for producing it.
> Could you expand on your reasons to use minidom for producing XML?

Inertia, I guess.  I tried that first, and it seems to work.

I tend to use html5lib and/or BeautifulSoup instead of ElementTree, and
that's mainly because I find the documentation for ElementTree is
confusing and partial and inconsistent.  Having various undated but
obsolete tutorials and documentation still up on effbot.org doesn't
help.


Bill

From solipsis at pitrou.net  Fri Dec  9 20:32:16 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 9 Dec 2011 20:32:16 +0100
Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and
 PyUnicode_EncodeCodePage()
References: <E1RYnC6-0005HV-Ep@dinsdale.python.org>
	<20111209013535.6fb38068@pitrou.net>
	<4EE258A2.8020902@haypocalc.com>
Message-ID: <20111209203216.2c627d61@pitrou.net>

On Fri, 09 Dec 2011 19:51:14 +0100
Victor Stinner <victor.stinner at haypocalc.com> wrote:
> On 09/12/2011 01:35, Antoine Pitrou wrote:
> > On Fri, 09 Dec 2011 00:16:02 +0100
> > victor.stinner<python-checkins at python.org>  wrote:
> >>
> >> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
> >> +
> >> +   Get a new copy of a Unicode object.
> >> +
> >> +   .. versionadded:: 3.3
> >
> > I'm not sure I understand. Why would you make a copy of an immutable
> > object?
> 
> PyUnicode_Copy() can be used to modify a string to create a new string 
> with the same length. It is used for example by str.upper(), 
> str.title(), ... (fixup()).

Then the doc should mention that the returned string can be modified.
Otherwise it's a bit obscure why the function exists.

Regards

Antoine.



From pje at telecommunity.com  Fri Dec  9 20:58:10 2011
From: pje at telecommunity.com (PJ Eby)
Date: Fri, 9 Dec 2011 14:58:10 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <20111209101123.01e92326@limelight.wooz.org>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
Message-ID: <CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>

On Fri, Dec 9, 2011 at 10:11 AM, Barry Warsaw <barry at python.org> wrote:

> As Chris points out, this seems to be a use case tied to WSGI and PEP
> 3333.  I
> guess it's an unfortunate choice for so recent a PEP, but maybe there was
> no
> way to do better.


For the record, "native strings" are defined the way they are because of
IronPython and Jython, which had unicode strings long before CPython.  At
the time WSGI was developed, the approach for Python 3 (then called "3000")
was expected to be similar, and the new I/O system was not (AFAIR) designed
yet.

All that changed in PEP 3333 was introducing *byte* strings (to accommodate
the I/O changes), not native strings.

In fact, I'm not sure why people are bringing it into this discussion at
all: PEP 3333 was designed to work well with 2to3, which does the right
thing for WSGI code: it converts 2.x "str" to 3.x "str", as it should.  If
you're writing 2.x WSGI code with 'u' literals, *your code is broken*.

WSGI doesn't need 'u' literals and never has.  It *does* need b'' literals
for stuff that refers to request and response bodies, but everything else
should be plain old string literals for the appropriate Python version.


It can certainly be useful in many contexts outside of WSGI.
>

And *only* there, pretty much.  ;-)  PEP 3333 was designed to work with the
official upgrade path (2to3), which is why it has a concept of native
strings.  Thing is, if you mark them with a 'u', you're writing incorrect
code for  2.x.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111209/bf0c5e73/attachment.html>

From manday at gmx.net  Fri Dec  9 21:26:29 2011
From: manday at gmx.net (Cedric Sodhi)
Date: Fri, 9 Dec 2011 21:26:29 +0100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
Message-ID: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>

IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN
DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO
DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
SIMILAR, JUST DON'T.

Otherwise, read on.

I know very well that this topic has been discussed before. On forums.
Mailing lists. IRC. Blogs. From person to person, even.

And I know equally well, from all those years experiencing
argument-turned-debates on the internet, how a (minor|major) fraction of
participants make up for their inability to lead a proper debate by
speaking the loudest of all, so that eventually quantity triumphs over
quality and logic.

That ahead; I hope you can try not to fall in that category. Let instead
reason prevail over sentimentalism, mislead purism, elitism, and all
other sorts of isms which hinder advancement in the greater context.

Python has surprised once already: The changes from 2 to 3 were not
downwards compatible because the core developers realized there is more
to a sustainable language than constantly patching it up until it comes
apart like the roman empire.

Let's keep that spirit for a second and let us discuss braces, again,
with the clear goal of improving the language.

End of disclaimer?

End of disclaimer!

Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has
reasons. What are those reasons? Well, primarily, it forces the
programmer to maintain well readable code. Then, some might argue, it is
quicker to type.

Two reasons, but of what importance are they? And are they actually
reasons?

You may guessed it from the questions themselves that I'm about to
question that.

I don't intend to connote brazen implications, so let me spell out what
I just implied: I think anyone who thinks that exclusive WSB is a good
alternative or even preferable to DB is actually deluding themselves for
some personal version of one of those isms mentioned above.

Let's examine these alleged advantages objectively one for one. But
before that, just to calm troubled waters a little, allow me bring
forward the conclusion:

Absolutely no intentions to remowe WSB from Python. Although one might
have gotten that impression from the early paragraphs, no intentions to
break downwards compatibility, either.

What Python needs is an alternative to WSB and can stay Python by still
offering WSB to all those who happen to like it.

Readable code, is it really an advantage?

Two linebreaks, just for the suspense, then:

Of course it is.

Forcing the programmer to write readable code, is that an advantage? No
suspense, the answer is Of course not.

Python may have started off as the casual scripting language for casual
people. People, who may not even have known programming. And perhaps it
has made sense to force -- or shall we say motivate, since you can still
produce perfectly obfuscated code with Python -- them to write readably.

But Python has matured and so has its clientele. Python does not become
a better language, neither for beginners nor for experienced programmers
who also frequently use Python these days, by patronizing them and
restricting them in their freedom.

Readable code? Yes. Forcing people to write readable code by artificial
means? No.

Practice is evidence for the mischief of this policy: Does the FOSS
community suffer from a notorious lack of proper indention or
readability of code? Of course we don't.

I'm not a native speaker, but dict.cc tells me that what we call "mit
Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called
breaking a fly on the wheel in English.

I may lack the analogy for the fly on the wheel, which, if I'm not
mistaken, used to be a device for torture in the Middle Ages, but I can
tell you that the cannon ball which might have struck the sparrows,
coincidently caused havoc in the hinterlands.

For the wide-spread and professional language Python is today, the idea
of forcing people to indent is misguided. These days, it may address a
neglible minority of absolute beginners who barely started programming
and would not listen to the simple advice of indenting properly, but on
the other hand it hurts/annoys/deters a great community of typical
programmers for whom DB has long become a de facto standard.

For them, it's not a mere inconsistency without, for them, any apparent
reason. It's more than the inconvenience not being able to follow ones
long time practices, using the scripts one wrote for delimiters, the
shortcuts that are usually offered by editor, etc.

It also brings about a whole class of new problems which may be
anticipated and prevent, yet bear a great potential for new, even
hard-to-find bugs (just in case anyone would respond that we had
eventually successfully redeemed the mismatched parenthesis problem - at
what cost?!).

Not just difficult to find, near to impossible would be the right word
for anyone who has to review someone else's patch.

It is widely known among the programmer's community that spaces and tabs
are remarkably similar to eachother. So similar even, that people fight
wars about which to use in a non-py context. It might strike one as an
equally remarkably nonsensical idea to give them programmatic meaning -
two DIFFERENT meanings, to make things even worse.

While it becomes a practical impossibility to spot these kind of bugs
while reviewing code -- optionally mangled through a medium which
expands tabs to whitespace, not so much of a rarity -- it is still a
time-consuming and tedious job to find them in a local situation.

More or less easily rectified, but once you spent a while trying to
figure something like that out, you inevitably have the urge to ask: Why?

Last of all, some might argue that it's convenient to not to have type
delimiters. Well, be my guest. I also appreciate single lined
conditional or loops once in a while. I understand how not having to
type delimiters if you don't want them lifts a burden. Hence I would not
want rid Python of them. WSB may come in handy. But equally, it may not.

Proposing the actual changes that would have to be made to accomodate
both, WSB and DB is beyond the scope of this script. It is the
CONCLUSION that the current situation is undesirable and Python,
although not apparent at the first glance, suffers from exclusive WSB,
which is the goal of this thread.

Discussing has its etymological roots in Discourse, which connotes a
loosely guided conversation about a topic. Therefore, I conclude with a 

DEBATE!!!111

kind regards,
-- MD

(not proof-read)

From brian at python.org  Fri Dec  9 21:36:21 2011
From: brian at python.org (Brian Curtin)
Date: Fri, 9 Dec 2011 14:36:21 -0600
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
Message-ID: <CAD+XWwqB2h1n_MnOAbJvW-gRDw6WcQ2GiLoBBywdc7SiCLMGHA@mail.gmail.com>

On Fri, Dec 9, 2011 at 14:26, Cedric Sodhi <manday at gmx.net> wrote:
> IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN
> DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO
> DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
> SIMILAR, JUST DON'T.
>
> Otherwise, read on.
>
> I know very well that this topic has been discussed before. On forums.
> Mailing lists. IRC. Blogs. From person to person, even.
>
> And I know equally well, from all those years experiencing
> argument-turned-debates on the internet, how a (minor|major) fraction of
> participants make up for their inability to lead a proper debate by
> speaking the loudest of all, so that eventually quantity triumphs over
> quality and logic.
>
> That ahead; I hope you can try not to fall in that category. Let instead
> reason prevail over sentimentalism, mislead purism, elitism, and all
> other sorts of isms which hinder advancement in the greater context.
>
> Python has surprised once already: The changes from 2 to 3 were not
> downwards compatible because the core developers realized there is more
> to a sustainable language than constantly patching it up until it comes
> apart like the roman empire.
>
> Let's keep that spirit for a second and let us discuss braces, again,
> with the clear goal of improving the language.
>
> End of disclaimer?
>
> End of disclaimer!
>
> Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has
> reasons. What are those reasons? Well, primarily, it forces the
> programmer to maintain well readable code. Then, some might argue, it is
> quicker to type.
>
> Two reasons, but of what importance are they? And are they actually
> reasons?
>
> You may guessed it from the questions themselves that I'm about to
> question that.
>
> I don't intend to connote brazen implications, so let me spell out what
> I just implied: I think anyone who thinks that exclusive WSB is a good
> alternative or even preferable to DB is actually deluding themselves for
> some personal version of one of those isms mentioned above.
>
> Let's examine these alleged advantages objectively one for one. But
> before that, just to calm troubled waters a little, allow me bring
> forward the conclusion:
>
> Absolutely no intentions to remowe WSB from Python. Although one might
> have gotten that impression from the early paragraphs, no intentions to
> break downwards compatibility, either.
>
> What Python needs is an alternative to WSB and can stay Python by still
> offering WSB to all those who happen to like it.
>
> Readable code, is it really an advantage?
>
> Two linebreaks, just for the suspense, then:
>
> Of course it is.
>
> Forcing the programmer to write readable code, is that an advantage? No
> suspense, the answer is Of course not.
>
> Python may have started off as the casual scripting language for casual
> people. People, who may not even have known programming. And perhaps it
> has made sense to force -- or shall we say motivate, since you can still
> produce perfectly obfuscated code with Python -- them to write readably.
>
> But Python has matured and so has its clientele. Python does not become
> a better language, neither for beginners nor for experienced programmers
> who also frequently use Python these days, by patronizing them and
> restricting them in their freedom.
>
> Readable code? Yes. Forcing people to write readable code by artificial
> means? No.
>
> Practice is evidence for the mischief of this policy: Does the FOSS
> community suffer from a notorious lack of proper indention or
> readability of code? Of course we don't.
>
> I'm not a native speaker, but dict.cc tells me that what we call "mit
> Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called
> breaking a fly on the wheel in English.
>
> I may lack the analogy for the fly on the wheel, which, if I'm not
> mistaken, used to be a device for torture in the Middle Ages, but I can
> tell you that the cannon ball which might have struck the sparrows,
> coincidently caused havoc in the hinterlands.
>
> For the wide-spread and professional language Python is today, the idea
> of forcing people to indent is misguided. These days, it may address a
> neglible minority of absolute beginners who barely started programming
> and would not listen to the simple advice of indenting properly, but on
> the other hand it hurts/annoys/deters a great community of typical
> programmers for whom DB has long become a de facto standard.
>
> For them, it's not a mere inconsistency without, for them, any apparent
> reason. It's more than the inconvenience not being able to follow ones
> long time practices, using the scripts one wrote for delimiters, the
> shortcuts that are usually offered by editor, etc.
>
> It also brings about a whole class of new problems which may be
> anticipated and prevent, yet bear a great potential for new, even
> hard-to-find bugs (just in case anyone would respond that we had
> eventually successfully redeemed the mismatched parenthesis problem - at
> what cost?!).
>
> Not just difficult to find, near to impossible would be the right word
> for anyone who has to review someone else's patch.
>
> It is widely known among the programmer's community that spaces and tabs
> are remarkably similar to eachother. So similar even, that people fight
> wars about which to use in a non-py context. It might strike one as an
> equally remarkably nonsensical idea to give them programmatic meaning -
> two DIFFERENT meanings, to make things even worse.
>
> While it becomes a practical impossibility to spot these kind of bugs
> while reviewing code -- optionally mangled through a medium which
> expands tabs to whitespace, not so much of a rarity -- it is still a
> time-consuming and tedious job to find them in a local situation.
>
> More or less easily rectified, but once you spent a while trying to
> figure something like that out, you inevitably have the urge to ask: Why?
>
> Last of all, some might argue that it's convenient to not to have type
> delimiters. Well, be my guest. I also appreciate single lined
> conditional or loops once in a while. I understand how not having to
> type delimiters if you don't want them lifts a burden. Hence I would not
> want rid Python of them. WSB may come in handy. But equally, it may not.
>
> Proposing the actual changes that would have to be made to accomodate
> both, WSB and DB is beyond the scope of this script. It is the
> CONCLUSION that the current situation is undesirable and Python,
> although not apparent at the first glance, suffers from exclusive WSB,
> which is the goal of this thread.
>
> Discussing has its etymological roots in Discourse, which connotes a
> loosely guided conversation about a topic. Therefore, I conclude with a
>
> DEBATE!!!111
>
> kind regards,
> -- MD
>
> (not proof-read)

You forgot the patch.

From regebro at gmail.com  Fri Dec  9 21:46:42 2011
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 9 Dec 2011 21:46:42 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE239A0.2020004@netwok.org>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
	<jbrllu$99u$1@dough.gmane.org>
	<loom.20111209T022519-121@post.gmane.org>
	<4EE239A0.2020004@netwok.org>
Message-ID: <CAL0kPAUVHuts8vZujGjB6QjHi599fsCPpWTszq59OKvCfdHz_g@mail.gmail.com>

On Fri, Dec 9, 2011 at 17:38, ?ric Araujo <merwok at netwok.org> wrote:
> When running 2to3 from a setup.py script, does it run on the whole
> codebase or only files that are found newer by the make-like
> timestamp-based dependency system?

Only on the ones that are newer. But since at install time, that's all
of them, it doesn't really help. :-)

//Lennart

From jnoller at gmail.com  Fri Dec  9 21:59:05 2011
From: jnoller at gmail.com (Jesse Noller)
Date: Fri, 9 Dec 2011 15:59:05 -0500
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
Message-ID: <80BEA596DA974153981E972102B17257@gmail.com>



On Friday, December 9, 2011 at 3:26 PM, Cedric Sodhi wrote:

> IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN
> DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO
> DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
> SIMILAR, JUST DON'T.
>  
> Otherwise, read on.
>  
> I know very well that this topic has been discussed before. On forums.
> Mailing lists. IRC. Blogs. From person to person, even.
>  
> And I know equally well, from all those years experiencing
> argument-turned-debates on the internet, how a (minor|major) fraction of
> participants make up for their inability to lead a proper debate by
> speaking the loudest of all, so that eventually quantity triumphs over
> quality and logic.
>  
> That ahead; I hope you can try not to fall in that category. Let instead
> reason prevail over sentimentalism, mislead purism, elitism, and all
> other sorts of isms which hinder advancement in the greater context.
>  
> Python has surprised once already: The changes from 2 to 3 were not
> downwards compatible because the core developers realized there is more
> to a sustainable language than constantly patching it up until it comes
> apart like the roman empire.
>  
> Let's keep that spirit for a second and let us discuss braces, again,
> with the clear goal of improving the language.
>  
> End of disclaimer?
>  
> End of disclaimer!
>  
> Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has
> reasons. What are those reasons? Well, primarily, it forces the
> programmer to maintain well readable code. Then, some might argue, it is
> quicker to type.
>  
> Two reasons, but of what importance are they? And are they actually
> reasons?
>  
> You may guessed it from the questions themselves that I'm about to
> question that.
>  
> I don't intend to connote brazen implications, so let me spell out what
> I just implied: I think anyone who thinks that exclusive WSB is a good
> alternative or even preferable to DB is actually deluding themselves for
> some personal version of one of those isms mentioned above.
>  
> Let's examine these alleged advantages objectively one for one. But
> before that, just to calm troubled waters a little, allow me bring
> forward the conclusion:
>  
> Absolutely no intentions to remowe WSB from Python. Although one might
> have gotten that impression from the early paragraphs, no intentions to
> break downwards compatibility, either.
>  
> What Python needs is an alternative to WSB and can stay Python by still
> offering WSB to all those who happen to like it.
>  
> Readable code, is it really an advantage?
>  
> Two linebreaks, just for the suspense, then:
>  
> Of course it is.
>  
> Forcing the programmer to write readable code, is that an advantage? No
> suspense, the answer is Of course not.
>  
> Python may have started off as the casual scripting language for casual
> people. People, who may not even have known programming. And perhaps it
> has made sense to force -- or shall we say motivate, since you can still
> produce perfectly obfuscated code with Python -- them to write readably.
>  
> But Python has matured and so has its clientele. Python does not become
> a better language, neither for beginners nor for experienced programmers
> who also frequently use Python these days, by patronizing them and
> restricting them in their freedom.
>  
> Readable code? Yes. Forcing people to write readable code by artificial
> means? No.
>  
> Practice is evidence for the mischief of this policy: Does the FOSS
> community suffer from a notorious lack of proper indention or
> readability of code? Of course we don't.
>  
> I'm not a native speaker, but dict.cc (http://dict.cc) tells me that what we call "mit
> Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called
> breaking a fly on the wheel in English.
>  
> I may lack the analogy for the fly on the wheel, which, if I'm not
> mistaken, used to be a device for torture in the Middle Ages, but I can
> tell you that the cannon ball which might have struck the sparrows,
> coincidently caused havoc in the hinterlands.
>  
> For the wide-spread and professional language Python is today, the idea
> of forcing people to indent is misguided. These days, it may address a
> neglible minority of absolute beginners who barely started programming
> and would not listen to the simple advice of indenting properly, but on
> the other hand it hurts/annoys/deters a great community of typical
> programmers for whom DB has long become a de facto standard.
>  
> For them, it's not a mere inconsistency without, for them, any apparent
> reason. It's more than the inconvenience not being able to follow ones
> long time practices, using the scripts one wrote for delimiters, the
> shortcuts that are usually offered by editor, etc.
>  
> It also brings about a whole class of new problems which may be
> anticipated and prevent, yet bear a great potential for new, even
> hard-to-find bugs (just in case anyone would respond that we had
> eventually successfully redeemed the mismatched parenthesis problem - at
> what cost?!).
>  
> Not just difficult to find, near to impossible would be the right word
> for anyone who has to review someone else's patch.
>  
> It is widely known among the programmer's community that spaces and tabs
> are remarkably similar to eachother. So similar even, that people fight
> wars about which to use in a non-py context. It might strike one as an
> equally remarkably nonsensical idea to give them programmatic meaning -
> two DIFFERENT meanings, to make things even worse.
>  
> While it becomes a practical impossibility to spot these kind of bugs
> while reviewing code -- optionally mangled through a medium which
> expands tabs to whitespace, not so much of a rarity -- it is still a
> time-consuming and tedious job to find them in a local situation.
>  
> More or less easily rectified, but once you spent a while trying to
> figure something like that out, you inevitably have the urge to ask: Why?
>  
> Last of all, some might argue that it's convenient to not to have type
> delimiters. Well, be my guest. I also appreciate single lined
> conditional or loops once in a while. I understand how not having to
> type delimiters if you don't want them lifts a burden. Hence I would not
> want rid Python of them. WSB may come in handy. But equally, it may not.
>  
> Proposing the actual changes that would have to be made to accomodate
> both, WSB and DB is beyond the scope of this script. It is the
> CONCLUSION that the current situation is undesirable and Python,
> although not apparent at the first glance, suffers from exclusive WSB,
> which is the goal of this thread.
>  
> Discussing has its etymological roots in Discourse, which connotes a
> loosely guided conversation about a topic. Therefore, I conclude with a  
>  
> DEBATE!!!111
>  
> kind regards,
> -- MD
>  
> (not proof-read)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org (mailto:Python-Dev at python.org)
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com


+1

From python-dev at masklinn.net  Fri Dec  9 22:02:54 2011
From: python-dev at masklinn.net (Xavier Morel)
Date: Fri, 9 Dec 2011 22:02:54 +0100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
Message-ID: <85361AAC-2DB1-4EF4-8DB6-07AB8846BDDA@masklinn.net>

On 2011-12-09, at 21:26 , Cedric Sodhi wrote:
> IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN
> DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO
> DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
> SIMILAR, JUST DON'T.
> 
> Otherwise, read on.
> 
> I know very well that this topic has been discussed before. On forums.
> Mailing lists. IRC. Blogs. From person to person, even.
> 
> And I know equally well, from all those years experiencing
> argument-turned-debates on the internet, how a (minor|major) fraction of
> participants make up for their inability to lead a proper debate by
> speaking the loudest of all, so that eventually quantity triumphs over
> quality and logic.
> 
> That ahead; I hope you can try not to fall in that category. Let instead
> reason prevail over sentimentalism, mislead purism, elitism, and all
> other sorts of isms which hinder advancement in the greater context.
> 
> Python has surprised once already: The changes from 2 to 3 were not
> downwards compatible because the core developers realized there is more
> to a sustainable language than constantly patching it up until it comes
> apart like the roman empire.
> 
> Let's keep that spirit for a second and let us discuss braces, again,
> with the clear goal of improving the language.
> 
> End of disclaimer?
> 
> End of disclaimer!
> 
> Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has
> reasons. What are those reasons? Well, primarily, it forces the
> programmer to maintain well readable code. Then, some might argue, it is
> quicker to type.
> 
> Two reasons, but of what importance are they? And are they actually
> reasons?
> 
> You may guessed it from the questions themselves that I'm about to
> question that.
> 
> I don't intend to connote brazen implications, so let me spell out what
> I just implied: I think anyone who thinks that exclusive WSB is a good
> alternative or even preferable to DB is actually deluding themselves for
> some personal version of one of those isms mentioned above.
> 
> Let's examine these alleged advantages objectively one for one. But
> before that, just to calm troubled waters a little, allow me bring
> forward the conclusion:
> 
> Absolutely no intentions to remowe WSB from Python. Although one might
> have gotten that impression from the early paragraphs, no intentions to
> break downwards compatibility, either.
> 
> What Python needs is an alternative to WSB and can stay Python by still
> offering WSB to all those who happen to like it.
> 
> Readable code, is it really an advantage?
> 
> Two linebreaks, just for the suspense, then:
> 
> Of course it is.
> 
> Forcing the programmer to write readable code, is that an advantage? No
> suspense, the answer is Of course not.
> 
> Python may have started off as the casual scripting language for casual
> people. People, who may not even have known programming. And perhaps it
> has made sense to force -- or shall we say motivate, since you can still
> produce perfectly obfuscated code with Python -- them to write readably.
> 
> But Python has matured and so has its clientele. Python does not become
> a better language, neither for beginners nor for experienced programmers
> who also frequently use Python these days, by patronizing them and
> restricting them in their freedom.
> 
> Readable code? Yes. Forcing people to write readable code by artificial
> means? No.
> 
> Practice is evidence for the mischief of this policy: Does the FOSS
> community suffer from a notorious lack of proper indention or
> readability of code? Of course we don't.
> 
> I'm not a native speaker, but dict.cc tells me that what we call "mit
> Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called
> breaking a fly on the wheel in English.
> 
> I may lack the analogy for the fly on the wheel, which, if I'm not
> mistaken, used to be a device for torture in the Middle Ages, but I can
> tell you that the cannon ball which might have struck the sparrows,
> coincidently caused havoc in the hinterlands.
> 
> For the wide-spread and professional language Python is today, the idea
> of forcing people to indent is misguided. These days, it may address a
> neglible minority of absolute beginners who barely started programming
> and would not listen to the simple advice of indenting properly, but on
> the other hand it hurts/annoys/deters a great community of typical
> programmers for whom DB has long become a de facto standard.
> 
> For them, it's not a mere inconsistency without, for them, any apparent
> reason. It's more than the inconvenience not being able to follow ones
> long time practices, using the scripts one wrote for delimiters, the
> shortcuts that are usually offered by editor, etc.
> 
> It also brings about a whole class of new problems which may be
> anticipated and prevent, yet bear a great potential for new, even
> hard-to-find bugs (just in case anyone would respond that we had
> eventually successfully redeemed the mismatched parenthesis problem - at
> what cost?!).
> 
> Not just difficult to find, near to impossible would be the right word
> for anyone who has to review someone else's patch.
> 
> It is widely known among the programmer's community that spaces and tabs
> are remarkably similar to eachother. So similar even, that people fight
> wars about which to use in a non-py context. It might strike one as an
> equally remarkably nonsensical idea to give them programmatic meaning -
> two DIFFERENT meanings, to make things even worse.
> 
> While it becomes a practical impossibility to spot these kind of bugs
> while reviewing code -- optionally mangled through a medium which
> expands tabs to whitespace, not so much of a rarity -- it is still a
> time-consuming and tedious job to find them in a local situation.
> 
> More or less easily rectified, but once you spent a while trying to
> figure something like that out, you inevitably have the urge to ask: Why?
> 
> Last of all, some might argue that it's convenient to not to have type
> delimiters. Well, be my guest. I also appreciate single lined
> conditional or loops once in a while. I understand how not having to
> type delimiters if you don't want them lifts a burden. Hence I would not
> want rid Python of them. WSB may come in handy. But equally, it may not.
> 
> Proposing the actual changes that would have to be made to accomodate
> both, WSB and DB is beyond the scope of this script. It is the
> CONCLUSION that the current situation is undesirable and Python,
> although not apparent at the first glance, suffers from exclusive WSB,
> which is the goal of this thread.
> 
> Discussing has its etymological roots in Discourse, which connotes a
> loosely guided conversation about a topic. Therefore, I conclude with a 
> 
> DEBATE!!!111
> 
> kind regards,
> ? MD
You do know braces are already in __future__ right?

Also, why did you feel the need to post eleven thousand (1100) words on a topic settled more than a decade ago (Wed Feb 28 17:47:12 2001 +0000).

As far as I can see, you also don't provide any argument towards making the language more complex beyond you not liking its current state. Your whole email is word-salad circling around that, and your finding insufficient tooling a reason to alter the language to fit the tool instead.

PS: Haskell's core syntax is defined through braces and semicolons, and "layout" (also called "off-side rule") is added to allow getting rid of these when writing code.

I do not remember *ever* seeing Haskell code written by a human which used braces. Not in repositories, not in blogs, not in forums, not in mailing lists, not anywhere.

If the issues of indentation-based blocks were as dire as you seem to believe, this would not be possible, as developers would flock towards "safer" delimiter-block syntax, but facts belie your assertions, and explicit braces are instead relegated to generated code.

Which could be a defense of adding braces to Python, but really, if you're generating Python code I think you should generate bytecode directly to ensure nobody will go editing generated code. Hence the issue of having to generate indentation is not an issue.

From ben+python at benfinney.id.au  Fri Dec  9 22:07:39 2011
From: ben+python at benfinney.id.au (Ben Finney)
Date: Sat, 10 Dec 2011 08:07:39 +1100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
Message-ID: <8762hp3144.fsf@benfinney.id.au>

Cedric Sodhi <manday at gmx.net> writes:

> IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN
> DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO
> DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
> SIMILAR, JUST DON'T.

If you're going to post a long screed on a settled subject, and try to
lay a heap of special restrictions in an open discussion forum on how
you want people to respond: just don't.

-- 
 \     ?Don't be afraid of missing opportunities. Behind every failure |
  `\         is an opportunity somebody wishes they had missed.? ?Jane |
_o__)                                          Wagner, via Lily Tomlin |
Ben Finney


From mwm at mired.org  Fri Dec  9 22:11:29 2011
From: mwm at mired.org (Mike Meyer)
Date: Fri, 9 Dec 2011 13:11:29 -0800
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
Message-ID: <20111209131129.1b804e52@mikmeyer-vm-fedora>

On Fri, 9 Dec 2011 21:26:29 +0100
Cedric Sodhi <manday at gmx.net> wrote:
> Readable code, is it really an advantage?
> Of course it is.

Ok, you got that right.

> Forcing the programmer to write readable code, is that an advantage?
> No suspense, the answer is Of course not.

This is *not* an "Of course". Readable code is *important*. Giving
programmers more power in exchange for less readable code is a bad
trade.  For an extended analsysis, see:
http://blog.mired.org/2011/10/more-power-is-not-always-good-thing.html

One of Python's best points is that the community resists the urge to
add things just to add things. The community generally applies three
tests to any feature before accepting it:

1) It should have a good use case.
2) It should enable more readable code for that use case.
3) It shouldn't make writing unreadable code easy.

DB fails all three of these tests. It doesn't have a good use
case. The code you create using it is not more readable than the
alternative. And it definitely makes writing unreadable code easy.

And of course, it violates TOOWTDI.

    <mike

From manday at gmx.net  Fri Dec  9 22:26:50 2011
From: manday at gmx.net (Cedric Sodhi)
Date: Fri, 9 Dec 2011 22:26:50 +0100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209131129.1b804e52@mikmeyer-vm-fedora>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<20111209131129.1b804e52@mikmeyer-vm-fedora>
Message-ID: <20111209212650.GA4346@slate.Speedport_W_723V_Typ_A>

On Fri, Dec 09, 2011 at 01:11:29PM -0800, Mike Meyer wrote:
> On Fri, 9 Dec 2011 21:26:29 +0100
> Cedric Sodhi <manday at gmx.net> wrote:
> > Readable code, is it really an advantage?
> > Of course it is.
> 
> Ok, you got that right.

Thank you. It doesn't go unnoticed that you learned your Feedback Rules.
> 
> > Forcing the programmer to write readable code, is that an advantage?
> > No suspense, the answer is Of course not.
> 
> This is *not* an "Of course". Readable code is *important*. Giving
> programmers more power in exchange for less readable code is a bad
> trade.  For an extended analsysis, see:
> http://blog.mired.org/2011/10/more-power-is-not-always-good-thing.html

And here is the catch. The typical ignoratio elenchi which is frequently
put forward by those who want to depict WSB as a neccessity, as a social
contract ? Locke for the Python community, by which they oblidge
themselves to write readable code.

The fallacy is trivial, though, and even further supported by evidence
presented by reality. Indeed, you pretty much serve the comeback on a
silver plate:

"Power in exchange for less readable code"

There is no such exchange.

Instead of further elaborating on why I say that, I leave it to you and
possible others readers to recognize the fallacy as a whole.

Rather, let me support the argument by the apparent evidence which I
already emphasized in the introductory script:

Not a single language in the FOSS community suffers from a lack of
proper indention. The propagated fear of unreadable code is unjustified.
That article you linked also completely ignores that.

kind regards,
Cedric

From solipsis at pitrou.net  Fri Dec  9 22:43:32 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 9 Dec 2011 22:43:32 +0100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
Message-ID: <20111209224332.2ad19a28@pitrou.net>


Dear Cedric,

I'm guessing you drank too much (perhaps you are training for New Year's
Eve), ate some bad sausages or are simply very self-complacent.
python-dev is not the place where to post long unstructured ramblings
with no practical value. Consider writing on your personal blog
instead.

Thank you

Antoine.



On Fri, 9 Dec 2011 21:26:29 +0100
Cedric Sodhi <manday at gmx.net> wrote:
> IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN
> DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO
> DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
> SIMILAR, JUST DON'T.
> 
> Otherwise, read on.
> 
> I know very well that this topic has been discussed before. On forums.
> Mailing lists. IRC. Blogs. From person to person, even.
> 
> And I know equally well, from all those years experiencing
> argument-turned-debates on the internet, how a (minor|major) fraction of
> participants make up for their inability to lead a proper debate by
> speaking the loudest of all, so that eventually quantity triumphs over
> quality and logic.
> 
> That ahead; I hope you can try not to fall in that category. Let instead
> reason prevail over sentimentalism, mislead purism, elitism, and all
> other sorts of isms which hinder advancement in the greater context.
> 
> Python has surprised once already: The changes from 2 to 3 were not
> downwards compatible because the core developers realized there is more
> to a sustainable language than constantly patching it up until it comes
> apart like the roman empire.
> 
> Let's keep that spirit for a second and let us discuss braces, again,
> with the clear goal of improving the language.
> 
> End of disclaimer?
> 
> End of disclaimer!
> 
> Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has
> reasons. What are those reasons? Well, primarily, it forces the
> programmer to maintain well readable code. Then, some might argue, it is
> quicker to type.
> 
> Two reasons, but of what importance are they? And are they actually
> reasons?
> 
> You may guessed it from the questions themselves that I'm about to
> question that.
> 
> I don't intend to connote brazen implications, so let me spell out what
> I just implied: I think anyone who thinks that exclusive WSB is a good
> alternative or even preferable to DB is actually deluding themselves for
> some personal version of one of those isms mentioned above.
> 
> Let's examine these alleged advantages objectively one for one. But
> before that, just to calm troubled waters a little, allow me bring
> forward the conclusion:
> 
> Absolutely no intentions to remowe WSB from Python. Although one might
> have gotten that impression from the early paragraphs, no intentions to
> break downwards compatibility, either.
> 
> What Python needs is an alternative to WSB and can stay Python by still
> offering WSB to all those who happen to like it.
> 
> Readable code, is it really an advantage?
> 
> Two linebreaks, just for the suspense, then:
> 
> Of course it is.
> 
> Forcing the programmer to write readable code, is that an advantage? No
> suspense, the answer is Of course not.
> 
> Python may have started off as the casual scripting language for casual
> people. People, who may not even have known programming. And perhaps it
> has made sense to force -- or shall we say motivate, since you can still
> produce perfectly obfuscated code with Python -- them to write readably.
> 
> But Python has matured and so has its clientele. Python does not become
> a better language, neither for beginners nor for experienced programmers
> who also frequently use Python these days, by patronizing them and
> restricting them in their freedom.
> 
> Readable code? Yes. Forcing people to write readable code by artificial
> means? No.
> 
> Practice is evidence for the mischief of this policy: Does the FOSS
> community suffer from a notorious lack of proper indention or
> readability of code? Of course we don't.
> 
> I'm not a native speaker, but dict.cc tells me that what we call "mit
> Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called
> breaking a fly on the wheel in English.
> 
> I may lack the analogy for the fly on the wheel, which, if I'm not
> mistaken, used to be a device for torture in the Middle Ages, but I can
> tell you that the cannon ball which might have struck the sparrows,
> coincidently caused havoc in the hinterlands.
> 
> For the wide-spread and professional language Python is today, the idea
> of forcing people to indent is misguided. These days, it may address a
> neglible minority of absolute beginners who barely started programming
> and would not listen to the simple advice of indenting properly, but on
> the other hand it hurts/annoys/deters a great community of typical
> programmers for whom DB has long become a de facto standard.
> 
> For them, it's not a mere inconsistency without, for them, any apparent
> reason. It's more than the inconvenience not being able to follow ones
> long time practices, using the scripts one wrote for delimiters, the
> shortcuts that are usually offered by editor, etc.
> 
> It also brings about a whole class of new problems which may be
> anticipated and prevent, yet bear a great potential for new, even
> hard-to-find bugs (just in case anyone would respond that we had
> eventually successfully redeemed the mismatched parenthesis problem - at
> what cost?!).
> 
> Not just difficult to find, near to impossible would be the right word
> for anyone who has to review someone else's patch.
> 
> It is widely known among the programmer's community that spaces and tabs
> are remarkably similar to eachother. So similar even, that people fight
> wars about which to use in a non-py context. It might strike one as an
> equally remarkably nonsensical idea to give them programmatic meaning -
> two DIFFERENT meanings, to make things even worse.
> 
> While it becomes a practical impossibility to spot these kind of bugs
> while reviewing code -- optionally mangled through a medium which
> expands tabs to whitespace, not so much of a rarity -- it is still a
> time-consuming and tedious job to find them in a local situation.
> 
> More or less easily rectified, but once you spent a while trying to
> figure something like that out, you inevitably have the urge to ask: Why?
> 
> Last of all, some might argue that it's convenient to not to have type
> delimiters. Well, be my guest. I also appreciate single lined
> conditional or loops once in a while. I understand how not having to
> type delimiters if you don't want them lifts a burden. Hence I would not
> want rid Python of them. WSB may come in handy. But equally, it may not.
> 
> Proposing the actual changes that would have to be made to accomodate
> both, WSB and DB is beyond the scope of this script. It is the
> CONCLUSION that the current situation is undesirable and Python,
> although not apparent at the first glance, suffers from exclusive WSB,
> which is the goal of this thread.
> 
> Discussing has its etymological roots in Discourse, which connotes a
> loosely guided conversation about a topic. Therefore, I conclude with a 
> 
> DEBATE!!!111
> 
> kind regards,
> -- MD
> 
> (not proof-read)



From ethan at stoneleaf.us  Fri Dec  9 22:36:25 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 09 Dec 2011 13:36:25 -0800
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
Message-ID: <4EE27F59.6080308@stoneleaf.us>

This belongs on python-ideas.  Please take it there.

~Ethan~

From donald.stufft at gmail.com  Fri Dec  9 22:53:32 2011
From: donald.stufft at gmail.com (Donald Stufft)
Date: Fri, 9 Dec 2011 16:53:32 -0500
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209224332.2ad19a28@pitrou.net>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<20111209224332.2ad19a28@pitrou.net>
Message-ID: <3202CD7C22604106874163646AD9E8D9@gmail.com>

I don't always post to python-dev, but when I do I ask for braces.

On Friday, December 9, 2011 at 4:43 PM, Antoine Pitrou wrote:

>  
> Dear Cedric,
>  
> I'm guessing you drank too much (perhaps you are training for New Year's
> Eve), ate some bad sausages or are simply very self-complacent.
> python-dev is not the place where to post long unstructured ramblings
> with no practical value. Consider writing on your personal blog
> instead.
>  
> Thank you
>  
> Antoine.
>  
>  
>  
> On Fri, 9 Dec 2011 21:26:29 +0100
> Cedric Sodhi <manday at gmx.net (mailto:manday at gmx.net)> wrote:
> > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN
> > DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO
> > DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
> > SIMILAR, JUST DON'T.
> >  
> > Otherwise, read on.
> >  
> > I know very well that this topic has been discussed before. On forums.
> > Mailing lists. IRC. Blogs. From person to person, even.
> >  
> > And I know equally well, from all those years experiencing
> > argument-turned-debates on the internet, how a (minor|major) fraction of
> > participants make up for their inability to lead a proper debate by
> > speaking the loudest of all, so that eventually quantity triumphs over
> > quality and logic.
> >  
> > That ahead; I hope you can try not to fall in that category. Let instead
> > reason prevail over sentimentalism, mislead purism, elitism, and all
> > other sorts of isms which hinder advancement in the greater context.
> >  
> > Python has surprised once already: The changes from 2 to 3 were not
> > downwards compatible because the core developers realized there is more
> > to a sustainable language than constantly patching it up until it comes
> > apart like the roman empire.
> >  
> > Let's keep that spirit for a second and let us discuss braces, again,
> > with the clear goal of improving the language.
> >  
> > End of disclaimer?
> >  
> > End of disclaimer!
> >  
> > Whitespace-Blocking (WSB) as opposed to Delimiter-Blocking (DB) has
> > reasons. What are those reasons? Well, primarily, it forces the
> > programmer to maintain well readable code. Then, some might argue, it is
> > quicker to type.
> >  
> > Two reasons, but of what importance are they? And are they actually
> > reasons?
> >  
> > You may guessed it from the questions themselves that I'm about to
> > question that.
> >  
> > I don't intend to connote brazen implications, so let me spell out what
> > I just implied: I think anyone who thinks that exclusive WSB is a good
> > alternative or even preferable to DB is actually deluding themselves for
> > some personal version of one of those isms mentioned above.
> >  
> > Let's examine these alleged advantages objectively one for one. But
> > before that, just to calm troubled waters a little, allow me bring
> > forward the conclusion:
> >  
> > Absolutely no intentions to remowe WSB from Python. Although one might
> > have gotten that impression from the early paragraphs, no intentions to
> > break downwards compatibility, either.
> >  
> > What Python needs is an alternative to WSB and can stay Python by still
> > offering WSB to all those who happen to like it.
> >  
> > Readable code, is it really an advantage?
> >  
> > Two linebreaks, just for the suspense, then:
> >  
> > Of course it is.
> >  
> > Forcing the programmer to write readable code, is that an advantage? No
> > suspense, the answer is Of course not.
> >  
> > Python may have started off as the casual scripting language for casual
> > people. People, who may not even have known programming. And perhaps it
> > has made sense to force -- or shall we say motivate, since you can still
> > produce perfectly obfuscated code with Python -- them to write readably.
> >  
> > But Python has matured and so has its clientele. Python does not become
> > a better language, neither for beginners nor for experienced programmers
> > who also frequently use Python these days, by patronizing them and
> > restricting them in their freedom.
> >  
> > Readable code? Yes. Forcing people to write readable code by artificial
> > means? No.
> >  
> > Practice is evidence for the mischief of this policy: Does the FOSS
> > community suffer from a notorious lack of proper indention or
> > readability of code? Of course we don't.
> >  
> > I'm not a native speaker, but dict.cc (http://dict.cc) tells me that what we call "mit
> > Kanonen auf Spatzen schie?en" (firing cannons at sparrows) is called
> > breaking a fly on the wheel in English.
> >  
> > I may lack the analogy for the fly on the wheel, which, if I'm not
> > mistaken, used to be a device for torture in the Middle Ages, but I can
> > tell you that the cannon ball which might have struck the sparrows,
> > coincidently caused havoc in the hinterlands.
> >  
> > For the wide-spread and professional language Python is today, the idea
> > of forcing people to indent is misguided. These days, it may address a
> > neglible minority of absolute beginners who barely started programming
> > and would not listen to the simple advice of indenting properly, but on
> > the other hand it hurts/annoys/deters a great community of typical
> > programmers for whom DB has long become a de facto standard.
> >  
> > For them, it's not a mere inconsistency without, for them, any apparent
> > reason. It's more than the inconvenience not being able to follow ones
> > long time practices, using the scripts one wrote for delimiters, the
> > shortcuts that are usually offered by editor, etc.
> >  
> > It also brings about a whole class of new problems which may be
> > anticipated and prevent, yet bear a great potential for new, even
> > hard-to-find bugs (just in case anyone would respond that we had
> > eventually successfully redeemed the mismatched parenthesis problem - at
> > what cost?!).
> >  
> > Not just difficult to find, near to impossible would be the right word
> > for anyone who has to review someone else's patch.
> >  
> > It is widely known among the programmer's community that spaces and tabs
> > are remarkably similar to eachother. So similar even, that people fight
> > wars about which to use in a non-py context. It might strike one as an
> > equally remarkably nonsensical idea to give them programmatic meaning -
> > two DIFFERENT meanings, to make things even worse.
> >  
> > While it becomes a practical impossibility to spot these kind of bugs
> > while reviewing code -- optionally mangled through a medium which
> > expands tabs to whitespace, not so much of a rarity -- it is still a
> > time-consuming and tedious job to find them in a local situation.
> >  
> > More or less easily rectified, but once you spent a while trying to
> > figure something like that out, you inevitably have the urge to ask: Why?
> >  
> > Last of all, some might argue that it's convenient to not to have type
> > delimiters. Well, be my guest. I also appreciate single lined
> > conditional or loops once in a while. I understand how not having to
> > type delimiters if you don't want them lifts a burden. Hence I would not
> > want rid Python of them. WSB may come in handy. But equally, it may not.
> >  
> > Proposing the actual changes that would have to be made to accomodate
> > both, WSB and DB is beyond the scope of this script. It is the
> > CONCLUSION that the current situation is undesirable and Python,
> > although not apparent at the first glance, suffers from exclusive WSB,
> > which is the goal of this thread.
> >  
> > Discussing has its etymological roots in Discourse, which connotes a
> > loosely guided conversation about a topic. Therefore, I conclude with a  
> >  
> > DEBATE!!!111
> >  
> > kind regards,
> > -- MD
> >  
> > (not proof-read)
>  
>  
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org (mailto:Python-Dev at python.org)
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/donald.stufft%40gmail.com
>  
>  


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111209/6c5be866/attachment-0001.html>

From marty at martyalchin.com  Fri Dec  9 22:53:45 2011
From: marty at martyalchin.com (Marty Alchin)
Date: Fri, 9 Dec 2011 13:53:45 -0800
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
Message-ID: <CAMTa0JcZrQDjT=QohGrmybPrsJRT69oP6EpcR8RyG5-gz=-D4w@mail.gmail.com>

You've really only given one reason why braces are a good idea:

"I also appreciate single lined conditional or loops once in a while."

Not only is this argument even weaker than the two you yourself gave in
defense of whitespace, these two features are already supported in Python.
If you're not aware of them, perhaps you should spend some quality time
with the documentation rather than suggesting unnecessary changes.

-Marty
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111209/d625978c/attachment.html>

From guido at python.org  Fri Dec  9 23:21:42 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 Dec 2011 14:21:42 -0800
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
Message-ID: <CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>

On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi <manday at gmx.net> wrote:

> IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN
> DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO
> DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
> SIMILAR, JUST DON'T.
>

Every single response in this thread so far has ignored this request. The
correct response honoring this should have been deafening silence.

For me, if I had to design a new language today, I would probably use
braces, not because they're better than whitespace, but because pretty much
every other lanugage uses them, and there are more interesting concepts to
distinguish a new language. That said, I don't regret that Python uses
indentation, and the rest I have to say about the topic would violate the
above request.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111209/bd51916f/attachment.html>

From manday at gmx.net  Fri Dec  9 23:29:30 2011
From: manday at gmx.net (Cedric Sodhi)
Date: Fri, 9 Dec 2011 23:29:30 +0100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>
Message-ID: <20111209222930.GD4346@slate.Speedport_W_723V_Typ_A>

On Fri, Dec 09, 2011 at 02:21:42PM -0800, Guido van Rossum wrote:
>    On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi <[1]manday at gmx.net> wrote:
> 
>      IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD BEEN
>      DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT "WHO
>      DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
>      SIMILAR, JUST DON'T.
> 
>    Every single response in this thread so far has ignored this request. The
>    correct response honoring this should have been deafening silence.
> 
>    For me, if I had to design a new language today, I would probably use
>    braces, not because they're better than whitespace, but because pretty
>    much every other lanugage uses them, and there are more interesting
>    concepts to distinguish a new language. That said, I don't regret that
>    Python uses indentation, and the rest I have to say about the topic would
>    violate the above request.
> 

I think this deserves a reply. Thank you for contributing your opinion
and respecting my request and therefore honoring the rules of a
civilized debate.

-- Cedric

From anacrolix at gmail.com  Fri Dec  9 23:40:54 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Sat, 10 Dec 2011 09:40:54 +1100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209222930.GD4346@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>
	<20111209222930.GD4346@slate.Speedport_W_723V_Typ_A>
Message-ID: <CAB4yi1OD46=r-L-x+ifhzeyWazRzXNByoSub7YhHZC7spPu9Rw@mail.gmail.com>

If braces were introduced I would switch to Haskell, I can't stand the
noise. If you want to see a language that allows both whitespace, semi
colons and braces take a look at it. Nails it.
On Dec 10, 2011 9:31 AM, "Cedric Sodhi" <manday at gmx.net> wrote:

> On Fri, Dec 09, 2011 at 02:21:42PM -0800, Guido van Rossum wrote:
> >    On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi <[1]manday at gmx.net>
> wrote:
> >
> >      IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD
> BEEN
> >      DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN", THAT
> "WHO
> >      DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
> >      SIMILAR, JUST DON'T.
> >
> >    Every single response in this thread so far has ignored this request.
> The
> >    correct response honoring this should have been deafening silence.
> >
> >    For me, if I had to design a new language today, I would probably use
> >    braces, not because they're better than whitespace, but because pretty
> >    much every other lanugage uses them, and there are more interesting
> >    concepts to distinguish a new language. That said, I don't regret that
> >    Python uses indentation, and the rest I have to say about the topic
> would
> >    violate the above request.
> >
>
> I think this deserves a reply. Thank you for contributing your opinion
> and respecting my request and therefore honoring the rules of a
> civilized debate.
>
> -- Cedric
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111210/d1ec3626/attachment.html>

From anacrolix at gmail.com  Fri Dec  9 23:43:59 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Sat, 10 Dec 2011 09:43:59 +1100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <69816.1323459197@parc.com>
References: <jbsfar$en7$1@dough.gmane.org>
	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com>
	<C0C8FDCA-61F7-469C-9C62-6861D604C637@masklinn.net>
	<69816.1323459197@parc.com>
Message-ID: <CAB4yi1PySf1FY3kLwgrq3U1j_GK7jt+DH+pKDNypQQnjPZTUhQ@mail.gmail.com>

I second this. The doco is very bad.
On Dec 10, 2011 6:34 AM, "Bill Janssen" <janssen at parc.com> wrote:

> Xavier Morel <python-dev at masklinn.net> wrote:
>
> > On 2011-12-09, at 19:15 , Bill Janssen wrote:
> > > I use ElementTree for parsing valid XML, but minidom for producing it.
> > Could you expand on your reasons to use minidom for producing XML?
>
> Inertia, I guess.  I tried that first, and it seems to work.
>
> I tend to use html5lib and/or BeautifulSoup instead of ElementTree, and
> that's mainly because I find the documentation for ElementTree is
> confusing and partial and inconsistent.  Having various undated but
> obsolete tutorials and documentation still up on effbot.org doesn't
> help.
>
>
> Bill
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111210/7ae10572/attachment.html>

From manday at gmx.net  Fri Dec  9 23:58:06 2011
From: manday at gmx.net (Cedric Sodhi)
Date: Fri, 9 Dec 2011 23:58:06 +0100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <CAB4yi1OD46=r-L-x+ifhzeyWazRzXNByoSub7YhHZC7spPu9Rw@mail.gmail.com>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>
	<20111209222930.GD4346@slate.Speedport_W_723V_Typ_A>
	<CAB4yi1OD46=r-L-x+ifhzeyWazRzXNByoSub7YhHZC7spPu9Rw@mail.gmail.com>
Message-ID: <20111209225806.GF4346@slate.Speedport_W_723V_Typ_A>

I reply to your contribution mainly because I see another, valid
argument hidden in what you formulated as an opinion:

Readability would be reduced by such "noise". To anticipate other people
agreeing with that, let me say, that it would be exactly one more
character, and the same amount of key presses. All that, assuming you
use editor automatisms, which particularly the advocates of WSB tend to
bring forth in defense of WSB and aforementioned problems associated
with it.

Only one more character and not more key presses? Yes, instead of
opening a block with a colon, you open it with an opening bracket. And
you close it with a closing one.

Referring to "noise", I take it you are preferring naturally expressed
languages (what Roff's PIC, for example, exemplifies to banality).

How is a COLON, which, in natural language, PUNCTUATES a context, any
more suited than braces, which naturally ENCLOSE a structure?

Obviously, it by far is not, even from the standpoint of not
intersparsing readable code with unnatural characters.

On Sat, Dec 10, 2011 at 09:40:54AM +1100, Matt Joiner wrote:
>    If braces were introduced I would switch to Haskell, I can't stand the
>    noise. If you want to see a language that allows both whitespace, semi
>    colons and braces take a look at it. Nails it.
> 
>    On Dec 10, 2011 9:31 AM, "Cedric Sodhi" <[1]manday at gmx.net> wrote:
> 
>      On Fri, Dec 09, 2011 at 02:21:42PM -0800, Guido van Rossum wrote:
>      > ? ?On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi
>      <[1][2]manday at gmx.net> wrote:
>      >
>      > ? ? ?IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS
>      HAD BEEN
>      > ? ? ?DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN",
>      THAT "WHO
>      > ? ? ?DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR SOMETHING
>      > ? ? ?SIMILAR, JUST DON'T.
>      >
>      > ? ?Every single response in this thread so far has ignored this
>      request. The
>      > ? ?correct response honoring this should have been deafening silence.
>      >
>      > ? ?For me, if I had to design a new language today, I would probably
>      use
>      > ? ?braces, not because they're better than whitespace, but because
>      pretty
>      > ? ?much every other lanugage uses them, and there are more interesting
>      > ? ?concepts to distinguish a new language. That said, I don't regret
>      that
>      > ? ?Python uses indentation, and the rest I have to say about the topic
>      would
>      > ? ?violate the above request.
>      >
> 
>      I think this deserves a reply. Thank you for contributing your opinion
>      and respecting my request and therefore honoring the rules of a
>      civilized debate.
> 
>      -- Cedric
>      _______________________________________________
>      Python-Dev mailing list
>      [3]Python-Dev at python.org
>      [4]http://mail.python.org/mailman/listinfo/python-dev
>      Unsubscribe:
>      [5]http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
> 
> References
> 
>    Visible links
>    1. mailto:manday at gmx.net
>    2. mailto:manday at gmx.net
>    3. mailto:Python-Dev at python.org
>    4. http://mail.python.org/mailman/listinfo/python-dev
>    5. http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com

From guido at python.org  Sat Dec 10 00:03:08 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 Dec 2011 15:03:08 -0800
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209225806.GF4346@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>
	<20111209222930.GD4346@slate.Speedport_W_723V_Typ_A>
	<CAB4yi1OD46=r-L-x+ifhzeyWazRzXNByoSub7YhHZC7spPu9Rw@mail.gmail.com>
	<20111209225806.GF4346@slate.Speedport_W_723V_Typ_A>
Message-ID: <CAP7+vJKOgYz4CDA3LptrJ05moaOVMkVSpeCoWzfyCuzkyFZsWA@mail.gmail.com>

Point of order (repeated), please move this thread to python-ideas.

--
--Guido van Rossum (python.org/~guido)

From vinay_sajip at yahoo.co.uk  Sat Dec 10 00:12:09 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Fri, 9 Dec 2011 23:12:09 +0000 (UTC)
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<1323408839.2710.143.camel@thinko>
	<CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
	<7DEE32A7-1426-4E93-8708-BDF3B0CAF8EC@twistedmatrix.com>
Message-ID: <loom.20111210T000622-853@post.gmane.org>

Glyph <glyph <at> twistedmatrix.com> writes:


> The biggest issue for the single-codebase approach is 'except ... as ...'.
>?Peppering one's codebase with calls to sys.exc_info() can be a real
> performance problem, especially on PyPy. ?Not to mention how ugly it is.
> For some reason I thought that this syntax was only supported by 2.7 and up;
> I see now that it's 2.6 and up.

Granted that it's ugly, but where is the evidence that it can be a real
performance problem? I mean in practice on real projects, rather than in theory
or on code contrived to show up a problem.

Please note, I'm not saying it isn't a real performance problem, I'm just asking
where the evidence is, whether running on PyPy or elsewhere.

Regards,


Vinay Sajip


From steve at pearwood.info  Sat Dec 10 01:01:04 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 10 Dec 2011 11:01:04 +1100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <CAP7+vJKOgYz4CDA3LptrJ05moaOVMkVSpeCoWzfyCuzkyFZsWA@mail.gmail.com>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>	<CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>	<20111209222930.GD4346@slate.Speedport_W_723V_Typ_A>	<CAB4yi1OD46=r-L-x+ifhzeyWazRzXNByoSub7YhHZC7spPu9Rw@mail.gmail.com>	<20111209225806.GF4346@slate.Speedport_W_723V_Typ_A>
	<CAP7+vJKOgYz4CDA3LptrJ05moaOVMkVSpeCoWzfyCuzkyFZsWA@mail.gmail.com>
Message-ID: <4EE2A140.2010509@pearwood.info>

Guido van Rossum wrote:
> Point of order (repeated), please move this thread to python-ideas.

Isn't that cruel to the people reading python-ideas?


-- 
Steven


From anacrolix at gmail.com  Sat Dec 10 02:35:20 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Sat, 10 Dec 2011 12:35:20 +1100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209225806.GF4346@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>
	<20111209222930.GD4346@slate.Speedport_W_723V_Typ_A>
	<CAB4yi1OD46=r-L-x+ifhzeyWazRzXNByoSub7YhHZC7spPu9Rw@mail.gmail.com>
	<20111209225806.GF4346@slate.Speedport_W_723V_Typ_A>
Message-ID: <CAB4yi1N9CjsHxFwtc9arwM5uvxrJ821cKX9H=0ZVhysJVjWTMA@mail.gmail.com>

Ditch the colon too. Also you're a troll.
On Dec 10, 2011 9:58 AM, "Cedric Sodhi" <manday at gmx.net> wrote:

> I reply to your contribution mainly because I see another, valid
> argument hidden in what you formulated as an opinion:
>
> Readability would be reduced by such "noise". To anticipate other people
> agreeing with that, let me say, that it would be exactly one more
> character, and the same amount of key presses. All that, assuming you
> use editor automatisms, which particularly the advocates of WSB tend to
> bring forth in defense of WSB and aforementioned problems associated
> with it.
>
> Only one more character and not more key presses? Yes, instead of
> opening a block with a colon, you open it with an opening bracket. And
> you close it with a closing one.
>
> Referring to "noise", I take it you are preferring naturally expressed
> languages (what Roff's PIC, for example, exemplifies to banality).
>
> How is a COLON, which, in natural language, PUNCTUATES a context, any
> more suited than braces, which naturally ENCLOSE a structure?
>
> Obviously, it by far is not, even from the standpoint of not
> intersparsing readable code with unnatural characters.
>
> On Sat, Dec 10, 2011 at 09:40:54AM +1100, Matt Joiner wrote:
> >    If braces were introduced I would switch to Haskell, I can't stand the
> >    noise. If you want to see a language that allows both whitespace, semi
> >    colons and braces take a look at it. Nails it.
> >
> >    On Dec 10, 2011 9:31 AM, "Cedric Sodhi" <[1]manday at gmx.net> wrote:
> >
> >      On Fri, Dec 09, 2011 at 02:21:42PM -0800, Guido van Rossum wrote:
> >      >    On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi
> >      <[1][2]manday at gmx.net> wrote:
> >      >
> >      >      IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT
> THIS
> >      HAD BEEN
> >      >      DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN",
> >      THAT "WHO
> >      >      DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR
> SOMETHING
> >      >      SIMILAR, JUST DON'T.
> >      >
> >      >    Every single response in this thread so far has ignored this
> >      request. The
> >      >    correct response honoring this should have been deafening
> silence.
> >      >
> >      >    For me, if I had to design a new language today, I would
> probably
> >      use
> >      >    braces, not because they're better than whitespace, but because
> >      pretty
> >      >    much every other lanugage uses them, and there are more
> interesting
> >      >    concepts to distinguish a new language. That said, I don't
> regret
> >      that
> >      >    Python uses indentation, and the rest I have to say about the
> topic
> >      would
> >      >    violate the above request.
> >      >
> >
> >      I think this deserves a reply. Thank you for contributing your
> opinion
> >      and respecting my request and therefore honoring the rules of a
> >      civilized debate.
> >
> >      -- Cedric
> >      _______________________________________________
> >      Python-Dev mailing list
> >      [3]Python-Dev at python.org
> >      [4]http://mail.python.org/mailman/listinfo/python-dev
> >      Unsubscribe:
> >      [5]
> http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
> >
> > References
> >
> >    Visible links
> >    1. mailto:manday at gmx.net
> >    2. mailto:manday at gmx.net
> >    3. mailto:Python-Dev at python.org
> >    4. http://mail.python.org/mailman/listinfo/python-dev
> >    5.
> http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111210/a1feb57f/attachment-0001.html>

From eliben at gmail.com  Sat Dec 10 04:28:06 2011
From: eliben at gmail.com (Eli Bendersky)
Date: Sat, 10 Dec 2011 05:28:06 +0200
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <CAB4yi1PySf1FY3kLwgrq3U1j_GK7jt+DH+pKDNypQQnjPZTUhQ@mail.gmail.com>
References: <jbsfar$en7$1@dough.gmane.org>
	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com>
	<C0C8FDCA-61F7-469C-9C62-6861D604C637@masklinn.net>
	<69816.1323459197@parc.com>
	<CAB4yi1PySf1FY3kLwgrq3U1j_GK7jt+DH+pKDNypQQnjPZTUhQ@mail.gmail.com>
Message-ID: <CAF-Rda9XeBHtvmGQ18UoErBO4sP9X_fkQ+A13Bd=39+LzCAvOA@mail.gmail.com>

On Sat, Dec 10, 2011 at 00:43, Matt Joiner <anacrolix at gmail.com> wrote:

> I second this. The doco is very bad.
>

It would be constructive to open issues for specific problems in the
documentation. I'm sure this won't be hard to fix. Documentation should not
be the roadblock for using a library.
Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111210/07c8c826/attachment.html>

From tjreedy at udel.edu  Sat Dec 10 05:01:00 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 09 Dec 2011 23:01:00 -0500
Subject: [Python-Dev] re.findall() should return named tuple
In-Reply-To: <CAN8d9gkTRgupzXE-eBEmtRNsC2diY0MA_7KvAuYwBQ15=4h2iA@mail.gmail.com>
References: <CAN8d9gkTRgupzXE-eBEmtRNsC2diY0MA_7KvAuYwBQ15=4h2iA@mail.gmail.com>
Message-ID: <jbulia$1re$1@dough.gmane.org>

On 12/8/2011 8:31 AM, Philipp A. wrote:
> hi devs,
>
> just an idea that popped up in my mind: re.findall() returns a list of
> tuples, where every entry of each tuple represents a match group.
> since match groups can be named, we are able to use named tuples instead
> of plain tuples here, in the same fashion as namedtuple?s rename works:
> misssing group names get renamed to _1 and so on. i suggest to add the
> rename keyword option, to findall, defaulting to True, since mixed
> positional and named tuples are more common than in usual use cases of
> namedtuple.
>
> do you think it?s a good idea?

I have not used named tuples or re.findall (much), so I have no opinion).

> finally: should i join the mailing list to see answers? should i file a
> PEP? i have no idea how the inner workings of python development are,
> but i wanted to share this idea with you :)

Ideas like this should either go the the python-ideas list or to the 
tracker at bugs.python.org as a feature request. If you post to the 
list, you should either subscribe at mail.python.org or follow it as a 
newsgroup at news.gmane.org (which is what I do). Posting a tracker 
issue requires registration of some sort.

-- 
Terry Jan Reedy



From tjreedy at udel.edu  Sat Dec 10 05:11:57 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 09 Dec 2011 23:11:57 -0500
Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal
 support in 3.3?)
In-Reply-To: <CADiSq7eh--b4Moe1XHnXnH1tHJNa51HHp-18hpcsWcGg_dTYyQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
	<jbrllu$99u$1@dough.gmane.org>
	<loom.20111209T022519-121@post.gmane.org>
	<jbsme9$t5d$1@dough.gmane.org>
	<CADiSq7eh--b4Moe1XHnXnH1tHJNa51HHp-18hpcsWcGg_dTYyQ@mail.gmail.com>
Message-ID: <jbum6r$4vk$1@dough.gmane.org>

On 12/9/2011 5:17 AM, Nick Coghlan wrote:

> As Chris pointed out though, the real problem with the "repeatedly run
> 2to3" workflow is that it can make interpreting tracebacks from the
> field *really* hard.

This just gave me the idea of tagging tracebacks with the Python version 
number. Something like

Traceback (Py3.2.2, most recent call last):

and perhaps with the platform also

Traceback (most recent call last) [Py3.2.2 on win23]:

Since computation has stopped, the few extra milliseconds is trivial. 
This would certainly help on Python list and the tracker when people do 
post the traceback (which they do not always) without version and system 
(which they often do not, especially on Python list). It might suggest 
to people that this is important info to include. I wonder if this would 
also help with tracebacks sent to library/app developers.

-- 
Terry Jan Reedy


From ncoghlan at gmail.com  Sat Dec 10 06:55:45 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 10 Dec 2011 15:55:45 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
Message-ID: <CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>

On Sat, Dec 10, 2011 at 5:58 AM, PJ Eby <pje at telecommunity.com> wrote:
> In fact, I'm not sure why people are bringing it into this discussion at
> all: PEP 3333 was designed to work well with 2to3, which does the right
> thing for WSGI code: it converts 2.x "str" to 3.x "str", as it should. ?If
> you're writing 2.x WSGI code with 'u' literals, *your code is broken*.
>
> WSGI doesn't need 'u' literals and never has. ?It *does* need b'' literals
> for stuff that refers to request and response bodies, but everything else
> should be plain old string literals for the appropriate Python version.

The reason it came up is that the reason "from __future__ import
unicode_literals" doesn't obviously help with doing single codebase
style ports for a lot of WSGI related code is because such code
actually has *3* string types to deal with:

Actual text (u'', unicode -> str)
Native strings for WSGI ('', str -> str)
Binary data (b'', str -> bytes)

That works fine with 2to3, since 2to3 will strip out the leading 'u'
from the actual text literals, but presents a potential hassle for the
single codebase approach. Most other contexts only need the
binary->binary and text->text conversion, so the future import really
helps out.

However, I just realised that there actually *is* a relatively clear
way to spell this for all 2.6+ versions: the future import *doesn't*
change the meaning of the 'str' builtin (it's still the 8-bit string
type in 2.x), so the native way to spell the above distinction when
"from __future__ import unicode_literals" is in effect is as follows:

Actual text: ''
Native strings for WSGI: str('')
Binary data: b''

Calling a builtin is much lower overhead than calling a helper from a
compatibility module, and this also makes it clear that native strings
are the odd ones out.

So I'm back to being -1 on the idea of adding back u'' literals for
3.3. Instead, people should explicitly call str() on any literals that
they want to be actual str instances both in 3.x and in 2.x when the
unicode literals future import is in effect.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From stefan_ml at behnel.de  Sat Dec 10 08:38:35 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 10 Dec 2011 08:38:35 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <68178.1323454554@parc.com>
References: <jbsfar$en7$1@dough.gmane.org>	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com>
Message-ID: <jbv29r$de$1@dough.gmane.org>

Bill Janssen, 09.12.2011 19:15:
> I think another thing that might go into "refreshing the batteries" is a
> feature comparison of BeautifulSoup and HTML5lib against the stdlib
> competition, to see what needs to be added/revised.  Having to switch to
> an outside package for parsing possibly invalid HTML is a pain.

Such a feature request should be worth a separate thread.

Note, however, that html5lib is likely way too big to add it to the stdlib, 
and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3, 
which would be the target release series for better HTML support. So, 
whatever library or API you would want to use for HTML processing is 
currently only the second question as long as Py3 lacks a real-world HTML 
parser in the stdlib, as well as a robust character detection mechanism. I 
don't think that can be fixed all that easily.

Stefan


From timwintle at gmail.com  Sat Dec 10 09:28:33 2011
From: timwintle at gmail.com (Tim Wintle)
Date: Sat, 10 Dec 2011 08:28:33 +0000
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <C0C8FDCA-61F7-469C-9C62-6861D604C637@masklinn.net>
References: <jbsfar$en7$1@dough.gmane.org>
	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com>
	<C0C8FDCA-61F7-469C-9C62-6861D604C637@masklinn.net>
Message-ID: <1323505713.13580.19.camel@tim-laptop>

On Fri, 2011-12-09 at 19:39 +0100, Xavier Morel wrote:
> On 2011-12-09, at 19:15 , Bill Janssen wrote:
> > I use ElementTree for parsing valid XML, but minidom for producing it.
> Could you expand on your reasons to use minidom for producing XML?

To throw my 2c in here:

I personally normally use minidom for manipulating (x)html data (through
html5lib), and for writing XML.

I think it's primarily because DOM:

a) matches the way I think about XML documents.

b) Provides the same API as I use in other languages. (FWIW, I do a lot
of DOM manipulation in javascript)

c) "Feels" (to me) more similar to other formats I work with.


All three may be because I haven't spent enough time with ElementTree -
again I've found the documentation lacking.

Tim


From ben+python at benfinney.id.au  Sat Dec 10 13:15:07 2011
From: ben+python at benfinney.id.au (Ben Finney)
Date: Sat, 10 Dec 2011 23:15:07 +1100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>
Message-ID: <871usc39o4.fsf@benfinney.id.au>

Guido van Rossum <guido at python.org> writes:

> On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi <manday at gmx.net> wrote:
>
> > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD
> > BEEN DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN",
> > THAT "WHO DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR
> > SOMETHING SIMILAR, JUST DON'T.
>
> Every single response in this thread so far has ignored this request.

The request was completely unreasonable. Cedric does not get to
unilaterally set restrictions on who and how people respond to a screed
in a public forum.

> the rest I have to say about the topic would violate the above
> request.

You have my permission to violate the above request. That should have at
least as much authority as the request itself, so you are hereby
empowered to respond as you like.

-- 
 \             ?We can't depend for the long run on distinguishing one |
  `\         bitstream from another in order to figure out which rules |
_o__)               apply.? ?Eben Moglen, _Anarchism Triumphant_, 1999 |
Ben Finney


From guido at python.org  Sat Dec 10 17:06:57 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 10 Dec 2011 08:06:57 -0800
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <871usc39o4.fsf@benfinney.id.au>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>
	<871usc39o4.fsf@benfinney.id.au>
Message-ID: <CAP7+vJJtcQF+LcKSOydtQ+KBXFp=q+4G9=yjTbpTEhFbS2HN-w@mail.gmail.com>

On Sat, Dec 10, 2011 at 4:15 AM, Ben Finney <ben+python at benfinney.id.au>wrote:

> Guido van Rossum <guido at python.org> writes:
>
> > On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi <manday at gmx.net> wrote:
> >
> > > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD
> > > BEEN DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN",
> > > THAT "WHO DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR
> > > SOMETHING SIMILAR, JUST DON'T.
> >
> > Every single response in this thread so far has ignored this request.
>
> The request was completely unreasonable. Cedric does not get to
> unilaterally set restrictions on who and how people respond to a screed
> in a public forum.
>

Oh, of course. I was just playing along. But my real point was to berate
the community for responding at all to such an obvious trolling post.


>  > the rest I have to say about the topic would violate the above
> > request.
>
> You have my permission to violate the above request. That should have at
> least as much authority as the request itself, so you are hereby
> empowered to respond as you like.
>

It would be utterly redundant. He said it all in his all-caps message.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111210/0964a029/attachment.html>

From francismb at email.de  Sat Dec 10 12:14:13 2011
From: francismb at email.de (francis)
Date: Sat, 10 Dec 2011 12:14:13 +0100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
Message-ID: <4EE33F05.4000907@email.de>

Hi Cedric,

On 12/09/2011 09:26 PM, Cedric Sodhi wrote:
> It is widely known among the programmer's community that spaces and tabs
> are remarkably similar to eachother. So similar even, that people fight
> wars about which to use in a non-py context. It might strike one as an
> equally remarkably nonsensical idea to give them programmatic meaning -
> two DIFFERENT meanings, to make things even worse.
>
> While it becomes a practical impossibility to spot these kind of bugs
> while reviewing code -- optionally mangled through a medium which
> expands tabs to whitespace, not so much of a rarity -- it is still a
> time-consuming and tedious job to find them in a local situation.
>

I'm not so experienced with python as the majority of
people here, but I've read that the practice is: do not to
mix them (spaces and tabs).

If this is taking much of you time while reviewing I would
recommend you to let some script run on you code first to
spot that mixture. IMHO that is a rule that should go in the
code rules of your project and the build process should break
if this mixture if found. Don't let that code reach the sync
repository. As I said I'm maybe failing to see some case.

Formatting is like food, everyone has it's own taste. One has
to use spicery to change it (if possible). For me the view of
the code (the layout) by the programmer should be automatically
changed by the tool that reads the code. Here you could have
a python with braces if you want... (I thing that 'go' has some
autoformater or a standard way of formatting).

-- francis


From pje at telecommunity.com  Sat Dec 10 18:09:59 2011
From: pje at telecommunity.com (PJ Eby)
Date: Sat, 10 Dec 2011 12:09:59 -0500
Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u''
 literal support in 3.3?)
In-Reply-To: <jbum6r$4vk$1@dough.gmane.org>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
	<jbrllu$99u$1@dough.gmane.org>
	<loom.20111209T022519-121@post.gmane.org>
	<jbsme9$t5d$1@dough.gmane.org>
	<CADiSq7eh--b4Moe1XHnXnH1tHJNa51HHp-18hpcsWcGg_dTYyQ@mail.gmail.com>
	<jbum6r$4vk$1@dough.gmane.org>
Message-ID: <CALeMXf7vA=z0uvsXMVxNEfqgTBdcmApXC_q-=zuyiaXTsx5gUA@mail.gmail.com>

On Fri, Dec 9, 2011 at 11:11 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> This just gave me the idea of tagging tracebacks with the Python version
> number. Something like
>
> Traceback (Py3.2.2, most recent call last):
>
> and perhaps with the platform also
>
> Traceback (most recent call last) [Py3.2.2 on win23]:
>
> Since computation has stopped, the few extra milliseconds is trivial. This
> would certainly help on Python list and the tracker when people do post the
> traceback (which they do not always) without version and system (which they
> often do not, especially on Python list). It might suggest to people that
> this is important info to include. I wonder if this would also help with
> tracebacks sent to library/app developers.
>

Yes, but doctest will need to take this into account, both for its native
traceback matcher, and for traceback matches using ellipses.  Otherwise you
introduce more Python version hell for doctest users.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111210/2b818183/attachment.html>

From steve at pearwood.info  Sat Dec 10 18:52:37 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 11 Dec 2011 04:52:37 +1100
Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u''
 literal support in 3.3?)
In-Reply-To: <jbum6r$4vk$1@dough.gmane.org>
References: <1323320919.2710.24.camel@thinko>	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>	<1323324644.2710.28.camel@thinko>	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>	<1323325916.2710.39.camel@thinko>	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>	<loom.20111208T161219-187@post.gmane.org>	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>	<jbrllu$99u$1@dough.gmane.org>	<loom.20111209T022519-121@post.gmane.org>	<jbsme9$t5d$1@dough.gmane.org>	<CADiSq7eh--b4Moe1XHnXnH1tHJNa51HHp-18hpcsWcGg_dTYyQ@mail.gmail.com>
	<jbum6r$4vk$1@dough.gmane.org>
Message-ID: <4EE39C65.60508@pearwood.info>

Terry Reedy wrote:
> On 12/9/2011 5:17 AM, Nick Coghlan wrote:
> 
>> As Chris pointed out though, the real problem with the "repeatedly run
>> 2to3" workflow is that it can make interpreting tracebacks from the
>> field *really* hard.
> 
> This just gave me the idea of tagging tracebacks with the Python version 
> number. Something like
> 
> Traceback (Py3.2.2, most recent call last):
> 
> and perhaps with the platform also
> 
> Traceback (most recent call last) [Py3.2.2 on win23]:
> 
> Since computation has stopped, the few extra milliseconds is trivial. 
> This would certainly help on Python list and the tracker when people do 
> post the traceback (which they do not always) without version and system 
> (which they often do not, especially on Python list). It might suggest 
> to people that this is important info to include.
[...]

But how often is it actually important information to include?

I am active on both the tutor and the python-list lists, and it seems to me 
that this proposed feature won't be very useful in either place. In my 
experience, the version number is rarely important for the sorts of questions 
that are commonly asked. Python is quite a stable language, and alist = 
alist.append(1) has confused newbies since version 1.5 and will probably 
continue confusing them in version 4000. (Aside: I was reading historical 
What's New docs today, and was stunned to realise how many cool features go 
back all the way to version 2.0.)

Obviously there are times where knowing the version is useful, but often you 
can often derive the version number from the error (at least to 1 significant 
figure):

 >>> map(chr, (40, 41, 42))[1]
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: 'map' object is not subscriptable

Assuming map has not been shadowed, this is obviously Python 3.

If the question involves tracking down an actual bug in Python, the version 
number becomes important. E.g. "it works as documented in 2.6 on Linux, but 
not in 2.7 on OS-X" sort of thing. But that's quite unusual.

Newbies barely read tracebacks at all. Adding the version number and platform 
will just add more text which they won't read and will probably discourage 
them further from reading it (more text = less chance they read it). 
Experienced coders tend to know when the version number is important and 
provided it only when necessary. So it's hard to see who this is aimed at... 
users experienced enough to pay attention to tracebacks but not experienced 
enough to know when to provide the version number?

YMMV, but I don't see much value in this. If it comes at the cost of making 
doctest harder to use, I'm actively against it. Otherwise I'm just mildly 
"meh, why bother?".



-- 
Steven


From storchaka at gmail.com  Sat Dec 10 20:50:15 2011
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Sat, 10 Dec 2011 21:50:15 +0200
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <4EE33F05.4000907@email.de>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<4EE33F05.4000907@email.de>
Message-ID: <jc0d5v$r7g$1@dough.gmane.org>

10.12.11 13:14, francis ???????(??):
> Formatting is like food, everyone has it's own taste. One has
> to use spicery to change it (if possible). For me the view of
> the code (the layout) by the programmer should be automatically
> changed by the tool that reads the code. Here you could have
> a python with braces if you want... (I thing that 'go' has some
> autoformater or a standard way of formatting).

pindent -c


From python-dev at masklinn.net  Sat Dec 10 21:35:40 2011
From: python-dev at masklinn.net (Xavier Morel)
Date: Sat, 10 Dec 2011 21:35:40 +0100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <4EE33F05.4000907@email.de>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<4EE33F05.4000907@email.de>
Message-ID: <A51763BE-D0CB-4BB5-8B1A-EC60657B244E@masklinn.net>

On 2011-12-10, at 12:14 , francis wrote:
> 
> (I thing that 'go' has some
> autoformater or a standard way of formatting).
`gofmt` yes, it simply reformats all the code to match the style
decided by the core go team, it does not provide support formatting-
independent edition.

Think of it as pep8.py editing the code in place instead of just
reporting the stuff it does not like.

From janssen at parc.com  Sat Dec 10 21:54:09 2011
From: janssen at parc.com (Bill Janssen)
Date: Sat, 10 Dec 2011 12:54:09 PST
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jbv29r$de$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>
	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com> <jbv29r$de$1@dough.gmane.org>
Message-ID: <85935.1323550449@parc.com>

Stefan Behnel <stefan_ml at behnel.de> wrote:

> Bill Janssen, 09.12.2011 19:15:
> > I think another thing that might go into "refreshing the batteries" is a
> > feature comparison of BeautifulSoup and HTML5lib against the stdlib
> > competition, to see what needs to be added/revised.  Having to switch to
> > an outside package for parsing possibly invalid HTML is a pain.
> 
> Such a feature request should be worth a separate thread.
> 
> Note, however, that html5lib is likely way too big to add it to the
> stdlib, and that BeautifulSoup lacks a parser for non-conforming HTML
> in Python 3, which would be the target release series for better HTML
> support. So, whatever library or API you would want to use for HTML
> processing is currently only the second question as long as Py3 lacks
> a real-world HTML parser in the stdlib, as well as a robust character
> detection mechanism. I don't think that can be fixed all that easily.

Sounds like it needs a PEP.

I'm only advocating spending some thought on what needs to be done --
whether outside libraries need to be adopted into the stdlib would be a
step after that.  But understanding *why* those libraries exist and are
widely used should be a prerequisite to "refreshing" the stdlib's support.

Bill

From glyph at twistedmatrix.com  Sat Dec 10 22:32:46 2011
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Sat, 10 Dec 2011 16:32:46 -0500
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jbv29r$de$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com> <jbv29r$de$1@dough.gmane.org>
Message-ID: <4A13B293-A093-4E86-ACD7-7B22590CEC7E@twistedmatrix.com>

On Dec 10, 2011, at 2:38 AM, Stefan Behnel wrote:

> Note, however, that html5lib is likely way too big to add it to the stdlib, and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3, which would be the target release series for better HTML support. So, whatever library or API you would want to use for HTML processing is currently only the second question as long as Py3 lacks a real-world HTML parser in the stdlib, as well as a robust character detection mechanism. I don't think that can be fixed all that easily.


Here's the problem in a nutshell, I think:

Everybody wants an HTML parser in the stdlib, because it's inconvenient to pull in a dependency for such a "simple" task.
Everybody wants the stdlib to remain small, stable, and simple and not get "overcomplicated".
Parsing arbitrary HTML5 is a monstrously complex problem, for which there exist rapidly-evolving standards and libraries to deal with it.  Parsing 'the web' (which is rapidly growing to include stuff like SVG, MathML etc) is even harder.

My personal opinion is that HTML5Lib gets this problem almost completely right, and so it should be absorbed by the stdlib.  Trying to re-invent this from scratch, or even use something like BeautifulSoup which uses a bunch of heuristics and hacks rather than reference to the laboriously-crafted standard that says exactly how parsing malformed stuff has to go to be "like a browser", seems like it will just give the stdlib solution a reputation for working on the test input but not working in the real world.

(No disrespect to BeautifulSoup: it was a great attempt in the pre-HTML5 world which it was born into, and I've used it numerous times to implement useful things.  But much more effort has been poured into this problem since then, and the problems are better understood now.)

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111210/4f64ce51/attachment.html>

From regebro at gmail.com  Sat Dec 10 22:56:15 2011
From: regebro at gmail.com (Lennart Regebro)
Date: Sat, 10 Dec 2011 22:56:15 +0100
Subject: [Python-Dev] [PATCH] Adding braces to __future__
In-Reply-To: <871usc39o4.fsf@benfinney.id.au>
References: <20111209202629.GB2319@slate.Speedport_W_723V_Typ_A>
	<CAP7+vJLxqwwHmtkdATN6HCQeC005wkTj-wiNzqcLocVTdguELA@mail.gmail.com>
	<871usc39o4.fsf@benfinney.id.au>
Message-ID: <CAL0kPAXqncs6AA1vmLjbtYmm4HNu+ndhU0eezfxx3AG9_DYZfw@mail.gmail.com>

On Sat, Dec 10, 2011 at 13:15, Ben Finney <ben+python at benfinney.id.au> wrote:
> Guido van Rossum <guido at python.org> writes:
>
>> On Fri, Dec 9, 2011 at 12:26 PM, Cedric Sodhi <manday at gmx.net> wrote:
>>
>> > IF YOU THINK YOU MUST REPLY SOMETHING WITTY, ITERATE THAT THIS HAD
>> > BEEN DISCUSSED BEFORE, REPLY THAT "IT'S SIMPLY NOT GO'NNA HAPPEN",
>> > THAT "WHO DOESN'T LIKE IT IS FREE TO CHOOSE ANOTHER LANGUAGE" OR
>> > SOMETHING SIMILAR, JUST DON'T.
>>
>> Every single response in this thread so far has ignored this request.
>
> The request was completely unreasonable.

As it basically said "I will ignore everything everyone ever will say
on this issue, and if you don't think I should do that, then you
should ignore me", I find the request very reasonable. I wish more
people would advertise that they not only know about the facts of the
matter but completely ignore them. It's basically a big sign saying
"LALALALIMNOTLISTENING", which would shorten a lot of internet debates
if it was more widely used. :-)

//Lennart

From tjreedy at udel.edu  Sat Dec 10 23:30:49 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 10 Dec 2011 17:30:49 -0500
Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u''
 literal support in 3.3?)
In-Reply-To: <CALeMXf7vA=z0uvsXMVxNEfqgTBdcmApXC_q-=zuyiaXTsx5gUA@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
	<jbrllu$99u$1@dough.gmane.org>
	<loom.20111209T022519-121@post.gmane.org>
	<jbsme9$t5d$1@dough.gmane.org>
	<CADiSq7eh--b4Moe1XHnXnH1tHJNa51HHp-18hpcsWcGg_dTYyQ@mail.gmail.com>
	<jbum6r$4vk$1@dough.gmane.org>
	<CALeMXf7vA=z0uvsXMVxNEfqgTBdcmApXC_q-=zuyiaXTsx5gUA@mail.gmail.com>
Message-ID: <jc0mj9$jtj$1@dough.gmane.org>

On 12/10/2011 12:09 PM, PJ Eby wrote:
> On Fri, Dec 9, 2011 at 11:11 PM, Terry Reedy <tjreedy at udel.edu
> <mailto:tjreedy at udel.edu>> wrote:
>
>     This just gave me the idea of tagging tracebacks with the Python
>     version number. Something like
>
>     Traceback (Py3.2.2, most recent call last):
>
>     and perhaps with the platform also
>
>     Traceback (most recent call last) [Py3.2.2 on win23]:
>
>     Since computation has stopped, the few extra milliseconds is
>     trivial. This would certainly help on Python list and the tracker
>     when people do post the traceback (which they do not always) without
>     version and system (which they often do not, especially on Python
>     list). It might suggest to people that this is important info to
>     include. I wonder if this would also help with tracebacks sent to
>     library/app developers.
>
>
> Yes, but doctest will need to take this into account, both for its
> native traceback matcher, and for traceback matches using ellipses.
>   Otherwise you introduce more Python version hell for doctest users.

Is doctest really insisting that the whole line
   Traceback (most recent call last):
exactly match, with nothing added? It really should not, as that is not 
part of the language spec. This seems like the tail wagging the dog.

-- 
Terry Jan Reedy


From tjreedy at udel.edu  Sat Dec 10 23:44:18 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 10 Dec 2011 17:44:18 -0500
Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u''
 literal support in 3.3?)
In-Reply-To: <4EE39C65.60508@pearwood.info>
References: <1323320919.2710.24.camel@thinko>	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>	<1323324644.2710.28.camel@thinko>	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>	<1323325916.2710.39.camel@thinko>	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>	<loom.20111208T161219-187@post.gmane.org>	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>	<jbrllu$99u$1@dough.gmane.org>	<loom.20111209T022519-121@post.gmane.org>	<jbsme9$t5d$1@dough.gmane.org>	<CADiSq7eh--b4Moe1XHnXnH1tHJNa51HHp-18hpcsWcGg_dTYyQ@mail.gmail.com>
	<jbum6r$4vk$1@dough.gmane.org> <4EE39C65.60508@pearwood.info>
Message-ID: <jc0nci$onl$1@dough.gmane.org>

On 12/10/2011 12:52 PM, Steven D'Aprano wrote:
> Terry Reedy wrote:
>> On 12/9/2011 5:17 AM, Nick Coghlan wrote:
>>
>>> As Chris pointed out though, the real problem with the "repeatedly run
>>> 2to3" workflow is that it can make interpreting tracebacks from the
>>> field *really* hard.
>>
>> This just gave me the idea of tagging tracebacks with the Python
>> version number. Something like
>>
>> Traceback (Py3.2.2, most recent call last):
>>
>> and perhaps with the platform also
>>
>> Traceback (most recent call last) [Py3.2.2 on win23]:
>>
>> Since computation has stopped, the few extra milliseconds is trivial.
>> This would certainly help on Python list and the tracker when people
>> do post the traceback (which they do not always) without version and
>> system (which they often do not, especially on Python list). It might
>> suggest to people that this is important info to include.
> [...]
>
> But how often is it actually important information to include?
>
> I am active on both the tutor and the python-list lists, and it seems to
> me that this proposed feature won't be very useful in either place. In
> my experience, the version number is rarely important for the sorts of
> questions that are commonly asked.

My experience on Python list is that version and platform are often 
important. But leave that aside. It is definitely important on the 
tracker, which I already mentioned. Just a few days ago, for instance, 
the opening message of
http://bugs.python.org/issue13538
has
" >>> bytes("foo")
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   TypeError: string argument without an encoding"
with no indication of the version anywhere in the message.

Perhaps in such cases the OP correctly marks the version up in the 
header, but it would be nice to have it right there in the traceback.

As for doctest, it could/should be changed to check for 
s.startswith("Traceback (most recent call last)") (instead of s == ...) 
if it does not do that now.

-- 
Terry Jan Reedy


From jcea at jcea.es  Sun Dec 11 00:30:49 2011
From: jcea at jcea.es (Jesus Cea)
Date: Sun, 11 Dec 2011 00:30:49 +0100
Subject: [Python-Dev] Adding GNU conditional execution in the Makefile?
Message-ID: <4EE3EBA9.2050600@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Working in the DTRACE probes, I think I can simplify the build logic
quite a bit using the GNU Makefile conditional execution:
<https://www.gnu.org/s/hello/manual/make/Conditional-Syntax.html>.

In concrete, I have object files that must be compiled and linked, or
not, according to a "configure" test result.

But currently I think we are not using these features. Maybe because
we don't want to force the use of GMAKE, I don't know.

If this is a policy, I would like to know.

And if somebody has a suggestion to cope with this difficulty...

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTuPrqZlgi5GaxT1NAQIRmQP/ebIcya/xg/lCTXPd6QyaBaFxrhL6jLiP
osKeklCSH/aw6tt6v1lK7XgPf8HBEU11KGBmL4xJUsVcDExkNb3Mdu3bSW4Gb5ao
Ep1PxvEWLxa/yVkKuvgdBpvdCoxibhNLfGgVTj08ZE18o9tGbhNKS6EN94uAQJT9
ZASlf8baOss=
=5lr+
-----END PGP SIGNATURE-----

From tjreedy at udel.edu  Sun Dec 11 00:30:34 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 10 Dec 2011 18:30:34 -0500
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4A13B293-A093-4E86-ACD7-7B22590CEC7E@twistedmatrix.com>
References: <jbsfar$en7$1@dough.gmane.org>	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com> <jbv29r$de$1@dough.gmane.org>
	<4A13B293-A093-4E86-ACD7-7B22590CEC7E@twistedmatrix.com>
Message-ID: <jc0q3b$8qu$1@dough.gmane.org>

On 12/10/2011 4:32 PM, Glyph Lefkowitz wrote:
> On Dec 10, 2011, at 2:38 AM, Stefan Behnel wrote:
>
>> Note, however, that html5lib is likely way too big to add it to the
>> stdlib, and that BeautifulSoup lacks a parser for non-conforming HTML
>> in Python 3, which would be the target release series for better HTML
>> support. So, whatever library or API you would want to use for HTML
>> processing is currently only the second question as long as Py3 lacks
>> a real-world HTML parser in the stdlib, as well as a robust character
>> detection mechanism. I don't think that can be fixed all that easily.
>
> Here's the problem in a nutshell, I think:
>
>  1. Everybody wants an HTML parser in the stdlib, because it's
>     inconvenient to pull in a dependency for such a "simple" task.
>  2. Everybody wants the stdlib to remain small, stable, and simple and
>     not get "overcomplicated".
>  3. Parsing arbitrary HTML5 is a monstrously complex problem, for which
>     there exist rapidly-evolving standards and libraries to deal with
>     it. Parsing 'the web' (which is rapidly growing to include stuff
>     like SVG, MathML etc) is even harder.
>
>
> My personal opinion is that HTML5Lib gets this problem almost completely
> right, and so it should be absorbed by the stdlib.

A little data: the HTML5lib project lives at
https://code.google.com/p/html5lib/
It has 4 owners and 22 other committers.

The most recent release, html5lib 0.90 for Python, is nearly 2 years 
old. Since there is a separate Python3 repository, and there is no 
mention on Python3 compatibility elsewhere that I saw, including the 
pypi listing, I assume that is for Python2 only.

A comment on a recent (July 11) Python3 issue
https://code.google.com/p/html5lib/issues/detail?id=187&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Port
suggest that the Python3 version still has problems. "Merged in now, 
though still lots of errors and failures in the testsuite."

-- 
Terry Jan Reedy


From guido at python.org  Sun Dec 11 02:02:28 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 10 Dec 2011 17:02:28 -0800
Subject: [Python-Dev] Adding GNU conditional execution in the Makefile?
In-Reply-To: <4EE3EBA9.2050600@jcea.es>
References: <4EE3EBA9.2050600@jcea.es>
Message-ID: <CAP7+vJJ3pr_YSR7p82wHPGrNT-EnKiYsJStO7Ymq3upRDQf70A@mail.gmail.com>

I don't know how widespread gmake is, but I certainly don't want Python to
be dependent on GNU tools exclusively. You don't have to use GCC to compile
it. (Autoconfig is a different story, it only is needed when
config.inchanges. Similar, readline is optional.)

--Guido

On Sat, Dec 10, 2011 at 3:30 PM, Jesus Cea <jcea at jcea.es> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Working in the DTRACE probes, I think I can simplify the build logic
> quite a bit using the GNU Makefile conditional execution:
> <https://www.gnu.org/s/hello/manual/make/Conditional-Syntax.html>.
>
> In concrete, I have object files that must be compiled and linked, or
> not, according to a "configure" test result.
>
> But currently I think we are not using these features. Maybe because
> we don't want to force the use of GMAKE, I don't know.
>
> If this is a policy, I would like to know.
>
> And if somebody has a suggestion to cope with this difficulty...
>
> - --
> Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
> jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
> jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
> .                              _/_/  _/_/    _/_/          _/_/  _/_/
> "Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
> "My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
> "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQCVAwUBTuPrqZlgi5GaxT1NAQIRmQP/ebIcya/xg/lCTXPd6QyaBaFxrhL6jLiP
> osKeklCSH/aw6tt6v1lK7XgPf8HBEU11KGBmL4xJUsVcDExkNb3Mdu3bSW4Gb5ao
> Ep1PxvEWLxa/yVkKuvgdBpvdCoxibhNLfGgVTj08ZE18o9tGbhNKS6EN94uAQJT9
> ZASlf8baOss=
> =5lr+
> -----END PGP SIGNATURE-----
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111210/caaade4e/attachment.html>

From glyph at twistedmatrix.com  Sun Dec 11 03:25:33 2011
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Sat, 10 Dec 2011 21:25:33 -0500
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jc0q3b$8qu$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com> <jbv29r$de$1@dough.gmane.org>
	<4A13B293-A093-4E86-ACD7-7B22590CEC7E@twistedmatrix.com>
	<jc0q3b$8qu$1@dough.gmane.org>
Message-ID: <489D0DBD-FE82-4200-97DF-5FA425E2AEF6@twistedmatrix.com>


On Dec 10, 2011, at 6:30 PM, Terry Reedy wrote:

> A little data: the HTML5lib project lives at
> https://code.google.com/p/html5lib/
> It has 4 owners and 22 other committers.
> 
> The most recent release, html5lib 0.90 for Python, is nearly 2 years old. Since there is a separate Python3 repository, and there is no mention on Python3 compatibility elsewhere that I saw, including the pypi listing, I assume that is for Python2 only.

I believe that you are correct.

> A comment on a recent (July 11) Python3 issue
> https://code.google.com/p/html5lib/issues/detail?id=187&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Port
> suggest that the Python3 version still has problems. "Merged in now, though still lots of errors and failures in the testsuite."


I don't see what bearing this has on the discussion.  There are three possible ways I can imagine to interpret this information.

First, you could believe that porting a codebase from Python 2 to Python 3 is much easier than solving a difficult domain-specific problem.  In that case, html5lib has done the hard part and someone interested in html-in-the-stdlib should do the rest.

Second, you could believe that porting a codebase from Python 2 to Python 3 is harder than solving a difficult domain-specific problem, in which case something is seriously wrong with Python 3 or its attendant migration tools and that needs to be fixed, so someone should fix that rather than worrying about parsing HTML right now.  (I doubt that many subscribers to this list would share this opinion, though.)

Third, you could believe that parsing HTML is not a difficult domain-specific problem.  But only a crazy person would believe that, so you're left with one of the previous options :).

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111210/5028d48a/attachment.html>

From tjreedy at udel.edu  Sun Dec 11 06:55:01 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 11 Dec 2011 00:55:01 -0500
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <489D0DBD-FE82-4200-97DF-5FA425E2AEF6@twistedmatrix.com>
References: <jbsfar$en7$1@dough.gmane.org>
	<20111209100736.5f16419a@mikmeyer-vm-fedora>
	<68178.1323454554@parc.com> <jbv29r$de$1@dough.gmane.org>
	<4A13B293-A093-4E86-ACD7-7B22590CEC7E@twistedmatrix.com>
	<jc0q3b$8qu$1@dough.gmane.org>
	<489D0DBD-FE82-4200-97DF-5FA425E2AEF6@twistedmatrix.com>
Message-ID: <4EE445B5.5090802@udel.edu>



On 12/10/2011 9:25 PM, Glyph Lefkowitz wrote:
> On Dec 10, 2011, at 6:30 PM, Terry Reedy wrote:

>> A little data: the HTML5lib project lives at
>> https://code.google.com/p/html5lib/
>> It has 4 owners and 22 other committers.

If there really are 4 'owners' rather than 4 people with admin access to 
the site, then there are 4 people to negotiate with.

>> The most recent release, html5lib 0.90 for Python, is nearly 2 years
>> old. Since there is a separate Python3 repository, and there is no
>> mention on Python3 compatibility elsewhere that I saw, including the
>> pypi listing, I assume that is for Python2 only.
>
> I believe that you are correct.

There are issues pointing to a 1.0 release, but I could not find any 
current timetable. The project lots a bit stagnant. That does not bode 
well for a commitment to future active maintenance.

>> A comment on a recent (July 11) Python3 issue
>> https://code.google.com/p/html5lib/issues/detail?id=187&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Port
>> <https://code.google.com/p/html5lib/issues/detail?id=187&colspec=ID
>> Type Status Priority Milestone Owner Summary Port>
>> suggest that the Python3 version still has problems. "Merged in now,
>> though still lots of errors and failures in the testsuite."
>
> I don't see what bearing this has on the discussion.

I think both points above show that 'absorbing HTML5Lib in the stdlib' 
will involve more sociological and technical problems than doing so with 
a active one-person module that already runs on 3.2. One is that the 
multiple version Python 2.x codebase is the reference version and that 
will not be incorporated. A serious plan will have to address the real 
situation.

---
Terry Jan Reedy


From python at mrabarnett.plus.com  Sun Dec 11 21:12:41 2011
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 11 Dec 2011 20:12:41 +0000
Subject: [Python-Dev] Omission in re.sub?
Message-ID: <4EE50EB9.3000606@mrabarnett.plus.com>

I've just come across an omission in re.sub which I hadn't noticed
before.

In re.sub the replacement string can contain escape sequences, for
example:

 >>> repr(re.sub(r"x", r"\n", "axb"))
"'a\\nb'"

However:

 >>> repr(re.sub(r"x", r"\x0A", "axb"))
"'a\\\\x0Ab'"

Yes, it doesn't recognise "\xNN".

Is there a reason for this?

The regex module does the same, but is there any objection to me fixing
it in the regex module? (I'm thinking about compatibility with re here.)

From pje at telecommunity.com  Sun Dec 11 21:12:52 2011
From: pje at telecommunity.com (PJ Eby)
Date: Sun, 11 Dec 2011 15:12:52 -0500
Subject: [Python-Dev] Tag trackbacks with version (was Re: readd u''
 literal support in 3.3?)
In-Reply-To: <jc0mj9$jtj$1@dough.gmane.org>
References: <1323320919.2710.24.camel@thinko>
	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>
	<1323324644.2710.28.camel@thinko>
	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>
	<1323325916.2710.39.camel@thinko>
	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>
	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>
	<loom.20111208T161219-187@post.gmane.org>
	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>
	<jbrllu$99u$1@dough.gmane.org>
	<loom.20111209T022519-121@post.gmane.org>
	<jbsme9$t5d$1@dough.gmane.org>
	<CADiSq7eh--b4Moe1XHnXnH1tHJNa51HHp-18hpcsWcGg_dTYyQ@mail.gmail.com>
	<jbum6r$4vk$1@dough.gmane.org>
	<CALeMXf7vA=z0uvsXMVxNEfqgTBdcmApXC_q-=zuyiaXTsx5gUA@mail.gmail.com>
	<jc0mj9$jtj$1@dough.gmane.org>
Message-ID: <CALeMXf77yzWsE4jpSMjsFiSY6G3OvGcCFdJqGdoL9Xmjdbu3HA@mail.gmail.com>

On Sat, Dec 10, 2011 at 5:30 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> Is doctest really insisting that the whole line
>  Traceback (most recent call last):
> exactly match, with nothing added? It really should not, as that is not
> part of the language spec. This seems like the tail wagging the dog.
>

It's a regular expression match, actually.  The standard matcher ignores
everything between the Traceback line (matched by a regex) and the first
unindented line that follows in the doctest.  However, if you explicitly
try to match a traceback with the ellipsis matcher, intending to observe
whether certain specific lines are printed, then you wouldn't be using
doctest's built-in matcher, and that was the case I was concerned about.

However, as it turns out, I was confused about when this latter case
occurs: in order to do it, you have to actually intentionally print a
traceback (e.g. via traceback.format_exception() and friends), rather than
allowing the exception to propagate normally.  This doesn't happen nearly
as often in my doctests as I thought it did, but if format_exception()
changes it'll still affect some people.

The other piece I was pointing out was that if you change the message
without changing the doctest regex, then pasting an interpreter transcript
into a doctest will no longer work, because doctest will think it's trying
to match non-error output.  So that has to be changed when the exception
format changes.

So, no actual objection here; just saying that if you don't change that
regex, people who create *new* doctests with tracebacks won't be able to
get them to work without deleting the version info from their copy-pasted
tracebacks.  I was also concerned about a situation that, while it exists,
does not occur anywhere near as frequently as I thought it would in my own
tests, even for things that seriously abuse Python internals and likely
can't be ported to Python 3 anyway.  ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111211/bbb639d6/attachment.html>

From guido at python.org  Sun Dec 11 21:27:35 2011
From: guido at python.org (Guido van Rossum)
Date: Sun, 11 Dec 2011 12:27:35 -0800
Subject: [Python-Dev] Omission in re.sub?
In-Reply-To: <4EE50EB9.3000606@mrabarnett.plus.com>
References: <4EE50EB9.3000606@mrabarnett.plus.com>
Message-ID: <CAP7+vJLwOnMGbvFTF=ONChyW8oVsSeiqoTvrBgOTwg6Fwr0YOA@mail.gmail.com>

As long as there's a way to place a single backslash in the output
this seems fine to me, though I'm not sure it's important. Of course
it will likely break some test... the test will then have to be fixed.

I can't remember why we did this -- is there a full list of all the
escapes that re.sub() interprets somewhere? I thought it was pretty
limited. Maybe it's the related list of escapes that are supported in
regular expressions?

--Guido

On Sun, Dec 11, 2011 at 12:12 PM, MRAB <python at mrabarnett.plus.com> wrote:
> I've just come across an omission in re.sub which I hadn't noticed
> before.
>
> In re.sub the replacement string can contain escape sequences, for
> example:
>
>>>> repr(re.sub(r"x", r"\n", "axb"))
> "'a\\nb'"
>
> However:
>
>>>> repr(re.sub(r"x", r"\x0A", "axb"))
> "'a\\\\x0Ab'"
>
> Yes, it doesn't recognise "\xNN".
>
> Is there a reason for this?
>
> The regex module does the same, but is there any objection to me fixing
> it in the regex module? (I'm thinking about compatibility with re here.)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)

From python at mrabarnett.plus.com  Sun Dec 11 21:47:48 2011
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 11 Dec 2011 20:47:48 +0000
Subject: [Python-Dev] Omission in re.sub?
In-Reply-To: <CAP7+vJLwOnMGbvFTF=ONChyW8oVsSeiqoTvrBgOTwg6Fwr0YOA@mail.gmail.com>
References: <4EE50EB9.3000606@mrabarnett.plus.com>
	<CAP7+vJLwOnMGbvFTF=ONChyW8oVsSeiqoTvrBgOTwg6Fwr0YOA@mail.gmail.com>
Message-ID: <4EE516F4.4000208@mrabarnett.plus.com>

On 11/12/2011 20:27, Guido van Rossum wrote:
> On Sun, Dec 11, 2011 at 12:12 PM, MRAB<python at mrabarnett.plus.com>
> wrote:
>> I've just come across an omission in re.sub which I hadn't noticed
>> before.
>>
>> In re.sub the replacement string can contain escape sequences, for
>> example:
>>
>>>>> repr(re.sub(r"x", r"\n", "axb"))
>> "'a\\nb'"
>>
>> However:
>>
>>>>> repr(re.sub(r"x", r"\x0A", "axb"))
>> "'a\\\\x0Ab'"
>>
>> Yes, it doesn't recognise "\xNN".
>>
>> Is there a reason for this?
>>
>> The regex module does the same, but is there any objection to me
>> fixing it in the regex module? (I'm thinking about compatibility
>> with re here.)
>
> As long as there's a way to place a single backslash in the output
> this seems fine to me, though I'm not sure it's important. Of course
> it will likely break some test... the test will then have to be
> fixed.
>
> I can't remember why we did this -- is there a full list of all the
> escapes that re.sub() interprets somewhere? I thought it was pretty
> limited. Maybe it's the related list of escapes that are supported
> in regular expressions?
>
The documentation says: """That is, \n is converted to a single newline 
character, \r is converted to a linefeed, and so forth."""

All of the other escape sequences work as expected, except for \uNNNN
and \UNNNNNNNN which aren't supported at all in re.

I should probably also add \N{...} to the list for completeness.

From guido at python.org  Sun Dec 11 22:04:56 2011
From: guido at python.org (Guido van Rossum)
Date: Sun, 11 Dec 2011 13:04:56 -0800
Subject: [Python-Dev] Omission in re.sub?
In-Reply-To: <4EE516F4.4000208@mrabarnett.plus.com>
References: <4EE50EB9.3000606@mrabarnett.plus.com>
	<CAP7+vJLwOnMGbvFTF=ONChyW8oVsSeiqoTvrBgOTwg6Fwr0YOA@mail.gmail.com>
	<4EE516F4.4000208@mrabarnett.plus.com>
Message-ID: <CAP7+vJ+yH0iw3y_9mhuHSk6DGDtEj3U5khf7BXY62p0OqjJvRA@mail.gmail.com>

I guess the current rule is that any escapes referring to characters
by a numeric value are not supported; this probably made some kind of
sense because \1 etc. are backreferences. But since we're discouraging
octal escapes anyway I think it's fine to improve over this.

On Sun, Dec 11, 2011 at 12:47 PM, MRAB <python at mrabarnett.plus.com> wrote:
> On 11/12/2011 20:27, Guido van Rossum wrote:
>>
>> On Sun, Dec 11, 2011 at 12:12 PM, MRAB<python at mrabarnett.plus.com>
>> wrote:
>>>
>>> I've just come across an omission in re.sub which I hadn't noticed
>>> before.
>>>
>>> In re.sub the replacement string can contain escape sequences, for
>>> example:
>>>
>>>>>> repr(re.sub(r"x", r"\n", "axb"))
>>>
>>> "'a\\nb'"
>>>
>>> However:
>>>
>>>>>> repr(re.sub(r"x", r"\x0A", "axb"))
>>>
>>> "'a\\\\x0Ab'"
>>>
>>> Yes, it doesn't recognise "\xNN".
>>>
>>> Is there a reason for this?
>>>
>>> The regex module does the same, but is there any objection to me
>>> fixing it in the regex module? (I'm thinking about compatibility
>>> with re here.)
>>
>>
>> As long as there's a way to place a single backslash in the output
>> this seems fine to me, though I'm not sure it's important. Of course
>> it will likely break some test... the test will then have to be
>> fixed.
>>
>> I can't remember why we did this -- is there a full list of all the
>> escapes that re.sub() interprets somewhere? I thought it was pretty
>> limited. Maybe it's the related list of escapes that are supported
>> in regular expressions?
>>
> The documentation says: """That is, \n is converted to a single newline
> character, \r is converted to a linefeed, and so forth."""
>
> All of the other escape sequences work as expected, except for \uNNNN
> and \UNNNNNNNN which aren't supported at all in re.
>
> I should probably also add \N{...} to the list for completeness.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)

From martin at v.loewis.de  Sun Dec 11 23:03:41 2011
From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 11 Dec 2011 23:03:41 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <E8A216E9-6AC9-4307-8250-B9B185604B82@masklinn.net>
References: <jbsfar$en7$1@dough.gmane.org> <4EE1C9AB.2040301@v.loewis.de>
	<E8A216E9-6AC9-4307-8250-B9B185604B82@masklinn.net>
Message-ID: <4EE528BD.2040102@v.loewis.de>

Am 09.12.2011 10:09, schrieb Xavier Morel:
> On 2011-12-09, at 09:41 , Martin v. L?wis wrote:
>>> a) The stdlib documentation should help users to choose the right
>>> tool right from the start. Instead of using the totally
>>> misleading wording that it uses now, it should be honest about
>>> the performance characteristics of MiniDOM and should actively
>>> suggest that those who don't know what to choose (or even *that*
>>> they can choose) should not use MiniDOM in the first place.
>> 
[...]
> 
> Minidom is inferior in interface flow and pythonicity, in terseness,
> in speed, in memory consumption (even more so using cElementTree, and
> that's not something which can be fixed unless minidom gets a C
> accelerator), etc? Even after fixing minidom (if anybody has the time
> and drive to commit to it), ET/cET should be preferred over it.

I don't mind pointing people to ElementTree, despite that I disagree
whether the ET interface is "superior" to DOM. It's Stefan's reasoning
as to *why* people should be pointed to ET, and what words should be
used to do that. IOW, I detest bashing some part of the standard
library, just to urge users to use some other part of the standard library.

People are still using PyXML, despite it's not being maintained anymore.
Telling them to replace 4DOM with minidom is much more appropriate than
telling them to rewrite in ET.

Regards,
Martin

From martin at v.loewis.de  Sun Dec 11 23:07:07 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 11 Dec 2011 23:07:07 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <CADiSq7etPwJh4sz+k1AnYm0iZFDP4FtSvb9dkK+D64Zs48h00w@mail.gmail.com>
References: <jbsfar$en7$1@dough.gmane.org>	<4EE1C9AB.2040301@v.loewis.de>
	<CADiSq7etPwJh4sz+k1AnYm0iZFDP4FtSvb9dkK+D64Zs48h00w@mail.gmail.com>
Message-ID: <4EE5298B.8090908@v.loewis.de>

> For the various XML libraries, a message along the lines of "Note: The
> <whatever> module is a <yada, yada, DOM based, whatever>. If all you
> are trying to do is read and write XML files, consider using the
> xml.etree.ElementTree module instead".

I wouldn't mind such a wording. I still would mind the changes that
Stefan proposed (which are actually different from yours).

Regards,
Martin

From martin at v.loewis.de  Sun Dec 11 23:14:57 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 11 Dec 2011 23:14:57 +0100
Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and
	PyUnicode_EncodeCodePage()
In-Reply-To: <CADiSq7fMeU+8L95ziXepBbA1bQ98Sut-3_Uzz6GT9mvn1symdw@mail.gmail.com>
References: <E1RYnC6-0005HV-Ep@dinsdale.python.org>	<20111209013535.6fb38068@pitrou.net>	<4EE1CA5D.70705@v.loewis.de>
	<CADiSq7fMeU+8L95ziXepBbA1bQ98Sut-3_Uzz6GT9mvn1symdw@mail.gmail.com>
Message-ID: <4EE52B61.40801@v.loewis.de>

Am 09.12.2011 10:12, schrieb Nick Coghlan:
> On Fri, Dec 9, 2011 at 6:44 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Am 09.12.2011 01:35, schrieb Antoine Pitrou:
>>> On Fri, 09 Dec 2011 00:16:02 +0100
>>> victor.stinner <python-checkins at python.org> wrote:
>>>>
>>>> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
>>>> +
>>>> +   Get a new copy of a Unicode object.
>>>> +
>>>> +   .. versionadded:: 3.3
>>>
>>> I'm not sure I understand. Why would you make a copy of an immutable
>>> object?
>>
>> It can convert a unicode subtype object into a an exact unicode
>> object.
>>
>> I'd rename it to _PyUnicode_AsExactUnicode, and undocument it.
> 
> Isn't it basically just exposing a C level version of the unicode()
> builtin's behaviour?

No. To call the unicode() builtin, do

  PyObject_CallFunction(&PyUnicode_Type, "O", param)

or some such. PyUnicode_Copy doesn't correspond to any Python-level
API.

> While I agree the name could be better (and
> PyUnicode_AsExactUnicode would certainly work), why make it private?

I suggest to be minimalistic in extensions to the API. There should
be a demonstrated need for an API before adding it, which I don't
see in this case.

In general, it will be difficult to find a demonstrable need for new
APIs, since the majority (more than 99%) of API use cases is already
covered by the abstract object API (i.e. what ceval uses).

The unicode type in particular has a bad tradition of adding tons
of function to the C API, only so we find out a few releases later
that the API is obsolete (e.g. needs additional/different parameters),
so we carry unused functions around just because some extension module
may use them.

Regards,
Martin

From python at mrabarnett.plus.com  Sun Dec 11 23:36:32 2011
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 11 Dec 2011 22:36:32 +0000
Subject: [Python-Dev] Omission in re.sub?
In-Reply-To: <CAP7+vJ+yH0iw3y_9mhuHSk6DGDtEj3U5khf7BXY62p0OqjJvRA@mail.gmail.com>
References: <4EE50EB9.3000606@mrabarnett.plus.com>
	<CAP7+vJLwOnMGbvFTF=ONChyW8oVsSeiqoTvrBgOTwg6Fwr0YOA@mail.gmail.com>
	<4EE516F4.4000208@mrabarnett.plus.com>
	<CAP7+vJ+yH0iw3y_9mhuHSk6DGDtEj3U5khf7BXY62p0OqjJvRA@mail.gmail.com>
Message-ID: <4EE53070.9010702@mrabarnett.plus.com>

On 11/12/2011 21:04, Guido van Rossum wrote:
> On Sun, Dec 11, 2011 at 12:47 PM, MRAB<python at mrabarnett.plus.com>  wrote:
>> On 11/12/2011 20:27, Guido van Rossum wrote:
>>>
>>> On Sun, Dec 11, 2011 at 12:12 PM, MRAB<python at mrabarnett.plus.com>
>>> wrote:
>>>>
>>>> I've just come across an omission in re.sub which I hadn't noticed
>>>> before.
>>>>
>>>> In re.sub the replacement string can contain escape sequences, for
>>>> example:
>>>>
>>>>>>> repr(re.sub(r"x", r"\n", "axb"))
>>>>
>>>> "'a\\nb'"
>>>>
>>>> However:
>>>>
>>>>>>> repr(re.sub(r"x", r"\x0A", "axb"))
>>>>
>>>> "'a\\\\x0Ab'"
>>>>
>>>> Yes, it doesn't recognise "\xNN".
>>>>
>>>> Is there a reason for this?
>>>>
>>>> The regex module does the same, but is there any objection to me
>>>> fixing it in the regex module? (I'm thinking about compatibility
>>>> with re here.)
>>>
>>>
>>> As long as there's a way to place a single backslash in the output
>>> this seems fine to me, though I'm not sure it's important. Of course
>>> it will likely break some test... the test will then have to be
>>> fixed.
>>>
>>> I can't remember why we did this -- is there a full list of all the
>>> escapes that re.sub() interprets somewhere? I thought it was pretty
>>> limited. Maybe it's the related list of escapes that are supported
>>> in regular expressions?
>>>
>> The documentation says: """That is, \n is converted to a single newline
>> character, \r is converted to a linefeed, and so forth."""
>>
>> All of the other escape sequences work as expected, except for \uNNNN
>> and \UNNNNNNNN which aren't supported at all in re.
>>
>> I should probably also add \N{...} to the list for completeness.
>>
> I guess the current rule is that any escapes referring to characters
> by a numeric value are not supported; this probably made some kind of
> sense because \1 etc. are backreferences. But since we're discouraging
> octal escapes anyway I think it's fine to improve over this.
>
A pattern can contain them, even octal escapes (must be 3 digits).

From martin at v.loewis.de  Sun Dec 11 23:39:53 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 11 Dec 2011 23:39:53 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jbsile$4vu$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org> <4EE1C9AB.2040301@v.loewis.de>
	<jbsile$4vu$1@dough.gmane.org>
Message-ID: <4EE53139.8020500@v.loewis.de>

> I can't recall anyone working on any substantial improvements during the
> last six years or so, and the reason for that seems obvious to me.

What do you think is the reason? It's not at all obvious to me.

Regards,
Martin

From martin at v.loewis.de  Sun Dec 11 23:40:47 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 11 Dec 2011 23:40:47 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <CAKmKYaBid4c8Y0pe7txxZMk9+0WN8Hr5ZodS=HP05MdV-ysPhQ@mail.gmail.com>
References: <jbsfar$en7$1@dough.gmane.org>
	<CAKmKYaBid4c8Y0pe7txxZMk9+0WN8Hr5ZodS=HP05MdV-ysPhQ@mail.gmail.com>
Message-ID: <4EE5316F.9060004@v.loewis.de>

Am 09.12.2011 16:09, schrieb Dirkjan Ochtman:
> On Fri, Dec 9, 2011 at 09:02, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> a) The stdlib documentation should help users to choose the right tool right
>> from the start.
>> b) cElementTree should finally loose it's "special" status as a separate
>> library and disappear as an accelerator module behind ElementTree.
> 
> An at least somewhat informed +1 from me. The ElementTree API is a
> very good way to deal with XML from Python, and it deserves to be
> promoted over the included alternatives.
> 
> Let's deprecate the NiCad batteries and try to guide users toward the
> Li-Ion ones.

If you are proposing to deprecate minidom: -1

Regards,
Martin

From solipsis at pitrou.net  Sun Dec 11 23:45:06 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 11 Dec 2011 23:45:06 +0100
Subject: [Python-Dev] cpython: Issue #5689: Add support for lzma
 compression to the tarfile module.
References: <E1RZSmP-0003XL-SZ@dinsdale.python.org>
Message-ID: <20111211234506.071db305@pitrou.net>

On Sat, 10 Dec 2011 20:40:17 +0100
lars.gustaebel <python-checkins at python.org> wrote:
>  
>  The :mod:`tarfile` module makes it possible to read and write tar
> -archives, including those using gzip or bz2 compression.
> +archives, including those using gzip, bz2 and lzma compression.
>  (:file:`.zip` files can be read and written using the :mod:`zipfile` module.)

Perhaps there should be a "versionchanged" directive for lzma support?

Regards

Antoine.



From martin at v.loewis.de  Sun Dec 11 23:44:50 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 11 Dec 2011 23:44:50 +0100
Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and
	PyUnicode_EncodeCodePage()
In-Reply-To: <20111209203216.2c627d61@pitrou.net>
References: <E1RYnC6-0005HV-Ep@dinsdale.python.org>	<20111209013535.6fb38068@pitrou.net>	<4EE258A2.8020902@haypocalc.com>
	<20111209203216.2c627d61@pitrou.net>
Message-ID: <4EE53262.1080405@v.loewis.de>

Am 09.12.2011 20:32, schrieb Antoine Pitrou:
> On Fri, 09 Dec 2011 19:51:14 +0100
> Victor Stinner <victor.stinner at haypocalc.com> wrote:
>> On 09/12/2011 01:35, Antoine Pitrou wrote:
>>> On Fri, 09 Dec 2011 00:16:02 +0100
>>> victor.stinner<python-checkins at python.org>  wrote:
>>>>
>>>> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
>>>> +
>>>> +   Get a new copy of a Unicode object.
>>>> +
>>>> +   .. versionadded:: 3.3
>>>
>>> I'm not sure I understand. Why would you make a copy of an immutable
>>> object?
>>
>> PyUnicode_Copy() can be used to modify a string to create a new string 
>> with the same length. It is used for example by str.upper(), 
>> str.title(), ... (fixup()).
> 
> Then the doc should mention that the returned string can be modified.
> Otherwise it's a bit obscure why the function exists.

I'm skeptical about this modification part. If you make a copy, it's
not clear at all that the new characters that you put in will fit
in range with the width of the unicode string. Even decreasing the
ordinal of a character may be incorrect as the result may not be
canonical anymore.

Regards,
Martin

From solipsis at pitrou.net  Sun Dec 11 23:46:09 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 11 Dec 2011 23:46:09 +0100
Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and
 PyUnicode_EncodeCodePage()
In-Reply-To: <4EE53262.1080405@v.loewis.de>
References: <E1RYnC6-0005HV-Ep@dinsdale.python.org>
	<20111209013535.6fb38068@pitrou.net>	<4EE258A2.8020902@haypocalc.com>
	<20111209203216.2c627d61@pitrou.net>  <4EE53262.1080405@v.loewis.de>
Message-ID: <1323643569.3366.19.camel@localhost.localdomain>

Le dimanche 11 d?cembre 2011 ? 23:44 +0100, "Martin v. L?wis" a ?crit :
> Am 09.12.2011 20:32, schrieb Antoine Pitrou:
> > On Fri, 09 Dec 2011 19:51:14 +0100
> > Victor Stinner <victor.stinner at haypocalc.com> wrote:
> >> On 09/12/2011 01:35, Antoine Pitrou wrote:
> >>> On Fri, 09 Dec 2011 00:16:02 +0100
> >>> victor.stinner<python-checkins at python.org>  wrote:
> >>>>
> >>>> +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
> >>>> +
> >>>> +   Get a new copy of a Unicode object.
> >>>> +
> >>>> +   .. versionadded:: 3.3
> >>>
> >>> I'm not sure I understand. Why would you make a copy of an immutable
> >>> object?
> >>
> >> PyUnicode_Copy() can be used to modify a string to create a new string 
> >> with the same length. It is used for example by str.upper(), 
> >> str.title(), ... (fixup()).
> > 
> > Then the doc should mention that the returned string can be modified.
> > Otherwise it's a bit obscure why the function exists.
> 
> I'm skeptical about this modification part. If you make a copy, it's
> not clear at all that the new characters that you put in will fit
> in range with the width of the unicode string. Even decreasing the
> ordinal of a character may be incorrect as the result may not be
> canonical anymore.

Ah, good point. And perhaps a good reason to make the API private.

Regards

Antoine.



From python-dev at masklinn.net  Sun Dec 11 23:47:45 2011
From: python-dev at masklinn.net (Xavier Morel)
Date: Sun, 11 Dec 2011 23:47:45 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4EE528BD.2040102@v.loewis.de>
References: <jbsfar$en7$1@dough.gmane.org> <4EE1C9AB.2040301@v.loewis.de>
	<E8A216E9-6AC9-4307-8250-B9B185604B82@masklinn.net>
	<4EE528BD.2040102@v.loewis.de>
Message-ID: <4E7DB3D7-F4DF-40D8-981D-23F71658EBCB@masklinn.net>

On 2011-12-11, at 23:03 , Martin v. L?wis wrote:
> People are still using PyXML, despite it's not being maintained anymore.
> Telling them to replace 4DOM with minidom is much more appropriate than
> telling them to rewrite in ET.

From my understanding, Stefan's suggestion is mostly aimed at "new"
python users trying to manipulate XML and not knowing what to use
(yet). It's not about telling people to rewrite existing codebase
(it's a good idea as well when possible, as far as I'm concerned, but
it's a different issue).

From martin at v.loewis.de  Sun Dec 11 23:50:50 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 11 Dec 2011 23:50:50 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7eh--b4Moe1XHnXnH1tHJNa51HHp-18hpcsWcGg_dTYyQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>	<1323324644.2710.28.camel@thinko>	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>	<1323325916.2710.39.camel@thinko>	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>	<loom.20111208T161219-187@post.gmane.org>	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>	<jbrllu$99u$1@dough.gmane.org>	<loom.20111209T022519-121@post.gmane.org>	<jbsme9$t5d$1@dough.gmane.org>
	<CADiSq7eh--b4Moe1XHnXnH1tHJNa51HHp-18hpcsWcGg_dTYyQ@mail.gmail.com>
Message-ID: <4EE533CA.4000605@v.loewis.de>

Am 09.12.2011 11:17, schrieb Nick Coghlan:
> On Fri, Dec 9, 2011 at 8:03 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>> On 12/8/2011 8:39 PM, Vinay Sajip wrote:
>>> on an
>>>
>>> entire codebase (for example, using setup.py with flags to run 2to3
>>> during setup).
>>
>>
>> Oh. That explains the 'slow' complaint.
> 
> As Chris pointed out though, the real problem with the "repeatedly run
> 2to3" workflow is that it can make interpreting tracebacks from the
> field *really* hard.

It's hard, but not *really* hard. In most cases, the line numbers
in the 2to3 result are exactly the same as in the original, and if
not, the quoted source in the traceback will give you enough context
to find the source line of the problem.

Regards,
Martin

From martin at v.loewis.de  Sun Dec 11 23:58:42 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 11 Dec 2011 23:58:42 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE239A0.2020004@netwok.org>
References: <1323320919.2710.24.camel@thinko>	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>	<1323324644.2710.28.camel@thinko>	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>	<1323325916.2710.39.camel@thinko>	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>	<loom.20111208T161219-187@post.gmane.org>	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>	<jbrllu$99u$1@dough.gmane.org>	<loom.20111209T022519-121@post.gmane.org>
	<4EE239A0.2020004@netwok.org>
Message-ID: <4EE535A2.1000002@v.loewis.de>

> When running 2to3 from a setup.py script, does it run on the whole
> codebase or only files that are found newer by the make-like
> timestamp-based dependency system? 

If you run "build" repeatedly (e.g. in a development cycle), then
it will process only the modified files (comparing time stamps
between the build/ area and the original source).

Regards,
Martin

From martin at v.loewis.de  Mon Dec 12 00:00:43 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 12 Dec 2011 00:00:43 +0100
Subject: [Python-Dev] 2to3 and timestamps
In-Reply-To: <20111209174631.68a311f5@pitrou.net>
References: <1323320919.2710.24.camel@thinko>	<CAPZV6o_BDEp6+FvzjUJLiRbecUzOQ3VfHi8+GXTmo+OTEjzVrA@mail.gmail.com>	<1323324644.2710.28.camel@thinko>	<CAPZV6o--O=oMZ90ir0ZxqhLA=WKFz+tF7Bn8XoPvuRwM9J6gKQ@mail.gmail.com>	<1323325916.2710.39.camel@thinko>	<CADiSq7cTfbm6DmT0s4zNWKzMifDus=K=MH2vFhgii0Sewg_pvg@mail.gmail.com>	<CAB4yi1NeuHDXA5GhoCBQ_i7B5pXn_QH0qejS45Mk5GYYXfipfQ@mail.gmail.com>	<loom.20111208T161219-187@post.gmane.org>	<C9CA33E8-0FC8-446A-A838-BEB1A4880057@leidel.info>	<jbrllu$99u$1@dough.gmane.org>	<loom.20111209T022519-121@post.gmane.org>	<4EE239A0.2020004@netwok.org>
	<20111209174631.68a311f5@pitrou.net>
Message-ID: <4EE5361B.4050408@v.loewis.de>

>> When running 2to3 from a setup.py script, does it run on the whole
>> codebase or only files that are found newer by the make-like
>> timestamp-based dependency system?  If it?s the former, as some messages
>> seem to show (sorry no time to test right now), ISTM we can fix
>> distutils to do the latter (unless there are bugs due to import
>> rewriting to use explicit relative imports when there are extension
>> modules?blergh).
> 
> It would be better to teach 2to3 to do it by itself. Not everybody runs
> 2to3 through a setup.py script.

For the 2to3 command line tool, the issue is where it shall place the
output. It currently supports writing diffs to stdout (without saving
any conversion result), and overwriting the original file (which means
that it loses the original files).

So before you try to consider incremental output, you need to consider
original-preserving saves first.

Regards,
Martin

From martin at v.loewis.de  Mon Dec 12 00:04:24 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 12 Dec 2011 00:04:24 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <7DEE32A7-1426-4E93-8708-BDF3B0CAF8EC@twistedmatrix.com>
References: <1323320919.2710.24.camel@thinko>
	<5242067.5aBSYdFaIB@einstein>	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>	<3344831.JP9Cfj4Ety@einstein>	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>	<4EE12BAA.1050601@v.loewis.de>	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>	<1323408839.2710.143.camel@thinko>	<CAP7+vJJmA7afxXdvNYDFjOaOMkDw1Fcaqu6a+y3F6HyfKcBfMw@mail.gmail.com>
	<7DEE32A7-1426-4E93-8708-BDF3B0CAF8EC@twistedmatrix.com>
Message-ID: <4EE536F8.2010209@v.loewis.de>

> Even in the plans that involve 2to3
> though, "drop everything prior to 2.6" was always supposed to be step 0,
> so "single codebase" adds much less of a burden than I thought.

Are you talking about general porting, or about Twisted?

It is a common misconception that "drop everything prior to 2.6" was
a recommended step 0 for porting to Python 3. That was never
recommended.

Instead, what *was* recommended is "port to Python 2.6", which for many
projects already supporting, say, 2.5, was a no-op, so people read more
into that than was actually necessary. With the project ported to 2.6,
you could then make use of the 3k warnings to learn what issues you
would face when porting to 3k.

Regards,
Martin

From victor.stinner at haypocalc.com  Mon Dec 12 01:54:44 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 12 Dec 2011 01:54:44 +0100
Subject: [Python-Dev] cpython: Document PyUnicode_Copy() and
	PyUnicode_EncodeCodePage()
In-Reply-To: <20111209203216.2c627d61@pitrou.net>
References: <E1RYnC6-0005HV-Ep@dinsdale.python.org>
	<4EE258A2.8020902@haypocalc.com>
	<20111209203216.2c627d61@pitrou.net>
Message-ID: <1618219.EXT6TC1vln@ned>

Le vendredi 9 d?cembre 2011 20:32:16 Antoine Pitrou a ?crit :
> ... it's a bit obscure why the function exists.

Yeah ok, I marked the function as private: renamed to _PyUnicode_Copy() and I 
undocumented it.

Victor

From guido at python.org  Mon Dec 12 04:14:48 2011
From: guido at python.org (Guido van Rossum)
Date: Sun, 11 Dec 2011 19:14:48 -0800
Subject: [Python-Dev] Omission in re.sub?
In-Reply-To: <4EE53070.9010702@mrabarnett.plus.com>
References: <4EE50EB9.3000606@mrabarnett.plus.com>
	<CAP7+vJLwOnMGbvFTF=ONChyW8oVsSeiqoTvrBgOTwg6Fwr0YOA@mail.gmail.com>
	<4EE516F4.4000208@mrabarnett.plus.com>
	<CAP7+vJ+yH0iw3y_9mhuHSk6DGDtEj3U5khf7BXY62p0OqjJvRA@mail.gmail.com>
	<4EE53070.9010702@mrabarnett.plus.com>
Message-ID: <CAP7+vJJHo+CYxpWOO2znfeFkNUQpT7fcoT9qDe+Aq0J4XUdEUg@mail.gmail.com>

On Sun, Dec 11, 2011 at 2:36 PM, MRAB <python at mrabarnett.plus.com> wrote:
> On 11/12/2011 21:04, Guido van Rossum wrote:
>>
>> On Sun, Dec 11, 2011 at 12:47 PM, MRAB<python at mrabarnett.plus.com> ?wrote:
>>>
>>> On 11/12/2011 20:27, Guido van Rossum wrote:
>>>>
>>>>
>>>> On Sun, Dec 11, 2011 at 12:12 PM, MRAB<python at mrabarnett.plus.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> I've just come across an omission in re.sub which I hadn't noticed
>>>>> before.
>>>>>
>>>>> In re.sub the replacement string can contain escape sequences, for
>>>>> example:
>>>>>
>>>>>>>> repr(re.sub(r"x", r"\n", "axb"))
>>>>>
>>>>>
>>>>> "'a\\nb'"
>>>>>
>>>>> However:
>>>>>
>>>>>>>> repr(re.sub(r"x", r"\x0A", "axb"))
>>>>>
>>>>>
>>>>> "'a\\\\x0Ab'"
>>>>>
>>>>> Yes, it doesn't recognise "\xNN".
>>>>>
>>>>> Is there a reason for this?
>>>>>
>>>>> The regex module does the same, but is there any objection to me
>>>>> fixing it in the regex module? (I'm thinking about compatibility
>>>>> with re here.)
>>>>
>>>>
>>>>
>>>> As long as there's a way to place a single backslash in the output
>>>> this seems fine to me, though I'm not sure it's important. Of course
>>>> it will likely break some test... the test will then have to be
>>>> fixed.
>>>>
>>>> I can't remember why we did this -- is there a full list of all the
>>>> escapes that re.sub() interprets somewhere? I thought it was pretty
>>>> limited. Maybe it's the related list of escapes that are supported
>>>> in regular expressions?
>>>>
>>> The documentation says: """That is, \n is converted to a single newline
>>> character, \r is converted to a linefeed, and so forth."""
>>>
>>> All of the other escape sequences work as expected, except for \uNNNN
>>> and \UNNNNNNNN which aren't supported at all in re.
>>>
>>> I should probably also add \N{...} to the list for completeness.
>>>
>> I guess the current rule is that any escapes referring to characters
>> by a numeric value are not supported; this probably made some kind of
>> sense because \1 etc. are backreferences. But since we're discouraging
>> octal escapes anyway I think it's fine to improve over this.
>>
> A pattern can contain them, even octal escapes (must be 3 digits).

Fine, then I think we should model this. Though I think that we could
start deprecating octal escapes in patterns so that eventually we can
support over 99 backreferences. So maybe we should just not start
supporting octal in the substitution string now.

-- 
--Guido van Rossum (python.org/~guido)

From ethan at stoneleaf.us  Mon Dec 12 08:32:37 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sun, 11 Dec 2011 23:32:37 -0800
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4EE528BD.2040102@v.loewis.de>
References: <jbsfar$en7$1@dough.gmane.org>
	<4EE1C9AB.2040301@v.loewis.de>	<E8A216E9-6AC9-4307-8250-B9B185604B82@masklinn.net>
	<4EE528BD.2040102@v.loewis.de>
Message-ID: <4EE5AE15.7060208@stoneleaf.us>

Martin,

You seem heavily invested in minidom.

In the near future I will need to parse and rewrite parts of an xml file 
created by a third-party program (PrintShopMail, for the curious).
It contains both binary and textual data.

Would you recommend minidom for this purpose?  What other purposes would 
you recommend minidom for?

xml-confused-ly yours,

~Ethan~

(Comments by others are, of course, also welcome. :)

From chrism at plope.com  Mon Dec 12 09:40:42 2011
From: chrism at plope.com (Chris McDonough)
Date: Mon, 12 Dec 2011 03:40:42 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
Message-ID: <1323679242.2710.350.camel@thinko>

On Sat, 2011-12-10 at 15:55 +1000, Nick Coghlan wrote:

> So I'm back to being -1 on the idea of adding back u'' literals for
> 3.3. Instead, people should explicitly call str() on any literals that
> they want to be actual str instances both in 3.x and in 2.x when the
> unicode literals future import is in effect.

After thinking on it a while, I can't see anything wrong with this
strategy except for the 10X performance hit for defining native
literals.

Truth be told, in the vast majority of WSGI apps only high-level WSGI
libraries (like WebOb and Werkzeug) and standalone middleware really
needs to work with native strings.  And the middleware really should be
using the high-level libraries to parse WSGI anyway.  So there are a
finite number of places where it's actually a real issue.

As someone who ported WebOb and other stuff built on top of it to Python
3 without using "from __future__ import unicode_literals", I'm kinda sad
that to be using best practice I'll have to go back and flip the
polarity on everything.  It's my cross to bear, though.  If I have any
issue with it in the future I'll bring u'' back up.

- C




From stefan_ml at behnel.de  Mon Dec 12 10:04:22 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 12 Dec 2011 10:04:22 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4EE53139.8020500@v.loewis.de>
References: <jbsfar$en7$1@dough.gmane.org>
	<4EE1C9AB.2040301@v.loewis.de>	<jbsile$4vu$1@dough.gmane.org>
	<4EE53139.8020500@v.loewis.de>
Message-ID: <jc4g2m$5hn$1@dough.gmane.org>

"Martin v. L?wis", 11.12.2011 23:39:
>> I can't recall anyone working on any substantial improvements during the
>> last six years or so, and the reason for that seems obvious to me.
>
> What do you think is the reason? It's not at all obvious to me.

Just to repeat myself for the third time here: lack of interest.

Stefan


From lars at gustaebel.de  Mon Dec 12 10:28:16 2011
From: lars at gustaebel.de (lars at gustaebel.de)
Date: Mon, 12 Dec 2011 10:28:16 +0100
Subject: [Python-Dev] cpython: Issue #5689: Add support for lzma
 compression to the tarfile module.
In-Reply-To: <20111211234506.071db305@pitrou.net>
References: <E1RZSmP-0003XL-SZ@dinsdale.python.org>
	<20111211234506.071db305@pitrou.net>
Message-ID: <20111212092815.GA19922@axis.g33x.de>

On Sun, Dec 11, 2011 at 11:45:06PM +0100, Antoine Pitrou wrote:
> On Sat, 10 Dec 2011 20:40:17 +0100
> lars.gustaebel <python-checkins at python.org> wrote:
> >  
> >  The :mod:`tarfile` module makes it possible to read and write tar
> > -archives, including those using gzip or bz2 compression.
> > +archives, including those using gzip, bz2 and lzma compression.
> >  (:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
> 
> Perhaps there should be a "versionchanged" directive for lzma support?

This is now fixed.

-- 
Lars Gust?bel
lars at gustaebel.de

There's no present. There's only the immediate future and
the recent past.
(George Carlin)

From stefan_ml at behnel.de  Mon Dec 12 10:59:23 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 12 Dec 2011 10:59:23 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4EE528BD.2040102@v.loewis.de>
References: <jbsfar$en7$1@dough.gmane.org>
	<4EE1C9AB.2040301@v.loewis.de>	<E8A216E9-6AC9-4307-8250-B9B185604B82@masklinn.net>
	<4EE528BD.2040102@v.loewis.de>
Message-ID: <jc4j9r$rbt$1@dough.gmane.org>

"Martin v. L?wis", 11.12.2011 23:03:
> Am 09.12.2011 10:09, schrieb Xavier Morel:
>> On 2011-12-09, at 09:41 , Martin v. L?wis wrote:
>>>> a) The stdlib documentation should help users to choose the right
>>>> tool right from the start. Instead of using the totally
>>>> misleading wording that it uses now, it should be honest about
>>>> the performance characteristics of MiniDOM and should actively
>>>> suggest that those who don't know what to choose (or even *that*
>>>> they can choose) should not use MiniDOM in the first place.
>>>
> [...]
>>
>> Minidom is inferior in interface flow and pythonicity, in terseness,
>> in speed, in memory consumption (even more so using cElementTree, and
>> that's not something which can be fixed unless minidom gets a C
>> accelerator), etc? Even after fixing minidom (if anybody has the time
>> and drive to commit to it), ET/cET should be preferred over it.
>
> I don't mind pointing people to ElementTree, despite that I disagree
> whether the ET interface is "superior" to DOM.

Yes, that's clearly a point where we agree to disagree, and I understand 
that you are as biased towards minidom as I am biased towards ElementTree.

However, I think I made it clear that the implementation of cElementTree 
(and lxml.etree as well, for that purpose) is largely superiour to MiniDOM 
in terms of performance, for any sensible meaning of the word performance.

And I'm also convinced that the API is largely superiour in terms of 
usability. ET certainly matches Python as a language much better than 
MiniDOM. But that's just my personal opinion.


> It's Stefan's reasoning
> as to *why* people should be pointed to ET, and what words should be
> used to do that. IOW, I detest bashing some part of the standard
> library, just to urge users to use some other part of the standard library.

I'm all for finding a good way of putting it into words, as long as it 
keeps uninformed users from taking the wrong decision and getting the wrong 
idea of how complicated and slow Python is.


> People are still using PyXML, despite it's not being maintained anymore.

My experience with that is that it's only *new* users that are still 
running into PyXML by accident, because they didn't see that it's a dead 
project and they find it through ancient web pages that tell them that they 
need it because "it's the way to do XML in Python" and "if minidom is not 
enough, use PyXML". Maybe we should "misuse" the stdlib documentation to 
clear that up as well. "PyXML" is just too attractive a name for a dead 
project.

Just look through the xml-sig page, basically all requests regarding PyXML 
during the last five years deal with problems in installing it, i.e. 
*before* even starting to use it. So you can't use this to claim that 
people really *are* still using it.


> Telling them to replace 4DOM with minidom is much more appropriate

Do you actually have any evidence that anyone is still actively using 4DOM?


> than telling them to rewrite in ET.

I usually encourage people to rewrite minidom code for ET. It makes the 
code simpler, more readable, more maintainable and much faster.

Stefan


From stefan_ml at behnel.de  Mon Dec 12 11:08:44 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 12 Dec 2011 11:08:44 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jc4j9r$rbt$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>	<4EE1C9AB.2040301@v.loewis.de>	<E8A216E9-6AC9-4307-8250-B9B185604B82@masklinn.net>	<4EE528BD.2040102@v.loewis.de>
	<jc4j9r$rbt$1@dough.gmane.org>
Message-ID: <jc4jrc$v8b$1@dough.gmane.org>

Stefan Behnel, 12.12.2011 10:59:
> Just look through the xml-sig page

Hmm, I meant "xml-sig mailing list archive" here ...

Stefan


From pje at telecommunity.com  Mon Dec 12 15:50:46 2011
From: pje at telecommunity.com (PJ Eby)
Date: Mon, 12 Dec 2011 09:50:46 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <1323679242.2710.350.camel@thinko>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
Message-ID: <CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>

On Mon, Dec 12, 2011 at 3:40 AM, Chris McDonough <chrism at plope.com> wrote:

> Truth be told, in the vast majority of WSGI apps only high-level WSGI
> libraries (like WebOb and Werkzeug) and standalone middleware really
> needs to work with native strings.  And the middleware really should be
> using the high-level libraries to parse WSGI anyway.  So there are a
> finite number of places where it's actually a real issue.
>

And those only if they're using "six" or a similar joint-codebase strategy,
*and* using unicode_literals in a 2.x module that also does WSGI.  If
they're using 2to3 and stick with explicit u'', they'll be fine.

Unfortunately, AFAIR, nobody in the PEP 3333 discussions brought up either
the unicode_literals import OR the strategy of using a common codebase, so
2to3 on plain code and writing new Python3 code were the only porting
scenarios discussed.  (Not that I'm sure it would've made a difference, as
I'm not sure what we could have done differently that would still support
simple Python3 code and easy 2to3 porting.)

As someone who ported WebOb and other stuff built on top of it to Python
> 3 without using "from __future__ import unicode_literals", I'm kinda sad
> that to be using best practice I'll have to go back and flip the
> polarity on everything.


Eh?  If you don't need unicode_literals, what's the problem?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111212/f6c13974/attachment.html>

From chrism at plope.com  Mon Dec 12 22:18:40 2011
From: chrism at plope.com (Chris McDonough)
Date: Mon, 12 Dec 2011 16:18:40 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
Message-ID: <1323724720.2710.388.camel@thinko>

On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote:


>         As someone who ported WebOb and other stuff built on top of it
>         to Python
>         3 without using "from __future__ import unicode_literals", I'm
>         kinda sad
>         that to be using best practice I'll have to go back and flip
>         the
>         polarity on everything.
> 
> 
> Eh?  If you don't need unicode_literals, what's the problem?

Porting the WebOb code sucked.  It's only about 5K lines of code but the
porting effort took me about 80 hours.  Some of the problem is certainly
my own idiocy, but some of it is just because straddling code across
Python 2 and Python 3 currently requires that you change lots and lots
of code for suspect benefit.

- C




From ericsnowcurrently at gmail.com  Mon Dec 12 22:44:56 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Mon, 12 Dec 2011 14:44:56 -0700
Subject: [Python-Dev] (no subject)
Message-ID: <CALFfu7CXv3E1TnoEYsx4+srQ5CUoA1=Rpkma-RJE5OKacE5=7Q@mail.gmail.com>

Guido posted this on Google+:

> IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page).

Will this document have a broad use, such that we should make sure it
is accurate (to avoid any future confusion)?  I skimmed through and
found that it covers a lot of ground, not necessarily about
vulnerabilities, with some inaccuracies but not a ton that I noticed.
If it doesn't matter then no big deal.  Just thought I'd bring it up.

-eric

From ericsnowcurrently at gmail.com  Mon Dec 12 22:46:32 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Mon, 12 Dec 2011 14:46:32 -0700
Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities
Message-ID: <CALFfu7Av3jiaSpSdVdVV5K_28ZrhArQ2dzRF39c2egZxKpmi-w@mail.gmail.com>

re-sending with subject :)

On Mon, Dec 12, 2011 at 2:44 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> Guido posted this on Google+:
>
>> IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page).
>
> Will this document have a broad use, such that we should make sure it
> is accurate (to avoid any future confusion)? ?I skimmed through and
> found that it covers a lot of ground, not necessarily about
> vulnerabilities, with some inaccuracies but not a ton that I noticed.
> If it doesn't matter then no big deal. ?Just thought I'd bring it up.
>
> -eric

From guido at python.org  Mon Dec 12 22:52:49 2011
From: guido at python.org (Guido van Rossum)
Date: Mon, 12 Dec 2011 13:52:49 -0800
Subject: [Python-Dev] (no subject)
In-Reply-To: <CALFfu7CXv3E1TnoEYsx4+srQ5CUoA1=Rpkma-RJE5OKacE5=7Q@mail.gmail.com>
References: <CALFfu7CXv3E1TnoEYsx4+srQ5CUoA1=Rpkma-RJE5OKacE5=7Q@mail.gmail.com>
Message-ID: <CAP7+vJ+vrmh_iV+pHQDqvdCqYNKeszEP39hboHVOskg0ckwE0w@mail.gmail.com>

The authors are definitely interested in feedback! Best probably to
post it to my G+ thread.

On Mon, Dec 12, 2011 at 1:44 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> Guido posted this on Google+:
>
>> IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page).
>
> Will this document have a broad use, such that we should make sure it
> is accurate (to avoid any future confusion)? ?I skimmed through and
> found that it covers a lot of ground, not necessarily about
> vulnerabilities, with some inaccuracies but not a ton that I noticed.
> If it doesn't matter then no big deal. ?Just thought I'd bring it up.
>
> -eric
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)

From victor.stinner at haypocalc.com  Mon Dec 12 23:56:50 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 12 Dec 2011 23:56:50 +0100
Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities
In-Reply-To: <CALFfu7Av3jiaSpSdVdVV5K_28ZrhArQ2dzRF39c2egZxKpmi-w@mail.gmail.com>
References: <CALFfu7Av3jiaSpSdVdVV5K_28ZrhArQ2dzRF39c2egZxKpmi-w@mail.gmail.com>
Message-ID: <4EE686B2.6040806@haypocalc.com>

>>> IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page).

Random comments. I didn't read everything.

--

"Vulnerability descriptions for the language Python Standards
and terminology based on the 3.x standard only."
(...)
"Automatic conversion also occurs when an integer becomes too large to 
fit within the constraints of the large integer specified in the 
language (typically C) used to create the Python interpreter. On a 
32?bit machine this would be the range ?2^30 to 2^30?1. When an integer 
becomes too large to fit into that range it is converted to an extended 
precision integer of arbitrary length."
(...)
"otherwise, if either argument is a floating point number, the other is 
converted to floating otherwise, if either argument is a long integer, 
the other is converted to long integer;"

10 and 2**1024 have the same type (int) in Python 3. I don't really 
understand what "extended precision" means. There are no more "long" 
integers.

--

"Python.16 Wrap?around Error [XYY]"
(...)
"... exception handling for floating point operations cannot be assumed 
to catch this type of error because they are not standardized in the 
underlying C language."

Can you give me an example of such problem? If there is really an issue, 
can we configure the FPU to catch such error?

pyfpe.h has PyFPE_START_PROTECT and PyFPE_END_PROTECT macros, but they 
do nothing by default. You can to enable this protection using 
./configure --with-fpectl.

--

"if(y > 0):print(x)"

Even if this example is valid, it is surprising to see parenthesis 
around the condition in Python.

"if y > 0: print(x)"
or even
"if y > 0:
     print(x)"

would be better.

--

"Python also encourages structured programming by not introducing any of 
the following constructs which could easily lead to unstructured code:

- Labels and branching statements such as GO TO;
- Case, GO TO DEPENDING, EVALUATE, switch and other statements that 
branch dependent on a variable?s value; and
- ALTER which changes GO TO label to branch to a different label."

You have to modify the language (and so build your own interpreter) to 
add a "goto" instruction to Python. Or do you mean that someone may want 
to implement something like goto using exceptions for example?

--

"When sorting a list using the sort() method, attempting to inspect or 
mutate the content of the list will result in undefined behaviour."

Oh... I never imagined such "use case". Let's try:

$ ./python
Python 3.3.0a0 (default:3ad7d01acbf4+, Dec 12 2011, 21:07:55)
 >>> def hack(x):
...  mylist.append(10)
...  return
...
 >>> mylist=[1]
 >>> mylist.sort(key=hack)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: list modified during sort

Same behaviour with Python 2.7 and 3.2: so the Python behaviour is 
defined, you get a ValueError.

Are there other ways to inspect or mutate a list while sorting it?

--

"The sequence of keys in a dictionary is undefined because the hashing 
function used to index the keys is unspecified therefore different 
implementations are likely to yield different sequences."

Exact. You might mention that collections.OrderedDict has a defined 
behaviour: it lists keys (and values) in the insertion order.

--

"Mixing tabs and spaces to indent is defined differently for UNIX and 
non?UNIX platforms;"

You can use the -tt command line option to raise an IndentationError (a 
block can still be indented using spaces and tabs).

Victor

From ncoghlan at gmail.com  Tue Dec 13 01:14:08 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 13 Dec 2011 10:14:08 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
Message-ID: <CADiSq7fCE4UtS0LjXH91jkuYjtNCaZBoO1NoQHiKCg0JsdfVHA@mail.gmail.com>

On Tue, Dec 13, 2011 at 12:50 AM, PJ Eby <pje at telecommunity.com> wrote:
> Unfortunately, AFAIR, nobody in the PEP 3333 discussions brought up either
> the unicode_literals import OR the strategy of using a common codebase, so
> 2to3 on plain code and writing new Python3 code were the only porting
> scenarios discussed. ?(Not that I'm sure it would've made a difference, as
> I'm not sure what we could have done differently that would still support
> simple Python3 code and easy 2to3 porting.)

That's not web-sig's fault though - it's only as people have been
trying it and *succeeding* that we've come to realise that single code
base approaches are significantly more feasible than we originally
anticipated. Now, depending on whether you need to support 2.5 and
earlier, we even have a reasonable answer to the native strings
problem:

If supporting only 2.6+, use "from __future__ import unicode_literals"
and the 'str' builtin:

    Import at top of module: "from __future__ import unicode_literals"
    Text: ""
    Native: str("")
    Binary: b""

If also supporting 2.5 and earlier, use "six" (or an equivalent
compatibility module):

    Import at top of module: "from six import u, b"
    Text: u("")
    Native: ""
    Binary: b("")

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From wolfson at gmail.com  Tue Dec 13 04:56:16 2011
From: wolfson at gmail.com (Ben Wolfson)
Date: Mon, 12 Dec 2011 19:56:16 -0800
Subject: [Python-Dev] str.format implementation
Message-ID: <CAPc-aXkg37nqw1_mBFOU7Om8dS2vYvGsJbEzF2CQSXaPW1jMog@mail.gmail.com>

Hi,

I'm hoping to get some kind of consensus about the divergences between
the implementation and documentation of str.format
(http://mail.python.org/pipermail/python-dev/2011-June/111860.html and
the linked bug report contain examples of the divergences). These
pertain to the arg_name, attribute_name, and element_index fields of
the grammar in the docs:

    replacement_field ::=  "{" [field_name] ["!" conversion] [":"
format_spec] "}"
    field_name        ::=  arg_name ("." attribute_name | "["
element_index "]")*
    arg_name          ::=  [identifier | integer]
    attribute_name    ::=  identifier
    element_index     ::=  integer | index_string
    index_string      ::=  <any source character except "]"> +

Nothing definitive emerged from the last round of discussion, and as
far as I can recall there are now three proposals for what kind of
changes might be worth making:

 (1) the implementation should conform to the docs;*
 (2) like (1) with the change that element_index should be changed to
"integer | identifier" (rendering index_string otiose);
 (3) like (1) with the change that index_string should be changed to
'<any source character except "]", "}", or "{">'.

* the docs link "integer" to
http://docs.python.org/reference/lexical_analysis.html#grammar-token-integer
but the current implementation only allows decimal integers, which
seems reasonable and worth retaining.

(2) was suggested by Greg Ewing on python-dev and (3) by Petri
Lehtinen in the bug report. (Petri actually suggested that braces be
disallowed except for the nesting in the format_spec, but it comes to
the same thing.)

None of these should be difficult to implement; patches exist for (1)
and (2). (2) and (3) would lead to format strings that are easier to
for the programmer to visually parse; (1) would make the indexing part
of the replacement field conform more closely to the way indexing with
strings behaves in Python generally, where arbitrary strings can be
used. (It wouldn't conform exactly, obviously, since ']' would still
be excluded.)

I personally would prefer (1) to (2) or (3), and (3) to (2), had I my
druthers, but it doesn't matter a *whole* lot to me; I'd prefer any of
them to nothing (or to changing the docs to reflect the current batty
behavior).

-- 
Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks,
which may be sweet, aromatic, fermented or spirit-based. ... Family
and social life also offer numerous other occasions to consume drinks
for pleasure." [Larousse, "Drink" entry]

From jimjjewett at gmail.com  Tue Dec 13 08:09:02 2011
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 13 Dec 2011 02:09:02 -0500
Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions
Message-ID: <CA+OGgf6CrT5Pi+f698qK9PnYfB0WqhHncwXZMsDuhLznYPYgNw@mail.gmail.com>

(see http://www.python.org/dev/peps/pep-0393/ and
http://hg.python.org/cpython/file/6f097ff9ac04/Include/unicodeobject.h
)


	typedef struct {
	  PyObject_HEAD
	  Py_ssize_t length;
	  Py_hash_t hash;
	  struct {
		  unsigned int interned:2;
		  unsigned int kind:2;   /* now 3 in implementation */
		  unsigned int compact:1;
		  unsigned int ascii:1;
		  unsigned int ready:1;
	  } state;
	  wchar_t *wstr;
	} PyASCIIObject;

	typedef struct {
	  PyASCIIObject _base;
	  Py_ssize_t utf8_length;
	  char *utf8;
	  Py_ssize_t wstr_length;
	} PyCompactUnicodeObject;

	typedef struct {
	  PyCompactUnicodeObject _base;
	  union {
		  void *any;
		  Py_UCS1 *latin1;
		  Py_UCS2 *ucs2;
		  Py_UCS4 *ucs4;
	  } data;
	} PyUnicodeObject;

(1)  Why is PyObject_HEAD used instead of PyObject_VAR_HEAD?  It is
because of the names (.length vs .size), or a holdover from when
unicode (as opposed to str) did not expect to be compact, or is there
a deeper reason?

(2)  Why does PyASCIIObject have a wstr member, and why does
PyCompactUnicodeObject have wstr_length?  As best I can tell from the
PEP or header file, wstr is only meaningful when either:

    (2a)  wstr is shared with (and redundant to) the canonical representation
         -- which will therefore not be ASCII.  So wstr (and
wstr_length) shouldn't need to be
        represented explicitly, and certainly not in the PyASCIIObject base.

or

    (2b)  The string is a "Legacy String" (and PyUnicode_READY has not
been called).  Because
        it is a Legacy String, the object header must already be a
full PyUnicodeObject, and the wstr
        fields could at least be stored there.

        I'm also not sure why wstr can't be stored in the existing
.data member -- once PyUnicode_READY
        is called, it will either be there (shared) or be discarded.

        Are there other times when the wstr will be explicitly
re-filled and cached?

(3)  I would feel much less nervous if the remaining 4 values of
PyUnicode_Kind were explicitly reserved, and the macros raised an
error when they showed up.  (Better still would be to allow other
values, and to have the macros delegate to some attribute on the (sub)
type object.)

Discussion on py-ideas strongly suggested that people should not be
rolling their own string string representations, and that it won't
really save as much as people think it will, etc ... but I'm not sure
that saying "do it without inheritance" is the best solution -- and
that is what treating kind as an exhaustive list does.

-jJ

From martin at v.loewis.de  Tue Dec 13 08:55:02 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 13 Dec 2011 08:55:02 +0100
Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions
In-Reply-To: <CA+OGgf6CrT5Pi+f698qK9PnYfB0WqhHncwXZMsDuhLznYPYgNw@mail.gmail.com>
References: <CA+OGgf6CrT5Pi+f698qK9PnYfB0WqhHncwXZMsDuhLznYPYgNw@mail.gmail.com>
Message-ID: <4EE704D6.5000901@v.loewis.de>


> (1)  Why is PyObject_HEAD used instead of PyObject_VAR_HEAD?  It is
> because of the names (.length vs .size), or a holdover from when
> unicode (as opposed to str) did not expect to be compact, or is there
> a deeper reason?

The unicode object is not a var object. In a var object, tp_itemsize
gives the element size, which is not possible for unicode objects,
since the itemsize may vary by instance. In addition, not all instances
have the items after the base object (plus the size of the base object
in tp_basicsize is also not always correct).

> (2)  Why does PyASCIIObject have a wstr member, and why does
> PyCompactUnicodeObject have wstr_length?  As best I can tell from the
> PEP or header file, wstr is only meaningful when either:

No. wstr is most of all relevant if someone calls
PyUnicode_AsUnicode(AndSize); any unicode object might get the wstr
pointer filled out at some point. It can be shared only if
sizeof(Py_UNICODE) matches the canonical width of the string.

wstr_length is only relevant if wstr is not NULL. For a pure ASCII
string (and also for Latin-1 and other BMP strings), the wstr length
will always equal the canonical length (number of code points). Only
for ASCII objects the optimization was made to drop the wstr_length
from the representation.

>         I'm also not sure why wstr can't be stored in the existing
> .data member -- once PyUnicode_READY
>         is called, it will either be there (shared) or be discarded.

Most objects won't have the .data member. For those that do, .data
holds the canonical representation (and *only* after PyUnicode_READY
has been called).

> (3)  I would feel much less nervous if the remaining 4 values of
> PyUnicode_Kind were explicitly reserved, and the macros raised an
> error when they showed up.  (Better still would be to allow other
> values, and to have the macros delegate to some attribute on the (sub)
> type object.)
> 
> Discussion on py-ideas strongly suggested that people should not be
> rolling their own string string representations, and that it won't
> really save as much as people think it will, etc ... but I'm not sure
> that saying "do it without inheritance" is the best solution -- and
> that is what treating kind as an exhaustive list does.

If people use C, they can construct all kinds of "illegal"
representations, for any object (e.g. lists where the stored length
differs from the actual length, dictionaries where key an value are
switched, and so on). If they do that, they likely get crashes and
other failures, so they quickly stop doing it. In the specific case
of kind values: many places will either work incorrectly, or have
an assertion in debug mode already if an unexpected kind is
encountered. I don't mind adding such checks to more places, but I
also don't see a need to explicitly care about this specific class
of bugs where people would have to deliberately try to "cheat".

Regards,
Martin

From raymond.hettinger at gmail.com  Tue Dec 13 09:37:20 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Tue, 13 Dec 2011 00:37:20 -0800
Subject: [Python-Dev] str.format implementation
In-Reply-To: <CAPc-aXkg37nqw1_mBFOU7Om8dS2vYvGsJbEzF2CQSXaPW1jMog@mail.gmail.com>
References: <CAPc-aXkg37nqw1_mBFOU7Om8dS2vYvGsJbEzF2CQSXaPW1jMog@mail.gmail.com>
Message-ID: <A1950A33-499B-4C9A-9708-0C21CA3DDF4E@gmail.com>


On Dec 12, 2011, at 7:56 PM, Ben Wolfson wrote:

> I personally would prefer (1) to (2) or (3), and (3) to (2), had I my
> druthers, but it doesn't matter a *whole* lot to me; I'd prefer any of
> them to nothing (or to changing the docs to reflect the current batty
> behavior).

+1 on changing the batty behavior.


Raymond

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111213/cb5321b3/attachment.html>

From ncoghlan at gmail.com  Tue Dec 13 11:11:07 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 13 Dec 2011 20:11:07 +1000
Subject: [Python-Dev] str.format implementation
In-Reply-To: <A1950A33-499B-4C9A-9708-0C21CA3DDF4E@gmail.com>
References: <CAPc-aXkg37nqw1_mBFOU7Om8dS2vYvGsJbEzF2CQSXaPW1jMog@mail.gmail.com>
	<A1950A33-499B-4C9A-9708-0C21CA3DDF4E@gmail.com>
Message-ID: <CADiSq7fJ0qkZ4i40AmYTfJ_03FrEckKu+HVPhn-FiNUDGSjnjg@mail.gmail.com>

On Tue, Dec 13, 2011 at 6:37 PM, Raymond Hettinger
<raymond.hettinger at gmail.com> wrote:
>
> On Dec 12, 2011, at 7:56 PM, Ben Wolfson wrote:
>
> I personally would prefer (1) to (2) or (3), and (3) to (2), had I my
> druthers, but it doesn't matter a *whole* lot to me; I'd prefer any of
> them to nothing (or to changing the docs to reflect the current batty
> behavior).
>
>
> +1 on changing the batty behavior.

Skimming my comments from last time this came up, +1 on just going
with what the docs say. The PEP underspecified it, so taking the docs
as the spec for this aspect seems like a reasonable course of action.

Cheers,
Nick.


-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From petri at digip.org  Tue Dec 13 10:44:53 2011
From: petri at digip.org (Petri Lehtinen)
Date: Tue, 13 Dec 2011 11:44:53 +0200
Subject: [Python-Dev] str.format implementation
In-Reply-To: <CAPc-aXkg37nqw1_mBFOU7Om8dS2vYvGsJbEzF2CQSXaPW1jMog@mail.gmail.com>
References: <CAPc-aXkg37nqw1_mBFOU7Om8dS2vYvGsJbEzF2CQSXaPW1jMog@mail.gmail.com>
Message-ID: <20111213094453.GD27440@p16>

Ben Wolfson wrote:
> Hi,
> 
> I'm hoping to get some kind of consensus about the divergences between
> the implementation and documentation of str.format
> (http://mail.python.org/pipermail/python-dev/2011-June/111860.html and
> the linked bug report contain examples of the divergences). These
> pertain to the arg_name, attribute_name, and element_index fields of
> the grammar in the docs:
> 
>     replacement_field ::=  "{" [field_name] ["!" conversion] [":"
> format_spec] "}"
>     field_name        ::=  arg_name ("." attribute_name | "["
> element_index "]")*
>     arg_name          ::=  [identifier | integer]
>     attribute_name    ::=  identifier
>     element_index     ::=  integer | index_string
>     index_string      ::=  <any source character except "]"> +
> 
> Nothing definitive emerged from the last round of discussion, and as
> far as I can recall there are now three proposals for what kind of
> changes might be worth making:
> 
>  (1) the implementation should conform to the docs;*
>  (2) like (1) with the change that element_index should be changed to
> "integer | identifier" (rendering index_string otiose);
>  (3) like (1) with the change that index_string should be changed to
> '<any source character except "]", "}", or "{">'.
> 
> * the docs link "integer" to
> http://docs.python.org/reference/lexical_analysis.html#grammar-token-integer
> but the current implementation only allows decimal integers, which
> seems reasonable and worth retaining.
> 
> (2) was suggested by Greg Ewing on python-dev and (3) by Petri
> Lehtinen in the bug report. (Petri actually suggested that braces be
> disallowed except for the nesting in the format_spec, but it comes to
> the same thing.)
> 
> None of these should be difficult to implement; patches exist for (1)
> and (2). (2) and (3) would lead to format strings that are easier to
> for the programmer to visually parse; (1) would make the indexing part
> of the replacement field conform more closely to the way indexing with
> strings behaves in Python generally, where arbitrary strings can be
> used. (It wouldn't conform exactly, obviously, since ']' would still
> be excluded.)
> 
> I personally would prefer (1) to (2) or (3), and (3) to (2), had I my
> druthers, but it doesn't matter a *whole* lot to me; I'd prefer any of
> them to nothing (or to changing the docs to reflect the current batty
> behavior).

+1 for changing. And as I've said before, I prefer proposal (3).

From amauryfa at gmail.com  Tue Dec 13 11:37:32 2011
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Tue, 13 Dec 2011 11:37:32 +0100
Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities
In-Reply-To: <4EE686B2.6040806@haypocalc.com>
References: <CALFfu7Av3jiaSpSdVdVV5K_28ZrhArQ2dzRF39c2egZxKpmi-w@mail.gmail.com>
	<4EE686B2.6040806@haypocalc.com>
Message-ID: <CAGmFidYEcgxHLYEi=A9zM0QPFskMcinY8U1MZQDMdMzxJTnNVA@mail.gmail.com>

2011/12/12 Victor Stinner <victor.stinner at haypocalc.com>

> "When sorting a list using the sort() method, attempting to inspect or
> mutate the content of the list will result in undefined behaviour."


But is this even true? in listobject.c::listsort(), since 2002,
/* The list is temporarily made empty, so that mutations performed
 * by comparison functions can't affect the slice of memory we're
 * sorting (allowing mutations during sorting is a core-dump
 * factory, since ob_item may change).
 */
So behaviour is not undefined at all... maybe this report is only based on
note #10 of the documentation:
http://docs.python.org/library/stdtypes.html#mutable-sequence-types
and only considers python 2.2 or older...

-- 
Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111213/7595f764/attachment.html>

From arigo at tunes.org  Tue Dec 13 14:13:55 2011
From: arigo at tunes.org (Armin Rigo)
Date: Tue, 13 Dec 2011 14:13:55 +0100
Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities
In-Reply-To: <CAGmFidYEcgxHLYEi=A9zM0QPFskMcinY8U1MZQDMdMzxJTnNVA@mail.gmail.com>
References: <CALFfu7Av3jiaSpSdVdVV5K_28ZrhArQ2dzRF39c2egZxKpmi-w@mail.gmail.com>
	<4EE686B2.6040806@haypocalc.com>
	<CAGmFidYEcgxHLYEi=A9zM0QPFskMcinY8U1MZQDMdMzxJTnNVA@mail.gmail.com>
Message-ID: <CAMSv6X0Uzoi_xuSGsdo_vLMYt+T3Y5=brWk3Hx0+X=iGyVk-zg@mail.gmail.com>

Hi,

On Tue, Dec 13, 2011 at 11:37, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote:
>> "When sorting a list using the sort() method, attempting to inspect or
>> mutate the content of the list will result in undefined behaviour."
>
> (...)
> So behaviour is not undefined at all...

No, the behavior _is_ undefined.  The comment you cited says that it
cannot crash the Python interpreter; additionally, it makes a
best-effort attempt at catching such accesses and raising ValueError.
But I think I can build a strange-looking example where you mutate a
list during sorting and don't get a ValueError (although admittedly it
needs a lot of hacking to do that nowadays, e.g. multiple threads).


A bient?t,

Armin.

From l at lrowe.co.uk  Tue Dec 13 14:33:42 2011
From: l at lrowe.co.uk (Laurence Rowe)
Date: Tue, 13 Dec 2011 14:33:42 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
Message-ID: <op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>

On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough <chrism at plope.com>  
wrote:

> On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote:
>
>
>>         As someone who ported WebOb and other stuff built on top of it
>>         to Python
>>         3 without using "from __future__ import unicode_literals", I'm
>>         kinda sad
>>         that to be using best practice I'll have to go back and flip
>>         the
>>         polarity on everything.
>>
>>
>> Eh?  If you don't need unicode_literals, what's the problem?
>
> Porting the WebOb code sucked.  It's only about 5K lines of code but the
> porting effort took me about 80 hours.  Some of the problem is certainly
> my own idiocy, but some of it is just because straddling code across
> Python 2 and Python 3 currently requires that you change lots and lots
> of code for suspect benefit.

Could this manual work be cut down if there was a version of 2to3 that  
targeted the subset of the language that is compatible with both 2 and 3?  
That would seem to avoid most of the drawbacks to the current 2to3  
approach.

Laurence


From amauryfa at gmail.com  Tue Dec 13 14:35:08 2011
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Tue, 13 Dec 2011 14:35:08 +0100
Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities
In-Reply-To: <CAMSv6X0Uzoi_xuSGsdo_vLMYt+T3Y5=brWk3Hx0+X=iGyVk-zg@mail.gmail.com>
References: <CALFfu7Av3jiaSpSdVdVV5K_28ZrhArQ2dzRF39c2egZxKpmi-w@mail.gmail.com>
	<4EE686B2.6040806@haypocalc.com>
	<CAGmFidYEcgxHLYEi=A9zM0QPFskMcinY8U1MZQDMdMzxJTnNVA@mail.gmail.com>
	<CAMSv6X0Uzoi_xuSGsdo_vLMYt+T3Y5=brWk3Hx0+X=iGyVk-zg@mail.gmail.com>
Message-ID: <CAGmFidZm6aAwst6KOG1HPLdZkbkz3h15Jcybeh0AT1Rv+ni-8w@mail.gmail.com>

2011/12/13 Armin Rigo <arigo at tunes.org>

> No, the behavior _is_ undefined.  The comment you cited says that it
> cannot crash the Python interpreter; additionally, it makes a
> best-effort attempt at catching such accesses and raising ValueError.
> But I think I can build a strange-looking example where you mutate a
> list during sorting and don't get a ValueError (although admittedly it
> needs a lot of hacking to do that nowadays, e.g. multiple threads).
>

I'm interested to see how!
The current implementation installs an empty array in the list,
and the initial array is only held by a local variable in listsort().
even gc.get_referrers() can return the empty list...

-- 
Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111213/9c40a186/attachment.html>

From fuzzyman at voidspace.org.uk  Tue Dec 13 14:42:12 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Tue, 13 Dec 2011 13:42:12 +0000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
Message-ID: <4EE75634.6000208@voidspace.org.uk>

On 13/12/2011 13:33, Laurence Rowe wrote:
> On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough <chrism at plope.com> 
> wrote:
>
>> On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote:
>>
>>
>>>         As someone who ported WebOb and other stuff built on top of it
>>>         to Python
>>>         3 without using "from __future__ import unicode_literals", I'm
>>>         kinda sad
>>>         that to be using best practice I'll have to go back and flip
>>>         the
>>>         polarity on everything.
>>>
>>>
>>> Eh?  If you don't need unicode_literals, what's the problem?
>>
>> Porting the WebOb code sucked.  It's only about 5K lines of code but the
>> porting effort took me about 80 hours.  Some of the problem is certainly
>> my own idiocy, but some of it is just because straddling code across
>> Python 2 and Python 3 currently requires that you change lots and lots
>> of code for suspect benefit.
>
> Could this manual work be cut down if there was a version of 2to3 that 
> targeted the subset of the language that is compatible with both 2 and 
> 3? That would seem to avoid most of the drawbacks to the current 2to3 
> approach.
>
I'm not sure what you mean, but it *reads* as if you mean "a version of 
2to3 that only converts code that doesn't need converting". Could you 
clarify?

Thanks,

Michael

> Laurence
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html


From ncoghlan at gmail.com  Tue Dec 13 15:24:16 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 14 Dec 2011 00:24:16 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE75634.6000208@voidspace.org.uk>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
Message-ID: <CADiSq7ePr_5G3G1e-Z_RVK-LuYGk7Hqg8aJ1KXzrQcUhCWKVCw@mail.gmail.com>

Input = normal 2.x code; Output = code that runs on both 2.x and 3.x.

That is, tinkering with what 2to3 produces, not what it accepts.

--
Nick Coghlan (via Gmail on Android, so likely to be more terse than usual)
On Dec 13, 2011 11:46 PM, "Michael Foord" <fuzzyman at voidspace.org.uk> wrote:

> On 13/12/2011 13:33, Laurence Rowe wrote:
>
>> On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough <chrism at plope.com>
>> wrote:
>>
>>  On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote:
>>>
>>>
>>>         As someone who ported WebOb and other stuff built on top of it
>>>>        to Python
>>>>        3 without using "from __future__ import unicode_literals", I'm
>>>>        kinda sad
>>>>        that to be using best practice I'll have to go back and flip
>>>>        the
>>>>        polarity on everything.
>>>>
>>>>
>>>> Eh?  If you don't need unicode_literals, what's the problem?
>>>>
>>>
>>> Porting the WebOb code sucked.  It's only about 5K lines of code but the
>>> porting effort took me about 80 hours.  Some of the problem is certainly
>>> my own idiocy, but some of it is just because straddling code across
>>> Python 2 and Python 3 currently requires that you change lots and lots
>>> of code for suspect benefit.
>>>
>>
>> Could this manual work be cut down if there was a version of 2to3 that
>> targeted the subset of the language that is compatible with both 2 and 3?
>> That would seem to avoid most of the drawbacks to the current 2to3 approach.
>>
>>  I'm not sure what you mean, but it *reads* as if you mean "a version of
> 2to3 that only converts code that doesn't need converting". Could you
> clarify?
>
> Thanks,
>
> Michael
>
>  Laurence
>>
>> ______________________________**_________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev>
>> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
>> fuzzyman%40voidspace.org.uk<http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk>
>>
>>
>
> --
> http://www.voidspace.org.uk/
>
> May you do good and not evil
> May you find forgiveness for yourself and forgive others
> May you share freely, never taking more than you give.
> -- the sqlite blessing http://www.sqlite.org/**different.html<http://www.sqlite.org/different.html>
>
> ______________________________**_________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
> ncoghlan%40gmail.com<http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111214/be305d4f/attachment.html>

From fuzzyman at voidspace.org.uk  Tue Dec 13 15:27:04 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Tue, 13 Dec 2011 14:27:04 +0000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7ePr_5G3G1e-Z_RVK-LuYGk7Hqg8aJ1KXzrQcUhCWKVCw@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<CADiSq7ePr_5G3G1e-Z_RVK-LuYGk7Hqg8aJ1KXzrQcUhCWKVCw@mail.gmail.com>
Message-ID: <4EE760B8.50807@voidspace.org.uk>

On 13/12/2011 14:24, Nick Coghlan wrote:
>
> Input = normal 2.x code; Output = code that runs on both 2.x and 3.x.
>
> That is, tinkering with what 2to3 produces, not what it accepts.
>

How is that different from what 2to3 currently does? Are you agreeing 
with Laurence, suggesting an alternative, or something else?

Michael

> --
> Nick Coghlan (via Gmail on Android, so likely to be more terse than usual)
>
> On Dec 13, 2011 11:46 PM, "Michael Foord" <fuzzyman at voidspace.org.uk 
> <mailto:fuzzyman at voidspace.org.uk>> wrote:
>
>     On 13/12/2011 13:33, Laurence Rowe wrote:
>
>         On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough
>         <chrism at plope.com <mailto:chrism at plope.com>> wrote:
>
>             On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote:
>
>
>                        As someone who ported WebOb and other stuff
>                 built on top of it
>                        to Python
>                        3 without using "from __future__ import
>                 unicode_literals", I'm
>                        kinda sad
>                        that to be using best practice I'll have to go
>                 back and flip
>                        the
>                        polarity on everything.
>
>
>                 Eh?  If you don't need unicode_literals, what's the
>                 problem?
>
>
>             Porting the WebOb code sucked.  It's only about 5K lines
>             of code but the
>             porting effort took me about 80 hours.  Some of the
>             problem is certainly
>             my own idiocy, but some of it is just because straddling
>             code across
>             Python 2 and Python 3 currently requires that you change
>             lots and lots
>             of code for suspect benefit.
>
>
>         Could this manual work be cut down if there was a version of
>         2to3 that targeted the subset of the language that is
>         compatible with both 2 and 3? That would seem to avoid most of
>         the drawbacks to the current 2to3 approach.
>
>     I'm not sure what you mean, but it *reads* as if you mean "a
>     version of 2to3 that only converts code that doesn't need
>     converting". Could you clarify?
>
>     Thanks,
>
>     Michael
>
>         Laurence
>
>         _______________________________________________
>         Python-Dev mailing list
>         Python-Dev at python.org <mailto:Python-Dev at python.org>
>         http://mail.python.org/mailman/listinfo/python-dev
>         Unsubscribe:
>         http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>
>
>
>     -- 
>     http://www.voidspace.org.uk/
>
>     May you do good and not evil
>     May you find forgiveness for yourself and forgive others
>     May you share freely, never taking more than you give.
>     -- the sqlite blessing http://www.sqlite.org/different.html
>
>     _______________________________________________
>     Python-Dev mailing list
>     Python-Dev at python.org <mailto:Python-Dev at python.org>
>     http://mail.python.org/mailman/listinfo/python-dev
>     Unsubscribe:
>     http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111213/49f24785/attachment.html>

From l at lrowe.co.uk  Tue Dec 13 15:28:31 2011
From: l at lrowe.co.uk (Laurence Rowe)
Date: Tue, 13 Dec 2011 15:28:31 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
Message-ID: <op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>

On Tue, 13 Dec 2011 14:42:12 +0100, Michael Foord  
<fuzzyman at voidspace.org.uk> wrote:

> On 13/12/2011 13:33, Laurence Rowe wrote:
>> On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough <chrism at plope.com>  
>> wrote:
>>
>>> On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote:
>>>
>>>
>>>>         As someone who ported WebOb and other stuff built on top of it
>>>>         to Python
>>>>         3 without using "from __future__ import unicode_literals", I'm
>>>>         kinda sad
>>>>         that to be using best practice I'll have to go back and flip
>>>>         the
>>>>         polarity on everything.
>>>>
>>>>
>>>> Eh?  If you don't need unicode_literals, what's the problem?
>>>
>>> Porting the WebOb code sucked.  It's only about 5K lines of code but  
>>> the
>>> porting effort took me about 80 hours.  Some of the problem is  
>>> certainly
>>> my own idiocy, but some of it is just because straddling code across
>>> Python 2 and Python 3 currently requires that you change lots and lots
>>> of code for suspect benefit.
>>
>> Could this manual work be cut down if there was a version of 2to3 that  
>> targeted the subset of the language that is compatible with both 2 and  
>> 3? That would seem to avoid most of the drawbacks to the current 2to3  
>> approach.
>>
> I'm not sure what you mean, but it *reads* as if you mean "a version of  
> 2to3 that only converts code that doesn't need converting". Could you  
> clarify?
>

The approach that most people seem to have settled on for porting  
libraries to Python 3 is to make a single codebase that is compatible with  
both Python 2 and Python 3, perhaps making use of the six library. If I  
understand correctly, Chris' experience of porting WebOb was that there is  
a large amount of manual work required in this approach in part because of  
the many u'' strings in libraries that extensively use unicode. It should  
be possible to automate this with the same approach as 2to3, but instead  
of a transform from 2->3 it would transform code from 2->(2 & 3). In this  
case the transform would only have to be run once (rather than on every  
setup.py install) and would avoid the difficulties of debugging with  
tracebacks that do not exactly match the source code.

Laurence


From fuzzyman at voidspace.org.uk  Tue Dec 13 15:34:00 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Tue, 13 Dec 2011 14:34:00 +0000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
References: <1323320919.2710.24.camel@thinko> <3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
Message-ID: <4EE76258.3040404@voidspace.org.uk>

On 13/12/2011 14:28, Laurence Rowe wrote:
> On Tue, 13 Dec 2011 14:42:12 +0100, Michael Foord 
> <fuzzyman at voidspace.org.uk> wrote:
>
>> On 13/12/2011 13:33, Laurence Rowe wrote:
>>> On Mon, 12 Dec 2011 22:18:40 +0100, Chris McDonough 
>>> <chrism at plope.com> wrote:
>>>
>>>> On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote:
>>>>
>>>>
>>>>>         As someone who ported WebOb and other stuff built on top 
>>>>> of it
>>>>>         to Python
>>>>>         3 without using "from __future__ import unicode_literals", 
>>>>> I'm
>>>>>         kinda sad
>>>>>         that to be using best practice I'll have to go back and flip
>>>>>         the
>>>>>         polarity on everything.
>>>>>
>>>>>
>>>>> Eh?  If you don't need unicode_literals, what's the problem?
>>>>
>>>> Porting the WebOb code sucked.  It's only about 5K lines of code 
>>>> but the
>>>> porting effort took me about 80 hours.  Some of the problem is 
>>>> certainly
>>>> my own idiocy, but some of it is just because straddling code across
>>>> Python 2 and Python 3 currently requires that you change lots and lots
>>>> of code for suspect benefit.
>>>
>>> Could this manual work be cut down if there was a version of 2to3 
>>> that targeted the subset of the language that is compatible with 
>>> both 2 and 3? That would seem to avoid most of the drawbacks to the 
>>> current 2to3 approach.
>>>
>> I'm not sure what you mean, but it *reads* as if you mean "a version 
>> of 2to3 that only converts code that doesn't need converting". Could 
>> you clarify?
>>
>
> The approach that most people seem to have settled on for porting 
> libraries to Python 3 is to make a single codebase that is compatible 
> with both Python 2 and Python 3, perhaps making use of the six 
> library. If I understand correctly, Chris' experience of porting WebOb 
> was that there is a large amount of manual work required in this 
> approach in part because of the many u'' strings in libraries that 
> extensively use unicode. It should be possible to automate this with 
> the same approach as 2to3, but instead of a transform from 2->3 it 
> would transform code from 2->(2 & 3). In this case the transform would 
> only have to be run once (rather than on every setup.py install) and 
> would avoid the difficulties of debugging with tracebacks that do not 
> exactly match the source code.

Ah, you mean a 2toPython3compatible2  converter. Not a bad idea.

Michael

>
> Laurence
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html


From vinay_sajip at yahoo.co.uk  Tue Dec 13 16:54:21 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Tue, 13 Dec 2011 15:54:21 +0000 (UTC)
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
Message-ID: <loom.20111213T163802-841@post.gmane.org>

Laurence Rowe <l <at> lrowe.co.uk> writes:

> The approach that most people seem to have settled on for porting  
> libraries to Python 3 is to make a single codebase that is compatible with  
> both Python 2 and Python 3, perhaps making use of the six library. If I  
> understand correctly, Chris' experience of porting WebOb was that there is  
> a large amount of manual work required in this approach in part because of  
> the many u'' strings in libraries that extensively use unicode. It should  
> be possible to automate this with the same approach as 2to3, but instead  
> of a transform from 2->3 it would transform code from 2->(2 & 3). In this  
> case the transform would only have to be run once (rather than on every  
> setup.py install) and would avoid the difficulties of debugging with  
> tracebacks that do not exactly match the source code.

I started writing a tool today, tentatively called '2to23', which aims to do
this. It's basically 2to3, but with a package of custom fixers in a package
'lib2to23.fixers' adapted from the corresponding fixers in lib2to3. It's
experimental work in progress at the moment. With a sample file like

import anything
import dummy

class CustomException(Exception):
    pass

def func1():
    a = u'abc'
    b = b'def'
    c = 'unchanged'
    c1 = u'abc' u'def'

def func2():
    try:
        d = 5L
        e = (int, long)
        f = (long, int)
        g = func3()
        if isinstance(g, basestring):
            print 'a string'
        elif isinstance(g, bytes):
            print 'some bytes'
        elif isinstance(g, unicode):
            print 'a unicode string'
        else:
            print
        for i in xrange(3):
            pass
    except Exception:
        e = sys.exc_info()
        raise CustomException, e[1], e[2]
        
class BaseClass:
    pass

class OtherBaseClass:
    pass
    
class MetaClass:
    pass

class DerivedClass(BaseClass, OtherBaseClass):
    __metaclass__ = MetaClass


2to23 gives the following suggested changes:

--- sample.py	(original)
+++ sample.py	(refactored)
@@ -1,34 +1,41 @@
 import anything
 import dummy
+from django.utils.py3 import long_type
+from django.utils.py3 import string_types
+from django.utils.py3 import binary_type
+from django.utils.py3 import b
+from django.utils.py3 import text_type
+from django.utils.py3 import u
+from django.utils.py3 import xrange
 
 class CustomException(Exception):
     pass
 
 def func1():
-    a = u'abc'
-    b = b'def'
+    a = u('abc')
+    b = b('def')
     c = 'unchanged'
-    c1 = u'abc' u'def'
+    c1 = u('abc') u('def')
 
 def func2():
     try:
-        d = 5L
+        d = long_type(5)
         e = (int, long)
         f = (long, int)
         g = func3()
-        if isinstance(g, basestring):
-            print 'a string'
-        elif isinstance(g, bytes):
-            print 'some bytes'
-        elif isinstance(g, unicode):
-            print 'a unicode string'
+        if isinstance(g, string_types):
+            print('a string')
+        elif isinstance(g, binary_type):
+            print('some bytes')
+        elif isinstance(g, text_type):
+            print('a unicode string')
         else:
-            print
+            print()
         for i in xrange(3):
             pass
     except Exception:
         e = sys.exc_info()
-        raise CustomException, e[1], e[2]
+        raise CustomException(e[1]).with_traceback(e[2])
         
 class BaseClass:
     pass
@@ -39,8 +46,8 @@
 class MetaClass:
     pass
 
-class DerivedClass(BaseClass, OtherBaseClass):
-    __metaclass__ = MetaClass
+class DerivedClass(with_metaclass(MetaClass, BaseClass, OtherBaseClass)):
+    pass

As you can see, there's still a bit of work to do, and the sample doesn't cover
all use cases yet. I'll be cross-checking it using my recent Django porting work
to confirm that it covers everything at least needed for that port, which is why
the fixers currently generate imports from django.utils.py3. Obviously, I'll
change this in due course.

Regards,

Vinay Sajip


From solipsis at pitrou.net  Tue Dec 13 17:24:23 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 13 Dec 2011 17:24:23 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
References: <1323320919.2710.24.camel@thinko> <3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
Message-ID: <20111213172423.2c567d8b@pitrou.net>

On Tue, 13 Dec 2011 15:28:31 +0100
"Laurence Rowe" <l at lrowe.co.uk> wrote:
> 
> The approach that most people seem to have settled on for porting  
> libraries to Python 3 is to make a single codebase that is compatible with  
> both Python 2 and Python 3, perhaps making use of the six library.

Do you have evidence that "most" people have settled on that approach?
(besides the couple of library writers who have commented on this
thread)



From barry at python.org  Tue Dec 13 17:21:04 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 13 Dec 2011 11:21:04 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <20111213172423.2c567d8b@pitrou.net>
References: <1323320919.2710.24.camel@thinko>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<20111213172423.2c567d8b@pitrou.net>
Message-ID: <20111213112104.6b02cd30@resist.wooz.org>

On Dec 13, 2011, at 05:24 PM, Antoine Pitrou wrote:

>On Tue, 13 Dec 2011 15:28:31 +0100
>"Laurence Rowe" <l at lrowe.co.uk> wrote:
>> 
>> The approach that most people seem to have settled on for porting  
>> libraries to Python 3 is to make a single codebase that is compatible with  
>> both Python 2 and Python 3, perhaps making use of the six library.
>
>Do you have evidence that "most" people have settled on that approach?
>(besides the couple of library writers who have commented on this
>thread)

I'm not sure there's any settling at all when it comes to Python 3 porting
yet. ;)

Sometimes, one code base works better, other times 2to3 works well.  I tend to
use the latter on pure-Python setuptools-based projects, and the former on
projects with C extensions, autoconf-based libraries.

-Barry

From regebro at gmail.com  Tue Dec 13 17:40:46 2011
From: regebro at gmail.com (Lennart Regebro)
Date: Tue, 13 Dec 2011 17:40:46 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
References: <1323320919.2710.24.camel@thinko> <5242067.5aBSYdFaIB@einstein>
	<6EB3EF7C-C742-44BD-9588-B6088282D146@langa.pl>
	<3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
Message-ID: <CAL0kPAWiGqxbgjYK284_+q9O1E1u=v8vTA634jM6i9_6U9HXCw@mail.gmail.com>

On Tue, Dec 13, 2011 at 14:33, Laurence Rowe <l at lrowe.co.uk> wrote:
> Could this manual work be cut down if there was a version of 2to3 that
> targeted the subset of the language that is compatible with both 2 and 3?

Not really, but a 2to6, ie something that tries to keep Python 2
compatibility by using the six library, might be useful.

//Lennart

From pje at telecommunity.com  Tue Dec 13 20:02:45 2011
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 13 Dec 2011 14:02:45 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <20111213172423.2c567d8b@pitrou.net>
References: <1323320919.2710.24.camel@thinko> <3344831.JP9Cfj4Ety@einstein>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<20111213172423.2c567d8b@pitrou.net>
Message-ID: <CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>

On Tue, Dec 13, 2011 at 11:24 AM, Antoine Pitrou <solipsis at pitrou.net>wrote:

> On Tue, 13 Dec 2011 15:28:31 +0100
> "Laurence Rowe" <l at lrowe.co.uk> wrote:
> >
> > The approach that most people seem to have settled on for porting
> > libraries to Python 3 is to make a single codebase that is compatible
> with
> > both Python 2 and Python 3, perhaps making use of the six library.
>
> Do you have evidence that "most" people have settled on that approach?
> (besides the couple of library writers who have commented on this
> thread)
>

I've seen more projects doing it that way than maintaining dual code bases.
 In retrospect, it seems way more attractive than having to run a converter
all the time, especially if I could run a "2to6" tool *once* and then
simply write new code using six-isms

Among other things, it means that:

* There's only one codebase
* If the conversion isn't perfect, you only have to fix it once
* Line numbers are the same
* There's no conversion step slowing down development

So, I expect that if the approach is at all viable, it'll quickly become
the One Obvious Way to do it.  In effect, 2to3 is a "purity" solution, but
six is more like a "practicality" solution.

And if there's official support for it, so much the better.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111213/4ac32cdc/attachment.html>

From eric at trueblade.com  Tue Dec 13 17:31:41 2011
From: eric at trueblade.com (Eric V. Smith)
Date: Tue, 13 Dec 2011 11:31:41 -0500
Subject: [Python-Dev] str.format implementation
In-Reply-To: <CAPc-aXkg37nqw1_mBFOU7Om8dS2vYvGsJbEzF2CQSXaPW1jMog@mail.gmail.com>
References: <CAPc-aXkg37nqw1_mBFOU7Om8dS2vYvGsJbEzF2CQSXaPW1jMog@mail.gmail.com>
Message-ID: <4EE77DED.904@trueblade.com>

On 12/12/2011 10:56 PM, Ben Wolfson wrote:
> Hi,
> 
> I'm hoping to get some kind of consensus about the divergences between
> the implementation and documentation of str.format
> (http://mail.python.org/pipermail/python-dev/2011-June/111860.html and
> the linked bug report contain examples of the divergences). These
> pertain to the arg_name, attribute_name, and element_index fields of
> the grammar in the docs:
> 
>     replacement_field ::=  "{" [field_name] ["!" conversion] [":"
> format_spec] "}"
>     field_name        ::=  arg_name ("." attribute_name | "["
> element_index "]")*
>     arg_name          ::=  [identifier | integer]
>     attribute_name    ::=  identifier
>     element_index     ::=  integer | index_string
>     index_string      ::=  <any source character except "]"> +
> 
> Nothing definitive emerged from the last round of discussion, and as
> far as I can recall there are now three proposals for what kind of
> changes might be worth making:
> 
>  (1) the implementation should conform to the docs;*
>  (2) like (1) with the change that element_index should be changed to
> "integer | identifier" (rendering index_string otiose);

I've now learned what "otiose" means. Thanks!

>  (3) like (1) with the change that index_string should be changed to
> '<any source character except "]", "}", or "{">'.

This is still on my plate. I just haven't had a lot of Python time
recently. But I do plan to address this.

Eric.


From tjreedy at udel.edu  Tue Dec 13 22:10:27 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 13 Dec 2011 16:10:27 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<20111213172423.2c567d8b@pitrou.net>
	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>
Message-ID: <jc8f06$r97$1@dough.gmane.org>

On 12/13/2011 2:02 PM, PJ Eby wrote:
> On Tue, Dec 13, 2011 at 11:24 AM, Antoine Pitrou <solipsis at pitrou.net
> <mailto:solipsis at pitrou.net>> wrote:
>
>     On Tue, 13 Dec 2011 15:28:31 +0100
>     "Laurence Rowe" <l at lrowe.co.uk <mailto:l at lrowe.co.uk>> wrote:
>      >
>      > The approach that most people seem to have settled on for porting
>      > libraries to Python 3 is to make a single codebase that is
>     compatible with
>      > both Python 2 and Python 3, perhaps making use of the six library.
>
>     Do you have evidence that "most" people have settled on that approach?
>     (besides the couple of library writers who have commented on this
>     thread)

I think there is clearly enough 'some' people to justify official 
support of a 2to23 (or more obscurely, 2to6, but I just got the point 
that 6=2*3). Beyond that, we don't know and don't need to know.

> I've seen more projects doing it that way than maintaining dual code
> bases.  In retrospect, it seems way more attractive than having to run a
> converter all the time, especially if I could run a "2to6" tool *once*
> and then simply write new code using six-isms
>
> Among other things, it means that:
>
> * There's only one codebase
> * If the conversion isn't perfect, you only have to fix it once
> * Line numbers are the same
> * There's no conversion step slowing down development
>
> So, I expect that if the approach is at all viable, it'll quickly become
> the One Obvious Way to do it.  In effect, 2to3 is a "purity" solution,
> but six is more like a "practicality" solution.

2to3 is the practical solution for someone converting private Python 2 
code to run on Python 3 *instead* of Python 2, without looking back. By 
the nature of things, such conversions will be private and scattered 
over the next decade or so. If 2to3 works well, we will never hear about 
them, except for the rare praise. Ditto for public code whose author 
wishes to abandon Py 2. But that seems to rare so far.

So we are really talking about upgrading public libraries and apps to 
work with Python 3 *as well as* 'recent' Python 2 versions. For that, a 
'Python23' bridge seems to work for some.

Looking ahead, there will in the future be a need for a 23to3 converter 
to remove the then extraneous bridge code. But that will need a 
semi-standard 'Python23' to work from.

-- 
Terry Jan Reedy


From jimjjewett at gmail.com  Tue Dec 13 22:17:13 2011
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 13 Dec 2011 16:17:13 -0500
Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions
In-Reply-To: <4EE704D6.5000901@v.loewis.de>
References: <CA+OGgf6CrT5Pi+f698qK9PnYfB0WqhHncwXZMsDuhLznYPYgNw@mail.gmail.com>
	<4EE704D6.5000901@v.loewis.de>
Message-ID: <CA+OGgf50PmG24op6ADkNRcXfv3hm2GwsRx=+G1=HN9nF5xfzEA@mail.gmail.com>

On Tue, Dec 13, 2011 at 2:55 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> (1) ?Why is PyObject_HEAD used instead of PyObject_VAR_HEAD?

> The unicode object is not a var object. In a var object, tp_itemsize
> gives the element size, which is not possible for unicode objects,
> since the itemsize may vary by instance. In addition, not all instances
> have the items after the base object (plus the size of the base object
> in tp_basicsize is also not always correct).

That makes perfect sense.

Any chance of adding the rationale to the code?  Either inline, such
as changing unicodeobject.h line 291 from

    PyObject_HEAD
to something like:
    PyObject_HEAD               /* Not VAR_HEAD, because tp_itemsize
varies, and data may be elsewhere. */

or in the large comments around line 288:

    Note that Strings use PyObject_HEAD and a length field instead of
PyObject_VAR_HEAD, because the tp_itemsize varies by instance, and the
actual data is not always immediately after the PyASCIIObject  header.



>> (2) ?Why does PyASCIIObject have a wstr member, and why does
>> PyCompactUnicodeObject have wstr_length? ?As best I can tell from the
>> PEP or header file, wstr is only meaningful when either:

> No. wstr is most of all relevant if someone calls
> PyUnicode_AsUnicode(AndSize); any unicode object might get the
> wstr pointer filled out at some point.

I am willing to believe that requests for a wchar_t (or utf-8 or
System Locale charset) representation are common enough to justify
caching the data after the first request.

But then why throw it away in the first place?  Wouldn't programs that
create unicode from wchar_t data also be the most likely to request
wchar_t data back?

> wstr_length is only relevant if wstr is not NULL. For a pure ASCII
> string (and also for Latin-1 and other BMP strings), the wstr length
> will always equal the canonical length (number of code points).

wstr_length != length exactly when:

    2==sizeof(wchar_t) &&
    PyUnicode_4BYTE_KIND == PyUnicode_KIND( str )

which can sometimes be eliminated at compile-time, and always by
string creation time.

In all other cases, (wstr_length == length), and wstr can be generated
by widening the data without having to inspect it.  Is it worth
eliminating wstr_length (or even wstr) in those cases, or is that too
much complexity?



>> (3) ?I would feel much less nervous if the remaining 4 values of
>> PyUnicode_Kind were explicitly reserved, and the macros raised an
>> error when they showed up. ...

> If people use C, they can construct all kinds of "illegal" ...
> kind values: many places will either work incorrectly, or have
> an assertion in debug mode already if an unexpected kind is
> encountered.

What I'm asking is that
(1)  The other values be documented as reserved, rather than as illegal.
(2)  The macros produce an error rather than silently corrupting data.

This allows at least the possibility of a later change such that

(3)  The macros handle the new values correctly, if only by delegating
back to type-supplied functions.

-jJ

From tjreedy at udel.edu  Tue Dec 13 22:37:10 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 13 Dec 2011 16:37:10 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <loom.20111213T163802-841@post.gmane.org>
References: <1323320919.2710.24.camel@thinko>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<loom.20111213T163802-841@post.gmane.org>
Message-ID: <jc8gib$6h3$1@dough.gmane.org>

On 12/13/2011 10:54 AM, Vinay Sajip wrote:

> I started writing a tool today, tentatively called '2to23', which aims to do
> this. It's basically 2to3, but with a package of custom fixers in a package
> 'lib2to23.fixers' adapted from the corresponding fixers in lib2to3.

When, some year in the future, people want to drop Python 2 
compatibility from their Python23 code, they will need a 23to3 tool. You 
might keep this in mind when designing and documenting a bridge 
language. For each 2to23 fixer, is there a 23to3 fixer so that 
23to3(2to23(code)) == 2to3(code) or close enough. (23to3 can and should 
assume that its input is the output of 2to23, and only look to convert 
the ultimately temporary scaffolding inserted by 2to23.)

The point about documentation is to list the names that 2to23 introduces 
(with its special meanings) and that 23to3 will remove (assuming the 
special meanings). So these names should neither be in the 2 code before 
running 2to23 nor added to 23 code (with a different meaning) before 
running 23to3.

If 2to23 were paired with a 23to3, so people knew that its output is not 
a deadend cul-de-sac, but a stepping stone to the future, it would be 
even more attractive.

-- 
Terry Jan Reedy


From fuzzyman at voidspace.org.uk  Tue Dec 13 23:17:16 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Tue, 13 Dec 2011 22:17:16 +0000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <jc8f06$r97$1@dough.gmane.org>
References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<20111213172423.2c567d8b@pitrou.net>
	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>
	<jc8f06$r97$1@dough.gmane.org>
Message-ID: <4EE7CEEC.2020306@voidspace.org.uk>

On 13/12/2011 21:10, Terry Reedy wrote:
> On 12/13/2011 2:02 PM, PJ Eby wrote:
>> On Tue, Dec 13, 2011 at 11:24 AM, Antoine Pitrou <solipsis at pitrou.net
>> <mailto:solipsis at pitrou.net>> wrote:
>>
>>     On Tue, 13 Dec 2011 15:28:31 +0100
>>     "Laurence Rowe" <l at lrowe.co.uk <mailto:l at lrowe.co.uk>> wrote:
>> >
>> > The approach that most people seem to have settled on for porting
>> > libraries to Python 3 is to make a single codebase that is
>>     compatible with
>> > both Python 2 and Python 3, perhaps making use of the six library.
>>
>>     Do you have evidence that "most" people have settled on that 
>> approach?
>>     (besides the couple of library writers who have commented on this
>>     thread)
>
> I think there is clearly enough 'some' people to justify official 
> support of a 2to23 (or more obscurely, 2to6, but I just got the point 
> that 6=2*3).

More specifically "six" [1] is the name of Benjamin Peterson's support 
package to help write code that works on both 2 and 3. So the idea is 
that the conversion isn't just a straight syntax conversion - but that 
it [could] generate code using this library.

All the best,

Michael

[1] http://packages.python.org/six/
> Beyond that, we don't know and don't need to know.
>
>> I've seen more projects doing it that way than maintaining dual code
>> bases.  In retrospect, it seems way more attractive than having to run a
>> converter all the time, especially if I could run a "2to6" tool *once*
>> and then simply write new code using six-isms
>>
>> Among other things, it means that:
>>
>> * There's only one codebase
>> * If the conversion isn't perfect, you only have to fix it once
>> * Line numbers are the same
>> * There's no conversion step slowing down development
>>
>> So, I expect that if the approach is at all viable, it'll quickly become
>> the One Obvious Way to do it.  In effect, 2to3 is a "purity" solution,
>> but six is more like a "practicality" solution.
>
> 2to3 is the practical solution for someone converting private Python 2 
> code to run on Python 3 *instead* of Python 2, without looking back. 
> By the nature of things, such conversions will be private and 
> scattered over the next decade or so. If 2to3 works well, we will 
> never hear about them, except for the rare praise. Ditto for public 
> code whose author wishes to abandon Py 2. But that seems to rare so far.
>
> So we are really talking about upgrading public libraries and apps to 
> work with Python 3 *as well as* 'recent' Python 2 versions. For that, 
> a 'Python23' bridge seems to work for some.
>
> Looking ahead, there will in the future be a need for a 23to3 
> converter to remove the then extraneous bridge code. But that will 
> need a semi-standard 'Python23' to work from.
>


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html


From ncoghlan at gmail.com  Tue Dec 13 23:38:06 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 14 Dec 2011 08:38:06 +1000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE7CEEC.2020306@voidspace.org.uk>
References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<20111213172423.2c567d8b@pitrou.net>
	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>
	<jc8f06$r97$1@dough.gmane.org> <4EE7CEEC.2020306@voidspace.org.uk>
Message-ID: <CADiSq7c=pjuSomB59zjLwhvWTVQNacRmLC1_kfjempcwUgEM8Q@mail.gmail.com>

On Wed, Dec 14, 2011 at 8:17 AM, Michael Foord
<fuzzyman at voidspace.org.uk> wrote:
> More specifically "six" [1] is the name of Benjamin Peterson's support
> package to help write code that works on both 2 and 3. So the idea is that
> the conversion isn't just a straight syntax conversion - but that it [could]
> generate code using this library.

The thing is, the code you want to generate varies depending on
whether you want to target 2.6+, or include 2.5 and earlier.

For 2.6+, you can just use the print_function and unicode_literal
__future__ imports to minimise overhead.

But if 2.5 and earlier is in the mix, you need to lean more heavily on
six (for u(), b() and print_())

String translation is also an open question. For some codebases, you
want both u"" and "" to translate to a Unicode "" (either in Py3k or
via the future import), but if a code base deals with WSGI-style
native strings (by means of u"" for text, "" for native, b"" for
binary), then the more appropriate translation is to use the future
import and map them to "", str("") and b"" respectively.

So, rather than an overall "2to6", it may be better to focus on
*specific* fixers that can be tweaked or added to help with:

2.4+ -> 2.4+, 3.2+
2.4+ -> 2.6+, 3.2+
2.6+ -> 2.6+, 3.2+
2.6+, 3.2+ -> 3.2+

(with handling of string literals being the most significant, and
likely most complicated)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From barry at python.org  Tue Dec 13 23:52:54 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 13 Dec 2011 17:52:54 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7c=pjuSomB59zjLwhvWTVQNacRmLC1_kfjempcwUgEM8Q@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<20111213172423.2c567d8b@pitrou.net>
	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>
	<jc8f06$r97$1@dough.gmane.org> <4EE7CEEC.2020306@voidspace.org.uk>
	<CADiSq7c=pjuSomB59zjLwhvWTVQNacRmLC1_kfjempcwUgEM8Q@mail.gmail.com>
Message-ID: <20111213175254.7b2cd6d0@resist.wooz.org>

On Dec 14, 2011, at 08:38 AM, Nick Coghlan wrote:

>String translation is also an open question. For some codebases, you
>want both u"" and "" to translate to a Unicode "" (either in Py3k or
>via the future import)

I have a fixer for this:

http://bazaar.launchpad.net/~barry/flufl.i18n/devel/view/head:/myfixers/fix_ugettext.py

(or maybe by "translation" you don't mean "gettext").

Cheers,
-Barry

From storchaka at gmail.com  Wed Dec 14 00:16:34 2011
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 14 Dec 2011 01:16:34 +0200
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CADiSq7c=pjuSomB59zjLwhvWTVQNacRmLC1_kfjempcwUgEM8Q@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko> <jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<20111213172423.2c567d8b@pitrou.net>
	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>
	<jc8f06$r97$1@dough.gmane.org> <4EE7CEEC.2020306@voidspace.org.uk>
	<CADiSq7c=pjuSomB59zjLwhvWTVQNacRmLC1_kfjempcwUgEM8Q@mail.gmail.com>
Message-ID: <jc8mco$dmr$1@dough.gmane.org>

14.12.11 00:38, Nick Coghlan ???????(??):
> String translation is also an open question. For some codebases, you
> want both u"" and "" to translate to a Unicode "" (either in Py3k or
> via the future import), but if a code base deals with WSGI-style
> native strings (by means of u"" for text, "" for native, b"" for
> binary), then the more appropriate translation is to use the future
> import and map them to "", str("") and b"" respectively.

There are other place for native strings -- sys.argv.

if sys.argv[1] == str('-'):
     f = sys.stdin
else:
     f = open(sys.argv[1], 'r')

Yet another pitfall -- sys.stdin is bytes stream in 2.x and text stream 
in 3.x. For reading binary data:

if sys.argv[1] == str('-'):
     if sys.version_info[0] >= 3:
         f = sys.stdin.buffer.raw
     else:
         f = sys.stdin
else:
     f = open(sys.argv[1], 'r')

Reading text data is even more complicated in Python 2.x.


From exarkun at twistedmatrix.com  Wed Dec 14 00:36:28 2011
From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com)
Date: Tue, 13 Dec 2011 23:36:28 -0000
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <jc8gib$6h3$1@dough.gmane.org>
References: <1323320919.2710.24.camel@thinko>
	<CADiSq7cBJynu1pTJ-gumoe5Dkicaz6gmKEjEFnYsYAS5Rndk3g@mail.gmail.com>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<loom.20111213T163802-841@post.gmane.org>
	<jc8gib$6h3$1@dough.gmane.org>
Message-ID: <20111213233628.1828.718618756.divmod.xquotient.271@localhost.localdomain>

On 09:37 pm, tjreedy at udel.edu wrote:
>On 12/13/2011 10:54 AM, Vinay Sajip wrote:
>>I started writing a tool today, tentatively called '2to23', which aims 
>>to do
>>this. It's basically 2to3, but with a package of custom fixers in a 
>>package
>>'lib2to23.fixers' adapted from the corresponding fixers in lib2to3.
>
>When, some year in the future, people want to drop Python 2 
>compatibility from their Python23 code, they will need a 23to3 tool.

No, they will not.  They only need a 2to3 or 2to6 tool because Python 2 
and Python 3 are not compatible with each other, but they want one 
program to be valid in Python 2 and Python 3 simultaneously.

When they decide they no longer care about Python 2, they can just stop 
taking care to keep their program valid as Python 2 and only take care 
to keep it a valid Python 3 program.  There's no specific change to 
make, just a different approach to take with future maintenance.

You might say that they will *want* to immediately discard all of their 
legacy Python 2 support code.  I suspect many of them will not want 
this; but either way it's a want, not a need.

Jean-Paul

From martin at v.loewis.de  Wed Dec 14 01:01:40 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 14 Dec 2011 01:01:40 +0100
Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions
In-Reply-To: <CA+OGgf50PmG24op6ADkNRcXfv3hm2GwsRx=+G1=HN9nF5xfzEA@mail.gmail.com>
References: <CA+OGgf6CrT5Pi+f698qK9PnYfB0WqhHncwXZMsDuhLznYPYgNw@mail.gmail.com>
	<4EE704D6.5000901@v.loewis.de>
	<CA+OGgf50PmG24op6ADkNRcXfv3hm2GwsRx=+G1=HN9nF5xfzEA@mail.gmail.com>
Message-ID: <4EE7E764.5050008@v.loewis.de>

> Any chance of adding the rationale to the code?

I'm really short of time right now, so you need to find somebody
else to make such a change.

> I am willing to believe that requests for a wchar_t (or utf-8 or
> System Locale charset) representation are common enough to justify
> caching the data after the first request.

That's not the issue; the real issue is memory management.

> But then why throw it away in the first place?  Wouldn't programs that
> create unicode from wchar_t data also be the most likely to request
> wchar_t data back?

Perhaps. But are they likely to access the string they just created
again at all? They know what's in it, so why look at it again?

> In all other cases, (wstr_length == length), and wstr can be generated
> by widening the data without having to inspect it.  Is it worth
> eliminating wstr_length (or even wstr) in those cases, or is that too
> much complexity?

It's too little saving.

> What I'm asking is that
> (1)  The other values be documented as reserved, rather than as illegal.

How is that different?

> (2)  The macros produce an error rather than silently corrupting data.

In debug mode, or release mode? -1 on release mode.

Regards,
Martin

From solipsis at pitrou.net  Wed Dec 14 01:30:24 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 14 Dec 2011 01:30:24 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<20111213172423.2c567d8b@pitrou.net>
	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>
Message-ID: <20111214013024.74addba7@pitrou.net>

On Tue, 13 Dec 2011 14:02:45 -0500
PJ Eby <pje at telecommunity.com> wrote:
> 
> Among other things, it means that:
> 
> * There's only one codebase
> * If the conversion isn't perfect, you only have to fix it once
> * Line numbers are the same
> * There's no conversion step slowing down development
> 
> So, I expect that if the approach is at all viable, it'll quickly become
> the One Obvious Way to do it.

Well, with all due respect, this is hand-waving. Sure, if it's
viable, then fine. The question is if it's "viable", precisely. That
depends on which project we're talking about.

> In effect, 2to3 is a "purity" solution, but
> six is more like a "practicality" solution.

This sounds like your personal interpretation. I see nothing "pure" in
2to3.

Regards

Antoine.

From pje at telecommunity.com  Wed Dec 14 03:42:48 2011
From: pje at telecommunity.com (PJ Eby)
Date: Tue, 13 Dec 2011 21:42:48 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <20111214013024.74addba7@pitrou.net>
References: <1323320919.2710.24.camel@thinko>
	<E39AB326-CCA4-404D-9B16-2BBB090B83B0@twistedmatrix.com>
	<4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<20111213172423.2c567d8b@pitrou.net>
	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>
	<20111214013024.74addba7@pitrou.net>
Message-ID: <CALeMXf5dtaDxEMtcJM=ifua_u8oVFV8vc9+E5zeSWo9zhZcoJw@mail.gmail.com>

On Tue, Dec 13, 2011 at 7:30 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Tue, 13 Dec 2011 14:02:45 -0500
> PJ Eby <pje at telecommunity.com> wrote:
> >
> > Among other things, it means that:
> >
> > * There's only one codebase
> > * If the conversion isn't perfect, you only have to fix it once
> > * Line numbers are the same
> > * There's no conversion step slowing down development
> >
> > So, I expect that if the approach is at all viable, it'll quickly become
> > the One Obvious Way to do it.
>
> Well, with all due respect, this is hand-waving. Sure, if it's
> viable, then fine. The question is if it's "viable", precisely. That
> depends on which project we're talking about.
>

What I'm saying is that it has many characteristics that are desirable for
people who need to support Python 2 and 3 - which is likely the most common
use case for library developers.

> In effect, 2to3 is a "purity" solution, but
> > six is more like a "practicality" solution.
>
> This sounds like your personal interpretation. I see nothing "pure" in
> 2to3.
>

It's "pure" in being optimized for a world where you just stop using Python
2 one day, and start using 3 the next, without any crossover support.

As someone else pointed out, this is a more common case for application
developers than for library developers.  However, until the libraries are
ported, it's harder for the app developers to port their apps.

Anyway, if you're supporting both 2 and 3, a common code base offers many
attractions, so if it can be done, it will.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111213/768ed839/attachment.html>

From tjreedy at udel.edu  Wed Dec 14 05:29:27 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 13 Dec 2011 23:29:27 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <20111213233628.1828.718618756.divmod.xquotient.271@localhost.localdomain>
References: <1323320919.2710.24.camel@thinko> <4EE12BAA.1050601@v.loewis.de>
	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>
	<jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<loom.20111213T163802-841@post.gmane.org>
	<jc8gib$6h3$1@dough.gmane.org>
	<20111213233628.1828.718618756.divmod.xquotient.271@localhost.localdomain>
Message-ID: <jc98na$g3q$1@dough.gmane.org>

On 12/13/2011 6:36 PM, exarkun at twistedmatrix.com wrote:
> On 09:37 pm, tjreedy at udel.edu wrote:
>> On 12/13/2011 10:54 AM, Vinay Sajip wrote:
>>> I started writing a tool today, tentatively called '2to23', which
>>> aims to do
>>> this. It's basically 2to3, but with a package of custom fixers in a
>>> package
>>> 'lib2to23.fixers' adapted from the corresponding fixers in lib2to3.
>>
>> When, some year in the future, people want to drop Python 2
>> compatibility from their Python23 code, they will need a 23to3 tool.
>
> No, they will not.

Yes they will, if you read my conditional statement properly.

Anyway, quibbling over the meaning of 'need' is quite useless. It has 
two shades of meaning: lack of something required, and lack of something 
desired. You could have made the valid part of your point without 
starting off as you did.

But I already implied that removal is less urgent when I wrote "When, 
some year in the future...".

> They only need a 2to3 or 2to6 tool because Python 2
> and Python 3 are not compatible with each other, but they want one
> program to be valid in Python 2 and Python 3 simultaneously.

They *need* the extra stuff inserted. They do not *want* to insert by 
hand. So by your narrow meaning of 'need', one could say that having the 
insertion done by program is a want, not a need.

> When they decide they no longer care about Python 2, they can just stop
> taking care to keep their program valid as Python 2 and only take care
> to keep it a valid Python 3 program. There's no specific change to make,
> just a different approach to take with future maintenance.
>
> You might say that they will *want* to immediately discard all of their
> legacy Python 2 support code. I suspect many of them will not want this;
> but either way it's a want, not a need.

If and when someone wants the extra stuff removed to eliminated both the 
extra run-time and mental overhead of having it around, and they do not 
want to remove it by hand, they will want and therefore need in the more 
general sense to have it done automatically. In both cases, addition and 
removal, the process is tedious and error-prone if done by hand.

-- 
Terry Jan Reedy


From tjreedy at udel.edu  Wed Dec 14 05:51:00 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 13 Dec 2011 23:51:00 -0500
Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions
In-Reply-To: <4EE7E764.5050008@v.loewis.de>
References: <CA+OGgf6CrT5Pi+f698qK9PnYfB0WqhHncwXZMsDuhLznYPYgNw@mail.gmail.com>
	<4EE704D6.5000901@v.loewis.de>
	<CA+OGgf50PmG24op6ADkNRcXfv3hm2GwsRx=+G1=HN9nF5xfzEA@mail.gmail.com>
	<4EE7E764.5050008@v.loewis.de>
Message-ID: <jc99vn$mgp$1@dough.gmane.org>

On 12/13/2011 7:01 PM, "Martin v. L?wis" wrote:

>> What I'm asking is that
>> (1)  The other values be documented as reserved, rather than as illegal.
> How is that different?
>> (2)  The macros produce an error rather than silently corrupting data.
> In debug mode, or release mode? -1 on release mode.

These two requests seem slight contradictory. Non-official __xxx__ names 
are reserved for future use but not illegal now for user-use, and 
user-generated examples do not raise an exception. They simply do not 
get any special attention unless and until given an official meaning. 
Then too bad if that breaks code.

So by analogy, reserved type value would be ignored, neither corrupting 
data or raising errors, until put in use. But I don't know how 
easy/practical that would be.

Or maybe more to the point, how expensive a check would be. Not checking 
names for reservedness is the easiest thing to do.

-- 
Terry Jan Reedy



From regebro at gmail.com  Wed Dec 14 08:21:07 2011
From: regebro at gmail.com (Lennart Regebro)
Date: Wed, 14 Dec 2011 08:21:07 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
Message-ID: <CAL0kPAVxdenaw47NA25wQZYbSJOF28qKrnSNey7zdnkFPNSxHA@mail.gmail.com>

On Tue, Dec 13, 2011 at 23:38, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Wed, Dec 14, 2011 at 8:17 AM, Michael Foord
> <fuzzyman at voidspace.org.uk> wrote:
>> More specifically "six" [1] is the name of Benjamin Peterson's support
>> package to help write code that works on both 2 and 3. So the idea is that
>> the conversion isn't just a straight syntax conversion - but that it [could]
>> generate code using this library.
>
> The thing is, the code you want to generate varies depending on
> whether you want to target 2.6+, or include 2.5 and earlier.

Sure. This is different fixers, and then script to run it could have a
parameter for version.
I'd expect though that a 2to6 first targets 2.6+, and possibly never
end up supporting 2.5 at all. I do realize there still is 2.4 out in
the wild, but fewer and fewer people need to support it, and the
effort to support it is much higher.

> String translation is also an open question. For some codebases, you
> want both u"" and "" to translate to a Unicode "" (either in Py3k or
> via the future import), but if a code base deals with WSGI-style
> native strings (by means of u"" for text, "" for native, b"" for
> binary), then the more appropriate translation is to use the future
> import and map them to "", str("") and b"" respectively.

Yeah, that can't be done automatically. There is no generic way to
determine if a string should be binary, unicode or native.

From victor.stinner at haypocalc.com  Wed Dec 14 09:31:40 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 14 Dec 2011 09:31:40 +0100
Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions
In-Reply-To: <CA+OGgf6CrT5Pi+f698qK9PnYfB0WqhHncwXZMsDuhLznYPYgNw@mail.gmail.com>
References: <CA+OGgf6CrT5Pi+f698qK9PnYfB0WqhHncwXZMsDuhLznYPYgNw@mail.gmail.com>
Message-ID: <2338734.i6kI0g9g3P@ned>

Le mardi 13 d?cembre 2011 02:09:02 Jim Jewett a ?crit :
> (3)  I would feel much less nervous if the remaining 4 values of
> PyUnicode_Kind were explicitly reserved, and the macros raised an
> error when they showed up.  (Better still would be to allow other
> values, and to have the macros delegate to some attribute on the (sub)
> type object.)

A macro is not supposed to raise an error. In debug mode, 
_PyUnicode_CheckConsistency() ensures that the kind is valid and 
PyUnicode_KIND() fails with an assertion error if kind is 
PyUnicode_WCHAR_KIND.

Python cannot create a string with a kind different than PyUnicode_1BYTE_KIND, 
PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND (the legacy API creates strings 
with a temporary PyUnicode_WCHAR_KIND kind, kind quickly replaces by 
PyUnicode_READY).

If you write your own extension generating an invalid string, I don't think 
that Python can help you. Python cannot check all data, it would be too slow.

If we change something, I would suggest  to remove PyUnicode_WCHAR_KIND from 
the PyUnicode_Kind, so you can be sure that PyUnicode_KIND() result is an enum 
with 3 possible values (PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or 
PyUnicode_4BYTE_KIND). It would help to make quiet the compiler on switch/case 
;-)

Victor

From martin at v.loewis.de  Wed Dec 14 10:15:00 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 14 Dec 2011 10:15:00 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <CALeMXf5dtaDxEMtcJM=ifua_u8oVFV8vc9+E5zeSWo9zhZcoJw@mail.gmail.com>
References: <1323320919.2710.24.camel@thinko>	<37AC50BA-EE24-4CFC-8B16-8A2C567A6F9F@twistedmatrix.com>	<jbrq67$28o$1@dough.gmane.org>	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>	<20111208223408.0e2e8bd1@limelight.wooz.org>	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>	<20111209101123.01e92326@limelight.wooz.org>	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>	<1323679242.2710.350.camel@thinko>	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>	<1323724720.2710.388.camel@thinko>	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>	<4EE75634.6000208@voidspace.org.uk>	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>	<20111213172423.2c567d8b@pitrou.net>	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>	<20111214013024.74addba7@pitrou.net>
	<CALeMXf5dtaDxEMtcJM=ifua_u8oVFV8vc9+E5zeSWo9zhZcoJw@mail.gmail.com>
Message-ID: <4EE86914.1050904@v.loewis.de>

>     > In effect, 2to3 is a "purity" solution, but
>     > six is more like a "practicality" solution.
> 
>     This sounds like your personal interpretation. I see nothing "pure" in
>     2to3.
> 
> 
> It's "pure" in being optimized for a world where you just stop using
> Python 2 one day, and start using 3 the next, without any crossover support.

That's not true. 2to3 is well suited for supporting both 2 and 3 from
the same code base, and reduces the number of compromises you have to
make compared to an identical-source approach (more dramatically so
if you also want to support 2.5 or 2.4).

> Anyway, if you're supporting both 2 and 3, a common code base offers
> many attractions, so if it can be done, it will.

And 2to3 is a good approach to maintaining a common code base.

Regards,
Martin

From solipsis at pitrou.net  Wed Dec 14 10:58:42 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 14 Dec 2011 10:58:42 +0100
Subject: [Python-Dev] PyUnicodeObject / PyASCIIObject questions
References: <CA+OGgf6CrT5Pi+f698qK9PnYfB0WqhHncwXZMsDuhLznYPYgNw@mail.gmail.com>
	<4EE704D6.5000901@v.loewis.de>
	<CA+OGgf50PmG24op6ADkNRcXfv3hm2GwsRx=+G1=HN9nF5xfzEA@mail.gmail.com>
	<4EE7E764.5050008@v.loewis.de> <jc99vn$mgp$1@dough.gmane.org>
Message-ID: <20111214105842.1eea1ced@pitrou.net>

On Tue, 13 Dec 2011 23:51:00 -0500
Terry Reedy <tjreedy at udel.edu> wrote:
> So by analogy, reserved type value would be ignored, neither corrupting 
> data or raising errors, until put in use.

That simply doesn't make sense.

Regards

Antoine.



From tseaver at palladion.com  Wed Dec 14 17:33:32 2011
From: tseaver at palladion.com (Tres Seaver)
Date: Wed, 14 Dec 2011 11:33:32 -0500
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE86914.1050904@v.loewis.de>
References: <1323320919.2710.24.camel@thinko>	<jbrq67$28o$1@dough.gmane.org>	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>	<20111208223408.0e2e8bd1@limelight.wooz.org>	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>	<20111209101123.01e92326@limelight.wooz.org>	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>	<1323679242.2710.350.camel@thinko>	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>	<1323724720.2710.388.camel@thinko>	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>	<4EE75634.6000208@voidspace.org.uk>	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>	<20111213172423.2c567d8
	b@pitrou.net>	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>	<20111214013024.74addba7@pitrou.net>
	<CALeMXf5dtaDxEMtcJM=ifua_u8oVFV8vc9+E5zeSWo9zhZcoJw@mail.gmail.com>
	<4EE86914.1050904@v.loewis.de>
Message-ID: <jcaj4s$j3h$1@dough.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/14/2011 04:15 AM, "Martin v. L?wis" wrote:

>> It's "pure" in being optimized for a world where you just stop
>> using Python 2 one day, and start using 3 the next, without any
>> crossover support.
> 
> That's not true. 2to3 is well suited for supporting both 2 and 3 from 
> the same code base, and reduces the number of compromises you have to 
> make compared to an identical-source approach (more dramatically so if
> you also want to support 2.5 or 2.4).
> 
>> Anyway, if you're supporting both 2 and 3, a common code base
>> offers many attractions, so if it can be done, it will.
> 
> And 2to3 is a good approach to maintaining a common code base.


Not in the experience of the folks who are actually doing that task:  the
overhead of running 2to3 every time 'setup.py develop' etc. runs dooms
the effort.  For instance, we have a report that the 2to3 step takes more
than half an hour (on at least one user's development machine) when
installing / refreshing zope.interface in a Python 3.2 virtualenv.  (Note
that I'm in the process of getting that package's unit test coverage up
to snuff before ripping out the 2to3 support in favor of a subset).

Using 2to3 during ongoing development makes Python feel like Java/C++,
where "get a cup of coffee while we rebuild the world" is a frequent
occurence.



Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7oz9wACgkQ+gerLs4ltQ7i4wCgh+9GliqukApx1skTs/0AnjKU
CUMAoLzzkctR0gcSBR3qBxZmsAg1kvvt
=FVtj
-----END PGP SIGNATURE-----


From martin at v.loewis.de  Wed Dec 14 18:23:12 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 14 Dec 2011 18:23:12 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <jcaj4s$j3h$1@dough.gmane.org>
References: <1323320919.2710.24.camel@thinko>	<jbrq67$28o$1@dough.gmane.org>	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>	<20111208223408.0e2e8bd1@limelight.wooz.org>	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>	<20111209101123.01e92326@limelight.wooz.org>	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>	<1323679242.2710.350.camel@thinko>	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>	<1323724720.2710.388.camel@thinko>	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>	<4EE75634.6000208@voidspace.org.uk>	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>	<20111213172423.2c567d8	b@pitrou.net>	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>	<20111214013024.74addba7@pitrou.net>	<CALeMXf5dtaDxEMtcJM=ifua_u8oVFV8vc9+E5zeSWo9zhZcoJw@mail.gmail.com>	<4EE86914.1050904@v.loewis.de>
	<jcaj4s$j3h$1@dough.gmane.org>
Message-ID: <4EE8DB80.5050007@v.loewis.de>

>> And 2to3 is a good approach to maintaining a common code base.
> 
> 
> Not in the experience of the folks who are actually doing that task:

Well, I personally actually *did* the task, so that experience certainly
isn't universal.

> the
> overhead of running 2to3 every time 'setup.py develop' etc. runs dooms
> the effort.

How so? Running 2to3 after every change is very fast. I never use
setup.py develop, though.

> Using 2to3 during ongoing development makes Python feel like Java/C++,
> where "get a cup of coffee while we rebuild the world" is a frequent
> occurence.

Unfortunately, these issues never get reported. I worked on porting
zope.interface, and it never took 30 minutes for me, not even remotely.

Regards,
Martin

From stefan_ml at behnel.de  Wed Dec 14 19:05:54 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 14 Dec 2011 19:05:54 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <4EE8DB80.5050007@v.loewis.de>
References: <1323320919.2710.24.camel@thinko>	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>	<20111208223408.0e2e8bd1@limelight.wooz.org>	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>	<20111209101123.01e92326@limelight.wooz.org>	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>	<1323679242.2710.350.camel@thinko>	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>	<1323724720.2710.388.camel@thinko>	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>	<4EE75634.6000208@voidspace.org.uk>	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>	<20111213172423.2c567d8	b@pitrou.net>	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1
	MTgpJuLVWQ@mail.gmail.com>	<20111214013024.74addba7@pitrou.net>	<CALeMXf5dtaDxEMtcJM=ifua_u8oVFV8vc9+E5zeSWo9zhZcoJw@mail.gmail.com>	<4EE86914.1050904@v.loewis.de>	<jcaj4s$j3h$1@dough.gmane.org>
	<4EE8DB80.5050007@v.loewis.de>
Message-ID: <jcaoi2$tjs$1@dough.gmane.org>

"Martin v. L?wis", 14.12.2011 18:23:
>> overhead of running 2to3 every time 'setup.py develop' etc. runs dooms
>> the effort.
>
> How so? Running 2to3 after every change is very fast. I never use
> setup.py develop, though.

I think the problem starts with the fact that it needs to be run in the 
first place. It's not enough any more to just fire up the interpreter and 
run a test, you first have to build your code before you can get back to 
work, and it gets moved away into a separate directory and runs from there. 
So your workspace looks different depending on the environment you are 
currently testing with, and all your development tools have to support that 
as well.

Even if the build step does not take half an hour, it's an otherwise 
unnecessary step that makes working and testing with Python 3 substantially 
less comfortable, and thus less likely to happen. And we all know where a 
reluctance against testing leads us.

And, just for the record, we use 2to3 for Cython's code base, and I'm not 
convinced that this was a good decision. Testing the code in Py3 is 
actually something that I avoid if not strictly necessary, and that I leave 
to our CI server in most cases.

I'm much more happy with lxml which was ported before there even was a 
2to3, so it works on 2 and 3 out of the box. That alone makes it much nicer 
to develop on, and I think that it was clearly worth the additional porting 
work at the time.

Stefan


From martin at v.loewis.de  Wed Dec 14 19:14:28 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 14 Dec 2011 19:14:28 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jc4g2m$5hn$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>	<4EE1C9AB.2040301@v.loewis.de>	<jbsile$4vu$1@dough.gmane.org>	<4EE53139.8020500@v.loewis.de>
	<jc4g2m$5hn$1@dough.gmane.org>
Message-ID: <4EE8E784.2050406@v.loewis.de>

Am 12.12.2011 10:04, schrieb Stefan Behnel:
> "Martin v. L?wis", 11.12.2011 23:39:
>>> I can't recall anyone working on any substantial improvements during the
>>> last six years or so, and the reason for that seems obvious to me.
>>
>> What do you think is the reason? It's not at all obvious to me.
> 
> Just to repeat myself for the third time here: lack of interest.

Ah, that's certainly wrong. I am interested in these libraries.

Regards,
Martin

From martin at v.loewis.de  Wed Dec 14 19:18:13 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 14 Dec 2011 19:18:13 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jc4j9r$rbt$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>	<4EE1C9AB.2040301@v.loewis.de>	<E8A216E9-6AC9-4307-8250-B9B185604B82@masklinn.net>	<4EE528BD.2040102@v.loewis.de>
	<jc4j9r$rbt$1@dough.gmane.org>
Message-ID: <4EE8E865.9070307@v.loewis.de>

> Just look through the xml-sig page, basically all requests regarding
> PyXML during the last five years deal with problems in installing it,
> i.e. *before* even starting to use it. So you can't use this to claim
> that people really *are* still using it.

I'm not so sure. In many of these cases, it turned out that they were
trying to run some software that uses PyXML, and that they tried
installing PyXML to satisfy the prerequisite. So while they may not
be software developers, they are indirectly "users" of PyXML.

Regards,
Martin

From cpmicropro at gmail.com  Wed Dec 14 17:04:49 2011
From: cpmicropro at gmail.com (Hossein)
Date: Wed, 14 Dec 2011 19:34:49 +0330
Subject: [Python-Dev] Compiling the source without stat
Message-ID: <4EE8C921.9000503@gmail.com>

Hi. I just started to port latest python 2.7.2 to another platform 
(don't think you're eager to know... well it's CASIO ClassPad).
And I faced a "minor" problem... It hasn't got stat or fstat or 
anything. (It supports a very limited set of c std lib).
As pyport.c suggested, i defined both DONT_HAVE_STAT and 
DONT_HAVE_FSTAT, but problems only began.
It failed to compile most of import.c, particularly because it fell into 
the wrong `#if !defined(PYOS_Something)' blocks. Sometimes it just fell 
into an #else part which assumed stat are available. So although 
HAVE_STAT is meant to control file operations, most of the source code 
aren't implement to use it. You see how "minor" the problem was?
So now I need advice from developers.
Is there a fix for it? What a question... definitely no replacement to stat.
It's 99% definite that I can't compile further without touching the 
source code. I have to #define my own PYOS_whatever and handle files in 
my own way. In that case where should my special file handling cases go? 
I saw some marshal.c code which seemed it wants to abstract away 
platform's file handling from source code; but from what I understood it 
can't be made to use alternate file handling methods.
If there is anything I should do (maybe show you my handmade 
pyconfig.h?) tell me.
[My first post in a mailing list... Should I say] Best Regards, Hossein 
[in here?]

From petri at digip.org  Wed Dec 14 20:26:29 2011
From: petri at digip.org (Petri Lehtinen)
Date: Wed, 14 Dec 2011 21:26:29 +0200
Subject: [Python-Dev] Compiling the source without stat
In-Reply-To: <4EE8C921.9000503@gmail.com>
References: <4EE8C921.9000503@gmail.com>
Message-ID: <20111214192629.GA2054@ihaa>

Hossein wrote:
> Hi. I just started to port latest python 2.7.2 to another platform
> (don't think you're eager to know... well it's CASIO ClassPad).
> And I faced a "minor" problem... It hasn't got stat or fstat or
> anything. (It supports a very limited set of c std lib).
> As pyport.c suggested, i defined both DONT_HAVE_STAT and
> DONT_HAVE_FSTAT, but problems only began.
> It failed to compile most of import.c, particularly because it fell
> into the wrong `#if !defined(PYOS_Something)' blocks. Sometimes it
> just fell into an #else part which assumed stat are available. So
> although HAVE_STAT is meant to control file operations, most of the
> source code aren't implement to use it. You see how "minor" the
> problem was?
> So now I need advice from developers.
> Is there a fix for it? What a question... definitely no replacement to stat.
> It's 99% definite that I can't compile further without touching the
> source code. I have to #define my own PYOS_whatever and handle files
> in my own way. In that case where should my special file handling
> cases go? I saw some marshal.c code which seemed it wants to
> abstract away platform's file handling from source code; but from
> what I understood it can't be made to use alternate file handling
> methods.
> If there is anything I should do (maybe show you my handmade
> pyconfig.h?) tell me.

See http://bugs.python.org/issue12082. Currently neither Python 2.x
nor 3.x can be compiled without stat() or fstat(). Python 2.7 almost
compiles, but Python 3 depends heavily on them.

The problem boils down to the fact that you cannot really check
whether a filesystem entry is a directory without calling stat() or
fstat().

My personal opinion is that support for DONT_HAVE_STAT and
DONT_HAVE_FSTAT defines should be dropped because they don't work, and
would only be useful in a very limited set of cases.

> [My first post in a mailing list... Should I say] Best Regards,
n> Hossein [in here?]

Yeah, why not? :)

Regards, Petri

From stefan_ml at behnel.de  Wed Dec 14 20:41:42 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 14 Dec 2011 20:41:42 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4EE8E784.2050406@v.loewis.de>
References: <jbsfar$en7$1@dough.gmane.org>	<4EE1C9AB.2040301@v.loewis.de>	<jbsile$4vu$1@dough.gmane.org>	<4EE53139.8020500@v.loewis.de>	<jc4g2m$5hn$1@dough.gmane.org>
	<4EE8E784.2050406@v.loewis.de>
Message-ID: <jcau5m$89n$1@dough.gmane.org>

"Martin v. L?wis", 14.12.2011 19:14:
> Am 12.12.2011 10:04, schrieb Stefan Behnel:
>> "Martin v. L?wis", 11.12.2011 23:39:
>>>> I can't recall anyone working on any substantial improvements during the
>>>> last six years or so, and the reason for that seems obvious to me.
>>>
>>> What do you think is the reason? It's not at all obvious to me.
>>
>> Just to repeat myself for the third time here: lack of interest.
>
> Ah, that's certainly wrong. I am interested in these libraries.

I meant: "lack of interest in improving them". It's clear from the 
discussion that there are still users and that new code is still being 
written that uses MiniDOM. However, I would argue that this cannot possibly 
be performance critical code and that it only deals with somewhat small 
documents. I say that because MiniDOM is evidently not suitable for large 
documents or performance critical applications, so this is the only 
explanation I have why the performance problems would not be obvious in the 
cases where it is still being used. And if they do show, it appears to be 
much more likely that users rewrite their code using ElementTree or lxml 
than that they try to fix MiniDOM's performance issues.

Now, read my first quote above again (and preferably also its context, 
which I already emphasized in a previous post), it should be clearer now.

Stefan


From martin at v.loewis.de  Wed Dec 14 20:51:15 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 14 Dec 2011 20:51:15 +0100
Subject: [Python-Dev] Compiling the source without stat
In-Reply-To: <4EE8C921.9000503@gmail.com>
References: <4EE8C921.9000503@gmail.com>
Message-ID: <4EE8FE33.8040906@v.loewis.de>

> It's 99% definite that I can't compile further without touching the
> source code. I have to #define my own PYOS_whatever and handle files in
> my own way. In that case where should my special file handling cases go?

It's difficult to say how to proceed. On one hand, I don't see an
overwhelming need to support systems without stat, and am tempted
to say that you are on your own.

On the other hand, it appears that people keep asking for it, from
time to time. So if it was possible to support such systems without
making the code too convoluted, it may be worth supporting it.

One thing seems clear: without stat(), we cannot possibly support
.pyc files, at least not in __pycache__. So one consequence of
a lacking stat should be that all the code dealing with caching of
byte code files needs to be disabled. Supporting .pyc as modules
might still be possible.

It's questionable how to deal with path searching in the absence
of stat. Testing for the presence of a file is possible in principle
by trying to open the file, and closing it when it was found to be
present. So in the places where we only check for the presence of
a file, an alternative implementation could be provided.

In any case, it needs someone to champion such a project, preferably
in an ongoing manner (i.e. several years). So if you are interested,
you should
- volunteer to maintain stat-less systems for some time
- create a port of Python 3 that works stat-less
- come back to python-dev for review to determine whether it's
  worth to support such systems.

Alternatively, you can just make your own fork of Python, which
you may or may not publish.

Regards,
Martin

From catch-all at masklinn.net  Wed Dec 14 20:54:50 2011
From: catch-all at masklinn.net (Xavier Morel)
Date: Wed, 14 Dec 2011 20:54:50 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jcau5m$89n$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>	<4EE1C9AB.2040301@v.loewis.de>	<jbsile$4vu$1@dough.gmane.org>	<4EE53139.8020500@v.loewis.de>	<jc4g2m$5hn$1@dough.gmane.org>
	<4EE8E784.2050406@v.loewis.de> <jcau5m$89n$1@dough.gmane.org>
Message-ID: <4503F565-5CD6-476A-9697-16FC5517659A@masklinn.net>

On 2011-12-14, at 20:41 , Stefan Behnel wrote:
> I meant: "lack of interest in improving them". It's clear from the discussion that there are still users and that new code is still being written that uses MiniDOM. However, I would argue that this cannot possibly be performance critical code and that it only deals with somewhat small documents. I say that because MiniDOM is evidently not suitable for large documents or performance critical applications, so this is the only explanation I have why the performance problems would not be obvious in the cases where it is still being used. And if they do show, it appears to be much more likely that users rewrite their code using ElementTree or lxml than that they try to fix MiniDOM's performance issues.
Could also be because "XML is slow (and sucks)" is part of the global consciousness at this point, and that minidom is slow and verbose doesn't surprise much.

From stefan_ml at behnel.de  Wed Dec 14 20:59:06 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 14 Dec 2011 20:59:06 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4503F565-5CD6-476A-9697-16FC5517659A@masklinn.net>
References: <jbsfar$en7$1@dough.gmane.org>	<4EE1C9AB.2040301@v.loewis.de>	<jbsile$4vu$1@dough.gmane.org>	<4EE53139.8020500@v.loewis.de>	<jc4g2m$5hn$1@dough.gmane.org>	<4EE8E784.2050406@v.loewis.de>
	<jcau5m$89n$1@dough.gmane.org>
	<4503F565-5CD6-476A-9697-16FC5517659A@masklinn.net>
Message-ID: <jcav6a$ev4$1@dough.gmane.org>

Xavier Morel, 14.12.2011 20:54:
> On 2011-12-14, at 20:41 , Stefan Behnel wrote:
>> I meant: "lack of interest in improving them". It's clear from the
>> discussion that there are still users and that new code is still being
>> written that uses MiniDOM. However, I would argue that this cannot
>> possibly be performance critical code and that it only deals with
>> somewhat small documents. I say that because MiniDOM is evidently not
>> suitable for large documents or performance critical applications, so
>> this is the only explanation I have why the performance problems would
>> not be obvious in the cases where it is still being used. And if they
>> do show, it appears to be much more likely that users rewrite their
>> code using ElementTree or lxml than that they try to fix MiniDOM's
>> performance issues.
>
> Could also be because "XML is slow (and sucks)" is part of the global
> consciousness at this point, and that minidom is slow and verbose
> doesn't surprise much.

Possibly, yes. Or that "Python is slow and sucks". But I think there are 
good counter arguments against both.

Stefan


From tjreedy at udel.edu  Wed Dec 14 21:29:43 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 14 Dec 2011 15:29:43 -0500
Subject: [Python-Dev] Compiling the source without stat
In-Reply-To: <20111214192629.GA2054@ihaa>
References: <4EE8C921.9000503@gmail.com> <20111214192629.GA2054@ihaa>
Message-ID: <jcb0vs$tjt$1@dough.gmane.org>

On 12/14/2011 2:26 PM, Petri Lehtinen wrote:

> The problem boils down to the fact that you cannot really check
> whether a filesystem entry is a directory without calling stat() or
> fstat().
>
> My personal opinion is that support for DONT_HAVE_STAT and
> DONT_HAVE_FSTAT defines should be dropped because they don't work, and
> would only be useful in a very limited set of cases.

At present, it seems to be an attractive nuisance, tempting people like 
Hossein to try something that does not work.

-- 
Terry Jan Reedy


From martin at v.loewis.de  Wed Dec 14 22:20:14 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 14 Dec 2011 22:20:14 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jcau5m$89n$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>	<4EE1C9AB.2040301@v.loewis.de>	<jbsile$4vu$1@dough.gmane.org>	<4EE53139.8020500@v.loewis.de>	<jc4g2m$5hn$1@dough.gmane.org>	<4EE8E784.2050406@v.loewis.de>
	<jcau5m$89n$1@dough.gmane.org>
Message-ID: <4EE9130E.7020104@v.loewis.de>

Am 14.12.2011 20:41, schrieb Stefan Behnel:
> "Martin v. L?wis", 14.12.2011 19:14:
>> Am 12.12.2011 10:04, schrieb Stefan Behnel:
>>> "Martin v. L?wis", 11.12.2011 23:39:
>>>>> I can't recall anyone working on any substantial improvements
>>>>> during the
>>>>> last six years or so, and the reason for that seems obvious to me.
>>>>
>>>> What do you think is the reason? It's not at all obvious to me.
>>>
>>> Just to repeat myself for the third time here: lack of interest.
>>
>> Ah, that's certainly wrong. I am interested in these libraries.
> 
> I meant: "lack of interest in improving them".

That's also what I meant. I'm interested in improving them.

> Now, read my first quote above again (and preferably also its context,
> which I already emphasized in a previous post), it should be clearer now.

I (now) know what you mean - but you are incorrect.

Regards,
Martin

From stefan_ml at behnel.de  Wed Dec 14 22:47:17 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 14 Dec 2011 22:47:17 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <4EE9130E.7020104@v.loewis.de>
References: <jbsfar$en7$1@dough.gmane.org>	<4EE1C9AB.2040301@v.loewis.de>	<jbsile$4vu$1@dough.gmane.org>	<4EE53139.8020500@v.loewis.de>	<jc4g2m$5hn$1@dough.gmane.org>	<4EE8E784.2050406@v.loewis.de>	<jcau5m$89n$1@dough.gmane.org>
	<4EE9130E.7020104@v.loewis.de>
Message-ID: <jcb5h6$2n2$1@dough.gmane.org>

"Martin v. L?wis", 14.12.2011 22:20:
> Am 14.12.2011 20:41, schrieb Stefan Behnel:
>> "Martin v. L?wis", 14.12.2011 19:14:
>>> Am 12.12.2011 10:04, schrieb Stefan Behnel:
>>>> "Martin v. L?wis", 11.12.2011 23:39:
>>>>>> I can't recall anyone working on any substantial improvements
>>>>>> during the
>>>>>> last six years or so, and the reason for that seems obvious to me.
>>>>>
>>>>> What do you think is the reason? It's not at all obvious to me.
>>>>
>>>> Just to repeat myself for the third time here: lack of interest.
>>>
>>> Ah, that's certainly wrong. I am interested in these libraries.
>>
>> I meant: "lack of interest in improving them".
>
> That's also what I meant. I'm interested in improving them.

Then please do. I posted the numbers, so you know what the baseline is, 
both in terms of speed and memory usage. If you need further benchmarks of 
other areas of the API (e.g. tag search or whatever), just ask.

Note, however, that even an improvement by an order of magnitude wouldn't 
solve the API issue for new users, so I'd still suggest to add an 
appropriate link towards ET to the MiniDOM documentation.

Stefan


From regebro at gmail.com  Wed Dec 14 22:57:37 2011
From: regebro at gmail.com (Lennart Regebro)
Date: Wed, 14 Dec 2011 22:57:37 +0100
Subject: [Python-Dev] readd u'' literal support in 3.3?
In-Reply-To: <jcaj4s$j3h$1@dough.gmane.org>
References: <1323320919.2710.24.camel@thinko> <jbrq67$28o$1@dough.gmane.org>
	<CAL0kPAX7rGcc76qjG8jOYwV=dj-famn=_oETaQaQ_YS_S65OwQ@mail.gmail.com>
	<20111208223408.0e2e8bd1@limelight.wooz.org>
	<CAL0kPAXAizTFR30M5+Sh-KdmdsinW=0TsNL7HjKhQ3cco+gU1g@mail.gmail.com>
	<20111209101123.01e92326@limelight.wooz.org>
	<CALeMXf6Ku-r49SmsWAPY9xsiL8nSo-LadP33bS-thSL2QbYeAg@mail.gmail.com>
	<CADiSq7dndySuPaBcBNLdXmHpCRGRU=g0rPqwmP5Uqt6sCmfMsw@mail.gmail.com>
	<1323679242.2710.350.camel@thinko>
	<CALeMXf6y_2cHzcbtaiDej01qdmFTbNa5B99sMg3+3TA1ybwY1A@mail.gmail.com>
	<1323724720.2710.388.camel@thinko>
	<op.v6fjygmc2y3dqy@laurence-rowes-macbook-3.local>
	<4EE75634.6000208@voidspace.org.uk>
	<op.v6fmhtb32y3dqy@laurence-rowes-macbook-3.local>
	<CALeMXf5SS9x4p6iZCFangddPc-n6mze=RSGXA7y1MTgpJuLVWQ@mail.gmail.com>
	<20111214013024.74addba7@pitrou.net>
	<CALeMXf5dtaDxEMtcJM=ifua_u8oVFV8vc9+E5zeSWo9zhZcoJw@mail.gmail.com>
	<4EE86914.1050904@v.loewis.de> <jcaj4s$j3h$1@dough.gmane.org>
Message-ID: <CAL0kPAV1+V2-6Mf2rKCYnprPtHeEiJVBpa9x2abGC9r=VKZRzQ@mail.gmail.com>

On Wed, Dec 14, 2011 at 17:33, Tres Seaver <tseaver at palladion.com> wrote:
> Not in the experience of the folks who are actually doing that task: ?the
> overhead of running 2to3 every time 'setup.py develop' etc. runs dooms
> the effort. ?For instance, we have a report that the 2to3 step takes more
> than half an hour (on at least one user's development machine) when
> installing / refreshing zope.interface in a Python 3.2 virtualenv.

If that is true, then there has to be a bug somewhere...
I might not have tried on 3.2 with virtualenv, but it doesn't take
anywhere near that time normally, and this is not a normal runtime at
all. When we are talking about 2to3 being slow here we are talking
about it taking 10 seconds to install a software that would have taken
under a second to install on Python 2. (Yes, I'm thinking of
Distribute, I just checked. ;-) ).

//Lennart

From techtonik at gmail.com  Thu Dec 15 09:58:31 2011
From: techtonik at gmail.com (anatoly techtonik)
Date: Thu, 15 Dec 2011 11:58:31 +0300
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <j5k46t$dc6$1@dough.gmane.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
Message-ID: <CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>

On Sat, Sep 24, 2011 at 11:27 AM, Georg Brandl <g.brandl at gmx.net> wrote:

> Am 24.09.2011 01:32, schrieb Guido van Rossum:
> > On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik <techtonik at gmail.com>
> wrote:
> >> Currently if you work in console and define a function and then
> >> immediately call it - it will fail with SyntaxError.
> >> For example, copy paste this completely valid Python script into
> console:
> >>
> >> def some():
> >>  print "XXX"
> >> some()
> >>
> >> There is an issue for that that was just closed by Eric. However, I'd
> >> like to know if there are people here that agree that if you paste a
> >> valid Python script into console - it should work without changes.
> >
> > You can't fix this without completely changing the way the interactive
> > console treats blank lines. None that it's not just that a blank line
> > is required after a function definition -- you also *can't* have a
> > blank line *inside* a function definition.
>
> While the former could be changed (I think), the latter certainly cannot.
> So it's probably not worth changing established behavior.


I've just hit this UX bug once more, but now I more prepared. Despite
Guido's proposal to move into python-ideas, I continue discussion here,
because:

1. It is not a proposal, but a defect (well, you may argue, but please,
don't)
2. This thread has a history of analysis of what's going wrong in console
3. This thread also has developer's decision that answers the question
    "why it's so wrong?" and "why it can't/won't be fixed"
4. Yesterday I've heard from a Java person that Python is hard to pick up
    and remembered how I struggled with indentation myself trying to
    'learn by example' in console

Right now I am trying to cope with point (3.). To summarize, let's speak
code
that is copy/pasted into console. Two things that will make me happy if they
behave consistently in console from .py file:

---ex1---
def some():
    print "XXX"
some()
---/ex1---

--ex1.output--
[ex1.py]
XXX
[console]
  File "<stdin>", line 3
    some()
       ^
SyntaxError: invalid syntax
--/ex1.output--


--ex2--
def some():
pass
--/ex2--

--ex2.output--
[ex2.py]
  File "./ex2.py", line 2
    pass
       ^
IndentationError: expected an indented block
[console]
  File "<stdin>", line 2
    pass
       ^
IndentationError: expected an indented block
--/ex2.output--


The second example already works as expected. Why it is not possible to fix
ex1? Guido said:

> You can't fix this without completely changing the way the interactive
> console treats blank lines.

But the fix doesn't require changing the way interactive console treats
blank lines at all. It only requires to finish current block when a
dedented line is encountered and not throwing obviously confusing
SyntaxError. At the very least it should not say it is SyntaxError, because
the code is pretty valid Python code. If it appears to be invalid "Python
Console code" - the error message should say that explicitly. That would be
a correct user-friendly fix for this UX issue, but I'd still like the
behavior to be fixed - i.e. "allow dedented lines end current block in
console without SyntaxError". Right now I don't see the reasons why it is
not possible.

Please speak code when replying about use cases/examples that will be
broken - I didn't quite get the problem with "global scope if" statements.
-- 
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111215/3c72d25e/attachment.html>

From g.rodola at gmail.com  Thu Dec 15 10:40:41 2011
From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=)
Date: Thu, 15 Dec 2011 10:40:41 +0100
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
Message-ID: <CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>

Il 15 dicembre 2011 09:58, anatoly techtonik <techtonik at gmail.com> ha scritto:
> 1. It is not a proposal, but a defect (well, you may argue, but please, don't)>

You can't copy/paste multiline scripts into system shell either,
unless you append "\".
It's likely that similar problems exists in a lot of other interactive
shells (ruby?).
And that makes sense to me, because they are supposed to be used interactively.
It might be good to change this? Maybe.
Is the current behavior objectively wrong? No, in my opinion.

--- Giampaolo
http://code.google.com/p/pyftpdlib/
http://code.google.com/p/psutil/

From hrvoje.niksic at avl.com  Thu Dec 15 11:00:06 2011
From: hrvoje.niksic at avl.com (Hrvoje Niksic)
Date: Thu, 15 Dec 2011 11:00:06 +0100
Subject: [Python-Dev] Compiling the source without stat
In-Reply-To: <4EE8C921.9000503@gmail.com>
References: <4EE8C921.9000503@gmail.com>
Message-ID: <4EE9C526.3000404@avl.com>

On 12/14/2011 05:04 PM, Hossein wrote:
> If there is anything I should do

You can determine what the code that calls stat() is trying to do, and 
implement that with other primitives that your platform provides.  For 
example, you can determine whether a file exists by trying to open it in 
read-only mode and checking the error.  You can find whether a 
filesystem path names a directory by trying to chdir into it and 
checking the error.  You can find the size of a regular file by opening 
it and seeking to the end.  These substitutions would not be acceptable 
for a desktop system, but may be perfectly adequate for an embedded one 
that doesn't provide stat() in the first place.  Either way, I expect 
that you will need to modify the sources.

Finally, are you 100% sure that your platform doesn't provide another 
API similar to stat()?

From vinay_sajip at yahoo.co.uk  Thu Dec 15 11:31:08 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Thu, 15 Dec 2011 10:31:08 +0000 (UTC)
Subject: [Python-Dev] Proposed changes to provide compression support for
	rotated log files
Message-ID: <loom.20111215T112645-234@post.gmane.org>

In response to http://bugs.python.org/issue13516 I'm thinking of implementing
some changes in the rotating file handlers, as outlined here:

http://plumberjack.blogspot.com/2011/12/improved-flexibility-for-log-file.html

The changes (including tests) are almost ready to check in, but I thought I'd
give any one here who's interested a chance to comment, in case they can spot
any shortcomings of the approach I suggest.

Regards,

Vinay Sajip




From cpmicropro at gmail.com  Thu Dec 15 12:59:23 2011
From: cpmicropro at gmail.com (Hossein)
Date: Thu, 15 Dec 2011 15:29:23 +0330
Subject: [Python-Dev] Compiling the source without stat
In-Reply-To: <20111214192629.GA2054@ihaa>
References: <20111214192629.GA2054@ihaa>
Message-ID: <4EE9E11B.6090202@gmail.com>

I wanted to say something in the bug page petri showed ( 
http://bugs.python.org/issue12082 ) however I though about first 
discussing it here. If faking a stat struct and a function to fill it 
solves the problem, and checking for existing files and folders is the 
only thing that python needs to be compiled (i'm talking about 2.7) then 
it's possible to fail-check it by just trying to open the file.

If you don't want to change the stat mechanism, you can create a new 
#define which can let user point it to his own faked stat function and 
struct.
I'm currently trying to fake stat to see what happens next, but I guess 
I will have more problems with file handling later.

By the way, some people with the same problem there said they "used" 
python by setting the Py_DontWriteBytecodeFlag flag, but here my problem 
is that i can't compile it. Dunno what they really did.

From mhazadmanesh2009 at gmail.com  Thu Dec 15 12:41:20 2011
From: mhazadmanesh2009 at gmail.com (Hossein Azadmanesh)
Date: Thu, 15 Dec 2011 15:11:20 +0330
Subject: [Python-Dev] Compiling the source without stat
In-Reply-To: <4EE9C526.3000404@avl.com>
References: <4EE9C526.3000404@avl.com>
Message-ID: <4EE9DCE0.3090701@gmail.com>

It does have its own file handling functions: Opening, getting the size, 
enumerating directories, etc.
It has its own limitations too. No dates supported, folders only one 
level deep, maximum 99 files inside each folder, etc.
There is not a function called stat. But I am considering faking it, 
will explain in another reply.

From solipsis at pitrou.net  Thu Dec 15 14:58:26 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 15 Dec 2011 14:58:26 +0100
Subject: [Python-Dev] Proposed changes to provide compression support
 for rotated log files
References: <loom.20111215T112645-234@post.gmane.org>
Message-ID: <20111215145826.770ddff1@pitrou.net>

On Thu, 15 Dec 2011 10:31:08 +0000 (UTC)
Vinay Sajip <vinay_sajip at yahoo.co.uk> wrote:
> In response to http://bugs.python.org/issue13516 I'm thinking of implementing
> some changes in the rotating file handlers, as outlined here:
> 
> http://plumberjack.blogspot.com/2011/12/improved-flexibility-for-log-file.html
> 
> The changes (including tests) are almost ready to check in, but I thought I'd
> give any one here who's interested a chance to comment, in case they can spot
> any shortcomings of the approach I suggest.

"def filename(self, name)" sounds like a poor method name.

Regards

Antoine.



From victor.stinner at haypocalc.com  Thu Dec 15 15:03:16 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 15 Dec 2011 15:03:16 +0100
Subject: [Python-Dev] Compiling the source without stat
In-Reply-To: <4EE9E11B.6090202@gmail.com>
References: <20111214192629.GA2054@ihaa> <4EE9E11B.6090202@gmail.com>
Message-ID: <4615736.IsbhkMj20n@ned>

Le jeudi 15 d?cembre 2011 15:29:23 vous avez ?crit :
> If faking a stat struct and a function to fill it
> solves the problem, and checking for existing files and folders is the
> only thing that python needs to be compiled (i'm talking about 2.7) then
> it's possible to fail-check it by just trying to open the file.

It's better to only work on Python 3.3. I consider "support platform without 
stat" as a new feature, and new features are only accepted in Python 3.3.

Victor

From tjreedy at udel.edu  Thu Dec 15 19:40:57 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 15 Dec 2011 13:40:57 -0500
Subject: [Python-Dev] Proposed changes to provide compression support
 for rotated log files
In-Reply-To: <loom.20111215T112645-234@post.gmane.org>
References: <loom.20111215T112645-234@post.gmane.org>
Message-ID: <jcdevv$btu$1@dough.gmane.org>

On 12/15/2011 5:31 AM, Vinay Sajip wrote:
> In response to http://bugs.python.org/issue13516 I'm thinking of implementing
> some changes in the rotating file handlers, as outlined here:
>
> http://plumberjack.blogspot.com/2011/12/improved-flexibility-for-log-file.html
>
> The changes (including tests) are almost ready to check in, but I thought I'd
> give any one here who's interested a chance to comment, in case they can spot
> any shortcomings of the approach I suggest.

It appears you are adding two methods to do the same thing. One is to 
subclass and override one or two functions. The other is to define one 
or two custom functions and attach as attributes. Both seem equally 
easy. (Actually, subclassing takes one line less.) Are both really needed?

-- 
Terry Jan Reedy


From vinay_sajip at yahoo.co.uk  Thu Dec 15 19:49:18 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Thu, 15 Dec 2011 18:49:18 +0000 (UTC)
Subject: [Python-Dev] Proposed changes to provide compression support
	for rotated log files
References: <loom.20111215T112645-234@post.gmane.org>
	<20111215145826.770ddff1@pitrou.net>
Message-ID: <loom.20111215T194806-897@post.gmane.org>

Antoine Pitrou <solipsis <at> pitrou.net> writes:

> 
> "def filename(self, name)" sounds like a poor method name.
> 

You're right - perhaps "def rotation_filename(self, default_name)" is better.

Regards,

Vinay Sajip




From vinay_sajip at yahoo.co.uk  Thu Dec 15 19:56:26 2011
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Thu, 15 Dec 2011 18:56:26 +0000 (UTC)
Subject: [Python-Dev] Proposed changes to provide compression support
	for rotated log files
References: <loom.20111215T112645-234@post.gmane.org>
	<jcdevv$btu$1@dough.gmane.org>
Message-ID: <loom.20111215T194933-23@post.gmane.org>

Terry Reedy <tjreedy <at> udel.edu> writes:

> 
> It appears you are adding two methods to do the same thing. One is to 
> subclass and override one or two functions. The other is to define one 
> or two custom functions and attach as attributes. Both seem equally 
> easy. (Actually, subclassing takes one line less.) Are both really needed?
> 

That's why I asked for comments. Subclassing can be avoided if the callable
attributes are used, which is a win, for example, if you have both timed and
non-timed rotating handlers: you can use the same callables in each case,
whereas with subclassing you would have to subclass both the timed and non-timed
handler classes. Also, in scenarios where one might want to use alternative
compression formats based on an application's configuration, there would be less
work because one wouldn't need to create multiple subclasses.

So for most cases the strategy would be to use the callable attributes, and if
they were inappropriate for some reason, they could subclass and override the
methods. I've factored out the two methods from the existing implementation
because at the moment, it's hard to subclass without copying the whole
doRollover method (as in the ActiveState example).

Regards,

Vinay Sajip




From tjreedy at udel.edu  Thu Dec 15 20:06:30 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 15 Dec 2011 14:06:30 -0500
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
Message-ID: <jcdgft$mmb$1@dough.gmane.org>

On 12/15/2011 3:58 AM, anatoly techtonik wrote:

> 1. It is not a proposal, but a defect (well, you may argue, but please,
> don't)

You state a controversial opinion as a fact and then request that others 
not discuss it. To me, this is a somewhat obnoxious hit-and-run tactic. 
If you do not want the point discussed, don't bring it up.

Anyway, I will follow your request and not argue. Since that opinion is 
a central point, not discussing it does not leave much to say.

-- 
Terry Jan Reedy


From victor.stinner at haypocalc.com  Thu Dec 15 20:45:42 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 15 Dec 2011 20:45:42 +0100
Subject: [Python-Dev] French sprint this week-end
Message-ID: <4EEA4E66.3040008@haypocalc.com>

Hi,

I organize an online sprint on CPython this week-end with french 
developers. At least six developers will participe, some of them don't 
know C, most know Python.

Do you know simple task to start contributing to Python? Something 
useful and not boring if possible :-) There is the "easy" tag on the bug 
tracker, but many issues have a long history, already have a patch, etc. 
Do know other generic task like improving code coverage or support of 
some rare platforms?

Eric Araujo, Antoine Pitrou and Charles Fran?ois Natali should help me, 
so I'm not alone to organize the sprint.

Don't watch the buildbot until Monday. You can expect more activity on 
our bug tracker (and maybe on the #python-dev channel) ;-)

--

If you speak french, join #python-dev-fr IRC channel (on Freenode) and 
see the wiki page http://wiki.python.org/moin/SprintFranceDec2011

Victor

From nadeem.vawda at gmail.com  Thu Dec 15 22:07:55 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Thu, 15 Dec 2011 23:07:55 +0200
Subject: [Python-Dev] [Python-checkins] cpython: input() in this sense
	is gone
In-Reply-To: <E1RbIA9-0006Oo-Ql@dinsdale.python.org>
References: <E1RbIA9-0006Oo-Ql@dinsdale.python.org>
Message-ID: <CANF4RM=Oq+x+SEykSHaZ+pt+AEHN-f2m4hu4M_1iq19z0Pvv=Q@mail.gmail.com>

On Thu, Dec 15, 2011 at 10:44 PM, benjamin.peterson
<python-checkins at python.org> wrote:
> +# ? ? ? eval_input is the input for the eval() functions.

Shouldn't this be "function" rather than "functions"?

From mark at hotpy.org  Thu Dec 15 23:18:18 2011
From: mark at hotpy.org (Mark Shannon)
Date: Thu, 15 Dec 2011 22:18:18 +0000
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
Message-ID: <4EEA722A.10403@hotpy.org>

Hi all,

The current dict implementation is getting pretty old,
isn't it time we had a new one (for xmas)?

I have a new dict implementation which allows sharing of keys between 
objects of the same class.
You can check it out here:
http://bitbucket.org/markshannon/hotpy_new_dict

Performance:

For numerical applications, with few instances of user-defined classes,
performance is pretty much unchanged, degrading about 1% for pystones.

For applications that create lots of instances of user-defined classes,
performance is improved and memory savings are large.

For the gcbench benchmark (from unladen swallow),
cpython with the new dict is about 9% faster and, more importantly,
reduces memory use from 99 Mbytes to 61Mbytes (a 38% reduction).

All tests were done on my ancient 32 bit intel linux  machine,
please try it out on your machines and let me know what sort of results 
you get.

By the way it passes all the tests,
but there are strange interactions with weakrefs and the GC.
(Try running the tests, you'll see what I mean)


Cheers,
Mark

From solipsis at pitrou.net  Fri Dec 16 00:15:16 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 16 Dec 2011 00:15:16 +0100
Subject: [Python-Dev] A new dict for Xmas?
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
	<4EEA722A.10403@hotpy.org>
Message-ID: <20111216001516.3698109e@pitrou.net>

On Thu, 15 Dec 2011 22:18:18 +0000
Mark Shannon <mark at hotpy.org> wrote:
> 
> For the gcbench benchmark (from unladen swallow),
> cpython with the new dict is about 9% faster and, more importantly,
> reduces memory use from 99 Mbytes to 61Mbytes (a 38% reduction).
> 
> All tests were done on my ancient 32 bit intel linux  machine,
> please try it out on your machines and let me know what sort of results 
> you get.

Benchmark results under a Core i5, 64-bit Linux:

Report on Linux localhost.localdomain 2.6.38.8-desktop-8.mga #1 SMP Fri
Nov 4 00:05:53 UTC 2011 x86_64 x86_64 Total CPU cores: 4

### call_method ###
Min: 0.292352 -> 0.274041: 1.07x faster
Avg: 0.292978 -> 0.277124: 1.06x faster
Significant (t=17.31)
Stddev: 0.00053 -> 0.00351: 6.5719x larger

### call_method_slots ###
Min: 0.284101 -> 0.273508: 1.04x faster
Avg: 0.285029 -> 0.274534: 1.04x faster
Significant (t=26.86)
Stddev: 0.00068 -> 0.00135: 1.9969x larger

### call_simple ###
Min: 0.225191 -> 0.222104: 1.01x faster
Avg: 0.227443 -> 0.222776: 1.02x faster
Significant (t=9.53)
Stddev: 0.00181 -> 0.00056: 3.2266x smaller

### fastpickle ###
Min: 0.482402 -> 0.493695: 1.02x slower
Avg: 0.486077 -> 0.496568: 1.02x slower
Significant (t=-5.35)
Stddev: 0.00340 -> 0.00276: 1.2335x smaller

### fastunpickle ###
Min: 0.394846 -> 0.433733: 1.10x slower
Avg: 0.397362 -> 0.436318: 1.10x slower
Significant (t=-23.73)
Stddev: 0.00234 -> 0.00283: 1.2129x larger

### float ###
Min: 0.052567 -> 0.051377: 1.02x faster
Avg: 0.053812 -> 0.052669: 1.02x faster
Significant (t=3.72)
Stddev: 0.00110 -> 0.00107: 1.0203x smaller

### json_dump ###
Min: 0.381395 -> 0.391053: 1.03x slower
Avg: 0.381937 -> 0.393219: 1.03x slower
Significant (t=-7.15)
Stddev: 0.00043 -> 0.00350: 8.1447x larger

### json_load ###
Min: 0.347112 -> 0.369763: 1.07x slower
Avg: 0.347490 -> 0.370317: 1.07x slower
Significant (t=-69.64)
Stddev: 0.00045 -> 0.00058: 1.2717x larger

### nbody ###
Min: 0.238068 -> 0.219208: 1.09x faster
Avg: 0.238951 -> 0.220000: 1.09x faster
Significant (t=36.09)
Stddev: 0.00076 -> 0.00090: 1.1863x larger

### nqueens ###
Min: 0.262282 -> 0.252576: 1.04x faster
Avg: 0.263835 -> 0.254497: 1.04x faster
Significant (t=7.12)
Stddev: 0.00117 -> 0.00269: 2.2914x larger

### regex_effbot ###
Min: 0.060298 -> 0.057791: 1.04x faster
Avg: 0.060435 -> 0.058128: 1.04x faster
Significant (t=17.82)
Stddev: 0.00012 -> 0.00026: 2.1761x larger

### richards ###
Min: 0.148266 -> 0.143755: 1.03x faster
Avg: 0.150677 -> 0.145003: 1.04x faster
Significant (t=5.74)
Stddev: 0.00200 -> 0.00094: 2.1329x smaller

### silent_logging ###
Min: 0.057191 -> 0.059082: 1.03x slower
Avg: 0.057335 -> 0.059194: 1.03x slower
Significant (t=-17.40)
Stddev: 0.00020 -> 0.00013: 1.4948x smaller

### unpack_sequence ###
Min: 0.000046 -> 0.000042: 1.10x faster
Avg: 0.000048 -> 0.000044: 1.09x faster
Significant (t=128.98)
Stddev: 0.00000 -> 0.00000: 1.8933x smaller


gcbench first showed no memory consumption difference (using "ps -u").
I then removed the "stretch tree" (which apparently reserves memory
upfront) and I saw a ~30% memory saving as well as a 20% performance
improvement on large sizes.

Regards

Antoine.



From martin at v.loewis.de  Fri Dec 16 00:16:29 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 16 Dec 2011 00:16:29 +0100
Subject: [Python-Dev] Compiling the source without stat
In-Reply-To: <4EE9E11B.6090202@gmail.com>
References: <20111214192629.GA2054@ihaa> <4EE9E11B.6090202@gmail.com>
Message-ID: <4EEA7FCD.3060805@v.loewis.de>

Am 15.12.2011 12:59, schrieb Hossein:
> I wanted to say something in the bug page petri showed (
> http://bugs.python.org/issue12082 ) however I though about first
> discussing it here. If faking a stat struct and a function to fill it
> solves the problem, and checking for existing files and folders is the
> only thing that python needs to be compiled (i'm talking about 2.7) then
> it's possible to fail-check it by just trying to open the file.

That's not true. It also looks at the file modification time.

Regards,
Martin

From ncoghlan at gmail.com  Fri Dec 16 00:18:16 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 16 Dec 2011 09:18:16 +1000
Subject: [Python-Dev] [Python-checkins] cpython: improve abstract
 property support (closes #11610)
In-Reply-To: <E1RbI0I-0005Iw-A0@dinsdale.python.org>
References: <E1RbI0I-0005Iw-A0@dinsdale.python.org>
Message-ID: <CADiSq7dWDtS1BacWSDmHos89Y5Gr0_rOvpNajOa_HkwmtjdV2g@mail.gmail.com>

On Fri, Dec 16, 2011 at 6:34 AM, benjamin.peterson
<python-checkins at python.org> wrote:
> +abc
> +---
> +
> +Improved support for abstract base classes containing descriptors composed with
> +abstract methods. The recommended approach to declaring abstract descriptors is
> +now to provide :attr:`__isabstractmethod__` as a dynamically updated
> +property. The built-in descriptors have been updated accordingly.
> +
> + ?* :class:`abc.abstractproperty` has been deprecated, use :class:`property`
> + ? ?with :func:`abc.abstractmethod` instead.
> + ?* :class:`abc.abstractclassmethod` has been deprecated, use
> + ? ?:class:`classmethod` with :func:`abc.abstractmethod` instead.
> + ?* :class:`abc.abstractstaticmethod` has been deprecated, use
> + ? ?:class:`property` with :func:`abc.abstractmethod` instead.
> +
> +(Contributed by Darren Dale in :issue:`11610`)

s/property/staticmethod/ in the final bullet point here.

Cheers,
Nick.
-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From mark at hotpy.org  Fri Dec 16 00:43:35 2011
From: mark at hotpy.org (Mark Shannon)
Date: Thu, 15 Dec 2011 23:43:35 +0000
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <20111216001516.3698109e@pitrou.net>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>	<4EEA722A.10403@hotpy.org>
	<20111216001516.3698109e@pitrou.net>
Message-ID: <4EEA8627.5030500@hotpy.org>

Antoine Pitrou wrote:
> On Thu, 15 Dec 2011 22:18:18 +0000
> Mark Shannon <mark at hotpy.org> wrote:
>> For the gcbench benchmark (from unladen swallow),
>> cpython with the new dict is about 9% faster and, more importantly,
>> reduces memory use from 99 Mbytes to 61Mbytes (a 38% reduction).
>>
>> All tests were done on my ancient 32 bit intel linux  machine,
>> please try it out on your machines and let me know what sort of results 
>> you get.
> 
> Benchmark results under a Core i5, 64-bit Linux:
> 
> Report on Linux localhost.localdomain 2.6.38.8-desktop-8.mga #1 SMP Fri
> Nov 4 00:05:53 UTC 2011 x86_64 x86_64 Total CPU cores: 4
> 
> ### call_method ###
> Min: 0.292352 -> 0.274041: 1.07x faster
> Avg: 0.292978 -> 0.277124: 1.06x faster
> Significant (t=17.31)
> Stddev: 0.00053 -> 0.00351: 6.5719x larger
> 
> ### call_method_slots ###
> Min: 0.284101 -> 0.273508: 1.04x faster
> Avg: 0.285029 -> 0.274534: 1.04x faster
> Significant (t=26.86)
> Stddev: 0.00068 -> 0.00135: 1.9969x larger
> 
> ### call_simple ###
> Min: 0.225191 -> 0.222104: 1.01x faster
> Avg: 0.227443 -> 0.222776: 1.02x faster
> Significant (t=9.53)
> Stddev: 0.00181 -> 0.00056: 3.2266x smaller
> 
> ### fastpickle ###
> Min: 0.482402 -> 0.493695: 1.02x slower
> Avg: 0.486077 -> 0.496568: 1.02x slower
> Significant (t=-5.35)
> Stddev: 0.00340 -> 0.00276: 1.2335x smaller
> 
> ### fastunpickle ###
> Min: 0.394846 -> 0.433733: 1.10x slower
> Avg: 0.397362 -> 0.436318: 1.10x slower
> Significant (t=-23.73)
> Stddev: 0.00234 -> 0.00283: 1.2129x larger
> 
> ### float ###
> Min: 0.052567 -> 0.051377: 1.02x faster
> Avg: 0.053812 -> 0.052669: 1.02x faster
> Significant (t=3.72)
> Stddev: 0.00110 -> 0.00107: 1.0203x smaller
> 
> ### json_dump ###
> Min: 0.381395 -> 0.391053: 1.03x slower
> Avg: 0.381937 -> 0.393219: 1.03x slower
> Significant (t=-7.15)
> Stddev: 0.00043 -> 0.00350: 8.1447x larger
> 
> ### json_load ###
> Min: 0.347112 -> 0.369763: 1.07x slower
> Avg: 0.347490 -> 0.370317: 1.07x slower
> Significant (t=-69.64)
> Stddev: 0.00045 -> 0.00058: 1.2717x larger
> 
> ### nbody ###
> Min: 0.238068 -> 0.219208: 1.09x faster
> Avg: 0.238951 -> 0.220000: 1.09x faster
> Significant (t=36.09)
> Stddev: 0.00076 -> 0.00090: 1.1863x larger
> 
> ### nqueens ###
> Min: 0.262282 -> 0.252576: 1.04x faster
> Avg: 0.263835 -> 0.254497: 1.04x faster
> Significant (t=7.12)
> Stddev: 0.00117 -> 0.00269: 2.2914x larger
> 
> ### regex_effbot ###
> Min: 0.060298 -> 0.057791: 1.04x faster
> Avg: 0.060435 -> 0.058128: 1.04x faster
> Significant (t=17.82)
> Stddev: 0.00012 -> 0.00026: 2.1761x larger
> 
> ### richards ###
> Min: 0.148266 -> 0.143755: 1.03x faster
> Avg: 0.150677 -> 0.145003: 1.04x faster
> Significant (t=5.74)
> Stddev: 0.00200 -> 0.00094: 2.1329x smaller
> 
> ### silent_logging ###
> Min: 0.057191 -> 0.059082: 1.03x slower
> Avg: 0.057335 -> 0.059194: 1.03x slower
> Significant (t=-17.40)
> Stddev: 0.00020 -> 0.00013: 1.4948x smaller
> 
> ### unpack_sequence ###
> Min: 0.000046 -> 0.000042: 1.10x faster
> Avg: 0.000048 -> 0.000044: 1.09x faster
> Significant (t=128.98)
> Stddev: 0.00000 -> 0.00000: 1.8933x smaller

Thanks for running the benchmarks.
It's probably best not to attach to much significance to
a few percent her and there, but its good to see that performance is OK.

> 
> 
> gcbench first showed no memory consumption difference (using "ps -u").
> I then removed the "stretch tree" (which apparently reserves memory
> upfront) and I saw a ~30% memory saving as well as a 20% performance
> improvement on large sizes.

I should say how I did my memory tests.
I did a search using ulimit to limit the maximum amount of memory the 
process was allowed. The given numbers were the minimum required to 
complete, I did not remove the "stretch tree".

Cheers,
Mark.

From ron3200 at gmail.com  Fri Dec 16 05:15:53 2011
From: ron3200 at gmail.com (Ron Adam)
Date: Thu, 15 Dec 2011 22:15:53 -0600
Subject: [Python-Dev] generators and ceval
Message-ID: <1324008953.18721.18.camel@Gutsy>


Hi,  I Just added issue 13607 with a patch that removes the generator
specific checks and code out of the ceval PyEval_EvalFrameEx() function.

Those parts where moved up into the generator gen_send_ex() function.

Doing that removed the generator flag checks from the eval loop and made
it a bit cleaner.  In order to do that, I needed to give generators a
why to look at the 'why' value.  Doing that also cleaned up the code in
gen_sendex() as it can use the 'why' in a select instead of several
indirect if tests.

http://bugs.python.org/issue13607

Altogether it made yields about 10% faster, and everything else about
2%-3% faster (on average).   But it does need to be checked.

Cheers,
   Ron



From greg.ewing at canterbury.ac.nz  Fri Dec 16 06:57:06 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 Dec 2011 18:57:06 +1300
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EEA722A.10403@hotpy.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
	<4EEA722A.10403@hotpy.org>
Message-ID: <4EEADDB2.2020202@canterbury.ac.nz>

Mark Shannon wrote:

> I have a new dict implementation which allows sharing of keys between 
> objects of the same class.

We already have the __slots__ mechanism for memory savings.
Have you done any comparisons with that?

Seems to me that __slots__ ought to save even more memory,
since it eliminates the per-instance dict altogether rather
than just the keys half of it.

-- 
Greg

From stefan_ml at behnel.de  Fri Dec 16 07:53:09 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 16 Dec 2011 07:53:09 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jbsfar$en7$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>
Message-ID: <jcepsn$jpi$1@dough.gmane.org>

Stefan Behnel, 09.12.2011 09:02:
> I think Py3.3 would be a good milestone for cleaning up the stdlib support
> for XML.
> [...]

I still think it is, so let me sum up the current discussion here.


> What should change?
>
> a) The stdlib documentation should help users to choose the right tool
> right from the start.

It looks like there's agreement on this part.


> Instead of using the totally misleading wording that
> it uses now, it should be honest about the performance characteristics of
> MiniDOM and should actively suggest that those who don't know what to
> choose (or even *that* they can choose) should not use MiniDOM in the first
> place.

There was some disagreement on whether MiniDOM should publicly disclose its 
performance characteristics in the documentation, and whether its use 
should be discouraged, even just for new users.

However, it seemed that there was enough consensus to settle on Nick 
Coghlan's proposal for a compromise to move ElementTree up to the top of 
the list, and to add a visible note to the top of each of the XML modules 
like this:

"Note: The
<whatever> module is a <yada, yada, DOM based, whatever>. If all you
are trying to do is read and write XML files, consider using the
xml.etree.ElementTree module instead"

That template could (with a bit of peaking into the getopt documentation) 
be expanded into the following.

"""
[[Note: The xml.dom.minidom module provides an implementation of the 
W3C-DOM whose API is similar to that in other programming languages. Users 
who are unfamiliar with the W3C-DOM interface or who would like to write 
less code for processing XML files should consider using the 
xml.etree.ElementTree module instead.]]
"""

I think this should go on the xml.dom.minidom page as well as the xml.dom 
package page. Hand-wavingly, users who are new to the DOM are more likely 
to hit the package page first, whereas those who know it already will 
likely find the MiniDOM page directly.

Note that I'd still encourage the removal of the misleading word 
"lightweight" until it makes sense to put it back in a meaningful way. I 
therefore propose the following minimalistic changes to the first paragraph 
on the minidom page:

"""
xml.dom.minidom is a [-XXX: light-weight] implementation of the Document 
Object Model interface. It is intended to be simpler than the full DOM and 
also [+XXX: provide a] significantly smaller [+XXX: API].
"""

@Martin: note how the original paragraph does not refer to "4DOM" or 
"PyXML". It only generically mentions "the DOM interface". It is certainly 
not true that MiniDOM is more "light-weight" and "significantly smaller" 
than (most) other DOM interface implementations outside of the Python 
world, for example. So the current wording actually makes no sense at all.

Additionally, the documentation on the xml.sax page would benefit from the 
following paragraph:

"""
[[Note: The xml.sax package provides an implementation of the SAX interface 
whose API is similar to that in other programming languages. Users who are 
unfamiliar with the SAX interface or who would like to write less code for 
efficient stream processing of XML files should consider using the 
iterparse() function in the xml.etree.ElementTree module instead.]]
"""

If these changes are considered acceptable, I'll copy the above over to the 
documentation bug I opened at

http://bugs.python.org/issue11379

Can these doc changes go into both 2.7 and 3.3? Given that there is no 
important difference between the implementations, I don't see why the 
documentation should differ in Py2.


> b) cElementTree should finally loose it's "special" status as a separate
> library and disappear as an accelerator module behind ElementTree.

There was no opposition and a general agreement on this in the thread, 
except for the warning that Fredrik Lundh should have a word in this. I 
wrote him an e-mail and didn't get a response so far. We can wait a little 
longer, I guess, there's still time before 3.3beta.

Stefan


From ncoghlan at gmail.com  Fri Dec 16 09:54:17 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 16 Dec 2011 18:54:17 +1000
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jcepsn$jpi$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>
	<jcepsn$jpi$1@dough.gmane.org>
Message-ID: <CADiSq7e8fqms7SrjP-rYb4B3=PKhTzLW1iAUPv5hP8RNsSAsdA@mail.gmail.com>

On Fri, Dec 16, 2011 at 4:53 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> If these changes are considered acceptable, I'll copy the above over to the
> documentation bug I opened at
>
> http://bugs.python.org/issue11379
>
> Can these doc changes go into both 2.7 and 3.3? Given that there is no
> important difference between the implementations, I don't see why the
> documentation should differ in Py2.

Your suggested tweaks look good to me and could go into all of 2.7, 3.2 and 3.3

>> b) cElementTree should finally loose it's "special" status as a separate
>> library and disappear as an accelerator module behind ElementTree.
>
> There was no opposition and a general agreement on this in the thread,
> except for the warning that Fredrik Lundh should have a word in this. I
> wrote him an e-mail and didn't get a response so far. We can wait a little
> longer, I guess, there's still time before 3.3beta.

Having ElementTree implicitly do "from _elementtree import *" is a 3.3
only change, though. (Note that xml.etree.cElementTree isn't the
actual acceleration module - that honor already goes to
"_elementtree". The only bit missing is the automatic import in
xml.etree.ElementTree and the appropriate test updates to ensure the
Python version still gets tested)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From stefan at bytereef.org  Fri Dec 16 10:00:29 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Fri, 16 Dec 2011 10:00:29 +0100
Subject: [Python-Dev] French sprint this week-end
In-Reply-To: <4EEA4E66.3040008@haypocalc.com>
References: <4EEA4E66.3040008@haypocalc.com>
Message-ID: <20111216090029.GA30463@sleipnir.bytereef.org>

Victor Stinner <victor.stinner at haypocalc.com> wrote:
> Do you know simple task to start contributing to Python? Something  
> useful and not boring if possible :-) There is the "easy" tag on the bug  
> tracker, but many issues have a long history, already have a patch, etc.  
> Do know other generic task like improving code coverage or support of  
> some rare platforms?

On some buildbots compiler warnings are starting to accumulate. Installing
a recent version of gcc and fixing those might be a good task. If the
participants are new to buildbot, it might even be interesting for them. :)


Stefan Krah



From eliben at gmail.com  Fri Dec 16 10:17:33 2011
From: eliben at gmail.com (Eli Bendersky)
Date: Fri, 16 Dec 2011 11:17:33 +0200
Subject: [Python-Dev] French sprint this week-end
In-Reply-To: <20111216090029.GA30463@sleipnir.bytereef.org>
References: <4EEA4E66.3040008@haypocalc.com>
	<20111216090029.GA30463@sleipnir.bytereef.org>
Message-ID: <CAF-Rda8fr8d25qeMk--Fg92uQz=esYTXRUGk5ofSP5SZB6F72A@mail.gmail.com>

On Fri, Dec 16, 2011 at 11:00, Stefan Krah <stefan at bytereef.org> wrote:

> Victor Stinner <victor.stinner at haypocalc.com> wrote:
> > Do you know simple task to start contributing to Python? Something
> > useful and not boring if possible :-) There is the "easy" tag on the bug
> > tracker, but many issues have a long history, already have a patch, etc.
> > Do know other generic task like improving code coverage or support of
> > some rare platforms?
>
> On some buildbots compiler warnings are starting to accumulate. Installing
> a recent version of gcc and fixing those might be a good task. If the
> participants are new to buildbot, it might even be interesting for them. :)
>
>
Do we have buildbots that build Python with Clang instead of GCC? The
reason I'm asking is that Clang's diagnostics are usually better, and
fixing all its warnings could nicely complement fixing GCC's qualms.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111216/982f2aa8/attachment.html>

From dirkjan at ochtman.nl  Fri Dec 16 10:32:11 2011
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Fri, 16 Dec 2011 10:32:11 +0100
Subject: [Python-Dev] French sprint this week-end
In-Reply-To: <CAF-Rda8fr8d25qeMk--Fg92uQz=esYTXRUGk5ofSP5SZB6F72A@mail.gmail.com>
References: <4EEA4E66.3040008@haypocalc.com>
	<20111216090029.GA30463@sleipnir.bytereef.org>
	<CAF-Rda8fr8d25qeMk--Fg92uQz=esYTXRUGk5ofSP5SZB6F72A@mail.gmail.com>
Message-ID: <CAKmKYaA1exex9MRiuaCM=qoZbq-V-Oi57mRVwuziRkypCk=Q0w@mail.gmail.com>

On Fri, Dec 16, 2011 at 10:17, Eli Bendersky <eliben at gmail.com> wrote:
> Do we have buildbots that build Python with Clang instead of GCC? The reason
> I'm asking is that Clang's diagnostics are usually better, and fixing all
> its warnings could nicely complement fixing GCC's qualms.

The box running my buildslave has clang installed, so someone with
access to the buildmaster could probably set that up without too much
trouble.

Cheers,

Dirkjan

From mark at hotpy.org  Fri Dec 16 11:03:30 2011
From: mark at hotpy.org (Mark Shannon)
Date: Fri, 16 Dec 2011 10:03:30 +0000
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EEADDB2.2020202@canterbury.ac.nz>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>	<4EEA722A.10403@hotpy.org>
	<4EEADDB2.2020202@canterbury.ac.nz>
Message-ID: <4EEB1772.1030300@hotpy.org>

Greg Ewing wrote:
> Mark Shannon wrote:
> 
>> I have a new dict implementation which allows sharing of keys between 
>> objects of the same class.
> 
> We already have the __slots__ mechanism for memory savings.
> Have you done any comparisons with that?
> 

You can't make Python programmers use slots, neither can you
automatically change existing programs.

Are you suggesting that because the __slots__ mechanism exists,
the dict implementation doesn't have to be efficient?

> Seems to me that __slots__ ought to save even more memory,
> since it eliminates the per-instance dict altogether rather
> than just the keys half of it.
> 

Of course using __slots__ saves more memory,
but people don't use them much.

Cheers,
Mark.


From stefan_ml at behnel.de  Fri Dec 16 17:00:26 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 16 Dec 2011 17:00:26 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jcau5m$89n$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org>	<4EE1C9AB.2040301@v.loewis.de>	<jbsile$4vu$1@dough.gmane.org>	<4EE53139.8020500@v.loewis.de>	<jc4g2m$5hn$1@dough.gmane.org>	<4EE8E784.2050406@v.loewis.de>
	<jcau5m$89n$1@dough.gmane.org>
Message-ID: <jcfpur$lpd$1@dough.gmane.org>

Stefan Behnel, 14.12.2011 20:41:
> It's clear from the
> discussion that there are still users and that new code is still being
> written that uses MiniDOM. However, I would argue that this cannot possibly
> be performance critical code and that it only deals with somewhat small
> documents. I say that because MiniDOM is evidently not suitable for large
> documents or performance critical applications, so this is the only
> explanation I have why the performance problems would not be obvious in the
> cases where it is still being used. And if they do show, it appears to be
> much more likely that users rewrite their code using ElementTree or lxml
> than that they try to fix MiniDOM's performance issues.

Out of curiosity, I reran my benchmarks under PyPy 1.7.

http://blog.behnel.de/index.php?p=210

In short: MiniDOM performs substantially better there, both in terms of 
time and space. That by itself doesn't make PyPy an interesting platform 
for XML processing (using lxml in CPython is way faster), but I found it 
interesting to note that the problem is not strictly inherent in MiniDOM. 
It also depends a lot on the runtime environment, even when it comes to 
memory usage.

Stefan


From devel at baptiste-carvello.net  Fri Dec 16 17:40:02 2011
From: devel at baptiste-carvello.net (Baptiste Carvello)
Date: Fri, 16 Dec 2011 17:40:02 +0100
Subject: [Python-Dev] Fixing the XML batteries
In-Reply-To: <jcepsn$jpi$1@dough.gmane.org>
References: <jbsfar$en7$1@dough.gmane.org> <jcepsn$jpi$1@dough.gmane.org>
Message-ID: <jcfs8p$7ar$1@dough.gmane.org>

Le 16/12/2011 07:53, Stefan Behnel a ?crit :

> Additionally, the documentation on the xml.sax page would benefit from
> the following paragraph:
> 
> """
> [[Note: The xml.sax package provides an implementation of the SAX
> interface whose API is similar to that in other programming languages.
> Users who are unfamiliar with the SAX interface or who would like to
> write less code for efficient stream processing of XML files should
> consider using the iterparse() function in the xml.etree.ElementTree
> module instead.]]
> """
> 

A small caveat to note about iterparse(), which I otherwise like a lot:
when processing very big data (I encountered this with a region-wide
openstreetmap XML dump), you have to remove the processed nodes from the
root element. Otherwise, its memory footprint increases with the size of
the document.


From status at bugs.python.org  Fri Dec 16 18:07:29 2011
From: status at bugs.python.org (Python tracker)
Date: Fri, 16 Dec 2011 18:07:29 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20111216170729.974CD1DEC6@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-12-09 - 2011-12-16)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3175 ( +6)
  closed 22220 (+40)
  total  25395 (+46)

Open issues with patches: 1360 


Issues opened (31)
==================

#11886: test_time.test_tzset() fails on "x86 FreeBSD 7.2 3.x": AEST ti
http://bugs.python.org/issue11886  reopened by haypo

#13571: Backup files support in IDLE
http://bugs.python.org/issue13571  opened by maniram.maniram

#13572: import _curses fails because of UnicodeDecodeError('utf8' code
http://bugs.python.org/issue13572  opened by haypo

#13573: csv.writer uses str() for floats instead of repr()
http://bugs.python.org/issue13573  opened by rhettinger

#13574: refresh example in doc for Extending and Embedding
http://bugs.python.org/issue13574  opened by flox

#13576: Handling of broken condcoms in HTMLParser
http://bugs.python.org/issue13576  opened by ezio.melotti

#13577: __qualname__ is not present on builtin methods and functions
http://bugs.python.org/issue13577  opened by meador.inge

#13578: Add subprocess.iter_output() convenience function
http://bugs.python.org/issue13578  opened by ncoghlan

#13579: string.Formatter doesn't understand the !a conversion specifie
http://bugs.python.org/issue13579  opened by ncoghlan

#13581: help() appears to be broken; doesn't display __doc__ for class
http://bugs.python.org/issue13581  opened by christopherthemagnificent

#13582: IDLE and pythonw.exe stderr problem
http://bugs.python.org/issue13582  opened by serwy

#13583: sqlite3.Row doesn't support slice indexes
http://bugs.python.org/issue13583  opened by xapple

#13585: Add contextlib.CleanupManager
http://bugs.python.org/issue13585  opened by Nikratio

#13586: Replace selected not working/consistent with find
http://bugs.python.org/issue13586  opened by marco

#13587: Correcting the typos error in Doc/howto/urllib2.rst
http://bugs.python.org/issue13587  opened by Bithin.A

#13588: Change name of internal closure functions in importlib
http://bugs.python.org/issue13588  opened by brett.cannon

#13589: Aifc low level serialization primitives fix
http://bugs.python.org/issue13589  opened by Oleg.Plakhotnyuk

#13590: Prebuilt python-2.7.2 binaries for macosx can not compile c ex
http://bugs.python.org/issue13590  opened by teamnoir

#13592: repr(regex) doesn't include actual regex
http://bugs.python.org/issue13592  opened by dwt

#13594: Aifc markers write fix
http://bugs.python.org/issue13594  opened by Oleg.Plakhotnyuk

#13598: string.Formatter doesn't support empty curly braces "{}"
http://bugs.python.org/issue13598  opened by maniram.maniram

#13601: sys.stderr should be unbuffered (or always line-buffered)
http://bugs.python.org/issue13601  opened by pitrou

#13604: update PEP 393 (match implementation)
http://bugs.python.org/issue13604  opened by Jim.Jewett

#13605: document argparse's nargs=REMAINDER
http://bugs.python.org/issue13605  opened by bethard

#13607: Move generator specific sections out of ceval.
http://bugs.python.org/issue13607  opened by ron_adam

#13608: remove born-deprecated PyUnicode_AsUnicodeAndSize
http://bugs.python.org/issue13608  opened by Jim.Jewett

#13609: Add "os.get_terminal_size()" function
http://bugs.python.org/issue13609  opened by denilsonsa

#13610: On Python parsing numbers.
http://bugs.python.org/issue13610  opened by Jean-Michel.Fauth

#13611: Integrate ElementC14N module into xml.etree package
http://bugs.python.org/issue13611  opened by scoder

#13612: xml.etree.ElementTree says unknown encoding of a regular encod
http://bugs.python.org/issue13612  opened by dongying

#13613: Small error in regular expression poker hand example
http://bugs.python.org/issue13613  opened by Eddie E



Most recent 15 issues with no replies (15)
==========================================

#13611: Integrate ElementC14N module into xml.etree package
http://bugs.python.org/issue13611

#13608: remove born-deprecated PyUnicode_AsUnicodeAndSize
http://bugs.python.org/issue13608

#13605: document argparse's nargs=REMAINDER
http://bugs.python.org/issue13605

#13594: Aifc markers write fix
http://bugs.python.org/issue13594

#13590: Prebuilt python-2.7.2 binaries for macosx can not compile c ex
http://bugs.python.org/issue13590

#13587: Correcting the typos error in Doc/howto/urllib2.rst
http://bugs.python.org/issue13587

#13586: Replace selected not working/consistent with find
http://bugs.python.org/issue13586

#13583: sqlite3.Row doesn't support slice indexes
http://bugs.python.org/issue13583

#13579: string.Formatter doesn't understand the !a conversion specifie
http://bugs.python.org/issue13579

#13576: Handling of broken condcoms in HTMLParser
http://bugs.python.org/issue13576

#13574: refresh example in doc for Extending and Embedding
http://bugs.python.org/issue13574

#13565: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Le
http://bugs.python.org/issue13565

#13556: When tzinfo.utcoffset is out-of-bounds, the exception message 
http://bugs.python.org/issue13556

#13554: Tkinter doesn't use higher resolution app icon
http://bugs.python.org/issue13554

#13553: Tkinter doesn't set proper application name
http://bugs.python.org/issue13553



Most recent 15 issues waiting for review (15)
=============================================

#13613: Small error in regular expression poker hand example
http://bugs.python.org/issue13613

#13609: Add "os.get_terminal_size()" function
http://bugs.python.org/issue13609

#13607: Move generator specific sections out of ceval.
http://bugs.python.org/issue13607

#13604: update PEP 393 (match implementation)
http://bugs.python.org/issue13604

#13598: string.Formatter doesn't support empty curly braces "{}"
http://bugs.python.org/issue13598

#13594: Aifc markers write fix
http://bugs.python.org/issue13594

#13589: Aifc low level serialization primitives fix
http://bugs.python.org/issue13589

#13588: Change name of internal closure functions in importlib
http://bugs.python.org/issue13588

#13585: Add contextlib.CleanupManager
http://bugs.python.org/issue13585

#13583: sqlite3.Row doesn't support slice indexes
http://bugs.python.org/issue13583

#13582: IDLE and pythonw.exe stderr problem
http://bugs.python.org/issue13582

#13577: __qualname__ is not present on builtin methods and functions
http://bugs.python.org/issue13577

#13576: Handling of broken condcoms in HTMLParser
http://bugs.python.org/issue13576

#13567: HTTPError interface changes / breaks depending on what was pas
http://bugs.python.org/issue13567

#13564: ftplib and sendfile()
http://bugs.python.org/issue13564



Top 10 most discussed issues (10)
=================================

#13521: Make dict.setdefault() atomic
http://bugs.python.org/issue13521  18 msgs

#13585: Add contextlib.CleanupManager
http://bugs.python.org/issue13585  17 msgs

#13577: __qualname__ is not present on builtin methods and functions
http://bugs.python.org/issue13577  14 msgs

#13405: Add DTrace probes
http://bugs.python.org/issue13405  13 msgs

#13609: Add "os.get_terminal_size()" function
http://bugs.python.org/issue13609   9 msgs

#13516: Gzip old log files in rotating handlers
http://bugs.python.org/issue13516   8 msgs

#13592: repr(regex) doesn't include actual regex
http://bugs.python.org/issue13592   8 msgs

#13248: deprecated in 3.2, should be removed in 3.3
http://bugs.python.org/issue13248   7 msgs

#13604: update PEP 393 (match implementation)
http://bugs.python.org/issue13604   7 msgs

#1559549: ImportError needs attributes for module and file name
http://bugs.python.org/issue1559549   7 msgs



Issues closed (37)
==================

#2979: use_builtin_types in xmlrpc.server
http://bugs.python.org/issue2979  closed by python-dev

#4028: Problem compiling the multiprocessing module on sunos5
http://bugs.python.org/issue4028  closed by neologix

#4625: IDLE won't open anymore, .idlerc unaccessible
http://bugs.python.org/issue4625  closed by ned.deily

#6570: Tutorial clarity: section 4.7.2, parameters and arguments
http://bugs.python.org/issue6570  closed by ezio.melotti

#6695: PyXXX_ClearFreeList for dict, set, and list
http://bugs.python.org/issue6695  closed by pitrou

#8373: socket: AF_UNIX socket paths not handled according to PEP 383
http://bugs.python.org/issue8373  closed by pitrou

#8684: improvements to sched.py
http://bugs.python.org/issue8684  closed by giampaolo.rodola

#9404: IDLE won't launch on XP
http://bugs.python.org/issue9404  closed by ned.deily

#10350: errno is read too late
http://bugs.python.org/issue10350  closed by pitrou

#10364: IDLE: make .py default added extension on save
http://bugs.python.org/issue10364  closed by terry.reedy

#13449: sched - provide an "async" argument for run() method
http://bugs.python.org/issue13449  closed by giampaolo.rodola

#13479: pickle too picky on re-defined classes
http://bugs.python.org/issue13479  closed by gvanrossum

#13505: Bytes objects pickled in 3.x with protocol <=2 are unpickled i
http://bugs.python.org/issue13505  closed by alexandre.vassalotti

#13528: Rework performance FAQ
http://bugs.python.org/issue13528  closed by pitrou

#13543: shlex with string ending in space gives "ValueError: No closin
http://bugs.python.org/issue13543  closed by ekorn

#13544: Add __qualname__ to functools.WRAPPER_ASSIGNMENTS
http://bugs.python.org/issue13544  closed by meador.inge

#13545: Pydoc3.2: TypeError: unorderable types
http://bugs.python.org/issue13545  closed by haypo

#13547: Clean Lib/_sysconfigdata.py and Modules/_testembed
http://bugs.python.org/issue13547  closed by skrah

#13549: Incorrect nested list comprehension documentation
http://bugs.python.org/issue13549  closed by ezio.melotti

#13563: Make use of with statement in ftplib
http://bugs.python.org/issue13563  closed by giampaolo.rodola

#13568: sqlite3 convert_date error with DATE type
http://bugs.python.org/issue13568  closed by sherpya

#13569: Loggers cannot be pickled
http://bugs.python.org/issue13569  closed by vinay.sajip

#13570: Expose faster unicode<->ascii functions in the C-API
http://bugs.python.org/issue13570  closed by skrah

#13575: old style classes still alive
http://bugs.python.org/issue13575  closed by flox

#13580: Pre-linkage of CPython >=2.6 binary on Linux too fat (libssl, 
http://bugs.python.org/issue13580  closed by pitrou

#13584: argparse doesn't respect double quotes
http://bugs.python.org/issue13584  closed by bethard

#13591: import_module potentially imports a module twice
http://bugs.python.org/issue13591  closed by meador.inge

#13593: importlib needs to be updated for __qualname__
http://bugs.python.org/issue13593  closed by meador.inge

#13595: Weird behavior with generators with self-referencing output.
http://bugs.python.org/issue13595  closed by amaury.forgeotdarc

#13596: Only recompile Lib/_sysconfigdata.py when needed
http://bugs.python.org/issue13596  closed by python-dev

#13597: Improve documentation of stdout/stderr buffering in Python 3.x
http://bugs.python.org/issue13597  closed by pitrou

#13599: Compiled regexes don't show all attributes in dir()
http://bugs.python.org/issue13599  closed by ezio.melotti

#13600: rot_13 codec not working
http://bugs.python.org/issue13600  closed by petri.lehtinen

#13602: format string '%b' doesn't work as expected
http://bugs.python.org/issue13602  closed by James.Classen

#13603: Add prime-related and number theory functions to Python
http://bugs.python.org/issue13603  closed by maniram.maniram

#13606: test_clear_dict_in_ref_cycle in test_module only works by coin
http://bugs.python.org/issue13606  closed by python-dev

#11610: Improved support for abstract base classes with descriptors
http://bugs.python.org/issue11610  closed by python-dev

From jimjjewett at gmail.com  Fri Dec 16 21:14:02 2011
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 16 Dec 2011 15:14:02 -0500
Subject: [Python-Dev]  A new dict for Xmas?
Message-ID: <CA+OGgf6_DdEgVBovmnNLXeBUB_vt08WXiFrC-izhgeV+DBBPsQ@mail.gmail.com>

> Greg Ewing wrote:
>> Mark Shannon wrote:

>>> I have a new dict implementation which allows sharing of keys between
>>> objects of the same class.

>> We already have the __slots__ mechanism for memory savings.
>> Have you done any comparisons with that?

> You can't make Python programmers use slots, neither can you
> automatically change existing programs.

The automatic change is exactly what a dictionary upgrade provides.

I haven't read your patch in detail yet, but it sounds like you're
replacing the array of keys + array of values with just an array of
values, and getting the numerical index from a single per-class array
of keys.

That would normally be sensible (so thanks!), but it isn't a drop-in
replacement.  If you have a "Data" class intended to take arbitrary
per-instance attributes, it just forces them all to keep resizing up,
even though individual instances would be small with the current dict.

How is this more extreme than replacing a pure dict with some
auto-calculated slots and an "other_attrs" dict that would normally
remain empty?

[It may be harder to implement, because of the difficulty of
calculating the slots in advance ... but I don't see it as any worse,
once implemented.]

Of course, maybe your shared dict just points to sequential array
positions (rather than matching the key position) ... in which case,
it may well beat slots, though the the "Data" class would still be a
problem.

-jJ

From tjreedy at udel.edu  Fri Dec 16 22:32:05 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 16 Dec 2011 16:32:05 -0500
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EEB1772.1030300@hotpy.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>	<4EEA722A.10403@hotpy.org>
	<4EEADDB2.2020202@canterbury.ac.nz> <4EEB1772.1030300@hotpy.org>
Message-ID: <jcgdct$s27$1@dough.gmane.org>

On 12/16/2011 5:03 AM, Mark Shannon wrote:

> Of course using __slots__ saves more memory,
> but people don't use them much.

Do you think the stdlib should be using __slots__ more?

-- 
Terry Jan Reedy


From mark at hotpy.org  Fri Dec 16 22:32:44 2011
From: mark at hotpy.org (Mark Shannon)
Date: Fri, 16 Dec 2011 21:32:44 +0000
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <CA+OGgf6_DdEgVBovmnNLXeBUB_vt08WXiFrC-izhgeV+DBBPsQ@mail.gmail.com>
References: <CA+OGgf6_DdEgVBovmnNLXeBUB_vt08WXiFrC-izhgeV+DBBPsQ@mail.gmail.com>
Message-ID: <4EEBB8FC.2010405@hotpy.org>

Jim Jewett wrote:
>> Greg Ewing wrote:
>>> Mark Shannon wrote:
> 
>>>> I have a new dict implementation which allows sharing of keys between
>>>> objects of the same class.
> 
>>> We already have the __slots__ mechanism for memory savings.
>>> Have you done any comparisons with that?
> 
>> You can't make Python programmers use slots, neither can you
>> automatically change existing programs.
> 
> The automatic change is exactly what a dictionary upgrade provides.
> 
> I haven't read your patch in detail yet, but it sounds like you're
> replacing the array of keys + array of values with just an array of
> values, and getting the numerical index from a single per-class array
> of keys.

Each dictionary has key/hash/values as before, but instead of on array,
they are broken into two: a key/hash array and a value array.
The key/hash arrays can be shared amongst dicts,
this happens for well behaved classes and completely empty dicts,
other wise each dict gets two arrays.

> 
> That would normally be sensible (so thanks!), but it isn't a drop-in
> replacement.  If you have a "Data" class intended to take arbitrary

It is a drop in replacement. It conforms to the current API.

> per-instance attributes, it just forces them all to keep resizing up,
> even though individual instances would be small with the current dict.
There is a cut-off point, at the moment it's quite unsophisticated about 
how it does this, but it could easily be improved.
Suggestions are welcome.

> 
> How is this more extreme than replacing a pure dict with some
> auto-calculated slots and an "other_attrs" dict that would normally
> remain empty?

Its less extreme, but equally effective.

> 
> [It may be harder to implement, because of the difficulty of
> calculating the slots in advance ... but I don't see it as any worse,
> once implemented.]
Its a trade of between ease of implementation as effectiveness.
I think the shared key/hash array approach gets most the advantages of
a full map implementation (like PyPy or V8) with much less hassle.

> 
> Of course, maybe your shared dict just points to sequential array
> positions (rather than matching the key position) ... in which case,
> it may well beat slots, though the the "Data" class would still be a
> problem.

It won't beat slots, mainly due to the extra space required to minimise 
collisions, but it is a lot more compact than the present approach.

For a well behaved class with lots of instances, each with 3 or 4 
attributes (ie the minimum size dict) its cuts the space used by the 
per-instance dict from 136 bytes (32bit machine) to 64 bytes plus the 
shared key/hash array. Slots would only require 12 or 16 bytes.

(When verifying these numbers I found a bug in the resizing,
which I have just fixed)

The next enhancement would be to store the naked value array directly 
into an instance, trimming the space cost down to just 32 bytes, but 
this would cause compatibility issues as the (internal) API would need 
to change.

Cheers,
Mark.

From mark at hotpy.org  Fri Dec 16 22:42:11 2011
From: mark at hotpy.org (Mark Shannon)
Date: Fri, 16 Dec 2011 21:42:11 +0000
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <jcgdct$s27$1@dough.gmane.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>	<4EEA722A.10403@hotpy.org>	<4EEADDB2.2020202@canterbury.ac.nz>
	<4EEB1772.1030300@hotpy.org> <jcgdct$s27$1@dough.gmane.org>
Message-ID: <4EEBBB33.50306@hotpy.org>

Terry Reedy wrote:
> On 12/16/2011 5:03 AM, Mark Shannon wrote:
> 
>> Of course using __slots__ saves more memory,
>> but people don't use them much.
> 
> Do you think the stdlib should be using __slots__ more?

For some things yes, but where it's critical slots are already used.
Take the ordered dict, the nodes in that use slots.

The advantage of improving things in the VM is that
we don't have to rewrite half of the stdlib.

Cheers,
Mark.


From mmueller at vigilantsw.com  Sat Dec 17 10:55:55 2011
From: mmueller at vigilantsw.com (Michael Mueller)
Date: Sat, 17 Dec 2011 01:55:55 -0800
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
Message-ID: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>

Hi Guys,

We've been analyzing CPython with our static analysis tool (Sentry)
and a NULL pointer dereference popped up the other day, in
Objects/descrobject.c:

    if (descr != NULL) {
        Py_XINCREF(type);
        descr->d_type = type;
        descr->d_name = PyUnicode_InternFromString(name);
        if (descr->d_name == NULL) {
            Py_DECREF(descr);
            descr = NULL;
        }
        descr->d_qualname = NULL; // Possible NULL pointer dereference
    }

If the inner conditional block can be reached, descr will be set NULL
and then dereferenced on the next line.  The commented line above was
added in this commit: http://hg.python.org/cpython/rev/73948#l4.92

Hopefully someone can take a look and determine the appropriate fix.

Best,
Mike

-- 
Mike Mueller
Phone: (401) 405-1525
Email: mmueller at vigilantsw.com

http://www.vigilantsw.com/
Static Analysis for C and C++

From anacrolix at gmail.com  Sat Dec 17 11:33:53 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Sat, 17 Dec 2011 21:33:53 +1100
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
In-Reply-To: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
Message-ID: <CAB4yi1MU698zzgztLKwuRXUE0+LG5R6bgWWpy0-+vJE6R4S7XA@mail.gmail.com>

?_?

On Sat, Dec 17, 2011 at 8:55 PM, Michael Mueller
<mmueller at vigilantsw.com> wrote:
> Hi Guys,
>
> We've been analyzing CPython with our static analysis tool (Sentry)
> and a NULL pointer dereference popped up the other day, in
> Objects/descrobject.c:
>
> ? ?if (descr != NULL) {
> ? ? ? ?Py_XINCREF(type);
> ? ? ? ?descr->d_type = type;
> ? ? ? ?descr->d_name = PyUnicode_InternFromString(name);
> ? ? ? ?if (descr->d_name == NULL) {
> ? ? ? ? ? ?Py_DECREF(descr);
> ? ? ? ? ? ?descr = NULL;
> ? ? ? ?}
> ? ? ? ?descr->d_qualname = NULL; // Possible NULL pointer dereference
> ? ?}
>
> If the inner conditional block can be reached, descr will be set NULL
> and then dereferenced on the next line. ?The commented line above was
> added in this commit: http://hg.python.org/cpython/rev/73948#l4.92
>
> Hopefully someone can take a look and determine the appropriate fix.
>
> Best,
> Mike
>
> --
> Mike Mueller
> Phone: (401) 405-1525
> Email: mmueller at vigilantsw.com
>
> http://www.vigilantsw.com/
> Static Analysis for C and C++
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com



-- 
?_?

From fijall at gmail.com  Sat Dec 17 12:53:05 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Sat, 17 Dec 2011 13:53:05 +0200
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <jcgdct$s27$1@dough.gmane.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
	<4EEA722A.10403@hotpy.org> <4EEADDB2.2020202@canterbury.ac.nz>
	<4EEB1772.1030300@hotpy.org> <jcgdct$s27$1@dough.gmane.org>
Message-ID: <CAK5idxRgJkmSzn_yLZA5C3=XTJ9jCTbNB7tFS+xPmsEvBY0SwA@mail.gmail.com>

On Fri, Dec 16, 2011 at 11:32 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 12/16/2011 5:03 AM, Mark Shannon wrote:
>
>> Of course using __slots__ saves more memory,
>> but people don't use them much.
>
>
> Do you think the stdlib should be using __slots__ more?

Note that unlike some other more advanced approaches, slots do change
semantics. There are many cases out there where people would stuff
arbitrary things on stdlib objects and this works fine without
__slots__, but will stop working as soon as you introduce them. A
change from no slots to using slots is not only a performance issue.

Cheers,
fijal

From dirkjan at ochtman.nl  Sat Dec 17 13:31:01 2011
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sat, 17 Dec 2011 13:31:01 +0100
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <CAK5idxRgJkmSzn_yLZA5C3=XTJ9jCTbNB7tFS+xPmsEvBY0SwA@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
	<4EEA722A.10403@hotpy.org> <4EEADDB2.2020202@canterbury.ac.nz>
	<4EEB1772.1030300@hotpy.org> <jcgdct$s27$1@dough.gmane.org>
	<CAK5idxRgJkmSzn_yLZA5C3=XTJ9jCTbNB7tFS+xPmsEvBY0SwA@mail.gmail.com>
Message-ID: <CAKmKYaCOZTbTJxtCWOFDW1AjwnJ2jLL-EzQeGycjQq2WoOzf4A@mail.gmail.com>

On Sat, Dec 17, 2011 at 12:53, Maciej Fijalkowski <fijall at gmail.com> wrote:
> Note that unlike some other more advanced approaches, slots do change
> semantics. There are many cases out there where people would stuff
> arbitrary things on stdlib objects and this works fine without
> __slots__, but will stop working as soon as you introduce them. A
> change from no slots to using slots is not only a performance issue.

Yeah... This whole idea reeks of polymorphic inline caches (called
"shapes" or "hidden classes" in SpiderMonkey and v8, respectively),
where they dynamically try to infer what kind of class an object has,
such that the __slots__ optimization can be done without making it
visible in the semantics. The Unladen Swallow guys mention in their
ProjectPlan that the overhead of opcode fetch/dispatch makes that
hard, though.

Cheers,

Dirkjan

From fijall at gmail.com  Sat Dec 17 13:34:52 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Sat, 17 Dec 2011 14:34:52 +0200
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <CAKmKYaCOZTbTJxtCWOFDW1AjwnJ2jLL-EzQeGycjQq2WoOzf4A@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
	<4EEA722A.10403@hotpy.org> <4EEADDB2.2020202@canterbury.ac.nz>
	<4EEB1772.1030300@hotpy.org> <jcgdct$s27$1@dough.gmane.org>
	<CAK5idxRgJkmSzn_yLZA5C3=XTJ9jCTbNB7tFS+xPmsEvBY0SwA@mail.gmail.com>
	<CAKmKYaCOZTbTJxtCWOFDW1AjwnJ2jLL-EzQeGycjQq2WoOzf4A@mail.gmail.com>
Message-ID: <CAK5idxSH3tUc5OfLN0SVXSRardN5znaVnAC2++TN8wMsEG1U8w@mail.gmail.com>

On Sat, Dec 17, 2011 at 2:31 PM, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
> On Sat, Dec 17, 2011 at 12:53, Maciej Fijalkowski <fijall at gmail.com> wrote:
>> Note that unlike some other more advanced approaches, slots do change
>> semantics. There are many cases out there where people would stuff
>> arbitrary things on stdlib objects and this works fine without
>> __slots__, but will stop working as soon as you introduce them. A
>> change from no slots to using slots is not only a performance issue.
>
> Yeah... This whole idea reeks of polymorphic inline caches (called
> "shapes" or "hidden classes" in SpiderMonkey and v8, respectively),
> where they dynamically try to infer what kind of class an object has,
> such that the __slots__ optimization can be done without making it
> visible in the semantics. The Unladen Swallow guys mention in their
> ProjectPlan that the overhead of opcode fetch/dispatch makes that
> hard, though.
>
> Cheers,
>
> Dirkjan

It's done in PyPy btw. Works like a charm :) It's called sharing dict
and the idea dates back to self and it's maps. There is also an
ongoing effort to specialize on types of fields, so you don't have to
box say ints stored on classes. That's however in-progress now :)

From g.brandl at gmx.net  Sat Dec 17 13:57:53 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 17 Dec 2011 13:57:53 +0100
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
In-Reply-To: <CAB4yi1MU698zzgztLKwuRXUE0+LG5R6bgWWpy0-+vJE6R4S7XA@mail.gmail.com>
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
	<CAB4yi1MU698zzgztLKwuRXUE0+LG5R6bgWWpy0-+vJE6R4S7XA@mail.gmail.com>
Message-ID: <jci3kg$2uk$1@dough.gmane.org>

On 12/17/2011 11:33 AM, Matt Joiner wrote:
> ?_?

Would you please stop this?  It may have been funny the first time, but
now it looks like pure trolling.

Georg


From benjamin at python.org  Sat Dec 17 14:02:40 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Sat, 17 Dec 2011 08:02:40 -0500
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
In-Reply-To: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
Message-ID: <CAPZV6o9wWm2=FV+0tOxTxPye-srbB0a6XhZVFKLANYxihy91_A@mail.gmail.com>

2011/12/17 Michael Mueller <mmueller at vigilantsw.com>:
>
> Hopefully someone can take a look and determine the appropriate fix.

Fixed.


-- 
Regards,
Benjamin

From elic at astllc.org  Sat Dec 17 17:02:05 2011
From: elic at astllc.org (Eli Collins)
Date: Sat, 17 Dec 2011 11:02:05 -0500
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
In-Reply-To: <CAPZV6o9wWm2=FV+0tOxTxPye-srbB0a6XhZVFKLANYxihy91_A@mail.gmail.com>
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
	<CAPZV6o9wWm2=FV+0tOxTxPye-srbB0a6XhZVFKLANYxihy91_A@mail.gmail.com>
Message-ID: <CADMi=5AhMiuVTjruEUxR2h+LzUzeHDriCSye=vGKp11t06MejA@mail.gmail.com>

In that same code, right before "PY_DECREF(descr)", should there also be a
"PY_XDECREF(type)"? it looks like it might leak a reference to "type"
otherwise.

the line in question -
http://hg.python.org/cpython/file/8c355edc5b1d/Objects/descrobject.c#l628

- Eli Collins

On Sat, Dec 17, 2011 at 8:02 AM, Benjamin Peterson <benjamin at python.org>wrote:

> 2011/12/17 Michael Mueller <mmueller at vigilantsw.com>:
> >
> > Hopefully someone can take a look and determine the appropriate fix.
>
> Fixed.
>
>
> --
> Regards,
> Benjamin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/elic%40assurancetechnologies.com
>



-- 
 Eli Collins   elic at assurancetechnologies.com
Software Development & I.T. Consulting
Assurance Technologies   www.assurancetechnologies.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111217/708b738f/attachment.html>

From kevinjcoyne at hotmail.com  Sat Dec 17 16:54:17 2011
From: kevinjcoyne at hotmail.com (Kevin Coyne)
Date: Sat, 17 Dec 2011 15:54:17 +0000 (UTC)
Subject: [Python-Dev] IEEE/ISO draft on Python vulnerabilities
Message-ID: <loom.20111217T165307-461@post.gmane.org>

Victor:

Python.3 Type System [IHN] - The use of ?extended precision? as a term to 
express Python?s  ability to create and manipulate integers of any size (within 
the memory limitations of the computer) is poor since that term is used in 
reference to floating point numbers almost exclusively. I will change it to 
?unlimited precision? in the revised annex.

Python.16 Wrap?around Error [XYY] - My source for this is in the Python 
documentation under the 2nd reference to OverflowError in:
http://docs.python.org/py3k/library/exceptions.html?highlight=overflowerror

Python.23 Initialization of Variables [LAV] ? Point taken on the unusual syntax 
(I am not a Python programmer) and I will change to the more common syntax s per 
your 2nd suggested syntax.

Python.32 Structured Programming [EWD] ? The point I was trying to make was 
that, unlike many early languages, Python has no constructs, like the ones 
mentioned, that can be used to create an unstructured program. I am not 
advocating, nor would it be proper in this kind of paper, that the Python 
language be extended to allow for unstructured statements. I will try to clarify 
this better in the revised version.

Python.51 Undefined Behaviour [EWF] #1 ? I need to do more research as your 
example does suggest that mutating, at least, does raise an exception.  Here are 
a few references that claim that this is undefined:
Refer to (10) under:
http://docs.python.org/release/2.4/lib/typesseq-mutable.html

Python.51 Undefined Behaviour [EWF] #2 ? In regard to collections.OrderedDict, 
since I am only listing undefined behaviors I don?t think adding a defined 
behaviour here is appropriate.
Python.52 Implementation?defined Behaviour [FAB] ? In regard to mixing tabs and 
spaces, I will add your advice to the 52.2 Guidance section
Thanks for your excellent comments; the paper will be improved because of them.

Kevin Coyne
703.901.6774


From benjamin at python.org  Sat Dec 17 17:20:38 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Sat, 17 Dec 2011 11:20:38 -0500
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
In-Reply-To: <CADMi=5AhMiuVTjruEUxR2h+LzUzeHDriCSye=vGKp11t06MejA@mail.gmail.com>
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
	<CAPZV6o9wWm2=FV+0tOxTxPye-srbB0a6XhZVFKLANYxihy91_A@mail.gmail.com>
	<CADMi=5AhMiuVTjruEUxR2h+LzUzeHDriCSye=vGKp11t06MejA@mail.gmail.com>
Message-ID: <CAPZV6o81VfESRDLekGi=LvvtXqMUN-w0m8gffb7n4+_d7PO4jA@mail.gmail.com>

2011/12/17 Eli Collins <elic at astllc.org>
>
> In that same code, right before "PY_DECREF(descr)", should there also be a "PY_XDECREF(type)"? it looks like it might leak a reference to "type" otherwise.


No. The descr will deallocate it.

PS. Please don't send HTML mail.



--
Regards,
Benjamin

From elic at astllc.org  Sat Dec 17 18:00:23 2011
From: elic at astllc.org (Eli Collins)
Date: Sat, 17 Dec 2011 12:00:23 -0500
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
In-Reply-To: <CAPZV6o81VfESRDLekGi=LvvtXqMUN-w0m8gffb7n4+_d7PO4jA@mail.gmail.com>
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
	<CAPZV6o9wWm2=FV+0tOxTxPye-srbB0a6XhZVFKLANYxihy91_A@mail.gmail.com>
	<CADMi=5AhMiuVTjruEUxR2h+LzUzeHDriCSye=vGKp11t06MejA@mail.gmail.com>
	<CAPZV6o81VfESRDLekGi=LvvtXqMUN-w0m8gffb7n4+_d7PO4jA@mail.gmail.com>
Message-ID: <CADMi=5BpTpEB8qrNYAqUaS+mqwZOOX+hZZazKT2TfcQSyMaj6Q@mail.gmail.com>

On Sat, Dec 17, 2011 at 11:20 AM, Benjamin Peterson <benjamin at python.org> wrote:
>
> No. The descr will deallocate it.
>
> PS. Please don't send HTML mail.
>

Thank you for the explanation.

And my apologies to the entire list for the HTML; it's way too early
for me, I forgot to turn that mess off.

From mmueller at vigilantsw.com  Sat Dec 17 18:45:11 2011
From: mmueller at vigilantsw.com (Michael Mueller)
Date: Sat, 17 Dec 2011 09:45:11 -0800
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
In-Reply-To: <CAPZV6o9wWm2=FV+0tOxTxPye-srbB0a6XhZVFKLANYxihy91_A@mail.gmail.com>
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
	<CAPZV6o9wWm2=FV+0tOxTxPye-srbB0a6XhZVFKLANYxihy91_A@mail.gmail.com>
Message-ID: <CANV9Rr_ir=DQGJr4aG1N150AOn4=gz5CRFimPQ_0iHvpvE-4hQ@mail.gmail.com>

On Sat, Dec 17, 2011 at 5:02 AM, Benjamin Peterson <benjamin at python.org> wrote:
> 2011/12/17 Michael Mueller <mmueller at vigilantsw.com>:
>>
>> Hopefully someone can take a look and determine the appropriate fix.
>
> Fixed.
>
> --
> Regards,
> Benjamin

Excellent!

-- 
Mike Mueller
Phone: (401) 405-1525
Email: mmueller at vigilantsw.com

http://www.vigilantsw.com/
Static Analysis for C and C++

From greg.ewing at canterbury.ac.nz  Sun Dec 18 01:09:16 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 18 Dec 2011 13:09:16 +1300
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
In-Reply-To: <CAB4yi1MU698zzgztLKwuRXUE0+LG5R6bgWWpy0-+vJE6R4S7XA@mail.gmail.com>
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
	<CAB4yi1MU698zzgztLKwuRXUE0+LG5R6bgWWpy0-+vJE6R4S7XA@mail.gmail.com>
Message-ID: <4EED2F2C.2070409@canterbury.ac.nz>

Matt Joiner wrote:
> ?_?

What's up with these ?_? messages?

-- 
Greg

From solipsis at pitrou.net  Sun Dec 18 01:20:57 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 18 Dec 2011 01:20:57 +0100
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>
	<CAB4yi1MU698zzgztLKwuRXUE0+LG5R6bgWWpy0-+vJE6R4S7XA@mail.gmail.com>
	<4EED2F2C.2070409@canterbury.ac.nz>
Message-ID: <20111218012057.25fe6903@pitrou.net>

On Sun, 18 Dec 2011 13:09:16 +1300
Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Matt Joiner wrote:
> > ?_?
> 
> What's up with these ?_? messages?

>>> print(ascii("?_?"))
'\u0ca0_\u0ca0'


Antoine.



From steve at pearwood.info  Sun Dec 18 01:33:20 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 18 Dec 2011 11:33:20 +1100
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
In-Reply-To: <4EED2F2C.2070409@canterbury.ac.nz>
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>	<CAB4yi1MU698zzgztLKwuRXUE0+LG5R6bgWWpy0-+vJE6R4S7XA@mail.gmail.com>
	<4EED2F2C.2070409@canterbury.ac.nz>
Message-ID: <4EED34D0.8030507@pearwood.info>

Greg Ewing wrote:
> Matt Joiner wrote:
>> ?_?
> 
> What's up with these ?_? messages?
> 

I think that, depending on the typeface you view it with, it is supposed to be 
some sort of smiley: two big wide open square eyes with tightly pursed lips. 
Presumably it is supposed to be a look of shock and surprise.

As smileys go, it's pretty poor, because people are unlikely to see the same 
thing. The supposed eyes are probably intended to be square boxes; in my email 
client, the boxes contain tiny 0ca0 characters, which completely ruins the 
effect. Apparently you see a question mark instead of a box. Depending on the 
typeface, others might see a full box, an empty box, a diamond with a question 
mark in it, nothing at all, or some other glyph.


-- 
Steven

From steve at pearwood.info  Sun Dec 18 01:39:11 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 18 Dec 2011 11:39:11 +1100
Subject: [Python-Dev] Potential NULL pointer dereference in descrobject.c
In-Reply-To: <4EED34D0.8030507@pearwood.info>
References: <CANV9Rr8T4PQUss_jCWXQeKGdBxLWkr7-rMLmWEcYAE7QwmWHjA@mail.gmail.com>	<CAB4yi1MU698zzgztLKwuRXUE0+LG5R6bgWWpy0-+vJE6R4S7XA@mail.gmail.com>	<4EED2F2C.2070409@canterbury.ac.nz>
	<4EED34D0.8030507@pearwood.info>
Message-ID: <4EED362F.5060203@pearwood.info>

Steven D'Aprano wrote:
> Greg Ewing wrote:
>> Matt Joiner wrote:
>>> ?_?
>>
>> What's up with these ?_? messages?
>>
> 
> I think that, depending on the typeface you view it with, it is supposed 
> to be some sort of smiley: two big wide open square eyes with tightly 
> pursed lips. Presumably it is supposed to be a look of shock and surprise.

Apparently it is supposed to be a look of disapproval:

http://knowyourmeme.com/memes/%E0%B2%A0%E0%B2%A0-look-of-disapproval

and the 0c0a characters on either side of the underscore is KANNADA LETTER 
TTHA: http://www.fileformat.info/info/unicode/char/ca0/index.htm



-- 
Steven

From fperez.net at gmail.com  Sun Dec 18 08:46:48 2011
From: fperez.net at gmail.com (Fernando Perez)
Date: Sun, 18 Dec 2011 07:46:48 +0000 (UTC)
Subject: [Python-Dev] Inconsistent script/console behaviour
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
Message-ID: <jck5p8$i3g$1@dough.gmane.org>

On Fri, 23 Sep 2011 16:32:30 -0700, Guido van Rossum wrote:

> You can't fix this without completely changing the way the interactive
> console treats blank lines. None that it's not just that a blank line is
> required after a function definition -- you also *can't* have a blank
> line *inside* a function definition.
> 
> The interactive console is optimized for people entering code by typing,
> not by copying and pasting large gobs of text.
> 
> If you think you can have it both, show us the code.

Apology for the advertising, but if the OP is really interested in that 
kind of behavior, then instead of asking for making the default shell more 
complex, he can use ipython which supports what he's looking for:

In [5]: def some():
   ...:   print 'xxx'
   ...: some()
   ...: 
xxx

and even blank lines inside functions (albeit only in certain locations):

In [6]: def some():
   ...: 
   ...:   print 'xxx'
   ...: some()
   ...: 
xxx


Now, the dances we have to do in ipython to achieve that are much more 
complex than what would be reasonable to have in the default '>>>' python 
shell, which should remain simple, light and robust.  But ipython is a 
simple install for someone who wants fancier features for interactive work.

Cheers,

f


From roundup-admin at psf.upfronthosting.co.za  Sun Dec 18 20:28:46 2011
From: roundup-admin at psf.upfronthosting.co.za (Python tracker)
Date: Sun, 18 Dec 2011 19:28:46 +0000
Subject: [Python-Dev] Failed issue tracker submission
Message-ID: <20111218192846.6098A1DE8A@psf.upfronthosting.co.za>


An unexpected error occurred during the processing
of your message. The tracker administrator is being
notified.
-------------- next part --------------
Return-Path: <python-dev at python.org>
X-Original-To: report at bugs.python.org
Delivered-To: roundup+tracker at psf.upfronthosting.co.za
Received: from mail.python.org (mail.python.org [82.94.164.166])
	by psf.upfronthosting.co.za (Postfix) with ESMTPS id EF0611DE20
	for <report at bugs.python.org>; Sun, 18 Dec 2011 20:23:39 +0100 (CET)
Received: from albatross.python.org (localhost [127.0.0.1])
	by mail.python.org (Postfix) with ESMTP id 3T5wK759MkzQ00
	for <report at bugs.python.org>; Sun, 18 Dec 2011 20:23:39 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=python.org; s=200901;
	t=1324236219; bh=/yIht6I8EmPEiXZ9KLwjVNemYVkalK/1gPj7HIPxFXM=;
	h=Date:Message-Id:Content-Type:MIME-Version:
	 Content-Transfer-Encoding:From:To:Subject;
	b=oFlrztFHjmQi6JK3VCXIic9qr39+OWQ4rGmVoFTk59ABwcLwBJpJGa4BQq74DRZT9
	 BoWSENTtwjmDIiLNg3LgIXv9RioJHWtR6EWlj1R7fvPUfTgnjXd7fJNgbVSPG5BbgU
	 VzVC5bQYIO9aKpzYWBTTxH700UdCfLAC27/GwIKY=
Received: from localhost (HELO mail.python.org) (127.0.0.1)
  by albatross.python.org with SMTP; 18 Dec 2011 20:23:39 +0100
Received: from dinsdale.python.org (svn.python.org [IPv6:2001:888:2000:d::a4])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.python.org (Postfix) with ESMTPS
	for <report at bugs.python.org>; Sun, 18 Dec 2011 20:23:39 +0100 (CET)
Received: from localhost
	([127.0.0.1] helo=dinsdale.python.org ident=hg)
	by dinsdale.python.org with esmtp (Exim 4.72)
	(envelope-from <python-dev at python.org>)
	id 1RcMKh-0006D3-Ii
	for report at bugs.python.org; Sun, 18 Dec 2011 20:23:39 +0100
Date: Sun, 18 Dec 2011 20:23:39 +0100
Message-Id: <E1RcMKh-0006D3-Ii at dinsdale.python.org>
Content-Type: text/plain; charset="utf8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
From: python-dev at python.org
To: report at bugs.python.org
Subject: [issue7502]

TmV3IGNoYW5nZXNldCBlMzdjNzE2OTg0MDkgYnkgQW50b2luZSBQaXRyb3UgaW4gYnJhbmNoICcz
LjInOgpGb2xsb3d1cCB0byAjNzUwMjogYWRkIF9faGFzaF9fIG1ldGhvZCBhbmQgdGVzdHMuCmh0
dHA6Ly9oZy5weXRob24ub3JnL2NweXRob24vcmV2L2UzN2M3MTY5ODQwOQoKCk5ldyBjaGFuZ2Vz
ZXQgNGZmYTk5OTJhN2Q4IGJ5IEFudG9pbmUgUGl0cm91IGluIGJyYW5jaCAnZGVmYXVsdCc6CkZv
bGxvd3VwIHRvICM3NTAyOiBhZGQgX19oYXNoX18gbWV0aG9kIGFuZCB0ZXN0cy4KaHR0cDovL2hn
LnB5dGhvbi5vcmcvY3B5dGhvbi9yZXYvNGZmYTk5OTJhN2Q4Cg==

From martin at v.loewis.de  Sun Dec 18 20:34:49 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 18 Dec 2011 20:34:49 +0100
Subject: [Python-Dev] [Python-checkins] cpython: Move
 PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum
In-Reply-To: <E1Rc1xb-0001Rp-OL@dinsdale.python.org>
References: <E1Rc1xb-0001Rp-OL@dinsdale.python.org>
Message-ID: <4EEE4059.5040807@v.loewis.de>

>   Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum

What's the rationale for that change? It's a valid kind value, after
all, and the C convention is that an enumeration lists all valid values
(else there wouldn't be a need for an enumeration in the first place).

Regards,
Martin

From victor.stinner at haypocalc.com  Sun Dec 18 20:45:40 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Sun, 18 Dec 2011 20:45:40 +0100
Subject: [Python-Dev] [Python-checkins] cpython: Move
 PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum
In-Reply-To: <4EEE4059.5040807@v.loewis.de>
References: <E1Rc1xb-0001Rp-OL@dinsdale.python.org>
	<4EEE4059.5040807@v.loewis.de>
Message-ID: <4EEE42E4.5020905@haypocalc.com>

On 18/12/2011 20:34, "Martin v. L?wis" wrote:
>>    Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum
>
> What's the rationale for that change? It's a valid kind value, after
> all, and the C convention is that an enumeration lists all valid values
> (else there wouldn't be a need for an enumeration in the first place).

PyUnicode_KIND() only returns PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND 
or PyUnicode_4BYTE_KIND. Outside unicodeobject.c, you are not supposed 
to see PyUnicode_WCHAR_KIND.

For switch/case, it avoids the need of adding a dummy 
PyUnicode_WCHAR_KIND case (or a default case).

Victor

From martin at v.loewis.de  Sun Dec 18 21:04:24 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 18 Dec 2011 21:04:24 +0100
Subject: [Python-Dev] [Python-checkins] cpython: Move
 PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum
In-Reply-To: <4EEE42E4.5020905@haypocalc.com>
References: <E1Rc1xb-0001Rp-OL@dinsdale.python.org>	<4EEE4059.5040807@v.loewis.de>
	<4EEE42E4.5020905@haypocalc.com>
Message-ID: <4EEE4748.2010901@v.loewis.de>

Am 18.12.2011 20:45, schrieb Victor Stinner:
> On 18/12/2011 20:34, "Martin v. L?wis" wrote:
>>>    Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum
>>
>> What's the rationale for that change? It's a valid kind value, after
>> all, and the C convention is that an enumeration lists all valid values
>> (else there wouldn't be a need for an enumeration in the first place).
> 
> PyUnicode_KIND() only returns PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND
> or PyUnicode_4BYTE_KIND. Outside unicodeobject.c, you are not supposed
> to see PyUnicode_WCHAR_KIND.

Why do you say that? It can very well happen, assuming you call
PyUnicode_KIND on a string that is not ready. That would be a
bug in the module, but people do make bugs when programming.

> For switch/case, it avoids the need of adding a dummy
> PyUnicode_WCHAR_KIND case (or a default case).

... and thus hides a potential source of errors, as people may
forget to call ready, and then fall through the case, letting
god-knows-what happen.

If the rationale is to simplify silencing compiler errors, I
vote for reverting the enumeration back to a macro list.

Regards,
Martin

From victor.stinner at haypocalc.com  Sun Dec 18 21:16:19 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Sun, 18 Dec 2011 21:16:19 +0100
Subject: [Python-Dev] [Python-checkins] cpython: Move
 PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum
In-Reply-To: <4EEE4748.2010901@v.loewis.de>
References: <E1Rc1xb-0001Rp-OL@dinsdale.python.org>	<4EEE4059.5040807@v.loewis.de>
	<4EEE42E4.5020905@haypocalc.com> <4EEE4748.2010901@v.loewis.de>
Message-ID: <4EEE4A13.1040808@haypocalc.com>

On 18/12/2011 21:04, "Martin v. L?wis" wrote:
>> PyUnicode_KIND() only returns PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND
>> or PyUnicode_4BYTE_KIND. Outside unicodeobject.c, you are not supposed
>> to see PyUnicode_WCHAR_KIND.
>
> Why do you say that? It can very well happen, assuming you call
> PyUnicode_KIND on a string that is not ready. That would be a
> bug in the module, but people do make bugs when programming.

I added assert(PyUnicode_IS_READY(op)) to the macro, so the bug will be 
quickly catched in debug mode. I forgot that it is just an assertion and 
few people use Python compiled in debug mode.

> If the rationale is to simplify silencing compiler errors, I
> vote for reverting the enumeration back to a macro list.

I'm not sure that gcc will not complain if only 3 values are handled. I 
agree to revert the commit if that helps developers to write bugs.

Victor

From martin at v.loewis.de  Sun Dec 18 21:36:44 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 18 Dec 2011 21:36:44 +0100
Subject: [Python-Dev] [Python-checkins] cpython: Move
 PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum
In-Reply-To: <4EEE4A13.1040808@haypocalc.com>
References: <E1Rc1xb-0001Rp-OL@dinsdale.python.org>	<4EEE4059.5040807@v.loewis.de>	<4EEE42E4.5020905@haypocalc.com>
	<4EEE4748.2010901@v.loewis.de> <4EEE4A13.1040808@haypocalc.com>
Message-ID: <4EEE4EDC.2000606@v.loewis.de>

Am 18.12.2011 21:16, schrieb Victor Stinner:
> On 18/12/2011 21:04, "Martin v. L?wis" wrote:
>>> PyUnicode_KIND() only returns PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND
>>> or PyUnicode_4BYTE_KIND. Outside unicodeobject.c, you are not supposed
>>> to see PyUnicode_WCHAR_KIND.
>>
>> Why do you say that? It can very well happen, assuming you call
>> PyUnicode_KIND on a string that is not ready. That would be a
>> bug in the module, but people do make bugs when programming.
> 
> I added assert(PyUnicode_IS_READY(op)) to the macro, so the bug will be
> quickly catched in debug mode. I forgot that it is just an assertion and
> few people use Python compiled in debug mode.
> 
>> If the rationale is to simplify silencing compiler errors, I
>> vote for reverting the enumeration back to a macro list.
> 
> I'm not sure that gcc will not complain if only 3 values are handled. I
> agree to revert the commit if that helps developers to write bugs.

It helps to detect bugs. User should be aware that there is an
additional case, and put something like

  case PyUnicode_WCHAR_KIND:
     /* string is guaranteed to be ready here */
     assert(0);

into their code.

Regards,
Martin

From solipsis at pitrou.net  Sun Dec 18 23:55:16 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 18 Dec 2011 23:55:16 +0100
Subject: [Python-Dev] A new dict for Xmas?
References: <CA+OGgf6_DdEgVBovmnNLXeBUB_vt08WXiFrC-izhgeV+DBBPsQ@mail.gmail.com>
	<4EEBB8FC.2010405@hotpy.org>
Message-ID: <20111218235516.741cc14d@pitrou.net>

On Fri, 16 Dec 2011 21:32:44 +0000
Mark Shannon <mark at hotpy.org> wrote:
> 
> > per-instance attributes, it just forces them all to keep resizing up,
> > even though individual instances would be small with the current dict.
> There is a cut-off point, at the moment it's quite unsophisticated about 
> how it does this, but it could easily be improved.
> Suggestions are welcome.

Can you open an issue on the bug tracker?
There you can either give your repo URL, or upload a patch.
Both should allow to start reviewing the code :)

Regards

Antoine.



From stephen at xemacs.org  Mon Dec 19 05:47:57 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 19 Dec 2011 13:47:57 +0900
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <jck5p8$i3g$1@dough.gmane.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<jck5p8$i3g$1@dough.gmane.org>
Message-ID: <87y5u9jhfm.fsf@uwakimon.sk.tsukuba.ac.jp>

Fernando Perez writes:

 > Apology for the advertising,

If there's any apologizing to be done, it's on Anatoly's part.  Your
post was short, to the point, information-packed, and should put a big
fat open-centered ideographic full stop period to this thread.


From solipsis at pitrou.net  Tue Dec 20 09:51:49 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 20 Dec 2011 09:51:49 +0100
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
Message-ID: <20111220095149.6187cca8@pitrou.net>

On Mon, 19 Dec 2011 22:42:43 +0100
benjamin.peterson <python-checkins at python.org> wrote:
> http://hg.python.org/cpython/rev/d85efd73b0e1
> changeset:   74088:d85efd73b0e1
> branch:      3.2
> parent:      74082:71e5a083f9b1
> user:        Benjamin Peterson <benjamin at python.org>
> date:        Mon Dec 19 16:41:11 2011 -0500
> summary:
>   don't mention implementation detail
> 
> files:
>   Doc/library/operator.rst |  10 +++++-----
>   1 files changed, 5 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst
> --- a/Doc/library/operator.rst
> +++ b/Doc/library/operator.rst
> @@ -12,11 +12,11 @@
>     from operator import itemgetter, iadd
>  
>  
> -The :mod:`operator` module exports a set of functions implemented in C
> -corresponding to the intrinsic operators of Python.  For example,
> -``operator.add(x, y)`` is equivalent to the expression ``x+y``.  The function
> -names are those used for special class methods; variants without leading and
> -trailing ``__`` are also provided for convenience.

I disagree with this change. Knowing that they are written in C is
important when deciding to pass them to e.g. sort() or sorted(),
because you know it will be faster than an arbitrary pure Python
function.

You could tag it as a "CPython implementation detail" if you want, or
talk about performance rather than mention "C".

Regards

Antoine.



From solipsis at pitrou.net  Tue Dec 20 09:54:40 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 20 Dec 2011 09:54:40 +0100
Subject: [Python-Dev] Difference between PyUnicode_IS_ASCII and
 PyUnicode_IS_COMPACT_ASCII ?
Message-ID: <20111220095440.43ff9f41@pitrou.net>


Hello,

The include file (unicodeobject.h) seems to imply that some pure ASCII
strings can be non-compact, but I don't understand how that can happen.

Besides, the following comment also seems wrong:

       - compact:

         * structure = PyCompactUnicodeObject
         * test: PyUnicode_IS_ASCII(op) && !PyUnicode_IS_COMPACT(op)

Regards

Antoine.



From fijall at gmail.com  Tue Dec 20 11:01:04 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 20 Dec 2011 12:01:04 +0200
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <20111220095149.6187cca8@pitrou.net>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
Message-ID: <CAK5idxRe01Yh6-vdmfCtr5rS5AqkYX4OOG2=be9EmSDo0=Arbw@mail.gmail.com>

On Tue, Dec 20, 2011 at 10:51 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 19 Dec 2011 22:42:43 +0100
> benjamin.peterson <python-checkins at python.org> wrote:
>> http://hg.python.org/cpython/rev/d85efd73b0e1
>> changeset: ? 74088:d85efd73b0e1
>> branch: ? ? ?3.2
>> parent: ? ? ?74082:71e5a083f9b1
>> user: ? ? ? ?Benjamin Peterson <benjamin at python.org>
>> date: ? ? ? ?Mon Dec 19 16:41:11 2011 -0500
>> summary:
>> ? don't mention implementation detail
>>
>> files:
>> ? Doc/library/operator.rst | ?10 +++++-----
>> ? 1 files changed, 5 insertions(+), 5 deletions(-)
>>
>>
>> diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst
>> --- a/Doc/library/operator.rst
>> +++ b/Doc/library/operator.rst
>> @@ -12,11 +12,11 @@
>> ? ? from operator import itemgetter, iadd
>>
>>
>> -The :mod:`operator` module exports a set of functions implemented in C
>> -corresponding to the intrinsic operators of Python. ?For example,
>> -``operator.add(x, y)`` is equivalent to the expression ``x+y``. ?The function
>> -names are those used for special class methods; variants without leading and
>> -trailing ``__`` are also provided for convenience.
>
> I disagree with this change. Knowing that they are written in C is
> important when deciding to pass them to e.g. sort() or sorted(),
> because you know it will be faster than an arbitrary pure Python
> function.
>
> You could tag it as a "CPython implementation detail" if you want, or
> talk about performance rather than mention "C".
>
> Regards
>
> Antoine.

If this documentation is to be used by other python implementations,
then mentions of performance are outright harmful, since the
performance characteristics differ quite drastically. Written in C is
also not a part of specification as far as I know :)

Cheers,
fijal

From solipsis at pitrou.net  Tue Dec 20 11:08:30 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 20 Dec 2011 11:08:30 +0100
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <CAK5idxRe01Yh6-vdmfCtr5rS5AqkYX4OOG2=be9EmSDo0=Arbw@mail.gmail.com>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
	<CAK5idxRe01Yh6-vdmfCtr5rS5AqkYX4OOG2=be9EmSDo0=Arbw@mail.gmail.com>
Message-ID: <1324375710.3368.17.camel@localhost.localdomain>


Le mardi 20 d?cembre 2011 ? 12:01 +0200, Maciej Fijalkowski a ?crit :
> 
> If this documentation is to be used by other python implementations,
> then mentions of performance are outright harmful, since the
> performance characteristics differ quite drastically. Written in C is
> also not a part of specification as far as I know :)

But that's basically the only reason to invoke the
`operator.attrgetter("foo")` ugliness, instead of writing the explicit
and obvious `lambda x: x.foo`.
So not mentioning that it provides a speed benefit on CPython hides the
primary reason for using the operator module. Overwise it's just a bunch
of useless wrappers.

---------

More generally, not talking about performance at all is more harmful
than making CPython-specific comments in the documentation. 

Implementation details *deserve* to be documented when they have an
impact on behaviour (including performance / resource usage). Python is
not just a platonic ideal. Do you suggest we also remove this part:
http://docs.python.org/dev/library/io.html#performance
?

Regards

Antoine.



From dirkjan at ochtman.nl  Tue Dec 20 11:14:15 2011
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Tue, 20 Dec 2011 11:14:15 +0100
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <1324375710.3368.17.camel@localhost.localdomain>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
	<CAK5idxRe01Yh6-vdmfCtr5rS5AqkYX4OOG2=be9EmSDo0=Arbw@mail.gmail.com>
	<1324375710.3368.17.camel@localhost.localdomain>
Message-ID: <CAKmKYaCRa6QXQzJWTud_u4gkF+Y2AVsFPab3Dbu9eLth0YxdBg@mail.gmail.com>

On Tue, Dec 20, 2011 at 11:08, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> If this documentation is to be used by other python implementations,
>> then mentions of performance are outright harmful, since the
>> performance characteristics differ quite drastically. Written in C is
>> also not a part of specification as far as I know :)
>
> But that's basically the only reason to invoke the
> `operator.attrgetter("foo")` ugliness, instead of writing the explicit
> and obvious `lambda x: x.foo`.
> So not mentioning that it provides a speed benefit on CPython hides the
> primary reason for using the operator module. Overwise it's just a bunch
> of useless wrappers.

So the question is if the docs are Python documentation or CPython
documentation? On PyPy, I'm guessing lambda x: x.foo might (some day)
be just as fast as operator.attrgetter("foo").

> Implementation details *deserve* to be documented when they have an
> impact on behaviour (including performance / resource usage). Python is
> not just a platonic ideal. Do you suggest we also remove this part:
> http://docs.python.org/dev/library/io.html#performance
> ?

I agree that it's good to document some implementation details, but it
seems like the paragraph, as it was before, documented too many
details. It seems like a paragraph that mentions the specificity of
this aspect for CPython and omits the reference to C as the VM
implementation should be acceptable to all parties.

Cheers,

Dirkjan

From solipsis at pitrou.net  Tue Dec 20 11:22:28 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 20 Dec 2011 11:22:28 +0100
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <CAKmKYaCRa6QXQzJWTud_u4gkF+Y2AVsFPab3Dbu9eLth0YxdBg@mail.gmail.com>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
	<CAK5idxRe01Yh6-vdmfCtr5rS5AqkYX4OOG2=be9EmSDo0=Arbw@mail.gmail.com>
	<1324375710.3368.17.camel@localhost.localdomain>
	<CAKmKYaCRa6QXQzJWTud_u4gkF+Y2AVsFPab3Dbu9eLth0YxdBg@mail.gmail.com>
Message-ID: <20111220112228.320c389b@pitrou.net>

On Tue, 20 Dec 2011 11:14:15 +0100
Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
> On Tue, Dec 20, 2011 at 11:08, Antoine Pitrou <solipsis at pitrou.net> wrote:
> >> If this documentation is to be used by other python implementations,
> >> then mentions of performance are outright harmful, since the
> >> performance characteristics differ quite drastically. Written in C is
> >> also not a part of specification as far as I know :)
> >
> > But that's basically the only reason to invoke the
> > `operator.attrgetter("foo")` ugliness, instead of writing the explicit
> > and obvious `lambda x: x.foo`.
> > So not mentioning that it provides a speed benefit on CPython hides the
> > primary reason for using the operator module. Overwise it's just a bunch
> > of useless wrappers.
> 
> So the question is if the docs are Python documentation or CPython
> documentation? On PyPy, I'm guessing lambda x: x.foo might (some day)
> be just as fast as operator.attrgetter("foo").

I would expect it to be just as fast right now, although that's just
an uninformed guess. That said, CPython is both the dominant
implementation and the only one (AFAIR) to have stable 3.2 support.

> > Implementation details *deserve* to be documented when they have an
> > impact on behaviour (including performance / resource usage). Python is
> > not just a platonic ideal. Do you suggest we also remove this part:
> > http://docs.python.org/dev/library/io.html#performance
> > ?
> 
> I agree that it's good to document some implementation details, but it
> seems like the paragraph, as it was before, documented too many
> details. It seems like a paragraph that mentions the specificity of
> this aspect for CPython and omits the reference to C as the VM
> implementation should be acceptable to all parties.

Agreed. The original wording was poor since it mentioned C while what
is really significant is performance. There are probably Python
programmers who don't even know what C is.

Regards

Antoine.

From python-dev at masklinn.net  Tue Dec 20 11:25:32 2011
From: python-dev at masklinn.net (Xavier Morel)
Date: Tue, 20 Dec 2011 11:25:32 +0100
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <1324375710.3368.17.camel@localhost.localdomain>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
	<CAK5idxRe01Yh6-vdmfCtr5rS5AqkYX4OOG2=be9EmSDo0=Arbw@mail.gmail.com>
	<1324375710.3368.17.camel@localhost.localdomain>
Message-ID: <4E93C721-BD86-42C6-81A4-FD2ED92FD4C6@masklinn.net>

On 2011-12-20, at 11:08 , Antoine Pitrou wrote:
> But that's basically the only reason to invoke the
> `operator.attrgetter("foo")` ugliness, instead of writing the explicit
> and obvious `lambda x: x.foo`.
I don't agree with this, an attrgetter in the current namespace can be clearer than an explicit lambda in place, and more importantly when trying to fetch more than one attribute attrgetter is far superior to lambdas as far as I'm concerned.

I don't think I've ever seen `attrgetter` (or any of the other `operator` functions) advocated on basis of speed. This mention does not even exist in the Python 2 docs, which does not prevent people from using `operator`.

From tjreedy at udel.edu  Tue Dec 20 11:27:41 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 20 Dec 2011 05:27:41 -0500
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <20111220095149.6187cca8@pitrou.net>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
Message-ID: <jcpnv9$k38$1@dough.gmane.org>

On 12/20/2011 3:51 AM, Antoine Pitrou wrote:
> On Mon, 19 Dec 2011 22:42:43 +0100
> benjamin.peterson<python-checkins at python.org>  wrote:
>> http://hg.python.org/cpython/rev/d85efd73b0e1
>> changeset:   74088:d85efd73b0e1
>> branch:      3.2
>> parent:      74082:71e5a083f9b1
>> user:        Benjamin Peterson<benjamin at python.org>
>> date:        Mon Dec 19 16:41:11 2011 -0500
>> summary:
>>    don't mention implementation detail
>>
>> files:
>>    Doc/library/operator.rst |  10 +++++-----
>>    1 files changed, 5 insertions(+), 5 deletions(-)
>>
>>
>> diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst
>> --- a/Doc/library/operator.rst
>> +++ b/Doc/library/operator.rst
>> @@ -12,11 +12,11 @@
>>      from operator import itemgetter, iadd
>>
>>
>> -The :mod:`operator` module exports a set of functions implemented in C
>> -corresponding to the intrinsic operators of Python.  For example,
>> -``operator.add(x, y)`` is equivalent to the expression ``x+y``.  The function
>> -names are those used for special class methods; variants without leading and
>> -trailing ``__`` are also provided for convenience.
>
> I disagree with this change. Knowing that they are written in C is
> important when deciding to pass them to e.g. sort() or sorted(),
> because you know it will be faster than an arbitrary pure Python
> function.
>
> You could tag it as a "CPython implementation detail" if you want, or
> talk about performance rather than mention "C".

The existence of operator and the behavior of its functions is not a C 
implementation detail. So some change was needed. I think a programmer 
can assume that they are are written in the implementation language to 
be as fast as possible. I do not think we should load the manual with 
'In CPython, this is implemented in C" notes all over. For instance, 
there is nothing is the library manual that I can see that specifies 
that the builtin functions and types are written in C (for CPython). And 
I remember that Guido has asked that the manual not discuss big O()
behavior of the methods of builtin classes.

I so see a note like "The binascii module contains low-level functions 
written in C for greater speed that are used by the higher-level 
modules." But that should be revised somehow for the same reason as 
operator. But I don't this this is typical. The heapq module makes no 
mention of _heapq. I think all this sort of stuff belong in a separate 
CPython Notes.

Perhaps Python Setup and Usage could be renamed CPython Setup and Usage 
and expanded with more info on gc (ref counting), O() notes, Python vs. 
C code, etc. I presume that other implementations are not run with 
'python script.py', so the very first section is CPython specific 
anyway. In fact, I have the impression that for some *nix systems, that 
is CPython 2 specific.

-- 
Terry Jan Reedy


From solipsis at pitrou.net  Tue Dec 20 11:57:08 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 20 Dec 2011 11:57:08 +0100
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net> <jcpnv9$k38$1@dough.gmane.org>
Message-ID: <20111220115708.40bd882e@pitrou.net>

On Tue, 20 Dec 2011 05:27:41 -0500
Terry Reedy <tjreedy at udel.edu> wrote:
> >
> > I disagree with this change. Knowing that they are written in C is
> > important when deciding to pass them to e.g. sort() or sorted(),
> > because you know it will be faster than an arbitrary pure Python
> > function.
> >
> > You could tag it as a "CPython implementation detail" if you want, or
> > talk about performance rather than mention "C".
> 
> The existence of operator and the behavior of its functions is not a C 
> implementation detail.

And?

> I think a programmer 
> can assume that they are are written in the implementation language to 
> be as fast as possible.

Yeah, you can assume anything, and then get bitten by the fact that
e.g. OrderedDict is pure Python and thus massively slower than dict.
But at least you've achieved some platonic ideal of how documentation
should not talk about implementation details, which is great, right?

Why you think we should leave users in the dark rather than inform them
is beyond me. While we certainly should find a good compromise between
readability and completeness, and should certainly tweak the doc's
wording and layout adequately, removing useful information is nonsense.

> For instance, 
> there is nothing is the library manual that I can see that specifies 
> that the builtin functions and types are written in C (for CPython).

I guess everyone expects builtin functions and types to be
reasonably fast, regardless of the language or implementation.
(even though I did see some beginner code rewrite its own slow "list"
wrapper, so it's probably not an universal expectation)

> Perhaps Python Setup and Usage could be renamed CPython Setup and Usage 
> and expanded with more info on gc (ref counting), O() notes, Python vs. 
> C code, etc.

Really? That's a perfectly inappropriate place to talk about performance
details of *any* implementation.

Regards

Antoine.



From dirkjan at ochtman.nl  Tue Dec 20 12:24:58 2011
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Tue, 20 Dec 2011 12:24:58 +0100
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <jcpnv9$k38$1@dough.gmane.org>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net> <jcpnv9$k38$1@dough.gmane.org>
Message-ID: <CAKmKYaD_kFusBh7awduX1A92jdUBHZfgaEW68yAJu67jUUsdkQ@mail.gmail.com>

On Tue, Dec 20, 2011 at 11:27, Terry Reedy <tjreedy at udel.edu> wrote:
> And I remember that Guido has
> asked that the manual not discuss big O()
> behavior of the methods of builtin classes.

Do you know when/where he did that? It seems useful to know that on
CPython, list.insert(0, x) will become slow as the list grows... It
probably shouldn't be upfront, but O() hints for some of the core
stuff seems useful (though again, in some cases they should probably
be limited to CPython).

Cheers,

Dirkjan

From lukasz at langa.pl  Tue Dec 20 13:27:11 2011
From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=)
Date: Tue, 20 Dec 2011 13:27:11 +0100
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <20111220115708.40bd882e@pitrou.net>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net> <jcpnv9$k38$1@dough.gmane.org>
	<20111220115708.40bd882e@pitrou.net>
Message-ID: <09213AF2-38D1-447C-BD23-7E30D2C54EBA@langa.pl>


Wiadomo?? napisana przez Antoine Pitrou w dniu 20 gru 2011, o godz. 11:57:

> Why you think we should leave users in the dark rather than inform them
> is beyond me. While we certainly should find a good compromise between
> readability and completeness, and should certainly tweak the doc's
> wording and layout adequately, removing useful information is nonsense.

+1

-- 
Best regards,
?ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.

From lukasz at langa.pl  Tue Dec 20 13:29:01 2011
From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=)
Date: Tue, 20 Dec 2011 13:29:01 +0100
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <CAKmKYaD_kFusBh7awduX1A92jdUBHZfgaEW68yAJu67jUUsdkQ@mail.gmail.com>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net> <jcpnv9$k38$1@dough.gmane.org>
	<CAKmKYaD_kFusBh7awduX1A92jdUBHZfgaEW68yAJu67jUUsdkQ@mail.gmail.com>
Message-ID: <2F828189-2D3F-4E10-A70F-7DBECC8870C9@langa.pl>

Wiadomo?? napisana przez Dirkjan Ochtman w dniu 20 gru 2011, o godz. 12:24:

> On Tue, Dec 20, 2011 at 11:27, Terry Reedy <tjreedy at udel.edu> wrote:
>> And I remember that Guido has
>> asked that the manual not discuss big O()
>> behavior of the methods of builtin classes.
> 
> Do you know when/where he did that?

http://mail.python.org/pipermail/python-dev/2008-March/077511.html

-- 
Best regards,
?ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.


From jxo6948 at rit.edu  Tue Dec 20 13:29:31 2011
From: jxo6948 at rit.edu (John O'Connor)
Date: Tue, 20 Dec 2011 07:29:31 -0500
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <CAKmKYaD_kFusBh7awduX1A92jdUBHZfgaEW68yAJu67jUUsdkQ@mail.gmail.com>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net> <jcpnv9$k38$1@dough.gmane.org>
	<CAKmKYaD_kFusBh7awduX1A92jdUBHZfgaEW68yAJu67jUUsdkQ@mail.gmail.com>
Message-ID: <CABCbifUU5FbZjEjdPYf_HPPgaWAK0zYEy=3_itQuVGm3+EoPXQ@mail.gmail.com>

On Tue, Dec 20, 2011 at 6:24 AM, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
> On Tue, Dec 20, 2011 at 11:27, Terry Reedy <tjreedy at udel.edu> wrote:
>> And I remember that Guido has
>> asked that the manual not discuss big O()
>> behavior of the methods of builtin classes.
>
> Do you know when/where he did that? It seems useful to know that on
> CPython, list.insert(0, x) will become slow as the list grows... It
> probably shouldn't be upfront, but O() hints for some of the core
> stuff seems useful (though again, in some cases they should probably
> be limited to CPython).

I think the question of the day is whether the documentation is
targeting those who wish to have an understanding of what is happening
under the hood, or those that want to take such details for granted. I
much prefer the little notes and performance hints.

- John

From benjamin at python.org  Tue Dec 20 16:57:06 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Tue, 20 Dec 2011 10:57:06 -0500
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <20111220095149.6187cca8@pitrou.net>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
Message-ID: <CAPZV6o__qDZdSSD+AH=PvvW7Yt8fJ_A6saKYhKWHH0jruuDzJg@mail.gmail.com>

2011/12/20 Antoine Pitrou <solipsis at pitrou.net>:
> On Mon, 19 Dec 2011 22:42:43 +0100
> benjamin.peterson <python-checkins at python.org> wrote:
>> http://hg.python.org/cpython/rev/d85efd73b0e1
>> changeset: ? 74088:d85efd73b0e1
>> branch: ? ? ?3.2
>> parent: ? ? ?74082:71e5a083f9b1
>> user: ? ? ? ?Benjamin Peterson <benjamin at python.org>
>> date: ? ? ? ?Mon Dec 19 16:41:11 2011 -0500
>> summary:
>> ? don't mention implementation detail
>>
>> files:
>> ? Doc/library/operator.rst | ?10 +++++-----
>> ? 1 files changed, 5 insertions(+), 5 deletions(-)
>>
>>
>> diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst
>> --- a/Doc/library/operator.rst
>> +++ b/Doc/library/operator.rst
>> @@ -12,11 +12,11 @@
>> ? ? from operator import itemgetter, iadd
>>
>>
>> -The :mod:`operator` module exports a set of functions implemented in C
>> -corresponding to the intrinsic operators of Python. ?For example,
>> -``operator.add(x, y)`` is equivalent to the expression ``x+y``. ?The function
>> -names are those used for special class methods; variants without leading and
>> -trailing ``__`` are also provided for convenience.
>
> I disagree with this change. Knowing that they are written in C is
> important when deciding to pass them to e.g. sort() or sorted(),
> because you know it will be faster than an arbitrary pure Python
> function.

In that case, I would rather speak of "fast" functions rather than
"implemented in C" functions (a la the itertools docs). Would that be
acceptable?



-- 
Regards,
Benjamin

From solipsis at pitrou.net  Tue Dec 20 17:10:50 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 20 Dec 2011 17:10:50 +0100
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <CAPZV6o__qDZdSSD+AH=PvvW7Yt8fJ_A6saKYhKWHH0jruuDzJg@mail.gmail.com>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
	<CAPZV6o__qDZdSSD+AH=PvvW7Yt8fJ_A6saKYhKWHH0jruuDzJg@mail.gmail.com>
Message-ID: <1324397450.3368.25.camel@localhost.localdomain>

Le mardi 20 d?cembre 2011 ? 10:57 -0500, Benjamin Peterson a ?crit :
> 2011/12/20 Antoine Pitrou <solipsis at pitrou.net>:
> > On Mon, 19 Dec 2011 22:42:43 +0100
> > benjamin.peterson <python-checkins at python.org> wrote:
> >> http://hg.python.org/cpython/rev/d85efd73b0e1
> >> changeset:   74088:d85efd73b0e1
> >> branch:      3.2
> >> parent:      74082:71e5a083f9b1
> >> user:        Benjamin Peterson <benjamin at python.org>
> >> date:        Mon Dec 19 16:41:11 2011 -0500
> >> summary:
> >>   don't mention implementation detail
> >>
> >> files:
> >>   Doc/library/operator.rst |  10 +++++-----
> >>   1 files changed, 5 insertions(+), 5 deletions(-)
> >>
> >>
> >> diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst
> >> --- a/Doc/library/operator.rst
> >> +++ b/Doc/library/operator.rst
> >> @@ -12,11 +12,11 @@
> >>     from operator import itemgetter, iadd
> >>
> >>
> >> -The :mod:`operator` module exports a set of functions implemented in C
> >> -corresponding to the intrinsic operators of Python.  For example,
> >> -``operator.add(x, y)`` is equivalent to the expression ``x+y``.  The function
> >> -names are those used for special class methods; variants without leading and
> >> -trailing ``__`` are also provided for convenience.
> >
> > I disagree with this change. Knowing that they are written in C is
> > important when deciding to pass them to e.g. sort() or sorted(),
> > because you know it will be faster than an arbitrary pure Python
> > function.
> 
> In that case, I would rather speak of "fast" functions rather than
> "implemented in C" functions (a la the itertools docs). Would that be
> acceptable?

Definitely.

Regards

Antoine.



From benjamin at python.org  Tue Dec 20 17:15:12 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Tue, 20 Dec 2011 11:15:12 -0500
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <1324397450.3368.25.camel@localhost.localdomain>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
	<CAPZV6o__qDZdSSD+AH=PvvW7Yt8fJ_A6saKYhKWHH0jruuDzJg@mail.gmail.com>
	<1324397450.3368.25.camel@localhost.localdomain>
Message-ID: <CAPZV6o90V5TCi=zbdd6TANa+v5Z7JM2inRFWaKy_FmzrZi84sQ@mail.gmail.com>

2011/12/20 Antoine Pitrou <solipsis at pitrou.net>:
> Le mardi 20 d?cembre 2011 ? 10:57 -0500, Benjamin Peterson a ?crit :
>> In that case, I would rather speak of "fast" functions rather than
>> "implemented in C" functions (a la the itertools docs). Would that be
>> acceptable?
>
> Definitely.

Done.



-- 
Regards,
Benjamin

From techtonik at gmail.com  Tue Dec 20 19:40:28 2011
From: techtonik at gmail.com (anatoly techtonik)
Date: Tue, 20 Dec 2011 21:40:28 +0300
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <87y5u9jhfm.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<jck5p8$i3g$1@dough.gmane.org>
	<87y5u9jhfm.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAPkN8xKZJH=Y_S1WaSOeywwgsF-kUDOQeBB+SQv5s17gYbUwww@mail.gmail.com>

On Mon, Dec 19, 2011 at 7:47 AM, Stephen J. Turnbull <stephen at xemacs.org>wrote:

> Fernando Perez writes:
>
>  > Apology for the advertising,
>
> If there's any apologizing to be done, it's on Anatoly's part.  Your
> post was short, to the point, information-packed, and should put a big
> fat open-centered ideographic full stop period to this thread.


Fernando clearly showed that IPython rocks, because CPython suxx. I don't
think anybody should apologize for the intention to fix this by enhancing
CPython, so as a python-dev subscriber you should be ashamed of yourself
for this proposal already. ;)


Thanks everyone else for explaining the problem with current
implementation. I'll post a follow-up as soon as I have a time to wrap my
head around the details and see for myself why the IPython solution is so
hard to implement.
-- 
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111220/15b6fbde/attachment.html>

From victor.stinner at haypocalc.com  Tue Dec 20 20:26:51 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 20 Dec 2011 20:26:51 +0100
Subject: [Python-Dev] Difference between PyUnicode_IS_ASCII and
 PyUnicode_IS_COMPACT_ASCII ?
In-Reply-To: <20111220095440.43ff9f41@pitrou.net>
References: <20111220095440.43ff9f41@pitrou.net>
Message-ID: <4EF0E17B.6010503@haypocalc.com>

On 20/12/2011 09:54, Antoine Pitrou wrote:
>
> Hello,
>
> The include file (unicodeobject.h) seems to imply that some pure ASCII
> strings can be non-compact, but I don't understand how that can happen.

If you create a string from Py_UNICODE* or wchar_t* (using the legacy 
API), PyUnicode_READY() may create a non-compact but ASCII string.

Such string would be in the following state (extract of unicodeobject.h):

        - legacy string, ready:

          * structure = PyUnicodeObject structure
          * test: !PyUnicode_IS_COMPACT(op) && kind != PyUnicode_WCHAR_KIND
          * kind = PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or
            PyUnicode_4BYTE_KIND
          * compact = 0
          * ready = 1
          * data.any is not NULL
          * utf8 is shared and utf8_length = length with data.any if 
ascii = 1
          * utf8_length = 0 if utf8 is NULL

> Besides, the following comment also seems wrong:
>
>         - compact:
>
>           * structure = PyCompactUnicodeObject
>           * test: PyUnicode_IS_ASCII(op)&&  !PyUnicode_IS_COMPACT(op)

I added the "test" lines recently because I always forget how to get the 
structure type. The correct test should be:

        - compact:

          * structure = PyCompactUnicodeObject
          * test: PyUnicode_IS_COMPACT(op) && !PyUnicode_IS_ASCII(op)

Victor

From fijall at gmail.com  Tue Dec 20 21:22:04 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 20 Dec 2011 22:22:04 +0200
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <CAKmKYaCRa6QXQzJWTud_u4gkF+Y2AVsFPab3Dbu9eLth0YxdBg@mail.gmail.com>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
	<CAK5idxRe01Yh6-vdmfCtr5rS5AqkYX4OOG2=be9EmSDo0=Arbw@mail.gmail.com>
	<1324375710.3368.17.camel@localhost.localdomain>
	<CAKmKYaCRa6QXQzJWTud_u4gkF+Y2AVsFPab3Dbu9eLth0YxdBg@mail.gmail.com>
Message-ID: <CAK5idxQZ51_UkLnAuWHRpg3ZezuEdqyP=R0qUcUxjeXscy6qGw@mail.gmail.com>

On Tue, Dec 20, 2011 at 12:14 PM, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
> On Tue, Dec 20, 2011 at 11:08, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>> If this documentation is to be used by other python implementations,
>>> then mentions of performance are outright harmful, since the
>>> performance characteristics differ quite drastically. Written in C is
>>> also not a part of specification as far as I know :)
>>
>> But that's basically the only reason to invoke the
>> `operator.attrgetter("foo")` ugliness, instead of writing the explicit
>> and obvious `lambda x: x.foo`.
>> So not mentioning that it provides a speed benefit on CPython hides the
>> primary reason for using the operator module. Overwise it's just a bunch
>> of useless wrappers.
>
> So the question is if the docs are Python documentation or CPython
> documentation? On PyPy, I'm guessing lambda x: x.foo might (some day)
> be just as fast as operator.attrgetter("foo").
>

as of now lambda is much faster on pypy for a constant name (there is
not a good reason why exactly attrgetter is slower, but it somehow
losts the fact that name is constant if it is).

I'm in general fine with saying that this is either Python
documentation or CPython documentation, but leaving this intermingled
has caused us quite some headaches in the past. For example using
attrgetter and map rather than just writing a loop is slower on PyPy,
so a knowledge that it's *fast* in the operator module is misleading
*in Python*. How about we somehow mark that all python documentation
when it talks about performance, it talks about CPython performance?

Cheers,
fijal

From tjreedy at udel.edu  Tue Dec 20 22:57:13 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 20 Dec 2011 16:57:13 -0500
Subject: [Python-Dev] cpython (3.2): don't mention implementation detail
In-Reply-To: <CAPZV6o90V5TCi=zbdd6TANa+v5Z7JM2inRFWaKy_FmzrZi84sQ@mail.gmail.com>
References: <E1Rckyp-0002W0-7n@dinsdale.python.org>
	<20111220095149.6187cca8@pitrou.net>
	<CAPZV6o__qDZdSSD+AH=PvvW7Yt8fJ_A6saKYhKWHH0jruuDzJg@mail.gmail.com>
	<1324397450.3368.25.camel@localhost.localdomain>
	<CAPZV6o90V5TCi=zbdd6TANa+v5Z7JM2inRFWaKy_FmzrZi84sQ@mail.gmail.com>
Message-ID: <jcr0c6$toh$1@dough.gmane.org>

On 12/20/2011 11:15 AM, Benjamin Peterson wrote:
> 2011/12/20 Antoine Pitrou<solipsis at pitrou.net>:
>> Le mardi 20 d?cembre 2011 ? 10:57 -0500, Benjamin Peterson a ?crit :
>>> In that case, I would rather speak of "fast" functions rather than
>>> "implemented in C" functions (a la the itertools docs). Would that be
>>> acceptable?
>>
>> Definitely.
>
> Done.

I like what you did too.

-- 
Terry Jan Reedy



From stephen at xemacs.org  Wed Dec 21 03:14:05 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 21 Dec 2011 11:14:05 +0900
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <CAPkN8xKZJH=Y_S1WaSOeywwgsF-kUDOQeBB+SQv5s17gYbUwww@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<jck5p8$i3g$1@dough.gmane.org>
	<87y5u9jhfm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAPkN8xKZJH=Y_S1WaSOeywwgsF-kUDOQeBB+SQv5s17gYbUwww@mail.gmail.com>
Message-ID: <87iplak6xe.fsf@uwakimon.sk.tsukuba.ac.jp>

anatoly techtonik writes:

 > Fernando clearly showed that IPython rocks, because CPython suxx.

<sigh/>

No, IPython rocks because it focuses on doing one thing well:
providing an interactive environment that takes advantage of the many
features that Python provides in support.  CPython should do the same:
specifically, focus on the *language* that we all consider excellent
but still can be improved, and on the (still) leading implementation
of the language and the stdlib.[1]

 > so as a python-dev subscriber you should be ashamed of yourself for
 > this proposal already. ;)

ROTFLMAO!  No, I still think you're making an awfully big deal of
something that doesn't need fixing, and I wish you would stop.

Footnotes: 
[1]  Note that this *is* *one* task, because CPython has chosen a
definition of "language excellence" that includes prototype
implementation of proposed language features and "batteries
included".


From chris at simplistix.co.uk  Wed Dec 21 08:16:06 2011
From: chris at simplistix.co.uk (Chris Withers)
Date: Wed, 21 Dec 2011 07:16:06 +0000
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <4EF187A2.6070909@simplistix.co.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
Message-ID: <4EF187B6.3080406@simplistix.co.uk>

What's the python-dev view on this?

-------- Original Message --------
Subject: Anyone still using Python 2.5?
Date: Wed, 21 Dec 2011 07:15:46 +0000
From: Chris Withers <chris at simplistix.co.uk>
To: Python List <python-list at python.org>, 
"testing-in-python at lists.idyll.org" <testing-in-python at lists.idyll.org>, 
simplistix at googlegroups.com

Hi All,

What's the general consensus on supporting Python 2.5 nowadays?

Do people still have to use this in commercial environments or is
everyone on 2.6+ nowadays?

I'm finally getting some continuous integration set up for my packages
and it's highlighting some 2.5 compatibility issues. I'm wondering
whether to fix those (lots of ugly "from __future__ import
with_statement" everywhere) or just to drop Python 2.5 support.

What do people feel?

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From python.leojay at gmail.com  Wed Dec 21 09:50:46 2011
From: python.leojay at gmail.com (Leo Jay)
Date: Wed, 21 Dec 2011 16:50:46 +0800
Subject: [Python-Dev] Cannot use multiprocessing and zip together on windows
In-Reply-To: <CANMCpS5HxTDc8MnF9m=g6M1RuAHcoWV8CacgKdmJO-CtvRTaFQ@mail.gmail.com>
References: <CANMCpS5HxTDc8MnF9m=g6M1RuAHcoWV8CacgKdmJO-CtvRTaFQ@mail.gmail.com>
Message-ID: <CANMCpS7cpCSp87gRgrLGUjwRh8M0c+TN2oaRSvmM8o_n10iqGQ@mail.gmail.com>

Hi All,

I posted this several days ago in python mailing list but got no response
and I think it might be a bug, so I post it here. Apologize if it's
not appropriate.

I have a file p.zip, there is a __main__.py in it, and the content of
__main__.py is:

from multiprocessing import Process
import os

def f():
? print 'in f, pid:', os.getpid()

if __name__ == '__main__':
? print 'pid:', os.getpid()
? p = Process(target=f)
? p.start()
? p.join()


On linux, I can get expected result for running "python p.zip"
But on windows xp, I got:

Traceback (most recent call last):
?File "<string>", line 1, in <module>
?File "C:\python27\lib\multiprocessing\forking.py", line 346, in main
? prepare(preparation_data)
?File "C:\python27\lib\multiprocessing\forking.py", line 454, in prepare
? assert main_name not in sys.modules, main_name
AssertionError: __main__

It seems that the situation described here is similar:
http://bugs.python.org/issue10128

But the patch doesn't work for me.

Anybody knows how to fix this?
Thanks.

--
Best Regards,
Leo Jay

From dirkjan at ochtman.nl  Wed Dec 21 09:55:34 2011
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Wed, 21 Dec 2011 09:55:34 +0100
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <4EF187B6.3080406@simplistix.co.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
Message-ID: <CAKmKYaBGTnt_S0G9=Zpf7k_nqo4YArAOF31ZB03mQpzN9hC49g@mail.gmail.com>

On Wed, Dec 21, 2011 at 08:16, Chris Withers <chris at simplistix.co.uk> wrote:
> What's the general consensus on supporting Python 2.5 nowadays?
>
> Do people still have to use this in commercial environments or is
> everyone on 2.6+ nowadays?

This seems rather off-topic for python-dev.

FWIW, on Gentoo we're just now getting to dropping 2.4, so we'll
support 2.5 quite a bit longer. That's also the tendency I see from
the ecosystem, at least insofar as I notice. On the other hand, we've
had 2.7 as the default python on our stable branch since March 2011. I
also know Mercurial is still supporting 2.4 (they tend to be
conservative about dropping support for old releases).

Cheers,

Dirkjan

From neologix at free.fr  Wed Dec 21 10:42:07 2011
From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Wed, 21 Dec 2011 10:42:07 +0100
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <4EF187B6.3080406@simplistix.co.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
Message-ID: <CAH_1eM0Eve_=ZkN=nbhs1b_eY3mX5JX=8pBSQzszJ26bVZAibg@mail.gmail.com>

> Do people still have to use this in commercial environments or is
> everyone on 2.6+ nowadays?

RHEL 5.7 ships with Python 2.4.3. So no, not everybody is on 2.6+
today, and this won't happen before a couple years.

cf

From phd at phdru.name  Wed Dec 21 11:29:15 2011
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 21 Dec 2011 14:29:15 +0400
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <4EF187B6.3080406@simplistix.co.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
Message-ID: <20111221102915.GA18354@iskra.aviel.ru>

On Wed, Dec 21, 2011 at 07:16:06AM +0000, Chris Withers wrote:
> What's the general consensus on supporting Python 2.5 nowadays?
> 
> Do people still have to use this in commercial environments

   I have to use it. There is a rather large and complex intranet site
with both 32- and 64-bit versions of Python and libraries, and there are
about 70 copies of it at client sites so it'd be very hard to recompile
and adapt it to Python 2.6, test and upgrade all clients.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From barry at python.org  Wed Dec 21 13:42:45 2011
From: barry at python.org (Barry Warsaw)
Date: Wed, 21 Dec 2011 07:42:45 -0500
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <4EF187B6.3080406@simplistix.co.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
Message-ID: <20111221074245.6652c314@resist.wooz.org>

On Dec 21, 2011, at 07:16 AM, Chris Withers wrote:

>What's the general consensus on supporting Python 2.5 nowadays?

FWIW, Ubuntu dropped 2.5 quite a while ago.  The next LTS (long term support)
release in April 2012 will have only Python 2.7 (and 3.2).  The currently
in-development next Debian release currently has only Python 2.6, 2.7, and 3.2
with 2.7 as the default.

For my own code, Python 2.6 is the minimum, and I'm seeing more upstream
libraries target 2.6 as a minimum also (e.g. dbus-python).  When projects say
they still need to target older Pythons, RHEL support is usually cited as the
reason.

Cheers,
-Barry

From fuzzyman at voidspace.org.uk  Wed Dec 21 14:07:14 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Wed, 21 Dec 2011 13:07:14 +0000
Subject: [Python-Dev] Anyone still using Python 2.5?
In-Reply-To: <20111221074245.6652c314@resist.wooz.org>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<20111221074245.6652c314@resist.wooz.org>
Message-ID: <6E3347AE-961F-4744-82D6-708BAE44A1E4@voidspace.org.uk>


On 21 Dec 2011, at 12:42, Barry Warsaw wrote:

> On Dec 21, 2011, at 07:16 AM, Chris Withers wrote:
> 
>> What's the general consensus on supporting Python 2.5 nowadays?
> 
> FWIW, Ubuntu dropped 2.5 quite a while ago.  The next LTS (long term support)
> release in April 2012 will have only Python 2.7 (and 3.2).  The currently
> in-development next Debian release currently has only Python 2.6, 2.7, and 3.2
> with 2.7 as the default.
> 
> For my own code, Python 2.6 is the minimum, and I'm seeing more upstream
> libraries target 2.6 as a minimum also (e.g. dbus-python).  When projects say
> they still need to target older Pythons, RHEL support is usually cited as the
> reason.


For "production work" I've been on 2.6 for a while and will soon be switching to 2.7 (I do my development on 2.7).

For my libraries I'm still supporting 2.4. The *major* syntax feature you lose by targeting 2.4 is the with statement, so it will be nice to drop 2.4 support. The next releases of mock and unittest2 will still support 2.4, but the ones after that will be 2.5+.

Thankfully tox makes testing across multiple versions (and implementations) easy.

All the best,

Michael Foord

> 
> Cheers,
> -Barry
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
> 


--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html






From solipsis at pitrou.net  Wed Dec 21 14:20:44 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 21 Dec 2011 14:20:44 +0100
Subject: [Python-Dev] Anyone still using Python 2.5?
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
Message-ID: <20111221142044.53382f94@pitrou.net>

On Wed, 21 Dec 2011 07:16:06 +0000
Chris Withers <chris at simplistix.co.uk> wrote:
> What's the python-dev view on this?

Python 2.5 is not supported by *us* anymore (*). Anyone still using it
therefore relies on their OS vendor to apply potential security
patches and other important fixes.

Library authors can of course choose to still support it. I wouldn't
care personally. I'm of the opinion that people who (by their choice
of OS) have a preference for legacy software shouldn't ask for the
latest versions of Python libraries.


(*) From http://www.python.org/download/releases/2.5.6/ :

?This release is the final release of Python 2.5; under the current
release policy, no security issues in Python 2.5 will be fixed anymore.?

Regards

Antoine.



From jwzxgo at gmail.com  Wed Dec 21 14:31:25 2011
From: jwzxgo at gmail.com (wang tiezhen)
Date: Wed, 21 Dec 2011 14:31:25 +0100
Subject: [Python-Dev] Anyone still using Python 2.5?
In-Reply-To: <6E3347AE-961F-4744-82D6-708BAE44A1E4@voidspace.org.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<20111221074245.6652c314@resist.wooz.org>
	<6E3347AE-961F-4744-82D6-708BAE44A1E4@voidspace.org.uk>
Message-ID: <CAN0E=N8zFoJVecqPSHE6ifOuav+PGVQ74SU9fs8RDPYg1FAkrw@mail.gmail.com>

I am still working on projects based on Python2.4  in commercial
environments (limitation of OS: Solaris 5.10). And I don't think this will
be changed soon..

2011/12/21 Michael Foord <fuzzyman at voidspace.org.uk>

>
> On 21 Dec 2011, at 12:42, Barry Warsaw wrote:
>
> > On Dec 21, 2011, at 07:16 AM, Chris Withers wrote:
> >
> >> What's the general consensus on supporting Python 2.5 nowadays?
> >
> > FWIW, Ubuntu dropped 2.5 quite a while ago.  The next LTS (long term
> support)
> > release in April 2012 will have only Python 2.7 (and 3.2).  The currently
> > in-development next Debian release currently has only Python 2.6, 2.7,
> and 3.2
> > with 2.7 as the default.
> >
> > For my own code, Python 2.6 is the minimum, and I'm seeing more upstream
> > libraries target 2.6 as a minimum also (e.g. dbus-python).  When
> projects say
> > they still need to target older Pythons, RHEL support is usually cited
> as the
> > reason.
>
>
> For "production work" I've been on 2.6 for a while and will soon be
> switching to 2.7 (I do my development on 2.7).
>
> For my libraries I'm still supporting 2.4. The *major* syntax feature you
> lose by targeting 2.4 is the with statement, so it will be nice to drop 2.4
> support. The next releases of mock and unittest2 will still support 2.4,
> but the ones after that will be 2.5+.
>
> Thankfully tox makes testing across multiple versions (and
> implementations) easy.
>
> All the best,
>
> Michael Foord
>
> >
> > Cheers,
> > -Barry
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > http://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
> >
>
>
> --
> http://www.voidspace.org.uk/
>
>
> May you do good and not evil
> May you find forgiveness for yourself and forgive others
> May you share freely, never taking more than you give.
> -- the sqlite blessing
> http://www.sqlite.org/different.html
>
>
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/jwzxgo%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111221/b2a68a61/attachment.html>

From charlesc-lists-python-dev2 at pyropus.ca  Wed Dec 21 14:35:34 2011
From: charlesc-lists-python-dev2 at pyropus.ca (Charles Cazabon)
Date: Wed, 21 Dec 2011 07:35:34 -0600
Subject: [Python-Dev] Anyone still using Python 2.5?
In-Reply-To: <6E3347AE-961F-4744-82D6-708BAE44A1E4@voidspace.org.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<20111221074245.6652c314@resist.wooz.org>
	<6E3347AE-961F-4744-82D6-708BAE44A1E4@voidspace.org.uk>
Message-ID: <20111221133534.GA27321@pyropus.ca>

Michael Foord <fuzzyman at voidspace.org.uk> wrote:
> On 21 Dec 2011, at 12:42, Barry Warsaw wrote:
> > 
> > FWIW, Ubuntu dropped 2.5 quite a while ago.  The next LTS (long term
> > support) release in April 2012 will have only Python 2.7 (and 3.2). 

True, but 2.5 is still current on Hardy, an LTS release that is officially
supported until April 2013.  Lots of places still use 2.5 on Hardy (or on
Lucid, the LTS release after Hardy, though they have to get it from the
deadsnakes repository as its not the normal version on Lucid).  

My workplace uses 2.5 for a lot of things, but is slowly transitioning to 2.6.

> For "production work" I've been on 2.6 for a while and will soon be
> switching to 2.7 (I do my development on 2.7).
> 
> For my libraries I'm still supporting 2.4.

My own personal software generally tries to stay compatible further back.
getmail is used on lots of little network appliances and such that don't
necessarily run a current OS, so getmail v4 targets 2.3.3 and up.
If I'm writing something new today, I usually assume 2.6 and up.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

From techtonik at gmail.com  Wed Dec 21 15:26:05 2011
From: techtonik at gmail.com (anatoly techtonik)
Date: Wed, 21 Dec 2011 17:26:05 +0300
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <4EF187B6.3080406@simplistix.co.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
Message-ID: <CAPkN8xLkFXOTwjFL+dg2Q+-YN2k39hxnm+_Byes3BJ6kMC9ipQ@mail.gmail.com>

I believe most AppEngine applications in Python are still using 2.5
run-time. So are development boxes for these applications. It may take
another year or two for the transition.
-- 
anatoly t.


On Wed, Dec 21, 2011 at 10:16 AM, Chris Withers <chris at simplistix.co.uk>wrote:

> What's the python-dev view on this?
>
> -------- Original Message --------
> Subject: Anyone still using Python 2.5?
> Date: Wed, 21 Dec 2011 07:15:46 +0000
> From: Chris Withers <chris at simplistix.co.uk>
> To: Python List <python-list at python.org>, "testing-in-python at lists.**
> idyll.org <testing-in-python at lists.idyll.org>" <testing-in-python at lists.**
> idyll.org <testing-in-python at lists.idyll.org>>,
> simplistix at googlegroups.com
>
> Hi All,
>
> What's the general consensus on supporting Python 2.5 nowadays?
>
> Do people still have to use this in commercial environments or is
> everyone on 2.6+ nowadays?
>
> I'm finally getting some continuous integration set up for my packages
> and it's highlighting some 2.5 compatibility issues. I'm wondering
> whether to fix those (lots of ugly "from __future__ import
> with_statement" everywhere) or just to drop Python 2.5 support.
>
> What do people feel?
>
> cheers,
>
> Chris
>
> --
> Simplistix - Content Management, Batch Processing & Python Consulting
>            - http://www.simplistix.co.uk
> ______________________________**_________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
> techtonik%40gmail.com<http://mail.python.org/mailman/options/python-dev/techtonik%40gmail.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111221/b36a9947/attachment.html>

From ncoghlan at gmail.com  Wed Dec 21 15:28:15 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 22 Dec 2011 00:28:15 +1000
Subject: [Python-Dev] Cannot use multiprocessing and zip together on
	windows
In-Reply-To: <CANMCpS7cpCSp87gRgrLGUjwRh8M0c+TN2oaRSvmM8o_n10iqGQ@mail.gmail.com>
References: <CANMCpS5HxTDc8MnF9m=g6M1RuAHcoWV8CacgKdmJO-CtvRTaFQ@mail.gmail.com>
	<CANMCpS7cpCSp87gRgrLGUjwRh8M0c+TN2oaRSvmM8o_n10iqGQ@mail.gmail.com>
Message-ID: <CADiSq7fd0enUXkEz4j2bGQPFBJj_sVqAreX9guSowyjRvJ9Lsg@mail.gmail.com>

On Wed, Dec 21, 2011 at 6:50 PM, Leo Jay <python.leojay at gmail.com> wrote:
> It seems that the situation described here is similar:
> http://bugs.python.org/issue10128
>
> But the patch doesn't work for me.
>
> Anybody knows how to fix this?

Try the patch from http://bugs.python.org/issue10845 (the one on
#10128 only partially addresses the problem - a similarly incomplete
answer was our first attempt at fixing this for 3.2)

I've added a note to the issue you linked indicating that the change
should also be backported to the 2.7 maintenance branch. (IIRC, the
reason backporting to 2.7 didn't come up originally is that the only
reason we found the bad interaction in 3.2 was because we added
test.__main__, so the regression test suite can be executed via
"python -m test". At the time, it didn't occur to me, or anyone else
involved, that the underlying bug also affected 2.7).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From dmalcolm at redhat.com  Wed Dec 21 18:11:58 2011
From: dmalcolm at redhat.com (David Malcolm)
Date: Wed, 21 Dec 2011 12:11:58 -0500
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <CAH_1eM0Eve_=ZkN=nbhs1b_eY3mX5JX=8pBSQzszJ26bVZAibg@mail.gmail.com>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<CAH_1eM0Eve_=ZkN=nbhs1b_eY3mX5JX=8pBSQzszJ26bVZAibg@mail.gmail.com>
Message-ID: <1324487519.2461.1.camel@surprise>

On Wed, 2011-12-21 at 10:42 +0100, Charles-Fran?ois Natali wrote:
> > Do people still have to use this in commercial environments or is
> > everyone on 2.6+ nowadays?
> 
> RHEL 5.7 ships with Python 2.4.3. So no, not everybody is on 2.6+
> today, and this won't happen before a couple years.

(and RHEL 4.9 with Python 2.3.4, fwiw)



From skippy.hammond at gmail.com  Thu Dec 22 02:25:27 2011
From: skippy.hammond at gmail.com (Mark Hammond)
Date: Thu, 22 Dec 2011 12:25:27 +1100
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <4EF187B6.3080406@simplistix.co.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
Message-ID: <4EF28707.405@gmail.com>

FWIW, the most recent version of pywin32 has the following download 
counts (rounded to the nearest thousand)

Version 32bit       64bit
-------------------------
3.2 -  75,000       9,000
3.1 -   4,000       1,000
2.7 - 126,000      16,000
2.6 -  46,000       6,000
2.5 -  21,000         n/a
2.4 -   3,000         n/a
2.3 -   1,000         n/a

So ISTM that 2.5 isn't hugely popular these days, but also isn't 
insignificant.  It probably means I could "safely" drop 2.3 and 2.4 
support though...

Mark

On 21/12/2011 6:16 PM, Chris Withers wrote:
> What's the python-dev view on this?
>
> -------- Original Message --------
> Subject: Anyone still using Python 2.5?
> Date: Wed, 21 Dec 2011 07:15:46 +0000
> From: Chris Withers <chris at simplistix.co.uk>
> To: Python List <python-list at python.org>,
> "testing-in-python at lists.idyll.org" <testing-in-python at lists.idyll.org>,
> simplistix at googlegroups.com
>
> Hi All,
>
> What's the general consensus on supporting Python 2.5 nowadays?
>
> Do people still have to use this in commercial environments or is
> everyone on 2.6+ nowadays?
>
> I'm finally getting some continuous integration set up for my packages
> and it's highlighting some 2.5 compatibility issues. I'm wondering
> whether to fix those (lots of ugly "from __future__ import
> with_statement" everywhere) or just to drop Python 2.5 support.
>
> What do people feel?
>
> cheers,
>
> Chris
>


From greg at krypto.org  Thu Dec 22 02:41:17 2011
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 21 Dec 2011 17:41:17 -0800
Subject: [Python-Dev] Adding features to 2to3... cpython/default right? can
	I backport to 2.7?
Message-ID: <CAGE7PN+WxJQi+1wrGuA-4zbEZf81ivGXiZLUnZCy4Q3HfjCsew@mail.gmail.com>

I have some features I need to add to lib2to3 to make it more useful for
our purposes at work supporting our massive code base in a Python 2 to 3
transition. Which tree should I develop these and check these into?

cpython/default?

Can I backport this to 3.2 and 2.7?  It counts as a feature addition which
is normally a no-no for backports.  But in this case I'm enhancing 2to3
which is a useful tool.

No big deal to me _personally_ if I can't backport from 3.3
(cpython/default) as I'd apply the changes to our copy at work internally
but it seems wise to me for us to keep enhancing and improving 2to3 in a
Python 2.x/3.x release independent manner to make people's conversions
easier.

The features I want to commit (all pretty easy additions) are command line
flag / constructor option support for:
  1) writing output files to a different directory tree instead of
overwriting the input file.
  2) modifying the output filename by altering the suffix (.py -> .py3 for
example)
  3) always writing output files even if there were no changes to make
(useful in combination with the above to effectively act as a "copy library
X to this directory converting it to python 3 syntax along the way").

The old http://hg.python.org/2to3/ tree exists but it really looks like an
out of date version.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111221/1ccdf00e/attachment.html>

From fuzzyman at voidspace.org.uk  Thu Dec 22 02:49:37 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Thu, 22 Dec 2011 01:49:37 +0000
Subject: [Python-Dev] Anyone still using Python 2.5?
In-Reply-To: <4EF28707.405@gmail.com>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk> <4EF28707.405@gmail.com>
Message-ID: <5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk>


On 22 Dec 2011, at 01:25, Mark Hammond wrote:

> FWIW, the most recent version of pywin32 has the following download counts (rounded to the nearest thousand)
> 
> Version 32bit       64bit
> -------------------------
> 3.2 -  75,000       9,000
> 3.1 -   4,000       1,000
> 2.7 - 126,000      16,000
> 2.6 -  46,000       6,000
> 2.5 -  21,000         n/a
> 2.4 -   3,000         n/a
> 2.3 -   1,000         n/a
> 
> So ISTM that 2.5 isn't hugely popular these days, but also isn't insignificant.  It probably means I could "safely" drop 2.3 and 2.4 support though...
> 


These figures can't possibly be true. No-one is using Python 3 yet. ;-)

FWIW I heard a few days ago about a UK government department, HMGCC (Her Majesty's Government Communication Centre - based in Milton Keynes), who use Python for research projects. They switched to using Python 3 a while ago.

All the best,

Michael Foord

> Mark
> 
> On 21/12/2011 6:16 PM, Chris Withers wrote:
>> What's the python-dev view on this?
>> 
>> -------- Original Message --------
>> Subject: Anyone still using Python 2.5?
>> Date: Wed, 21 Dec 2011 07:15:46 +0000
>> From: Chris Withers <chris at simplistix.co.uk>
>> To: Python List <python-list at python.org>,
>> "testing-in-python at lists.idyll.org" <testing-in-python at lists.idyll.org>,
>> simplistix at googlegroups.com
>> 
>> Hi All,
>> 
>> What's the general consensus on supporting Python 2.5 nowadays?
>> 
>> Do people still have to use this in commercial environments or is
>> everyone on 2.6+ nowadays?
>> 
>> I'm finally getting some continuous integration set up for my packages
>> and it's highlighting some 2.5 compatibility issues. I'm wondering
>> whether to fix those (lots of ugly "from __future__ import
>> with_statement" everywhere) or just to drop Python 2.5 support.
>> 
>> What do people feel?
>> 
>> cheers,
>> 
>> Chris
>> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
> 


--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html






From greg at krypto.org  Thu Dec 22 02:51:44 2011
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 21 Dec 2011 17:51:44 -0800
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <20111218235516.741cc14d@pitrou.net>
References: <CA+OGgf6_DdEgVBovmnNLXeBUB_vt08WXiFrC-izhgeV+DBBPsQ@mail.gmail.com>
	<4EEBB8FC.2010405@hotpy.org> <20111218235516.741cc14d@pitrou.net>
Message-ID: <CAGE7PNJU2+rwN2N311DCJNEHHnJvtCBmTbWyDUXb0ovzj8gNdw@mail.gmail.com>

On Sun, Dec 18, 2011 at 2:55 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Fri, 16 Dec 2011 21:32:44 +0000
> Mark Shannon <mark at hotpy.org> wrote:
> >
> > > per-instance attributes, it just forces them all to keep resizing up,
> > > even though individual instances would be small with the current dict.
> > There is a cut-off point, at the moment it's quite unsophisticated about
> > how it does this, but it could easily be improved.
> > Suggestions are welcome.
>
> Can you open an issue on the bug tracker?
> There you can either give your repo URL, or upload a patch.
> Both should allow to start reviewing the code :)
>
> Regards
>
> Antoine.
>

+1 I'm interested in seeing this as well.

Anything that improves the memory overhead in cpython is appreciated as it
decreases the pain when moving an app from 32bit to 64bit. :)

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111221/4696c35c/attachment.html>

From victor.stinner at haypocalc.com  Thu Dec 22 02:49:06 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 22 Dec 2011 02:49:06 +0100
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <4EF187B6.3080406@simplistix.co.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
Message-ID: <4EF28C92.1090501@haypocalc.com>

> What's the general consensus on supporting Python 2.5 nowadays?

There is no such consensus :-)

> Do people still have to use this in commercial environments or is
> everyone on 2.6+ nowadays?

At work, we are still using Python 2.5. Six months ago, we started a 
project to upgrade to 2.7, but we have now more urgent tasks, so the 
upgrade is delayed to later. Even if we upgrade new clients to 2.7, we 
will have to continue to support 2.5 for some more months (or years?).

In a personal project (the IPy library), I dropped support of Python 2.5 
in february 2011. Recently, I got a mail asking me where the previous 
version of my library (supporting Python 2.4) can be downloaded! Someone 
is still using Python 2.4: "I'm stuck with python 2.4 in my work 
environment."

> What do people feel?

For a new project, try to support Python 2.5, especially if you would 
like to write a portable library. For a new application working on Mac 
OS X, Windows and Linux, you can only support Python 2.6.

Victor

From victor.stinner at haypocalc.com  Thu Dec 22 02:55:40 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 22 Dec 2011 02:55:40 +0100
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <CAPkN8xLkFXOTwjFL+dg2Q+-YN2k39hxnm+_Byes3BJ6kMC9ipQ@mail.gmail.com>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<CAPkN8xLkFXOTwjFL+dg2Q+-YN2k39hxnm+_Byes3BJ6kMC9ipQ@mail.gmail.com>
Message-ID: <4EF28E1C.6080703@haypocalc.com>

On 21/12/2011 15:26, anatoly techtonik wrote:
> I believe most AppEngine applications in Python are still using 2.5
> run-time. So are development boxes for these applications. It may take
> another year or two for the transition.

App engine 1.6 improved support of Python 2.7, so I hope that -slowly- 
everybody will move to Python 3. Oops, I mean Python 2.7 ;-)

http://code.google.com/appengine/docs/python/python27/

Victor

From benjamin at python.org  Thu Dec 22 03:08:49 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 21 Dec 2011 20:08:49 -0600
Subject: [Python-Dev] Adding features to 2to3... cpython/default right?
 can I backport to 2.7?
In-Reply-To: <CAGE7PN+WxJQi+1wrGuA-4zbEZf81ivGXiZLUnZCy4Q3HfjCsew@mail.gmail.com>
References: <CAGE7PN+WxJQi+1wrGuA-4zbEZf81ivGXiZLUnZCy4Q3HfjCsew@mail.gmail.com>
Message-ID: <CAPZV6o8Q+tkwK4vj+6zE2H8jd+FthUzxQftemWCGt7+v=vQ_Gg@mail.gmail.com>

2011/12/21 Gregory P. Smith <greg at krypto.org>:
> I have some features I need to add to lib2to3 to make it more useful for our
> purposes at work supporting our massive code base in a Python 2 to 3
> transition. Which tree should I develop these and check these into?
>
> cpython/default?
>
> Can I backport this to 3.2 and 2.7? ?It counts as a feature addition which
> is normally a no-no for backports. ?But in this case I'm enhancing 2to3
> which is a useful tool.

You may backport things for 2to3. It's exempt from feature freeze.

>
> No big deal to me _personally_ if I can't backport from 3.3
> (cpython/default) as I'd apply the changes to our copy at work internally
> but it seems wise to me for us to keep enhancing and improving 2to3 in a
> Python 2.x/3.x release independent manner to make people's conversions
> easier.
>
> The features I want to commit (all pretty easy additions) are command line
> flag / constructor option support for:
> ? 1) writing output files to a different directory tree instead of
> overwriting the input file.
> ? 2) modifying the output filename by altering the suffix (.py -> .py3 for
> example)
> ? 3) always writing output files even if there were no changes to make
> (useful in combination with the above to effectively act as a "copy library
> X to this directory converting it to python 3 syntax along the way").
>
> The old?http://hg.python.org/2to3/?tree exists but it really looks like an
> out of date version.

Indeed; I should probably just delete it.



-- 
Regards,
Benjamin

From mwm at mired.org  Thu Dec 22 03:45:50 2011
From: mwm at mired.org (Mike Meyer)
Date: Wed, 21 Dec 2011 18:45:50 -0800
Subject: [Python-Dev] Anyone still using Python 2.5?
In-Reply-To: <5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk> <4EF28707.405@gmail.com>
	<5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk>
Message-ID: <20111221184550.00dc6b8a@bhuda.mired.org>

On Thu, 22 Dec 2011 01:49:37 +0000
Michael Foord <fuzzyman at voidspace.org.uk> wrote:
> These figures can't possibly be true. No-one is using Python 3 yet. ;-)

Since you brought it up. Is anyone paying people (or trying to hire
people) to write Python 3?

	Thanks,
	<mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

From anacrolix at gmail.com  Thu Dec 22 04:36:41 2011
From: anacrolix at gmail.com (Matt Joiner)
Date: Thu, 22 Dec 2011 14:36:41 +1100
Subject: [Python-Dev] Anyone still using Python 2.5?
In-Reply-To: <20111221184550.00dc6b8a@bhuda.mired.org>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk> <4EF28707.405@gmail.com>
	<5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk>
	<20111221184550.00dc6b8a@bhuda.mired.org>
Message-ID: <CAB4yi1O8xJwP=3pVr_sENCZ5SbP5f660OzPn8yxahztr9NYO1w@mail.gmail.com>

I'm paid to write Python3. I've also been writing Python3 for hobby
projects since mid 2010. I'm on the verge of going back to 2.7 due to
compatibility issues :(

On Thu, Dec 22, 2011 at 1:45 PM, Mike Meyer <mwm at mired.org> wrote:
> On Thu, 22 Dec 2011 01:49:37 +0000
> Michael Foord <fuzzyman at voidspace.org.uk> wrote:
>> These figures can't possibly be true. No-one is using Python 3 yet. ;-)
>
> Since you brought it up. Is anyone paying people (or trying to hire
> people) to write Python 3?
>
> ? ? ? ?Thanks,
> ? ? ? ?<mike
> --
> Mike Meyer <mwm at mired.org> ? ? ? ? ? ? ?http://www.mired.org/
> Independent Software developer/SCM consultant, email for more information.
>
> O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com



-- 
?_?

From a.badger at gmail.com  Thu Dec 22 06:17:32 2011
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Wed, 21 Dec 2011 21:17:32 -0800
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <4EF28C92.1090501@haypocalc.com>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<4EF28C92.1090501@haypocalc.com>
Message-ID: <20111222051732.GF24681@unaka.lan>

On Thu, Dec 22, 2011 at 02:49:06AM +0100, Victor Stinner wrote:
> 
> >Do people still have to use this in commercial environments or is
> >everyone on 2.6+ nowadays?
> 
> At work, we are still using Python 2.5. Six months ago, we started a
> project to upgrade to 2.7, but we have now more urgent tasks, so the
> upgrade is delayed to later. Even if we upgrade new clients to 2.7,
> we will have to continue to support 2.5 for some more months (or
> years?).
> 
At my work, I'm on RHEL5 and RHEL6.  So I'm currently supporting python-2.4
and python-2.6.  We're up to 75% RHEL6 (though, not the machines where most
of our deployed, custom written apps are running) so I shouldn't have to
support python-2.4 for much longer.

> In a personal project (the IPy library), I dropped support of Python
> 2.5 in february 2011. Recently, I got a mail asking me where the
> previous version of my library (supporting Python 2.4) can be
> downloaded! Someone is still using Python 2.4: "I'm stuck with python
> 2.4 in my work environment."
> 
As part of work, I package for EPEL5 (addon packages for RHEL5).  Sometimes
we need a new version of a package or a new package for RHEL5 and thus need
to have python-2.4 compatible versions of the package and any of its
dependencies.

When I no longer need to maintain python-2.4 stuff for work, I'm hoping to
not have to do quite so much of this but sometimes I know I'll still get
requests to update an existing package to fix a bug or fix a feature and
that will require updates of dependent libraries.  I'll still be stuck
looking for python-2.4 compatible versions of all of these :-(

> >What do people feel?
> 
> For a new project, try to support Python 2.5, especially if you would
> like to write a portable library. For a new application working on
> Mac OS X, Windows and Linux, you can only support Python 2.6.
> 
I agree that libraries have a need to go farther back than applications.
I have one library that I support on python-2.3 (for RHEL4... I'm counting
down the months on that one :-).  Every other library I maintain, I make sure
I support at least python-2.4.

Application-wise, I currently have to support python-2.4+ but given that
Linux distros seem to all have some version out that supports at least
python-2.6, I don't think I'll be developing any applications that
intentionally support less than that once I get moved away from RHEL-5 at my
workplace.

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111221/031dc193/attachment.pgp>

From techtonik at gmail.com  Thu Dec 22 08:56:47 2011
From: techtonik at gmail.com (anatoly techtonik)
Date: Thu, 22 Dec 2011 09:56:47 +0200
Subject: [Python-Dev] Anyone still using Python 2.5?
In-Reply-To: <5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk> <4EF28707.405@gmail.com>
	<5C96DB96-3CE7-4E02-A597-5E5B58669EBE@voidspace.org.uk>
Message-ID: <CAPkN8x+Mtu3-HFyBzgXE2NePgauyH4uHfO7YSiWz--ck=DiJTw@mail.gmail.com>

On Thu, Dec 22, 2011 at 4:49 AM, Michael Foord <fuzzyman at voidspace.org.uk>wrote:

>
> On 22 Dec 2011, at 01:25, Mark Hammond wrote:
>
> > FWIW, the most recent version of pywin32 has the following download
> counts (rounded to the nearest thousand)
> >
> > Version 32bit       64bit
> > -------------------------
> > 3.2 -  75,000       9,000
> > 3.1 -   4,000       1,000
> > 2.7 - 126,000      16,000
> > 2.6 -  46,000       6,000
> > 2.5 -  21,000         n/a
> > 2.4 -   3,000         n/a
> > 2.3 -   1,000         n/a
> >
> > So ISTM that 2.5 isn't hugely popular these days, but also isn't
> insignificant.  It probably means I could "safely" drop 2.3 and 2.4 support
> though...
> >
>
>
> These figures can't possibly be true. No-one is using Python 3 yet. ;-)
>

python.org should have a poll/settings for active python.org accounts to
allow people mark when they switch to Python 3.

FWIW I heard a few days ago about a UK government department, HMGCC (Her
> Majesty's Government Communication Centre - based in Milton Keynes), who
> use Python for research projects. They switched to using Python 3 a while
> ago.
>

 if that == True:
     front_page.response(news_template.render("News About Her Majesty
switched to Python 3"))

Can't stand to do a +1 for the news item.

All the best,
>
> Michael Foord
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111222/2b7ca64e/attachment.html>

From techtonik at gmail.com  Thu Dec 22 09:05:11 2011
From: techtonik at gmail.com (anatoly techtonik)
Date: Thu, 22 Dec 2011 10:05:11 +0200
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <4EF28E1C.6080703@haypocalc.com>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<CAPkN8xLkFXOTwjFL+dg2Q+-YN2k39hxnm+_Byes3BJ6kMC9ipQ@mail.gmail.com>
	<4EF28E1C.6080703@haypocalc.com>
Message-ID: <CAPkN8xJfT9HKiFQeF8AaTCaK+7NMKaajMEK0Xyro4AUCqmbp0w@mail.gmail.com>

On Thu, Dec 22, 2011 at 4:55 AM, Victor Stinner <
victor.stinner at haypocalc.com> wrote:

> On 21/12/2011 15:26, anatoly techtonik wrote:
>
>> I believe most AppEngine applications in Python are still using 2.5
>> run-time. So are development boxes for these applications. It may take
>> another year or two for the transition.
>>
>
> App engine 1.6 improved support of Python 2.7, so I hope that -slowly-
> everybody will move to Python 3. Oops, I mean Python 2.7 ;-)
>
> http://code.google.com/appengine/docs/python/python27/
>

I've just got reminded that Python 2.7 support in AppEngine is still
experimental, so the exodus is unlikely to happen soon.
https://groups.google.com/forum/#!topic/google-appengine-python/tPbDEAHke64
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111222/9e67c3f4/attachment.html>

From timwintle at gmail.com  Thu Dec 22 10:44:32 2011
From: timwintle at gmail.com (Tim Wintle)
Date: Thu, 22 Dec 2011 09:44:32 +0000
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <20111221074245.6652c314@resist.wooz.org>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<20111221074245.6652c314@resist.wooz.org>
Message-ID: <1324547072.30982.15.camel@tim-laptop>

On Wed, 2011-12-21 at 07:42 -0500, Barry Warsaw wrote:
> On Dec 21, 2011, at 07:16 AM, Chris Withers wrote:
> 
> >What's the general consensus on supporting Python 2.5 nowadays?
> 
> FWIW, Ubuntu dropped 2.5 quite a while ago.

Some servers I deploy to run Ubuntu, but we're installing previous
python versions to support our apps - OS support isn't a factor in which
version we develop for.

I work on applications in 2.4-2.6.

Generally:

2.4 apps are legacy and a migration is planned in the next year (either
to 2.7 or to pypy).

2.5 apps are the speed-critical ones. Our tests showed the performance
was different enough between 2.5 and 2.6 for me to not update. They also
have significant native extensions in them so are potentially the most
difficult to port to python3.

2.6 apps are newish and (mainly) pure python.

I can see myself still using 2.5 for many years, but porting the 2.6 and
2.4 code to either pypy or python3 in the not too distant future. I
believe we're most likely to choose python3 for apps with heavy use of
Unicode (and pick a version after the changes to internal unicode format
landed).

Tim Wintle


From solipsis at pitrou.net  Thu Dec 22 10:56:38 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 22 Dec 2011 10:56:38 +0100
Subject: [Python-Dev] Anyone still using Python 2.5?
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<20111221074245.6652c314@resist.wooz.org>
	<1324547072.30982.15.camel@tim-laptop>
Message-ID: <20111222105638.59498a88@pitrou.net>

On Thu, 22 Dec 2011 09:44:32 +0000
Tim Wintle <timwintle at gmail.com> wrote:
> 
> 2.5 apps are the speed-critical ones. Our tests showed the performance
> was different enough between 2.5 and 2.6 for me to not update.

Really? Where's the regression?

Regards

Antoine.



From stefan_ml at behnel.de  Thu Dec 22 11:12:24 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 22 Dec 2011 11:12:24 +0100
Subject: [Python-Dev] Anyone still using Python 2.5?
In-Reply-To: <20111222105638.59498a88@pitrou.net>
References: <4EF187A2.6070909@simplistix.co.uk>	<4EF187B6.3080406@simplistix.co.uk>	<20111221074245.6652c314@resist.wooz.org>	<1324547072.30982.15.camel@tim-laptop>
	<20111222105638.59498a88@pitrou.net>
Message-ID: <jcuvq8$tm4$1@dough.gmane.org>

Antoine Pitrou, 22.12.2011 10:56:
> On Thu, 22 Dec 2011 09:44:32 +0000
> Tim Wintle wrote:
>>
>> 2.5 apps are the speed-critical ones. Our tests showed the performance
>> was different enough between 2.5 and 2.6 for me to not update.
>
> Really? Where's the regression?

That's not unexpected at least, and matches my own (limited) experience 
here. My gut feeling is that Py2.6 added a lot of "new in Py3.0" overhead, 
but without all the optimisations that went into Py3.x since then. At least 
some of that came back later with Py2.7.

Would be nice to (eventually) see Py2.[567] run in speed.python.org in 
order to get a better idea of the relative performance.

Stefan


From fijall at gmail.com  Thu Dec 22 11:09:53 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 22 Dec 2011 12:09:53 +0200
Subject: [Python-Dev] Anyone still using Python 2.5?
In-Reply-To: <20111222105638.59498a88@pitrou.net>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<20111221074245.6652c314@resist.wooz.org>
	<1324547072.30982.15.camel@tim-laptop>
	<20111222105638.59498a88@pitrou.net>
Message-ID: <CAK5idxQ706c5fbKjbPLTVJBQFc36MPw3sQMGxTq+_XPh-RT9Xg@mail.gmail.com>

On Thu, Dec 22, 2011 at 11:56 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Thu, 22 Dec 2011 09:44:32 +0000
> Tim Wintle <timwintle at gmail.com> wrote:
>>
>> 2.5 apps are the speed-critical ones. Our tests showed the performance
>> was different enough between 2.5 and 2.6 for me to not update.
>
> Really? Where's the regression?
>
> Regards
>
> Antoine.

Sounds weird, for all I know 2.6 is faster or not slower than 2.5.

From timwintle at gmail.com  Thu Dec 22 12:05:09 2011
From: timwintle at gmail.com (Tim Wintle)
Date: Thu, 22 Dec 2011 11:05:09 +0000
Subject: [Python-Dev] Anyone still using Python 2.5?
In-Reply-To: <20111222105638.59498a88@pitrou.net>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<20111221074245.6652c314@resist.wooz.org>
	<1324547072.30982.15.camel@tim-laptop>
	<20111222105638.59498a88@pitrou.net>
Message-ID: <1324551909.32266.58.camel@tim-laptop>

On Thu, 2011-12-22 at 10:56 +0100, Antoine Pitrou wrote:
> On Thu, 22 Dec 2011 09:44:32 +0000
> Tim Wintle <timwintle at gmail.com> wrote:
> > 
> > 2.5 apps are the speed-critical ones. Our tests showed the performance
> > was different enough between 2.5 and 2.6 for me to not update.
> 
> Really? Where's the regression?

I'm not certain - IIRC there were several nice optimisations in 2.6, and
I wasn't expecting that when I first looked.

I was running code designed for 2.5 under 2.6, so it's likely that with
sufficient tweaking for 2.6 I might not have the same result. I tested
this specific code with the python builds we have in production, not
general python code - I don't mean this as a recommendation that anyone
else assume 2.5 is faster for them.

I suspect that Stefan's comments about newly added features without the
optimisation in python3 might be partially true, but having the extra
code to support them (while not using them) might also be part of the
cause - ceval.c had over 1K line changes between r25 and r26, including
cases for new opcodes, and new opcode predictions etc - it's possible
that my code just happens to not follow the most optimal paths.

I'm talking about a slow-down of under 10%, but enough that I couldn't
justify moving these apps to 2.6 at the time for economic reasons, and
pypy would be the main incentive to move this to 2.7.

Tim


From macsmith.us at gmail.com  Thu Dec 22 12:10:32 2011
From: macsmith.us at gmail.com (Mac Smith)
Date: Thu, 22 Dec 2011 16:40:32 +0530
Subject: [Python-Dev] reading multiline output
Message-ID: <79DF8234-67BF-4297-ACE6-8D091D05B3E9@gmail.com>

Hi,


I have started HandBrakeCLI using subprocess.popen but the output is multiline and not terminated with \n so i am not able to read it using readline() while the HandBrakeCLI is running. kindly suggest some alternative. i have attached the output in a file.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: output
Type: application/octet-stream
Size: 4541 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111222/d81b1a30/attachment.obj>
-------------- next part --------------


--
Thanks

Mac

From phd at phdru.name  Thu Dec 22 12:30:41 2011
From: phd at phdru.name (Oleg Broytman)
Date: Thu, 22 Dec 2011 15:30:41 +0400
Subject: [Python-Dev] reading multiline output
In-Reply-To: <79DF8234-67BF-4297-ACE6-8D091D05B3E9@gmail.com>
References: <79DF8234-67BF-4297-ACE6-8D091D05B3E9@gmail.com>
Message-ID: <20111222113041.GA18753@iskra.aviel.ru>

Hello.

   We are sorry but we cannot help you. This mailing list is to work on
developing Python (adding new features to Python itself and fixing bugs);
if you're having problems learning, understanding or using Python, please
find another forum. Probably python-list/comp.lang.python mailing list/news
group is the best place; there are Python developers who participate in it;
you may get a faster, and probably more complete, answer there. See
http://www.python.org/community/ for other lists/news groups/fora. Thank
you for understanding.

On Thu, Dec 22, 2011 at 04:40:32PM +0530, Mac Smith wrote:
> I have started HandBrakeCLI using subprocess.popen but the output is multiline and not terminated with \n so i am not able to read it using readline() while the HandBrakeCLI is running. kindly suggest some alternative. i have attached the output in a file.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From martin at v.loewis.de  Thu Dec 22 14:34:00 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 22 Dec 2011 14:34:00 +0100
Subject: [Python-Dev] Adding GNU conditional execution in the Makefile?
In-Reply-To: <4EE3EBA9.2050600@jcea.es>
References: <4EE3EBA9.2050600@jcea.es>
Message-ID: <4EF331C8.9000307@v.loewis.de>

> If this is a policy, I would like to know.

As Guido says: Python should work with "traditional make", I think
this is particularly relevant for the BSDs, and Solaris.

> And if somebody has a suggestion to cope with this difficulty...

Why don't you use some @FOO@ replacement? Have something expand
to either the object file name, or nothing.

Regards,
Martin

From martin at v.loewis.de  Thu Dec 22 15:09:25 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 22 Dec 2011 15:09:25 +0100
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EEA722A.10403@hotpy.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
	<4EEA722A.10403@hotpy.org>
Message-ID: <4EF33A15.1040707@v.loewis.de>

> The current dict implementation is getting pretty old,
> isn't it time we had a new one (for xmas)?

I like the approach, and I think something should be done indeed.
If you don't contribute your approach, I'd like to drop at least
ma_smalltable for 3.3.

A number of things about your branch came to my mind:
- it would be useful to have a specialized representation for
  all-keys-are-strings. In that case, me_hash could be dropped
  from the representation. You would get savings compared to
  the status quo even in the non-shared case.
- why does _dictkeys need to be a full-blown Python object?
  We need refcounting and the size, but not the type slot.
- I wonder whether the shared keys could be computed at compile
  time, considering all attribute names that get assigned for
  self. The compiler could list those in the code object, and
  class creation could iterate over all methods (taking base
  classes into account).

Regards,
Martin

From solipsis at pitrou.net  Thu Dec 22 17:49:31 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 22 Dec 2011 17:49:31 +0100
Subject: [Python-Dev] hg.python.org mod_wsgi changes
Message-ID: <20111222174931.20c1c693@pitrou.net>


Hello,

Today I've modified the WSGI configuration at hg.python.org. If you
notify anything wrong (e.g. when cloning a repository), please tell me.

For the curious: http://mercurial.selenic.com/bts/issue2595

Regards

Antoine.



From fijall at gmail.com  Thu Dec 22 19:15:04 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 22 Dec 2011 20:15:04 +0200
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EF33A15.1040707@v.loewis.de>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
	<4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de>
Message-ID: <CAK5idxQt4i0-w4_z8+hECUAnySs2bODtx6QwOAvWs5Di5cTX5g@mail.gmail.com>

> - I wonder whether the shared keys could be computed at compile
> ?time, considering all attribute names that get assigned for
> ?self. The compiler could list those in the code object, and
> ?class creation could iterate over all methods (taking base
> ?classes into account).

This is hard, because sometimes you don't quite know what the self
*is* even, especially if __init__ calls some methods or there is any
sort of control flow. You can however track what gets assigned at
runtime at have shapes associated with objects.

From jafo at tummy.com  Fri Dec 23 01:15:33 2011
From: jafo at tummy.com (Sean Reifschneider)
Date: Thu, 22 Dec 2011 17:15:33 -0700
Subject: [Python-Dev] Fwd: Anyone still using Python 2.5?
In-Reply-To: <20111221074245.6652c314@resist.wooz.org>
References: <4EF187A2.6070909@simplistix.co.uk>
	<4EF187B6.3080406@simplistix.co.uk>
	<20111221074245.6652c314@resist.wooz.org>
Message-ID: <20111223001533.GA2061@tummy.com>

On Wed, Dec 21, 2011 at 07:42:45AM -0500, Barry Warsaw wrote:
>FWIW, Ubuntu dropped 2.5 quite a while ago.  The next LTS (long term support)

That's true for *CURRENT* releases, however Ubuntu still supports Python
2.5 via 8.04 LTS (end of life in April 2013).  Lucid is 2.6 and goes EOL in
2015.

Red Hat Enterprise is a bit more difficult a situation.  They currently
still have active support for Python 2.3 in RHEL 4, but that comes up to
EOL in just a couple of months (Feb 2012).  But they have this "extended
life cycle" that ends in Feb 2015.

RHEL 5 has python 2.4.3 and has an EOL of April 2014 (April 2017 for
extended life cycle).

There was a fairly large lag between RHEL 5 and RHEL 6 (almost 4 years), so
there are a *LOT* of RHEL 5 systems out there.

RHEL 6 has Python 2.6.6, BTW.

This is why I recently released the "ineedpy2" package so that your program
can request and search for specific versions of Python on a multi-python
system.  We have a number of systems that have Python 2.3 and older on
them, but many of those systems have newer Pythons also available as
alternate names.

We recommend that whenever possible customers target deploying against the
system python, meaning version 2.4.3 if they are deploying on CentOS 5.
Because otherwise security updates of Python and *all the libraries they
depend on* need to be tracked manually.  Some customers decide to go one
route, some to go the other, but that is our recommendation.

Ideally, you are building your apps to target a production environment, not
just using the latest and greatest Python without compelling reasons.

So, yes, people are still using Python 2.5 and 2.4.  Mostly this is people
who have already deployed apps and are either fixing/updating them, or are
adding new applications that they want to target the same production
environment rather than setting up a new environment.

Sean
-- 
 Linux, because eventually you grow up enough to be trusted with a fork().
Sean Reifschneider, Member of Technical Staff <jafo at tummy.com>
tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability


From martin at v.loewis.de  Fri Dec 23 09:57:01 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Fri, 23 Dec 2011 09:57:01 +0100
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <CAK5idxQt4i0-w4_z8+hECUAnySs2bODtx6QwOAvWs5Di5cTX5g@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>
	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
	<4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de>
	<CAK5idxQt4i0-w4_z8+hECUAnySs2bODtx6QwOAvWs5Di5cTX5g@mail.gmail.com>
Message-ID: <4EF4425D.4060409@v.loewis.de>

Am 22.12.2011 19:15, schrieb Maciej Fijalkowski:
>> - I wonder whether the shared keys could be computed at compile
>>  time, considering all attribute names that get assigned for
>>  self. The compiler could list those in the code object, and
>>  class creation could iterate over all methods (taking base
>>  classes into account).
> 
> This is hard, because sometimes you don't quite know what the self
> *is* even, especially if __init__ calls some methods or there is any
> sort of control flow. You can however track what gets assigned at
> runtime at have shapes associated with objects.

Actually, it's fairly easy, as it only needs to be heuristical.
I am proposing the exact heuristics as specified above ("attribute
names that get assigned for self").

I don't think that __init__ calling methods is much of an issue here,
since these methods then still have attributes assigned to self.

Regards,
Martin

From mark at hotpy.org  Fri Dec 23 10:51:47 2011
From: mark at hotpy.org (Mark Shannon)
Date: Fri, 23 Dec 2011 09:51:47 +0000
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EF33A15.1040707@v.loewis.de>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>
	<4EEA722A.10403@hotpy.org> <4EF33A15.1040707@v.loewis.de>
Message-ID: <4EF44F33.4000508@hotpy.org>

Martin v. L?wis wrote:
>> The current dict implementation is getting pretty old,
>> isn't it time we had a new one (for xmas)?
> 
> I like the approach, and I think something should be done indeed.
> If you don't contribute your approach, I'd like to drop at least
> ma_smalltable for 3.3.
> 
> A number of things about your branch came to my mind:
> - it would be useful to have a specialized representation for
>   all-keys-are-strings. In that case, me_hash could be dropped
>   from the representation. You would get savings compared to
>   the status quo even in the non-shared case.
It might tricky switching key tables and I dont think it would save much 
memory as keys that are widely shared take up very little memory anyway,
and not many other dicts are long-lived.

(It might improve performance for dicts used for keyword arguments)

> - why does _dictkeys need to be a full-blown Python object?
>   We need refcounting and the size, but not the type slot.
It doesn't. It's just a hangover from my original HotPy implementation 
where all objects needed a type for the GC.
So yes, the type slot could be removed.

> - I wonder whether the shared keys could be computed at compile
>   time, considering all attribute names that get assigned for
>   self. The compiler could list those in the code object, and
>   class creation could iterate over all methods (taking base
>   classes into account).
> 

It probably wouldn't be that hard to make a guess at compile time as to 
what the shared keys would be, but it doesn't really matter.
The generation of intermediate shared keys will only happen once per 
class, so the overhead would be negligible.

To cut down on that overhead, we could use a ref-count trick: If the 
instance being updating and its class hold the only two refs to an 
immutable keys(-set -table -vector?) then just treat it as mutable.

I'll modify the repo to incorporate these changes when I have a chance.

Cheers,
Mark.

From martin at v.loewis.de  Fri Dec 23 11:33:59 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 23 Dec 2011 11:33:59 +0100
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EF44F33.4000508@hotpy.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>	<4EEA722A.10403@hotpy.org>
	<4EF33A15.1040707@v.loewis.de> <4EF44F33.4000508@hotpy.org>
Message-ID: <4EF45917.10605@v.loewis.de>

>> - it would be useful to have a specialized representation for
>>   all-keys-are-strings. In that case, me_hash could be dropped
>>   from the representation. You would get savings compared to
>>   the status quo even in the non-shared case.
> It might tricky switching key tables and I dont think it would save much
> memory as keys that are widely shared take up very little memory anyway,
> and not many other dicts are long-lived.

Why do you say that? In a plain 3.3 interpreter, I counted 595 dict
objects (see script below). Of these, 563 (so nearly of them) had
only strings as keys. Among those, I found 286 different key sets,
where 231 key sets occurred only once (i.e. wouldn't be shared).

Together, the string dictionaries had 13282 keys, and you could save
as many pointers (actually more, because there will be more key slots
than keys).

I'm not sure why you think the string dicts with unshared keys would be
short-lived. But even if they were, what matters is the steady-state
number of dictionaries - if for every short-lived dictionary that
gets released another one is created, any memory savings from reducing
the dict size would still materialize.

>> - I wonder whether the shared keys could be computed at compile
>>   time, considering all attribute names that get assigned for
>>   self. The compiler could list those in the code object, and
>>   class creation could iterate over all methods (taking base
>>   classes into account).
>>
> 
> It probably wouldn't be that hard to make a guess at compile time as to
> what the shared keys would be, but it doesn't really matter.
> The generation of intermediate shared keys will only happen once per
> class, so the overhead would be negligible.

I'm not so much concerned about overhead, but about correctness/
effectiveness of the heuristics. For a class with dynamic attributes,
you may well come up with a very large key set. With source analysis,
you wouldn't attempt to grow the keyset beyond what likely is being
shared.

Regards,
Martin

import sys
d = sys.getobjects(0,dict)
print(len(d), "dicts")
d2 = []
for o in d:
    keys = o.keys()
    if not keys:continue
    types = tuple(set(type(k) for k in keys))
    if types != (str,):
        continue
    d2.append(tuple(sorted(keys)))
print(len(d2), "str dicts")
freq = {}
for keys in d2:
    freq[keys] = freq.get(keys,0)+1
print(len(freq), "different key sets")
freq = sorted(freq.items(), key=lambda t:t[1])
print(len([o for o in freq if o[1]==1]), "unsharable")
print(sum(len(o[0]) for o in freq), "keys")
print(freq[-10:])

From mark at hotpy.org  Fri Dec 23 12:21:26 2011
From: mark at hotpy.org (Mark Shannon)
Date: Fri, 23 Dec 2011 11:21:26 +0000
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EF45917.10605@v.loewis.de>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>	<4EEA722A.10403@hotpy.org>
	<4EF33A15.1040707@v.loewis.de> <4EF44F33.4000508@hotpy.org>
	<4EF45917.10605@v.loewis.de>
Message-ID: <4EF46436.2080904@hotpy.org>

Martin v. L?wis wrote:
>>> - it would be useful to have a specialized representation for
>>>   all-keys-are-strings. In that case, me_hash could be dropped
>>>   from the representation. You would get savings compared to
>>>   the status quo even in the non-shared case.
>> It might tricky switching key tables and I dont think it would save much
>> memory as keys that are widely shared take up very little memory anyway,
>> and not many other dicts are long-lived.
> 
> Why do you say that? In a plain 3.3 interpreter, I counted 595 dict
> objects (see script below). Of these, 563 (so nearly of them) had
> only strings as keys. Among those, I found 286 different key sets,
> where 231 key sets occurred only once (i.e. wouldn't be shared).
> 
> Together, the string dictionaries had 13282 keys, and you could save
> as many pointers (actually more, because there will be more key slots
> than keys).

The question is how much memory needs to be saved to be worth adding the 
complexity, 10kb: No, 100Mb: yes.
So data from "real" benchmarks would be useful.

Also, I'm assuming that it would be tricky to implement correctly due to 
implicit assumptions in the rest of the code.
If I'm wrong and its easy to implement then please do.

> 
> I'm not sure why you think the string dicts with unshared keys would be
> short-lived. 
Not all, but most. Most dicts with unshared keys would most likely be
for keyword parameters. Explicit dicts tend to be few in number.
(When I say few I mean up to 1k, not 100k or 1M).

Module dicts are very likely to have unshared keys; they number in the 
10s or 100s, but they do tend to be large.

> But even if they were, what matters is the steady-state
> number of dictionaries - if for every short-lived dictionary that
> gets released another one is created, any memory savings from reducing
> the dict size would still materialize. 
But only a few kb?

> 
>>> - I wonder whether the shared keys could be computed at compile
>>>   time, considering all attribute names that get assigned for
>>>   self. The compiler could list those in the code object, and
>>>   class creation could iterate over all methods (taking base
>>>   classes into account).
>>>
>> It probably wouldn't be that hard to make a guess at compile time as to
>> what the shared keys would be, but it doesn't really matter.
>> The generation of intermediate shared keys will only happen once per
>> class, so the overhead would be negligible.
> 
> I'm not so much concerned about overhead, but about correctness/
> effectiveness of the heuristics. For a class with dynamic attributes,
> you may well come up with a very large key set. With source analysis,
> you wouldn't attempt to grow the keyset beyond what likely is being
> shared.
I agree some sort of heuristic is required to limit excessive growth
and prevent pathological behaviour.
The current implementation just has a cut off at a certain size;
it could definitely be improved.

As I said, I'll update the code soon and then, well what's the phase...
Oh yes, "patches welcome" ;)

Thanks for the feedback.

Cheers,
Mark.

> 
> Regards,
> Martin
> 
> import sys
> d = sys.getobjects(0,dict)
> print(len(d), "dicts")
> d2 = []
> for o in d:
>     keys = o.keys()
>     if not keys:continue
>     types = tuple(set(type(k) for k in keys))
>     if types != (str,):
>         continue
>     d2.append(tuple(sorted(keys)))
> print(len(d2), "str dicts")
> freq = {}
> for keys in d2:
>     freq[keys] = freq.get(keys,0)+1
> print(len(freq), "different key sets")
> freq = sorted(freq.items(), key=lambda t:t[1])
> print(len([o for o in freq if o[1]==1]), "unsharable")
> print(sum(len(o[0]) for o in freq), "keys")
> print(freq[-10:])





From stefan_ml at behnel.de  Fri Dec 23 13:03:17 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 23 Dec 2011 13:03:17 +0100
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EF46436.2080904@hotpy.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>	<4EEA722A.10403@hotpy.org>	<4EF33A15.1040707@v.loewis.de>
	<4EF44F33.4000508@hotpy.org>	<4EF45917.10605@v.loewis.de>
	<4EF46436.2080904@hotpy.org>
Message-ID: <jd1qm6$4g7$1@dough.gmane.org>

Mark Shannon, 23.12.2011 12:21:
> Martin v. L?wis wrote:
>>>> - it would be useful to have a specialized representation for
>>>> all-keys-are-strings. In that case, me_hash could be dropped
>>>> from the representation. You would get savings compared to
>>>> the status quo even in the non-shared case.
>>> It might tricky switching key tables and I dont think it would save much
>>> memory as keys that are widely shared take up very little memory anyway,
>>> and not many other dicts are long-lived.
>>
>> Why do you say that? In a plain 3.3 interpreter, I counted 595 dict
>> objects (see script below). Of these, 563 (so nearly of them) had
>> only strings as keys. Among those, I found 286 different key sets,
>> where 231 key sets occurred only once (i.e. wouldn't be shared).
>>
>> Together, the string dictionaries had 13282 keys, and you could save
>> as many pointers (actually more, because there will be more key slots
>> than keys).
>
> The question is how much memory needs to be saved to be worth adding the
> complexity, 10kb: No, 100Mb: yes.
> So data from "real" benchmarks would be useful.

Consider taking a parsed MiniDOM tree as a benchmark. It contains so many 
instances of just a couple of different classes that it just has to make a 
huge difference if each of those instances is even just a bit smaller. It 
should also make a clear difference for plain Python ElementTree.

I attached a benchmark script that measures the parsing speed as well as 
the total memory usage of the in-memory tree. You can get data files from 
the following places, just download them and pass their file names on the 
command line:

http://gnosis.cx/download/hamlet.xml

http://www.ibiblio.org/xml/examples/religion/ot/ot.xml

Here are some results from my own machine for comparison:

http://blog.behnel.de/index.php?p=197

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: etbenchmark.py
Type: text/x-python
Size: 4760 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111223/cfc8eb7d/attachment.py>

From martin at v.loewis.de  Fri Dec 23 14:05:22 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 23 Dec 2011 14:05:22 +0100
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EF46436.2080904@hotpy.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>	<4EEA722A.10403@hotpy.org>	<4EF33A15.1040707@v.loewis.de>
	<4EF44F33.4000508@hotpy.org>	<4EF45917.10605@v.loewis.de>
	<4EF46436.2080904@hotpy.org>
Message-ID: <4EF47C92.1020603@v.loewis.de>

> If I'm wrong and its easy to implement then please do.

Ok, so I take it that you are not interested in the idea. No problem.

Regards,
Martin

From martin at v.loewis.de  Fri Dec 23 14:07:57 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Fri, 23 Dec 2011 14:07:57 +0100
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <jd1qm6$4g7$1@dough.gmane.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>	<4EEA722A.10403@hotpy.org>	<4EF33A15.1040707@v.loewis.de>	<4EF44F33.4000508@hotpy.org>	<4EF45917.10605@v.loewis.de>	<4EF46436.2080904@hotpy.org>
	<jd1qm6$4g7$1@dough.gmane.org>
Message-ID: <4EF47D2D.2080707@v.loewis.de>

> Consider taking a parsed MiniDOM tree as a benchmark. It contains so
> many instances of just a couple of different classes that it just has to
> make a huge difference if each of those instances is even just a bit
> smaller. It should also make a clear difference for plain Python
> ElementTree.

Of course, for minidom, Mark's current implementation should already
save quite a lot of memory, since all elements and text nodes have the
same attributes.

Still, it would be good to see how Mark's implementation deals with
that.

Regards,
Martin

From mark at hotpy.org  Fri Dec 23 16:08:44 2011
From: mark at hotpy.org (Mark Shannon)
Date: Fri, 23 Dec 2011 15:08:44 +0000
Subject: [Python-Dev] A new dict for Xmas?
In-Reply-To: <4EF47C92.1020603@v.loewis.de>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>	<j5k46t$dc6$1@dough.gmane.org>	<CAPkN8x+OEU9_npBfRwScgfFzShvEbFLxMwZTRq63-hLre3MYFA@mail.gmail.com>	<CAFYqXL-x3d78uOvTj=U7WscJu9pTMCkRwsKe58aepcGWy6tzFQ@mail.gmail.com>	<4EEA722A.10403@hotpy.org>	<4EF33A15.1040707@v.loewis.de>
	<4EF44F33.4000508@hotpy.org>	<4EF45917.10605@v.loewis.de>
	<4EF46436.2080904@hotpy.org> <4EF47C92.1020603@v.loewis.de>
Message-ID: <4EF4997C.9010808@hotpy.org>

Martin v. L?wis wrote:
>> If I'm wrong and its easy to implement then please do.
> 
> Ok, so I take it that you are not interested in the idea. No problem.

Its just that I don't think it would yield results commensurate with the
effort.
Also I think its worth keeping the initial version as simple as
reasonably possible. Refinements can be added later.

Cheers,
Mark.
> 
> Regards,
> Martin



From status at bugs.python.org  Fri Dec 23 18:07:32 2011
From: status at bugs.python.org (Python tracker)
Date: Fri, 23 Dec 2011 18:07:32 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20111223170732.0F1B81CEEE@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-12-16 - 2011-12-23)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3168 ( -7)
  closed 22272 (+52)
  total  25440 (+45)

Open issues with patches: 1358 


Issues opened (27)
==================

#13614: `setup.py register` fails if long_description contains RST
http://bugs.python.org/issue13614  opened by techtonik

#13615: `setup.py register` fails with -r argument
http://bugs.python.org/issue13615  opened by techtonik

#13617: Reject embedded null characters in wchar* strings
http://bugs.python.org/issue13617  opened by haypo

#13619: Add a new codec: "locale", the current locale encoding
http://bugs.python.org/issue13619  opened by haypo

#13621: Unicode performance regression in python3.3 vs python3.2
http://bugs.python.org/issue13621  opened by Boris.FELD

#13629: _PyParser_TokenNames does not match up with the token.h number
http://bugs.python.org/issue13629  opened by meador.inge

#13630: IDLE: Find(ed) text is not highlighted while dialog box is ope
http://bugs.python.org/issue13630  opened by marco

#13631: readline fails to parse some forms of .editrc under editline (
http://bugs.python.org/issue13631  opened by zvezdan

#13632: Update token documentation to reflect actual token types
http://bugs.python.org/issue13632  opened by meador.inge

#13633: Handling of hex character references in HTMLParser.handle_char
http://bugs.python.org/issue13633  opened by ezio.melotti

#13636: Python SSL Stack doesn't have a Secure Default set of ciphers
http://bugs.python.org/issue13636  opened by naif

#13638: PyErr_SetFromErrnoWithFilenameObject is undocumented
http://bugs.python.org/issue13638  opened by pitrou

#13639: UnicodeDecodeError when creating tar.gz with unicode name
http://bugs.python.org/issue13639  opened by jason.coombs

#13640: add mimetype for application/vnd.apple.mpegurl
http://bugs.python.org/issue13640  opened by Hiroaki.Kawai

#13641: decoding functions in the base64 module could accept unicode s
http://bugs.python.org/issue13641  opened by pitrou

#13642: urllib incorrectly quotes username and password in https basic
http://bugs.python.org/issue13642  opened by joneskoo

#13643: 'ascii' is a bad filesystem default encoding
http://bugs.python.org/issue13643  opened by gz

#13644: Python 3 crashes (segfaults) with this code.
http://bugs.python.org/issue13644  opened by maniram.maniram

#13645: test_import fails after test_coding
http://bugs.python.org/issue13645  opened by pitrou

#13646: Document poor interaction between multiprocessing and -m on Wi
http://bugs.python.org/issue13646  opened by ncoghlan

#13647: Python SSL stack doesn't securely validate certificate (as cli
http://bugs.python.org/issue13647  opened by naif

#13649: termios.ICANON is not documented
http://bugs.python.org/issue13649  opened by techtonik

#13651: Improve redirection in urllib
http://bugs.python.org/issue13651  opened by tom.kel

#13653: reorder set.intersection parameters for better performance
http://bugs.python.org/issue13653  opened by dalke

#13655: Python SSL stack doesn't have a default CA Store
http://bugs.python.org/issue13655  opened by naif

#13657: IDLE doesn't support sys.ps1 and sys.ps2.
http://bugs.python.org/issue13657  opened by maniram.maniram

#13658: Extra clause in class grammar documentation
http://bugs.python.org/issue13658  opened by Joshua.Landau



Most recent 15 issues with no replies (15)
==========================================

#13658: Extra clause in class grammar documentation
http://bugs.python.org/issue13658

#13657: IDLE doesn't support sys.ps1 and sys.ps2.
http://bugs.python.org/issue13657

#13649: termios.ICANON is not documented
http://bugs.python.org/issue13649

#13642: urllib incorrectly quotes username and password in https basic
http://bugs.python.org/issue13642

#13641: decoding functions in the base64 module could accept unicode s
http://bugs.python.org/issue13641

#13640: add mimetype for application/vnd.apple.mpegurl
http://bugs.python.org/issue13640

#13638: PyErr_SetFromErrnoWithFilenameObject is undocumented
http://bugs.python.org/issue13638

#13633: Handling of hex character references in HTMLParser.handle_char
http://bugs.python.org/issue13633

#13632: Update token documentation to reflect actual token types
http://bugs.python.org/issue13632

#13631: readline fails to parse some forms of .editrc under editline (
http://bugs.python.org/issue13631

#13608: remove born-deprecated PyUnicode_AsUnicodeAndSize
http://bugs.python.org/issue13608

#13605: document argparse's nargs=REMAINDER
http://bugs.python.org/issue13605

#13594: Aifc markers write fix
http://bugs.python.org/issue13594

#13590: Prebuilt python-2.7.2 binaries for macosx can not compile c ex
http://bugs.python.org/issue13590

#13574: refresh example in doc for Extending and Embedding
http://bugs.python.org/issue13574



Most recent 15 issues waiting for review (15)
=============================================

#13651: Improve redirection in urllib
http://bugs.python.org/issue13651

#13645: test_import fails after test_coding
http://bugs.python.org/issue13645

#13643: 'ascii' is a bad filesystem default encoding
http://bugs.python.org/issue13643

#13640: add mimetype for application/vnd.apple.mpegurl
http://bugs.python.org/issue13640

#13639: UnicodeDecodeError when creating tar.gz with unicode name
http://bugs.python.org/issue13639

#13636: Python SSL Stack doesn't have a Secure Default set of ciphers
http://bugs.python.org/issue13636

#13632: Update token documentation to reflect actual token types
http://bugs.python.org/issue13632

#13631: readline fails to parse some forms of .editrc under editline (
http://bugs.python.org/issue13631

#13629: _PyParser_TokenNames does not match up with the token.h number
http://bugs.python.org/issue13629

#13619: Add a new codec: "locale", the current locale encoding
http://bugs.python.org/issue13619

#13617: Reject embedded null characters in wchar* strings
http://bugs.python.org/issue13617

#13609: Add "os.get_terminal_size()" function
http://bugs.python.org/issue13609

#13607: Move generator specific sections out of ceval.
http://bugs.python.org/issue13607

#13604: update PEP 393 (match implementation)
http://bugs.python.org/issue13604

#13598: string.Formatter doesn't support empty curly braces "{}"
http://bugs.python.org/issue13598



Top 10 most discussed issues (10)
=================================

#13643: 'ascii' is a bad filesystem default encoding
http://bugs.python.org/issue13643  31 msgs

#13636: Python SSL Stack doesn't have a Secure Default set of ciphers
http://bugs.python.org/issue13636  29 msgs

#8604: Adding an atomic FS write API
http://bugs.python.org/issue8604  12 msgs

#8828: Atomic function to rename a file
http://bugs.python.org/issue8828  12 msgs

#13585: Add contextlib.ContextStack
http://bugs.python.org/issue13585  11 msgs

#5689: Support xz compression in tarfile module
http://bugs.python.org/issue5689   8 msgs

#11638: python setup.py sdist --formats tar* crashes if version is uni
http://bugs.python.org/issue11638   8 msgs

#13555: cPickle MemoryError when loading large file (while pickle work
http://bugs.python.org/issue13555   8 msgs

#13621: Unicode performance regression in python3.3 vs python3.2
http://bugs.python.org/issue13621   8 msgs

#13647: Python SSL stack doesn't securely validate certificate (as cli
http://bugs.python.org/issue13647   8 msgs



Issues closed (49)
==================

#1785: "inspect" gets broken by some descriptors
http://bugs.python.org/issue1785  closed by pitrou

#3932: HTMLParser cannot handle '&' and non-ascii characters in attri
http://bugs.python.org/issue3932  closed by ezio.melotti

#5424: Packed IPaddr conversion tests should be extended
http://bugs.python.org/issue5424  closed by pitrou

#6321: Reload Python modules when running programs
http://bugs.python.org/issue6321  closed by samwyse

#7502: All DocTestCase instances compare and hash equal to each other
http://bugs.python.org/issue7502  closed by pitrou

#8035: urllib.request.urlretrieve hangs waiting for connection close 
http://bugs.python.org/issue8035  closed by neologix

#8093: IDLE processes don't close
http://bugs.python.org/issue8093  closed by ned.deily

#9039: IDLE and module Doc
http://bugs.python.org/issue9039  closed by terry.reedy

#11006: warnings with subprocess and pipe2
http://bugs.python.org/issue11006  closed by rosslagerwall

#11178: Running tests inside a package by module name fails
http://bugs.python.org/issue11178  closed by michael.foord

#11231: bytes() constructor is not correctly documented
http://bugs.python.org/issue11231  closed by haypo

#11764: inspect.getattr_static code execution w/ class body as non dic
http://bugs.python.org/issue11764  closed by michael.foord

#11813: inspect.getattr_static doesn't get module attributes
http://bugs.python.org/issue11813  closed by python-dev

#11829: inspect.getattr_static code execution with meta-metaclasses
http://bugs.python.org/issue11829  closed by python-dev

#11867: Make test_mailbox deterministic
http://bugs.python.org/issue11867  closed by neologix

#11870: test_3_join_in_forked_from_thread() of test_threading hangs 1 
http://bugs.python.org/issue11870  closed by neologix

#12231: regrtest: add -k and -K options to filter tests by function/fi
http://bugs.python.org/issue12231  closed by pitrou

#12708: multiprocessing.Pool is missing a starmap[_async]() method.
http://bugs.python.org/issue12708  closed by pitrou

#12798: Update mimetypes documentation
http://bugs.python.org/issue12798  closed by orsenthil

#12809: Missing new setsockopts in Linux (eg: IP_TRANSPARENT)
http://bugs.python.org/issue12809  closed by neologix

#13294: http.server: minor code style changes.
http://bugs.python.org/issue13294  closed by orsenthil

#13443: wrong links and examples in the functional HOWTO
http://bugs.python.org/issue13443  closed by orsenthil

#13522: Document error return values for PyFloat_* and PyComplex_*
http://bugs.python.org/issue13522  closed by pitrou

#13530: Docs for os.lseek neglect to mention what it returns
http://bugs.python.org/issue13530  closed by haypo

#13560: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize
http://bugs.python.org/issue13560  closed by haypo

#13571: Backup files support in IDLE
http://bugs.python.org/issue13571  closed by terry.reedy

#13576: Handling of broken condcoms in HTMLParser
http://bugs.python.org/issue13576  closed by ezio.melotti

#13577: __qualname__ is not present on builtin methods and functions
http://bugs.python.org/issue13577  closed by pitrou

#13581: help() appears to be broken; doesn't display __doc__ for class
http://bugs.python.org/issue13581  closed by pitrou

#13610: On Python parsing numbers.
http://bugs.python.org/issue13610  closed by ezio.melotti

#13613: Small error in regular expression poker hand example
http://bugs.python.org/issue13613  closed by ezio.melotti

#13616: Never ending loop in in update_refs Modules/gcmodule.c
http://bugs.python.org/issue13616  closed by David.Butler

#13618: bytes.decode() UnicodeEncodeError on Apple iOS (>16-bit) chara
http://bugs.python.org/issue13618  closed by silverbacknet

#13620: Support Chrome in webbrowser.py
http://bugs.python.org/issue13620  closed by orsenthil

#13622: Bytes performance regression in python3.3 vs python3.2
http://bugs.python.org/issue13622  closed by haypo

#13623: Bytes performance regression in python3.3 vs python3.2
http://bugs.python.org/issue13623  closed by haypo

#13624: UTF-8 encoder performance regression in python3.3
http://bugs.python.org/issue13624  closed by haypo

#13625: multiprocessing.reduction gives OSError: [Errno 9] in 2.7.2
http://bugs.python.org/issue13625  closed by neologix

#13626: Python SSL stack doesn't support DH ciphers
http://bugs.python.org/issue13626  closed by pitrou

#13627: Python SSL stack doesn't support Elliptic Curve ciphers
http://bugs.python.org/issue13627  closed by pitrou

#13628: python-gdb.py: patch to improve support of optimized Python
http://bugs.python.org/issue13628  closed by haypo

#13634: Python SSL stack doesn't support Compression configuration
http://bugs.python.org/issue13634  closed by pitrou

#13635: Python SSL stack doesn't support ordering of Ciphers
http://bugs.python.org/issue13635  closed by pitrou

#13637: binascii.a2b_* functions could accept unicode strings
http://bugs.python.org/issue13637  closed by pitrou

#13648: xml.sax.saxutils.escape does not escapes \x00
http://bugs.python.org/issue13648  closed by loewis

#13650: urllib HTTPRedirectHandler does not implement documented behav
http://bugs.python.org/issue13650  closed by tom.kel

#13652: Creating lambda functions in a loop has unexpected results whe
http://bugs.python.org/issue13652  closed by benjamin.peterson

#13654: IDLE: Freezes and/or crash on SyntaxWarning... is used prior t
http://bugs.python.org/issue13654  closed by ned.deily

#13656: Document ctypes.util and ctypes.wintypes.
http://bugs.python.org/issue13656  closed by maniram.maniram

From fuzzyman at voidspace.org.uk  Thu Dec 29 02:28:45 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Thu, 29 Dec 2011 01:28:45 +0000
Subject: [Python-Dev] Hash collision security issue (now public)
Message-ID: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>

Hello all,

A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:

	 http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

Although it's a security issue I'm posting it here because it is now public and seems important.

The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:

	reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
	7 minutes of CPU usage for a 1 MB request
	~20 kbits/s ? keep one Core Duo core busy

This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). 

The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.

Their recommended fix is to randomize the hash function.

All the best,

Michael


--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html






From jnoller at gmail.com  Thu Dec 29 02:37:56 2011
From: jnoller at gmail.com (Jesse Noller)
Date: Wed, 28 Dec 2011 20:37:56 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
Message-ID: <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>



On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote:

> Hello all,
>  
> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>  
> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>  
> Although it's a security issue I'm posting it here because it is now public and seems important.
>  
> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>  
> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
> 7 minutes of CPU usage for a 1 MB request
> ~20 kbits/s ? keep one Core Duo core busy
>  
> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).  
>  
> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>  
> Their recommended fix is to randomize the hash function.
>  
> All the best,
>  
> Michael
>  
Back up link for the PDF:
http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

Ocert disclosure:
http://www.ocert.org/advisories/ocert-2011-003.html

jesse  



From jnoller at gmail.com  Thu Dec 29 02:48:00 2011
From: jnoller at gmail.com (Jesse Noller)
Date: Wed, 28 Dec 2011 20:48:00 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
Message-ID: <ED281B37CFE54FA5BF0DABD8721BE2AC@gmail.com>



On Wednesday, December 28, 2011 at 8:37 PM, Jesse Noller wrote:

>  
>  
> On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote:
>  
> > Hello all,
> >  
> > A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
> >  
> > http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
> >  
> > Although it's a security issue I'm posting it here because it is now public and seems important.
> >  
> > The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
> >  
> > reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
> > 7 minutes of CPU usage for a 1 MB request
> > ~20 kbits/s ? keep one Core Duo core busy
> >  
> > This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).  
> >  
> > The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
> >  
> > Their recommended fix is to randomize the hash function.
> >  
> > All the best,
> >  
> > Michael
>  
> Back up link for the PDF:
> http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>  
> Ocert disclosure:
> http://www.ocert.org/advisories/ocert-2011-003.html

And more analysis/information:

http://cryptanalysis.eu/blog/2011/12/28/effective-dos-attacks-against-web-application-plattforms-hashdos/  



From ericsnowcurrently at gmail.com  Thu Dec 29 02:49:08 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 28 Dec 2011 18:49:08 -0700
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
Message-ID: <CALFfu7BPXx2AFhE2oae1fYBrx_7zX7QbsXEXzjPH9R6Yry87Ow@mail.gmail.com>

On Wed, Dec 28, 2011 at 6:28 PM, Michael Foord
<fuzzyman at voidspace.org.uk> wrote:
> Hello all,
>
> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>
> ? ? ? ? http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>
> Although it's a security issue I'm posting it here because it is now public and seems important.
>
> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>
> ? ? ? ?reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
> ? ? ? ?7 minutes of CPU usage for a 1 MB request
> ? ? ? ?~20 kbits/s ? keep one Core Duo core busy
>
> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
>
> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>
> Their recommended fix is to randomize the hash function.

Ironically, this morning I ran across a discussion from about 8 years
ago on basically the same thing:

http://mail.python.org/pipermail/python-dev/2003-May/035874.html

 From what I read in the thread, it didn't seem like anyone here was
too worried about it.  Does this new research change anything?

-eric

From alex.gaynor at gmail.com  Thu Dec 29 02:51:21 2011
From: alex.gaynor at gmail.com (Alex Gaynor)
Date: Thu, 29 Dec 2011 01:51:21 +0000 (UTC)
Subject: [Python-Dev] Hash collision security issue (now public)
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
Message-ID: <loom.20111229T024633-735@post.gmane.org>

A few thoughts on this:

a) This is not a new issue, I'm curious what the new interest is in it.

b) Whatever the solution to this is, it is *not* CPython specific, any decision
should be reflected in the Python language spec IMO, if CPython has the semantic
that dicts aren't vulnerable to hash collision then users *will* rely on this
and another implementation having a different (valid) behavior opens up users to
security issues.

c) I'm not convinced a randomized hash is appropriate for the default dict, for
a number of reasons: it's a performance hit on every dict operations, using a
per-process seed means you can't compile the hash of an obj at Python's compile
time, a per-dict seed inhibits a bunch of other optimizations.  These may not be
relevant to CPython, but they are to PyPy and probably the invoke-dynamic work
on Jython (pursuant to point b).

Therefore I think these should be considered application issues, since request
limiting is difficult and error prone, I'd recommend the Python stdlib including
a non-hash based map (such as a binary tree).

Alex


From raymond.hettinger at gmail.com  Thu Dec 29 03:09:21 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Wed, 28 Dec 2011 18:09:21 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
Message-ID: <CC775199-9257-4DCB-A6B4-19497F22C39F@gmail.com>

FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue.
It is believed that they give us better-than-random results for commonly
encountered datasets.  A change to randomized hashes would have a
negative performance impact on those cases.

Also, randomizing the hash wreaks havoc on doctests, book examples
not matching actual dict reprs, and on efforts by users to optimize
the insertion order into dicts with frequent lookups.


Raymond





On Dec 28, 2011, at 5:28 PM, Michael Foord wrote:

> Hello all,
> 
> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
> 
> 	 http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
> 
> Although it's a security issue I'm posting it here because it is now public and seems important.
> 
> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
> 
> 	reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
> 	7 minutes of CPU usage for a 1 MB request
> 	~20 kbits/s ? keep one Core Duo core busy
> 
> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). 
> 
> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
> 
> Their recommended fix is to randomize the hash function.
> 
> All the best,
> 
> Michael
> 
> 
> --
> http://www.voidspace.org.uk/
> 
> 
> May you do good and not evil
> May you find forgiveness for yourself and forgive others
> May you share freely, never taking more than you give.
> -- the sqlite blessing 
> http://www.sqlite.org/different.html
> 
> 
> 
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail.com


From lists at cheimes.de  Thu Dec 29 04:04:17 2011
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 Dec 2011 04:04:17 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
Message-ID: <4EFBD8B1.4020207@cheimes.de>

Am 29.12.2011 02:37, schrieb Jesse Noller:
> Back up link for the PDF:
> http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
> 
> Ocert disclosure:
> http://www.ocert.org/advisories/ocert-2011-003.html

>From http://www.nruns.com/_downloads/advisory28122011.pdf

---
Python uses a hash function which is very similar to DJBX33X, which can
be broken using a
meet-in-the-middle attack. It operates on register size and is thus
different for 64 and 32 bit
machines. While generating multi-collisions efficiently is also possible
for the 64 bit version
of the function, the resulting colliding strings are too large to be
relevant for anything more
than an academic attack.

Plone as the most prominent Python web framework accepts 1 MB of POST
data, which it
parses in about 7 minutes of CPU time in the worst case.
This gives an attacker with about 20 kbit/s the possibility to keep one
Core Duo core
constantly busy. If the attacker is in the position to have a Gigabit
line available, he can keep
about 50.000 Core Duo cores busy.
---

If I remember correctly CPython uses the long for its hash function so
64bit Windows uses a 32bit hash.

From lists at cheimes.de  Thu Dec 29 03:55:22 2011
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 Dec 2011 03:55:22 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CC775199-9257-4DCB-A6B4-19497F22C39F@gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<CC775199-9257-4DCB-A6B4-19497F22C39F@gmail.com>
Message-ID: <4EFBD69A.9030903@cheimes.de>

Am 29.12.2011 03:09, schrieb Raymond Hettinger:
> FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue.
> It is believed that they give us better-than-random results for commonly
> encountered datasets.  A change to randomized hashes would have a
> negative performance impact on those cases.
> 
> Also, randomizing the hash wreaks havoc on doctests, book examples
> not matching actual dict reprs, and on efforts by users to optimize
> the insertion order into dicts with frequent lookups.

My five cents on the topic:

I totally concur with Raymound. He, Tim and all the others did a
fantastic job with the dict implementation and optimization. We
shouldn't overreact and mess with the current hashing and dict code just
because a well-known and old attack vector pops up again. The dict code
is far too crucial for Python's overall performance. However the issue
should be documented in our docs.

I've been dealing with web stuff and security for almost a decade. I've
seen far worse attack vectors. This one can easily be solved with a
couple of lines of Python code. For example Application developers can
limit the maximum amount of POST parameters to a sensible amount and
limit the length of each key, too. The issue less severe on platforms
with 64bit hashes, so it won't affect most people. I think only 32bit
Unix and Windows in general (32bit long) are in trouble.

CPython could aid developers with a special subclass of dict. The
crucial lookup function is already overwrite-able per dict instance and
on subclasses of dict through PyDictObj's struct member PyDictEntry
*(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example
specialized subclass could limit the seach for a free slot to n
recursions or choose to ignore the hash argument and calculate its own
hash of the key.

Christian

From brian at python.org  Thu Dec 29 04:41:22 2011
From: brian at python.org (Brian Curtin)
Date: Wed, 28 Dec 2011 21:41:22 -0600
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <loom.20111229T024633-735@post.gmane.org>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
	<loom.20111229T024633-735@post.gmane.org>
Message-ID: <CAD+XWwr0aR2MwKQcodzEM7P8wm6XsJ3fo0SktJgnugwUeVPF=A@mail.gmail.com>

On Wed, Dec 28, 2011 at 19:51, Alex Gaynor <alex.gaynor at gmail.com> wrote:
> A few thoughts on this:
>
> a) This is not a new issue, I'm curious what the new interest is in it.

Well they (the presenters of the report) had to be accepted to that
conference for *something*, otherwise we wouldn't know they exist.

From solipsis at pitrou.net  Thu Dec 29 11:32:44 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 29 Dec 2011 11:32:44 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
	<4EFBD8B1.4020207@cheimes.de>
Message-ID: <20111229113244.58cb739c@pitrou.net>

On Thu, 29 Dec 2011 04:04:17 +0100
Christian Heimes <lists at cheimes.de> wrote:
> Am 29.12.2011 02:37, schrieb Jesse Noller:
> > Back up link for the PDF:
> > http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
> > 
> > Ocert disclosure:
> > http://www.ocert.org/advisories/ocert-2011-003.html
> 
> >From http://www.nruns.com/_downloads/advisory28122011.pdf
> 
> ---
> Python uses a hash function which is very similar to DJBX33X, which can
> be broken using a
> meet-in-the-middle attack. It operates on register size and is thus
> different for 64 and 32 bit
> machines. While generating multi-collisions efficiently is also possible
> for the 64 bit version
> of the function, the resulting colliding strings are too large to be
> relevant for anything more
> than an academic attack.
> 
> Plone as the most prominent Python web framework accepts 1 MB of POST
> data, which it
> parses in about 7 minutes of CPU time in the worst case.
> This gives an attacker with about 20 kbit/s the possibility to keep one
> Core Duo core
> constantly busy. If the attacker is in the position to have a Gigabit
> line available, he can keep
> about 50.000 Core Duo cores busy.
> ---
> 
> If I remember correctly CPython uses the long for its hash function so
> 64bit Windows uses a 32bit hash.

Not anymore, Py_hash_t is currently aligned with Py_ssize_t.

Regards

Antoine.



From solipsis at pitrou.net  Thu Dec 29 12:10:00 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 29 Dec 2011 12:10:00 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<CC775199-9257-4DCB-A6B4-19497F22C39F@gmail.com>
	<4EFBD69A.9030903@cheimes.de>
Message-ID: <20111229121000.582e8f31@pitrou.net>

On Thu, 29 Dec 2011 03:55:22 +0100
Christian Heimes <lists at cheimes.de> wrote:
> 
> I've been dealing with web stuff and security for almost a decade. I've
> seen far worse attack vectors. This one can easily be solved with a
> couple of lines of Python code. For example Application developers can
> limit the maximum amount of POST parameters to a sensible amount and
> limit the length of each key, too.

Shouldn't the setting be implemented by frameworks?

> CPython could aid developers with a special subclass of dict. The
> crucial lookup function is already overwrite-able per dict instance and
> on subclasses of dict through PyDictObj's struct member PyDictEntry
> *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example
> specialized subclass could limit the seach for a free slot to n
> recursions or choose to ignore the hash argument and calculate its own
> hash of the key.

Or, rather, the specialized subclass could implement hash randomization.

Regards

Antoine.



From mark at hotpy.org  Thu Dec 29 12:13:26 2011
From: mark at hotpy.org (Mark Shannon)
Date: Thu, 29 Dec 2011 11:13:26 +0000
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
Message-ID: <4EFC4B56.90709@hotpy.org>

Michael Foord wrote:
> Hello all,
> 
> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
> 
> 	 http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
> 
> Although it's a security issue I'm posting it here because it is now public and seems important.
> 
> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
> 
> 	reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
> 	7 minutes of CPU usage for a 1 MB request
> 	~20 kbits/s ? keep one Core Duo core busy
> 
> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). 
> 
> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
> 
> Their recommended fix is to randomize the hash function.
> 

The attack relies on being able to predict the hash value for a given 
string. Randomising the string hash function is quite straightforward.
There is no need to change the dictionary code.

A possible (*untested*) patch is attached. I'll leave it for those more 
familiar with unicodeobject.c to do properly.

Cheers,
Mark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hash.patch
Type: text/x-diff
Size: 1367 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111229/df516875/attachment.patch>

From mark at hotpy.org  Thu Dec 29 12:25:03 2011
From: mark at hotpy.org (Mark Shannon)
Date: Thu, 29 Dec 2011 11:25:03 +0000
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CC775199-9257-4DCB-A6B4-19497F22C39F@gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<CC775199-9257-4DCB-A6B4-19497F22C39F@gmail.com>
Message-ID: <4EFC4E0F.2070201@hotpy.org>

Raymond Hettinger wrote:
> FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue.
> It is believed that they give us better-than-random results for commonly
> encountered datasets.  A change to randomized hashes would have a
> negative performance impact on those cases.

Tim Peter's analysis applies mainly to ints which would be unchanged.

A change to the hash function for strings would make no difference to 
the performance of the dict, as the ordering of the hash values is 
already quite different from the ordering of the strings for any string 
of more than 3 characters.

> 
> Also, randomizing the hash wreaks havoc on doctests, book examples
> not matching actual dict reprs, and on efforts by users to optimize
> the insertion order into dicts with frequent lookups.

The docs clearly state that the ordering of iteration over dicts is 
arbitrary. Perhaps changing it once in a while might be a good thing :)


Cheers,
Mark.

> 
> 
> Raymond
> 
> 
> 
> 
> 
> On Dec 28, 2011, at 5:28 PM, Michael Foord wrote:
> 
>> Hello all,
>>
>> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>>
>> 	 http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>>
>> Although it's a security issue I'm posting it here because it is now public and seems important.
>>
>> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>>
>> 	reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
>> 	7 minutes of CPU usage for a 1 MB request
>> 	~20 kbits/s ? keep one Core Duo core busy
>>
>> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue). 
>>
>> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>>
>> Their recommended fix is to randomize the hash function.
>>
>> All the best,
>>
>> Michael
>>
>>
>> --
>> http://www.voidspace.org.uk/
>>
>>
>> May you do good and not evil
>> May you find forgiveness for yourself and forgive others
>> May you share freely, never taking more than you give.
>> -- the sqlite blessing 
>> http://www.sqlite.org/different.html
>>
>>
>>
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail.com
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org


From solipsis at pitrou.net  Thu Dec 29 12:42:18 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 29 Dec 2011 12:42:18 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<CC775199-9257-4DCB-A6B4-19497F22C39F@gmail.com>
	<4EFC4E0F.2070201@hotpy.org>
Message-ID: <20111229124218.6affbebd@pitrou.net>

On Thu, 29 Dec 2011 11:25:03 +0000
Mark Shannon <mark at hotpy.org> wrote:
> > 
> > Also, randomizing the hash wreaks havoc on doctests, book examples
> > not matching actual dict reprs, and on efforts by users to optimize
> > the insertion order into dicts with frequent lookups.
> 
> The docs clearly state that the ordering of iteration over dicts is 
> arbitrary. Perhaps changing it once in a while might be a good thing :)

We already change it once in a while.
http://twistedmatrix.com/trac/ticket/5352
;)

Regards

Antoine.



From armin.ronacher at active-4.com  Thu Dec 29 12:29:53 2011
From: armin.ronacher at active-4.com (Armin Ronacher)
Date: Thu, 29 Dec 2011 12:29:53 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
Message-ID: <4EFC4F31.3090703@active-4.com>

Hi,

Just some extra thoughts about the whole topic in the light of web
applications (since this was hinted in the talk) running on Python:

   Yes, you can limit the number of maximum allowed parameters for post
   data but really there are so many places where data is parsed into
   hashing containers that it's quite a worthless task.  Here a very
   brief list of things usually parsed into a dict or set and where it
   happens:

   - URL parameters and url encoded form data
     Generally this happens somewhere in a framework but typically also
     in utility libraries that deal with URLs.  For instance the
     stdlib's cgi.parse_qs or urllib.parse.parse_qs on Python 3 do
     just that and that code is used left and right.

     Even if a framework would start limiting it's own URL parsing there
     is still a lot of code that does not do that the stdlib does that
     as well.

     With form data it's worse because you have multipart headers that
     need parsing and that is usually abstracted away so far from the
     user that they do not do that.  Many frameworks just use the cgi
     module's parsing functions which also just directly feed into a
     dictionary.

   - HTTP headers.
     There is zero a WSGI framework can do about that since the headers
     are parsed into a dictionary by the WSGI server.

   - Incoming JSON data.
     Again outside of what the framework can do for the most part.
     simplejson can be modified to stop parsing with the hook stuff
     but nobody does that and since users invoke simplejson's parsing
     routines themselves most webapps would still be vulnerable even
     if all frameworks would fix the problem.

   - Hidden dict parameters.
     Things like the parameter part of content-type or the
     content-disposition headers are generally also just parsed into a
     dictionary.  Likewise many frameworks parse things into set headers
     (for instance incoming etags).  The cookie header is usually parsed
     into a dictionary as well.

The issue is nothing new and at least my current POV on this topic was
that your server should be guarded and shoot handlers of requests going
rogue.  Dictionaries are not the only thing that has a worst case
performance that could be triggered by user input.

That said.  Considering that there are so many different places where
things are probably close to arbitrarily long that is parsed into a
dictionary or other hashing structure it's hard for a web application
developer or framework to protect itself against.

In case the watchdog is not a viable solution as I had assumed it was, I
think it's more reasonable to indeed consider adding a flag to Python
that allows randomization of hashes optionally before startup.

However as it was said earlier, the attack is a lot more complex to
carry out on a 64bit environment that it's probably (as it stands right
now!) safe to ignore.

The main problem there however is not that it's a new attack but that
some dickheads could now make prebaked attacks against websites to
disrupt them that might cause some negative publicity.  In general
though there are so many more ways to DDOS a website than this that I
would rate the whole issue very low.


Regards,
Armin


From mal at egenix.com  Thu Dec 29 13:49:44 2011
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 29 Dec 2011 13:49:44 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFC4B56.90709@hotpy.org>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4B56.90709@hotpy.org>
Message-ID: <4EFC61E8.7090100@egenix.com>

Mark Shannon wrote:
> Michael Foord wrote:
>> Hello all,
>>
>> A paper (well, presentation) has been published highlighting security problems with the hashing
>> algorithm (exploiting collisions) in many programming languages Python included:
>>
>>     
>> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>>
>>
>> Although it's a security issue I'm posting it here because it is now public and seems important.
>>
>> The issue they report can cause (for example) handling an http post to consume horrible amounts of
>> cpu. For Python the figures they quoted:
>>
>>     reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
>>     7 minutes of CPU usage for a 1 MB request
>>     ~20 kbits/s ? keep one Core Duo core busy
>>
>> This was apparently reported to the security list, but hasn't been responded to beyond an
>> acknowledgement on November 24th (the original report didn't make it onto the security list
>> because it was held in a moderation queue).
>> The same vulnerability was reported against various languages and web frameworks, and is already
>> fixed in some of them.
>>
>> Their recommended fix is to randomize the hash function.
>>
> 
> The attack relies on being able to predict the hash value for a given string. Randomising the string
> hash function is quite straightforward.
> There is no need to change the dictionary code.
> 
> A possible (*untested*) patch is attached. I'll leave it for those more familiar with
> unicodeobject.c to do properly.

The paper mentions that several web frameworks work around this by
limiting the number of parameters per GET/POST/HEAD request.

This sounds like a better alternative than randomizing the hash
function of strings.

Uncontrollable randomization has issues when you work with
multi-process setups, since the processes would each use different
hash values for identical strings. Putting the base_hash value
under application control could be done to solve this problem,
making sure that all processes use the same random base value.

BTW: Since your randomization trick uses the current time, it would
also be rather easy to tune an attack to find the currently
used base_hash. To make this safe, you'd have to use a more
random source for initializing the base_hash.

Note that the same hash collision attack can be used for
other key types as well, e.g. integers (where it's very easy
to find hash collisions), so this kind of randomization
would have to be applied to other basic types too.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 29 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From armin.ronacher at active-4.com  Thu Dec 29 13:57:07 2011
From: armin.ronacher at active-4.com (Armin Ronacher)
Date: Thu, 29 Dec 2011 13:57:07 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFC4F31.3090703@active-4.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com>
Message-ID: <4EFC63A3.5010008@active-4.com>

Hi,

Something I should add to this now that I thought about it a bit more:

Assuming this should be fixed on a language level the solution would
probably be to salt hashes.  The most common hash to salt here is the
PyUnicode hash for obvious reasons.

- Option a: Compiled in Salt
  + Easy to implement
  - Breaks unittests most likely (those were broken in the first place
    but that's still a very annoying change to make)
  - Might cause problems with interoperability of Pythons compiled with
    different hash salts
  - You're not really solving the problem because each linux
    distribution (besides Gentoo I guess) would have just one salt
    compiled in and that would be popular enough to have the same
    issue.

- Option b: Environment variable for the salt
  + Easy-ish to implement
  + Easy to synchronize over different machines
  - initialization for base types happens early and unpredictive which
    makes it hard for embedded Python interpreters (think mod_wsgi and
    other things) to specify the salt

- Option c: Random salt at runtime
  + Easy to implement
  - impossible to synchronize
  - breaks unittests in the same way as a compiled in salt would do

Where to add the salt to?  Unicode strings and bytestrings (byte
objects) I guess since those are the most common offenders.  Sometimes
tuples are keys of dictionaries but in that case a contributing factor
to the hash is the string in the tuple anyways.

Also related: since this is a security related issue, would this be
something that goes into Python 2?  Does that affect how a fix would
look like?


Regards,
Armin

From lists at cheimes.de  Thu Dec 29 14:04:05 2011
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 Dec 2011 14:04:05 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <20111229121000.582e8f31@pitrou.net>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<CC775199-9257-4DCB-A6B4-19497F22C39F@gmail.com>
	<4EFBD69A.9030903@cheimes.de> <20111229121000.582e8f31@pitrou.net>
Message-ID: <4EFC6545.5070904@cheimes.de>

Am 29.12.2011 12:10, schrieb Antoine Pitrou:
>> I've been dealing with web stuff and security for almost a decade. I've
>> seen far worse attack vectors. This one can easily be solved with a
>> couple of lines of Python code. For example Application developers can
>> limit the maximum amount of POST parameters to a sensible amount and
>> limit the length of each key, too.
> 
> Shouldn't the setting be implemented by frameworks?

Web framework like Django or CherryPy can be considered an application
from the CPython core's point of view. ;)
You are right. The term "framework" is a better word.

>> CPython could aid developers with a special subclass of dict. The
>> crucial lookup function is already overwrite-able per dict instance and
>> on subclasses of dict through PyDictObj's struct member PyDictEntry
>> *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash). For example
>> specialized subclass could limit the seach for a free slot to n
>> recursions or choose to ignore the hash argument and calculate its own
>> hash of the key.
> 
> Or, rather, the specialized subclass could implement hash randomization.

Yeah! I was thinking about the same when I wrote "calculate its own
hash" but I was too sloppy to carry on my argument. Please take 3am as
my excuse.

From hs at ox.cx  Thu Dec 29 14:11:49 2011
From: hs at ox.cx (Hynek Schlawack)
Date: Thu, 29 Dec 2011 14:11:49 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFC63A3.5010008@active-4.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
Message-ID: <B9DE63E5FFCB4B9C903AFDF0CB14E4D5@gmail.com>

Hi,  

how about

Option d: Host based salt
 + Easy-ish to implement ? how about basing it on the hostname for example?
 + transparent for all processes on the same host
 - probably unit test breakage

In fact, we could use host based as default with the option to specify own which would solve the sync problems.

That said, I agree with Armin that fixing this in the frameworks isn't an option.

Regards,
Hynek


Am Donnerstag, 29. Dezember 2011 um 13:57 schrieb Armin Ronacher:

> Hi,
>  
> Something I should add to this now that I thought about it a bit more:
>  
> Assuming this should be fixed on a language level the solution would
> probably be to salt hashes. The most common hash to salt here is the
> PyUnicode hash for obvious reasons.
>  
> - Option a: Compiled in Salt
> + Easy to implement
> - Breaks unittests most likely (those were broken in the first place
> but that's still a very annoying change to make)
> - Might cause problems with interoperability of Pythons compiled with
> different hash salts
> - You're not really solving the problem because each linux
> distribution (besides Gentoo I guess) would have just one salt
> compiled in and that would be popular enough to have the same
> issue.
>  
> - Option b: Environment variable for the salt
> + Easy-ish to implement
> + Easy to synchronize over different machines
> - initialization for base types happens early and unpredictive which
> makes it hard for embedded Python interpreters (think mod_wsgi and
> other things) to specify the salt
>  
> - Option c: Random salt at runtime
> + Easy to implement
> - impossible to synchronize
> - breaks unittests in the same way as a compiled in salt would do
>  
> Where to add the salt to? Unicode strings and bytestrings (byte
> objects) I guess since those are the most common offenders. Sometimes
> tuples are keys of dictionaries but in that case a contributing factor
> to the hash is the string in the tuple anyways.
>  
> Also related: since this is a security related issue, would this be
> something that goes into Python 2? Does that affect how a fix would
> look like?
>  
>  
> Regards,
> Armin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org (mailto:Python-Dev at python.org)
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/hs%40ox.cx




From lists at cheimes.de  Thu Dec 29 14:19:28 2011
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 Dec 2011 14:19:28 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFC4B56.90709@hotpy.org>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4B56.90709@hotpy.org>
Message-ID: <4EFC68E0.4000606@cheimes.de>

Am 29.12.2011 12:13, schrieb Mark Shannon:
> The attack relies on being able to predict the hash value for a given
> string. Randomising the string hash function is quite straightforward.
> There is no need to change the dictionary code.
> 
> A possible (*untested*) patch is attached. I'll leave it for those more 
> familiar with unicodeobject.c to do properly.

I'm worried that hash randomization of str is going to break 3rd party
software that rely on a stable hash across multiple Python instances.
Persistence layers like ZODB and cross interpreter communication
channels used by multiprocessing may (!) rely on the fact that the hash
of a string is fixed.

Perhaps the dict code is a better place for randomization. The code in
lookdict() and lookdict_unicode() could add a value to the hash. My
approach is less intrusive and also closes the attack vector for all
possible objects including str, byte, int and so on. I like also Armin's
idea of an optional hash randomization.

Christian

From solipsis at pitrou.net  Thu Dec 29 14:21:19 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 29 Dec 2011 14:21:19 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
Message-ID: <20111229142119.02e1ab50@pitrou.net>

On Thu, 29 Dec 2011 13:57:07 +0100
Armin Ronacher <armin.ronacher at active-4.com> wrote:
> 
> - Option c: Random salt at runtime
>   + Easy to implement
>   - impossible to synchronize
>   - breaks unittests in the same way as a compiled in salt would do

This option would have my preference. I don't think hash() was ever
meant to be "synchronizable". Already using a 32-bit Python will give
you different results from a 64-bit Python.

As for breaking unittests, these tests were broken in the first place.
hash() does change from time to time.

> Where to add the salt to?  Unicode strings and bytestrings (byte
> objects) I guess since those are the most common offenders.  Sometimes
> tuples are keys of dictionaries but in that case a contributing factor
> to the hash is the string in the tuple anyways.

Or it could be a process-wide constant for all dicts. If the constant
is additive as proposed by Mark the impact should be negligible.
(but the randomness must be good enough)

Regards

Antoine.



From fdrake at acm.org  Thu Dec 29 14:30:55 2011
From: fdrake at acm.org (Fred Drake)
Date: Thu, 29 Dec 2011 08:30:55 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFC68E0.4000606@cheimes.de>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de>
Message-ID: <CAFT4OTGeuestYJs7yyFHGN0JEOdEZVpj3CxHVf_hwfHZwMVXYg@mail.gmail.com>

On Thu, Dec 29, 2011 at 8:19 AM, Christian Heimes <lists at cheimes.de> wrote:
> Persistence layers like ZODB and cross interpreter communication
> channels used by multiprocessing may (!) rely on the fact that the hash
> of a string is fixed.

ZODB does not rely on a fixed hash function for strings; for any application
to rely on a stable hash would cause problems when updating Python versions.


  -Fred

-- 
Fred L. Drake, Jr.? ? <fdrake at acm.org>
"A person who won't read has no advantage over one who can't read."
?? --Samuel Langhorne Clemens

From lists at cheimes.de  Thu Dec 29 14:32:21 2011
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 Dec 2011 14:32:21 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFC63A3.5010008@active-4.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
Message-ID: <4EFC6BE5.6000400@cheimes.de>

Am 29.12.2011 13:57, schrieb Armin Ronacher:
> Hi,
> 
> Something I should add to this now that I thought about it a bit more:
> 
> Assuming this should be fixed on a language level the solution would
> probably be to salt hashes.  The most common hash to salt here is the
> PyUnicode hash for obvious reasons.
> 
> - Option a: Compiled in Salt
>   + Easy to implement
>   - Breaks unittests most likely (those were broken in the first place
>     but that's still a very annoying change to make)
>   - Might cause problems with interoperability of Pythons compiled with
>     different hash salts
>   - You're not really solving the problem because each linux
>     distribution (besides Gentoo I guess) would have just one salt
>     compiled in and that would be popular enough to have the same
>     issue.
> 
> - Option b: Environment variable for the salt
>   + Easy-ish to implement
>   + Easy to synchronize over different machines
>   - initialization for base types happens early and unpredictive which
>     makes it hard for embedded Python interpreters (think mod_wsgi and
>     other things) to specify the salt
> 
> - Option c: Random salt at runtime
>   + Easy to implement
>   - impossible to synchronize
>   - breaks unittests in the same way as a compiled in salt would do

- Option d: Don't change __hash__ but only use randomized hash for
PyDictEntry lookup
  + Easy to implement
  - breaks only software to relies on a fixed order of dict keys
  - breaks only a few to no unit tests

IMHO we don't have to alter the outcome of hash("some string"), hash(1)
and all other related types. We just need to reduce the change the an
attacker can produce collisions in the dict (and set?) code that looks
up the slot (PyDictEntry). How about adding the random value in
Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
lookdict_unicode() (Python 3.x)? With this approach the hash of all our
objects stay the same and just the dict code needs to be altered. The
approach has also the benefit that all possible objects gain a
randomized hash.

> Also related: since this is a security related issue, would this be
> something that goes into Python 2?  Does that affect how a fix would
> look like?

IMHO it does affect the fix. A changed and randomized hash function may
break software that relies on a stable hash() function.

Christian

From lists at cheimes.de  Thu Dec 29 14:34:59 2011
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 Dec 2011 14:34:59 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <20111229113244.58cb739c@pitrou.net>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
	<4EFBD8B1.4020207@cheimes.de> <20111229113244.58cb739c@pitrou.net>
Message-ID: <4EFC6C83.40207@cheimes.de>

Am 29.12.2011 11:32, schrieb Antoine Pitrou:
>> If I remember correctly CPython uses the long for its hash function so
>> 64bit Windows uses a 32bit hash.
> 
> Not anymore, Py_hash_t is currently aligned with Py_ssize_t.

Thanks for the update. Python 2.x still uses long and several large
frameworks like Zope/Plone require 2.x.

Christian

From solipsis at pitrou.net  Thu Dec 29 15:14:33 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 29 Dec 2011 15:14:33 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
	<4EFC6BE5.6000400@cheimes.de>
Message-ID: <20111229151433.17a14ee0@pitrou.net>

On Thu, 29 Dec 2011 14:32:21 +0100
Christian Heimes <lists at cheimes.de> wrote:
> Am 29.12.2011 13:57, schrieb Armin Ronacher:
> > Hi,
> > 
> > Something I should add to this now that I thought about it a bit more:
> > 
> > Assuming this should be fixed on a language level the solution would
> > probably be to salt hashes.  The most common hash to salt here is the
> > PyUnicode hash for obvious reasons.
> > 
> > - Option a: Compiled in Salt
> >   + Easy to implement
> >   - Breaks unittests most likely (those were broken in the first place
> >     but that's still a very annoying change to make)
> >   - Might cause problems with interoperability of Pythons compiled with
> >     different hash salts
> >   - You're not really solving the problem because each linux
> >     distribution (besides Gentoo I guess) would have just one salt
> >     compiled in and that would be popular enough to have the same
> >     issue.
> > 
> > - Option b: Environment variable for the salt
> >   + Easy-ish to implement
> >   + Easy to synchronize over different machines
> >   - initialization for base types happens early and unpredictive which
> >     makes it hard for embedded Python interpreters (think mod_wsgi and
> >     other things) to specify the salt
> > 
> > - Option c: Random salt at runtime
> >   + Easy to implement
> >   - impossible to synchronize
> >   - breaks unittests in the same way as a compiled in salt would do
> 
> - Option d: Don't change __hash__ but only use randomized hash for
> PyDictEntry lookup
>   + Easy to implement
>   - breaks only software to relies on a fixed order of dict keys
>   - breaks only a few to no unit tests
> 
> IMHO we don't have to alter the outcome of hash("some string"), hash(1)
> and all other related types. We just need to reduce the change the an
> attacker can produce collisions in the dict (and set?) code that looks
> up the slot (PyDictEntry). How about adding the random value in
> Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
> lookdict_unicode() (Python 3.x)? With this approach the hash of all our
> objects stay the same and just the dict code needs to be altered. The
> approach has also the benefit that all possible objects gain a
> randomized hash.

I basically agree with your proposal. The only downside is that custom
hashed containers (such as _pickle.c's memotable) don't
automatically benefit. That said, I think it would be difficult to
craft an attack against the aforementioned memotable (you would have
to basically choose the addresses of pickled objects).

Regards

Antoine.



From debatem1 at gmail.com  Thu Dec 29 16:41:49 2011
From: debatem1 at gmail.com (geremy condra)
Date: Thu, 29 Dec 2011 10:41:49 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CALFfu7BPXx2AFhE2oae1fYBrx_7zX7QbsXEXzjPH9R6Yry87Ow@mail.gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<CALFfu7BPXx2AFhE2oae1fYBrx_7zX7QbsXEXzjPH9R6Yry87Ow@mail.gmail.com>
Message-ID: <CAJ=m_n7AXppRq8=e9eztHdCoCV_i3cFzLSeyFvkGoJ+jQ_73Bg@mail.gmail.com>

On Wed, Dec 28, 2011 at 8:49 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Wed, Dec 28, 2011 at 6:28 PM, Michael Foord
> <fuzzyman at voidspace.org.uk> wrote:
>> Hello all,
>>
>> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>>
>> ? ? ? ? http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>>
>> Although it's a security issue I'm posting it here because it is now public and seems important.
>>
>> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>>
>> ? ? ? ?reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
>> ? ? ? ?7 minutes of CPU usage for a 1 MB request
>> ? ? ? ?~20 kbits/s ? keep one Core Duo core busy
>>
>> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
>>
>> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>>
>> Their recommended fix is to randomize the hash function.
>
> Ironically, this morning I ran across a discussion from about 8 years
> ago on basically the same thing:
>
> http://mail.python.org/pipermail/python-dev/2003-May/035874.html
>
> ?From what I read in the thread, it didn't seem like anyone here was
> too worried about it. ?Does this new research change anything?

Not really. It's actually somewhat behind previous work in that it
doesn't exploit the timing deltas, just generates very large ones.

Geremy Condra

From ned at nedbatchelder.com  Thu Dec 29 17:25:37 2011
From: ned at nedbatchelder.com (Ned Batchelder)
Date: Thu, 29 Dec 2011 11:25:37 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CC775199-9257-4DCB-A6B4-19497F22C39F@gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<CC775199-9257-4DCB-A6B4-19497F22C39F@gmail.com>
Message-ID: <4EFC9481.6010309@nedbatchelder.com>

On 12/28/2011 9:09 PM, Raymond Hettinger wrote:
> Also, randomizing the hash wreaks havoc on doctests, book examples
> not matching actual dict reprs, and on efforts by users to optimize
> the insertion order into dicts with frequent lookups.
I don't have a strong opinion about what to do about this vulnerability, 
but I know that none of these three reasons are a good reason to not 
change anything.  Dictionary key order has never been guaranteed, and 
changes from time to time.  Any code relying on it is broken to begin 
with. This is one of the reasons not to use doctests in the first place: 
comparing dicts textually has always been silly.

--Ned.

From timothy.c.delaney at gmail.com  Thu Dec 29 20:59:47 2011
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Fri, 30 Dec 2011 06:59:47 +1100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <B9DE63E5FFCB4B9C903AFDF0CB14E4D5@gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
	<B9DE63E5FFCB4B9C903AFDF0CB14E4D5@gmail.com>
Message-ID: <CAN8CLgkSk3C_yOvgV8RASkX51=BK-M6JvuWbK-sgTeyOVCerbw@mail.gmail.com>

+1 to option d (host-based salt) but would need to consistently order the
hostnames/addresses to guarantee that all processes on the same machine got
the same salt by default.

+1 to option c (environment variable) as an override. And/or maybe an
override on the command line.

+1 to implementing the salt in the dictionary hash as an additive value.

+0 to exposing the salt as a constant (3.3+ only) - or alternatively expose
a hash function that just takes an existing hash and returns the salted
hash. That would make it very easy for anything that wanted a salted hash
to get one.

For choosing the default salt, I think something like:

a. If IPv6 is enabled, take the link-local address of the interface with
the default route. Pretty much guaranteed not to change, can't be
determined externally (salting doesn't need a secret, but it doesn't hurt),
large number so probably a good salt. (If it is likely to change, a salt
override should be being used instead). Don't use any other IPv6 address.
In particular, never use a "temporary" IPv6" address like Windows assigns -
multiprocessing could end up with instances with different salts.

b. Take the FQDN of the machine.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111230/3717d78f/attachment.html>

From timothy.c.delaney at gmail.com  Thu Dec 29 21:00:33 2011
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Fri, 30 Dec 2011 07:00:33 +1100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CAN8CLgkSk3C_yOvgV8RASkX51=BK-M6JvuWbK-sgTeyOVCerbw@mail.gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
	<B9DE63E5FFCB4B9C903AFDF0CB14E4D5@gmail.com>
	<CAN8CLgkSk3C_yOvgV8RASkX51=BK-M6JvuWbK-sgTeyOVCerbw@mail.gmail.com>
Message-ID: <CAN8CLgnB5bvBTMVwX5eVsP2oZgLJMoocX5hg+ubcYDL32nZkOA@mail.gmail.com>

>
> +1 to option c (environment variable) as an override. And/or maybe an
> override on the command line.
>

That obviously should have said option b (environment variable) ...

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111230/72b6382a/attachment.html>

From pje at telecommunity.com  Thu Dec 29 21:07:59 2011
From: pje at telecommunity.com (PJ Eby)
Date: Thu, 29 Dec 2011 15:07:59 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFC6BE5.6000400@cheimes.de>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
	<4EFC6BE5.6000400@cheimes.de>
Message-ID: <CALeMXf5W27uyEx7Kmg7CetubOSLnGqQYdt=gvqf5c9zJSNqy6g@mail.gmail.com>

On Thu, Dec 29, 2011 at 8:32 AM, Christian Heimes <lists at cheimes.de> wrote:

> IMHO we don't have to alter the outcome of hash("some string"), hash(1)
> and all other related types. We just need to reduce the change the an
> attacker can produce collisions in the dict (and set?) code that looks
> up the slot (PyDictEntry). How about adding the random value in
> Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
> lookdict_unicode() (Python 3.x)? With this approach the hash of all our
> objects stay the same and just the dict code needs to be altered.


I don't understand how that helps a collision attack.  If you can still
generate two strings with the same (pre-randomized) hash, what difference
does it make that the dict adds a random number?  The post-randomized
number will still be the same, no?

Or does this attack just rely on the hash *remainders* being the same?  If
so, I can see how hashing the hash would help.  But since the attacker
doesn't know the modulus, and it can change as the dictionary grows, I
would expect the attack to require matching hashes, not just matching hash
remainders...  unless I'm just completely off base here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111229/1d848209/attachment.html>

From paul at mcmillan.ws  Thu Dec 29 22:28:23 2011
From: paul at mcmillan.ws (Paul McMillan)
Date: Thu, 29 Dec 2011 13:28:23 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CAN8CLgkSk3C_yOvgV8RASkX51=BK-M6JvuWbK-sgTeyOVCerbw@mail.gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
	<B9DE63E5FFCB4B9C903AFDF0CB14E4D5@gmail.com>
	<CAN8CLgkSk3C_yOvgV8RASkX51=BK-M6JvuWbK-sgTeyOVCerbw@mail.gmail.com>
Message-ID: <CAO_YWRVEYHj9xc5_XBrD3Rf+MYnJHZi_FGdqGSebLAnrNpXmdw@mail.gmail.com>

It's worth pointing out that if the salt is somehow exposed to an
attacker, or is guessable, much of the benefit goes away. It's likely
that a timing attack could be used to discover the salt if it is fixed
per machine or process over a long period of time.

If a salt is generally fixed per machine, but varies from
machine-to-machine, I think we'll see an influx of frustrated devs who
have something that works perfectly on their machine but not for
others. It doesn't matter that they're doing it wrong, we'll still
have to deal with them as a community. This seems like an argument in
favor of randomizing it at runtime by default, so it fails early for
them.

Allowing an environment and command line override makes sense, as it
allows users to rotate the salt as frequently as their needs dictate.

-Paul

From lists at cheimes.de  Thu Dec 29 22:31:05 2011
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 Dec 2011 22:31:05 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CALeMXf5W27uyEx7Kmg7CetubOSLnGqQYdt=gvqf5c9zJSNqy6g@mail.gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
	<4EFC6BE5.6000400@cheimes.de>
	<CALeMXf5W27uyEx7Kmg7CetubOSLnGqQYdt=gvqf5c9zJSNqy6g@mail.gmail.com>
Message-ID: <4EFCDC19.2070703@cheimes.de>

Am 29.12.2011 21:07, schrieb PJ Eby:
> I don't understand how that helps a collision attack.  If you can still
> generate two strings with the same (pre-randomized) hash, what
> difference does it make that the dict adds a random number?  The
> post-randomized number will still be the same, no?
> 
> Or does this attack just rely on the hash *remainders* being the same?
>  If so, I can see how hashing the hash would help.  But since the
> attacker doesn't know the modulus, and it can change as the dictionary
> grows, I would expect the attack to require matching hashes, not just
> matching hash remainders...  unless I'm just completely off base here.

The attack doesn't need perfect collisions. The attacker calculates
strings in a way so that their hashes results in as many collision as
possible in the dict code. An attacker succeeds when the initial slot
for an hash is filled and as many subsequent slots of the perturbed
masked hash, too. Also an attacker can easily predict the size and
therefore the mask for the hash remainder. A POST request parser usually
starts with an empty dict and the growth rate of Python's dicts is well
documented. The changing mask makes the attack just a tiny bit more
challenging.

The hash randomization idea adds a salt to throw the attacker of course.
Instead of

  position = hash & mask

it's now

  hash = salt + hash
  position = hash & mask

where salt is a random, process global value that is fixed for the life
time of the program. The salt also affects the perturbance during the
search for new slots. As you already stated this salt won't be affective
against full hash collisions.

The attack needs A LOT of problematic strings to become an issue,
possible hundred of thousands or even millions of keys in a very large
POST request. In reality an attacker won't reach the full theoretical
O(n^2) performance degradation for a hash table. But even more than O(n)
instead of O(1) for a million keys in each request has some impact on
your servers' CPUs. Some vendors have limited to POST request to 1 MB or
the amount of keys to 10,000 to work around the issue. One paper also
states that attacks on Python with 64bit is just academical for now.

Christian

From tjreedy at udel.edu  Thu Dec 29 23:19:58 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 29 Dec 2011 17:19:58 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFC4F31.3090703@active-4.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com>
Message-ID: <jdip2l$71b$1@dough.gmane.org>

The talk was a presentation yesterday by Alexander Klink and Julian 
W?lde at the Chaos Communication Congress in Germany hashDoS at alech.de
I read the non-technical summary at
http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars 

and watched the video of the talk at
https://www.youtube.com/watch?feature=player_embedded&v=_EEhviEO1Vo#

My summary: hash table creation with N keys changes from amortized O(N)
to O(N**2) time if the hash values of all the keys are the same. This 
should only happen for large N if done intentionally. This is easy to 
accomplish with a linear multiply and add hash function, such as used in 
PHP4 (but nowhere else that the authors found). A nonlinear multiply and 
xor hash function, used in one form or another by everything else, is 
much harder to break. It is *theoretically* vulnerable to brute-force 
search and this has been known for years. With a more cleaver 
meet-in-the-middle strategy, that builds a dict of suffixes and then 
searches for matching prefixes, 32-bit hashes are *practically* 
vulnerable. The attack depends on, for instance, 2**16 (64K) being 1/64K 
of 2**32. (I did not hear when this strategy was developed, but it is 
certainly more practical on a desktop now than even 8 years ago.)

[64-bit hashes are much, much less vulnerable to attack, at least for 
now. So it seems to me that anyone who hashes potential attack data can 
avoid the problem by using 64-bit Python with 64-bit hash values. If I 
understood Antoine, that should be all 64-bit builds.]

More summary: Perl added an #define option to start the hash calculation 
with non-zero value instead of 0 years ago to "avoid algorithmic 
complexity attacks". The patch is at 47:20 in the video. The authors 
believe all should do similarly.

[The change amounts to adding a char, unknown to attackers, to the 
beginning of every string before hashing. So it adds a small bit of 
time. The code patch shown did not show the source of the non-zero seed 
or the timing and scope of any randomization. As the discussion here has 
shown, this is an important issue to applications. So 'do the same' is 
inadequate and over-simplified advice. I believe Armin's patch is 
similar to the Perl patch.]

Since the authors sent out CERT alert about Nov 1, PHP has added to PHP5 
a new function to limit the number of vars hashed. Microsoft will do 
something similar now with hash randomization to follow (maybe?). JRuby 
is going to do something. Java does not think it needs to change Java 
itself, but will leave all to the frameworks.

[The discussion here suggests that this is an inadequate response for 
32-bit systems like Java since one person/group may not control all the 
pieces of a server system. However, a person or group can run all pieces 
on a system Python with an option turned on.]

-- 
Terry Jan Reedy



From tjreedy at udel.edu  Thu Dec 29 23:28:22 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 29 Dec 2011 17:28:22 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFCDC19.2070703@cheimes.de>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
	<4EFC6BE5.6000400@cheimes.de>
	<CALeMXf5W27uyEx7Kmg7CetubOSLnGqQYdt=gvqf5c9zJSNqy6g@mail.gmail.com>
	<4EFCDC19.2070703@cheimes.de>
Message-ID: <jdipid$9t6$1@dough.gmane.org>

On 12/29/2011 4:31 PM, Christian Heimes wrote:

> The hash randomization idea adds a salt to throw the attacker of course.
> Instead of
>
>    position = hash&  mask
>
> it's now
>
>    hash = salt + hash

As I understood the talk (actually, the bit of Perl interpreter C code 
shown), the randomization is to change hash(s) to hash(salt+s) so that 
the salt is completely mixed into the hash from the beginning, rather 
than just tacked on at the end.

-- 
Terry Jan Reedy


From lists at cheimes.de  Thu Dec 29 23:50:16 2011
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 Dec 2011 23:50:16 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <jdipid$9t6$1@dough.gmane.org>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
	<4EFC6BE5.6000400@cheimes.de>
	<CALeMXf5W27uyEx7Kmg7CetubOSLnGqQYdt=gvqf5c9zJSNqy6g@mail.gmail.com>
	<4EFCDC19.2070703@cheimes.de> <jdipid$9t6$1@dough.gmane.org>
Message-ID: <4EFCEEA8.8010206@cheimes.de>

Am 29.12.2011 23:28, schrieb Terry Reedy:
> As I understood the talk (actually, the bit of Perl interpreter C code 
> shown), the randomization is to change hash(s) to hash(salt+s) so that 
> the salt is completely mixed into the hash from the beginning, rather 
> than just tacked on at the end.

Yes, the Perl and Ruby code uses a random seed as IV for hash
generation. It's the best way to create randomized hashes but it might
not be a feasible fix for Python 2.x. I'm worried that it might break
applications that rely on stable hash values.

From timothy.c.delaney at gmail.com  Fri Dec 30 01:55:45 2011
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Fri, 30 Dec 2011 11:55:45 +1100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CAN8CLgkSk3C_yOvgV8RASkX51=BK-M6JvuWbK-sgTeyOVCerbw@mail.gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com> <4EFC63A3.5010008@active-4.com>
	<B9DE63E5FFCB4B9C903AFDF0CB14E4D5@gmail.com>
	<CAN8CLgkSk3C_yOvgV8RASkX51=BK-M6JvuWbK-sgTeyOVCerbw@mail.gmail.com>
Message-ID: <CAN8CLgkVc2Mdg6vG+8nj3QtJQtQh7DaRv2fk+Ax4FAkbpEf5Zg@mail.gmail.com>

On 30 December 2011 06:59, Tim Delaney <timothy.c.delaney at gmail.com> wrote:

> +0 to exposing the salt as a constant (3.3+ only) - or alternatively
> expose a hash function that just takes an existing hash and returns the
> salted hash. That would make it very easy for anything that wanted a salted
> hash to get one.
>

Sorry - brain fart on my part there - the salt needs to be included right
from the start.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111230/fed6ebb5/attachment.html>

From julien at tayon.net  Fri Dec 30 15:26:29 2011
From: julien at tayon.net (julien tayon)
Date: Fri, 30 Dec 2011 15:26:29 +0100
Subject: [Python-Dev] hello, new dict addition for new eve ?
Message-ID: <CAFpLVkwzE0e8XTFLpc7BvdF3sSaeCYJJfTRVqHZsdw=FXSiEfA@mail.gmail.com>

Hello,
Sorry to annoy the very busy core devs :) out of the blue

I quite noticed people were
1) wanting to have a new dict for Xmas
2) strongly resenting dict addition.

Even though I am not a good developper, I have come to a definition of
addition that would follow algebraic rules, and not something of a
dutch logic. (it is a jest, not a troll)

I propose the following code to validate my point of view regarding
the dictionnatry addition as a proof of concept :
https://github.com/jul/ADictAdd_iction/blob/master/test.py

It follows all my dusty math books regarding addition + it has the
amability to have rules of conservation.

I pretty much see a real advantage in this behaviour in functional
programming (map/reduce). (see the demonstrate.py), and it has a sense
(if dict can be seen has vectors).

I have been told to be a troll, but I am pretty serious.

Since, I coded with luck, no internet, intuition, and a complete
ignorance of the real meaning of the magic methods most of the time,
thus the actual implementation of the addition surely needs a complete
refactoring.

Sheers,
Bonne f?tes
Julien

From blendmaster1024 at gmail.com  Fri Dec 30 15:49:11 2011
From: blendmaster1024 at gmail.com (lahwran)
Date: Fri, 30 Dec 2011 07:49:11 -0700
Subject: [Python-Dev] Your email to the mailing list
In-Reply-To: <CAFa2EvUkpLX9VhvPfCk+yAJ5MOiZjfijaxr4GNYcjYPwWbLoyw@mail.gmail.com>
References: <CAFa2EvUkpLX9VhvPfCk+yAJ5MOiZjfijaxr4GNYcjYPwWbLoyw@mail.gmail.com>
Message-ID: <CAFa2EvWTs3b7NXVsJOx_au2bwQ7fQr1yvOeR1=0+0gg-uGO_1A@mail.gmail.com>

...oops, I did not intend to send this to the mailing list. I
apologize for the accidental off topic.

On Fri, Dec 30, 2011 at 7:40 AM, lahwran <blendmaster1024 at gmail.com> wrote:
> I don't want to post to the mailing list about this; But I must say, I
> found your email very entertaining. You have a good sense of humor.

From blendmaster1024 at gmail.com  Fri Dec 30 15:40:38 2011
From: blendmaster1024 at gmail.com (lahwran)
Date: Fri, 30 Dec 2011 07:40:38 -0700
Subject: [Python-Dev] Your email to the mailing list
Message-ID: <CAFa2EvUkpLX9VhvPfCk+yAJ5MOiZjfijaxr4GNYcjYPwWbLoyw@mail.gmail.com>

I don't want to post to the mailing list about this; But I must say, I
found your email very entertaining. You have a good sense of humor.

From guido at python.org  Fri Dec 30 17:40:06 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 30 Dec 2011 09:40:06 -0700
Subject: [Python-Dev] hello, new dict addition for new eve ?
In-Reply-To: <CAFpLVkwzE0e8XTFLpc7BvdF3sSaeCYJJfTRVqHZsdw=FXSiEfA@mail.gmail.com>
References: <CAFpLVkwzE0e8XTFLpc7BvdF3sSaeCYJJfTRVqHZsdw=FXSiEfA@mail.gmail.com>
Message-ID: <CAP7+vJ+9M_CiepXVPyJRX3rDtk4wi+=gux5Ts3m62H-aTR=OwQ@mail.gmail.com>

Hi Julien,

Don't despair! I have tried to get people to warm up to dict addition too
-- in fact it was my counter-proposal at the time when we were considering
adding sets to the language. I will look at your proposal, but I have a
point of order first: this should be discussed on python-ideas, not on
python-dev. I have added python-ideas to the thread and moved python-dev to
Bcc, so followups will hopefully all go to python-ideas.

--Guido

On Fri, Dec 30, 2011 at 7:26 AM, julien tayon <julien at tayon.net> wrote:

> Hello,
> Sorry to annoy the very busy core devs :) out of the blue
>
> I quite noticed people were
> 1) wanting to have a new dict for Xmas
> 2) strongly resenting dict addition.
>
> Even though I am not a good developper, I have come to a definition of
> addition that would follow algebraic rules, and not something of a
> dutch logic. (it is a jest, not a troll)
>
> I propose the following code to validate my point of view regarding
> the dictionnatry addition as a proof of concept :
> https://github.com/jul/ADictAdd_iction/blob/master/test.py
>
> It follows all my dusty math books regarding addition + it has the
> amability to have rules of conservation.
>
> I pretty much see a real advantage in this behaviour in functional
> programming (map/reduce). (see the demonstrate.py), and it has a sense
> (if dict can be seen has vectors).
>
> I have been told to be a troll, but I am pretty serious.
>
> Since, I coded with luck, no internet, intuition, and a complete
> ignorance of the real meaning of the magic methods most of the time,
> thus the actual implementation of the addition surely needs a complete
> refactoring.
>
> Sheers,
> Bonne f?tes
> Julien
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111230/a6b172a6/attachment.html>

From status at bugs.python.org  Fri Dec 30 18:07:34 2011
From: status at bugs.python.org (Python tracker)
Date: Fri, 30 Dec 2011 18:07:34 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20111230170734.025381CCBF@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-12-23 - 2011-12-30)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3178 (+10)
  closed 22288 (+16)
  total  25466 (+26)

Open issues with patches: 1365 


Issues opened (21)
==================

#12760: Add create mode to open()
http://bugs.python.org/issue12760  reopened by pitrou

#13294: http.server: minor code style changes.
http://bugs.python.org/issue13294  reopened by ezio.melotti

#13659: Add a help() viewer for IDLE's Shell.
http://bugs.python.org/issue13659  opened by ramchandra.apte

#13663: pootle.python.org is outdated.
http://bugs.python.org/issue13663  opened by naoki

#13664: UnicodeEncodeError in gzip when filename contains non-ascii
http://bugs.python.org/issue13664  opened by jason.coombs

#13665: TypeError: string or integer address expected instead of str i
http://bugs.python.org/issue13665  opened by jason.coombs

#13666: datetime documentation typos
http://bugs.python.org/issue13666  opened by steveire

#13668: mute ImportError in __del__ of _threading_local module
http://bugs.python.org/issue13668  opened by Zhiping.Deng

#13669: XATTR_SIZE_MAX and XATTR_LIST_MAX undefined on kfreebsd/debian
http://bugs.python.org/issue13669  opened by zbysz

#13670: Increase test coverage for pstats.py
http://bugs.python.org/issue13670  opened by andrea.crotti

#13672: Add co_qualname attribute in code objects
http://bugs.python.org/issue13672  opened by Arfrever

#13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi
http://bugs.python.org/issue13673  opened by sbt

#13674: crash in datetime.strftime
http://bugs.python.org/issue13674  opened by patrick.vrijlandt

#13676: sqlite3: Zero byte truncates string contents
http://bugs.python.org/issue13676  opened by petri.lehtinen

#13677: correct docstring for builtin compile
http://bugs.python.org/issue13677  opened by Jim.Jewett

#13679: Multiprocessing system crash
http://bugs.python.org/issue13679  opened by Rock.Achu

#13680: Aifc comptype write fix
http://bugs.python.org/issue13680  opened by Oleg.Plakhotnyuk

#13681: Aifc read compressed frames fix
http://bugs.python.org/issue13681  opened by Oleg.Plakhotnyuk

#13682: Documentation of os.fdopen() refers to non-existing bufsize ar
http://bugs.python.org/issue13682  opened by petri.lehtinen

#13683: Docs in Python 3:raise statement mistake
http://bugs.python.org/issue13683  opened by ramchandra.apte

#13684: httplib tunnel infinite loop
http://bugs.python.org/issue13684  opened by luzakiru



Most recent 15 issues with no replies (15)
==========================================

#13684: httplib tunnel infinite loop
http://bugs.python.org/issue13684

#13683: Docs in Python 3:raise statement mistake
http://bugs.python.org/issue13683

#13682: Documentation of os.fdopen() refers to non-existing bufsize ar
http://bugs.python.org/issue13682

#13680: Aifc comptype write fix
http://bugs.python.org/issue13680

#13677: correct docstring for builtin compile
http://bugs.python.org/issue13677

#13668: mute ImportError in __del__ of _threading_local module
http://bugs.python.org/issue13668

#13666: datetime documentation typos
http://bugs.python.org/issue13666

#13665: TypeError: string or integer address expected instead of str i
http://bugs.python.org/issue13665

#13664: UnicodeEncodeError in gzip when filename contains non-ascii
http://bugs.python.org/issue13664

#13649: termios.ICANON is not documented
http://bugs.python.org/issue13649

#13641: decoding functions in the base64 module could accept unicode s
http://bugs.python.org/issue13641

#13640: add mimetype for application/vnd.apple.mpegurl
http://bugs.python.org/issue13640

#13638: PyErr_SetFromErrnoWithFilenameObject is undocumented
http://bugs.python.org/issue13638

#13633: Handling of hex character references in HTMLParser.handle_char
http://bugs.python.org/issue13633

#13631: readline fails to parse some forms of .editrc under editline (
http://bugs.python.org/issue13631



Most recent 15 issues waiting for review (15)
=============================================

#13684: httplib tunnel infinite loop
http://bugs.python.org/issue13684

#13681: Aifc read compressed frames fix
http://bugs.python.org/issue13681

#13680: Aifc comptype write fix
http://bugs.python.org/issue13680

#13677: correct docstring for builtin compile
http://bugs.python.org/issue13677

#13676: sqlite3: Zero byte truncates string contents
http://bugs.python.org/issue13676

#13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi
http://bugs.python.org/issue13673

#13670: Increase test coverage for pstats.py
http://bugs.python.org/issue13670

#13668: mute ImportError in __del__ of _threading_local module
http://bugs.python.org/issue13668

#13651: Improve redirection in urllib
http://bugs.python.org/issue13651

#13645: import machinery vulnerable to timestamp collisions
http://bugs.python.org/issue13645

#13640: add mimetype for application/vnd.apple.mpegurl
http://bugs.python.org/issue13640

#13636: Python SSL Stack doesn't have a Secure Default set of ciphers
http://bugs.python.org/issue13636

#13631: readline fails to parse some forms of .editrc under editline (
http://bugs.python.org/issue13631

#13629: _PyParser_TokenNames does not match up with the token.h number
http://bugs.python.org/issue13629

#13619: Add a new codec: "locale", the current locale encoding
http://bugs.python.org/issue13619



Top 10 most discussed issues (10)
=================================

#13679: Multiprocessing system crash
http://bugs.python.org/issue13679  10 msgs

#13674: crash in datetime.strftime
http://bugs.python.org/issue13674   9 msgs

#13669: XATTR_SIZE_MAX and XATTR_LIST_MAX undefined on kfreebsd/debian
http://bugs.python.org/issue13669   8 msgs

#8828: Atomic function to rename a file
http://bugs.python.org/issue8828   7 msgs

#9260: A finer grained import lock
http://bugs.python.org/issue9260   5 msgs

#13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi
http://bugs.python.org/issue13673   5 msgs

#6028: Interpreter aborts when chaining an infinite number of excepti
http://bugs.python.org/issue6028   3 msgs

#13565: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Le
http://bugs.python.org/issue13565   3 msgs

#13657: IDLE doesn't support sys.ps1 and sys.ps2.
http://bugs.python.org/issue13657   3 msgs

#13672: Add co_qualname attribute in code objects
http://bugs.python.org/issue13672   3 msgs



Issues closed (17)
==================

#3555: Regression: nested exceptions crash (Cannot recover from stack
http://bugs.python.org/issue3555  closed by terry.reedy

#7338: recursive __attribute__ -> Fatal Python error: Cannot recover 
http://bugs.python.org/issue7338  closed by terry.reedy

#11638: python setup.py sdist --formats tar* crashes if version	is uni
http://bugs.python.org/issue11638  closed by jason.coombs

#11812: transient socket failure to connect to 'localhost'
http://bugs.python.org/issue11812  closed by neologix

#13632: Update token documentation to reflect actual token types
http://bugs.python.org/issue13632  closed by meador.inge

#13639: UnicodeDecodeError when creating tar.gz with unicode name
http://bugs.python.org/issue13639  closed by terry.reedy

#13643: 'ascii' is a bad filesystem default encoding
http://bugs.python.org/issue13643  closed by terry.reedy

#13644: Python 3 aborts with this code.
http://bugs.python.org/issue13644  closed by terry.reedy

#13658: Extra clause in class grammar documentation
http://bugs.python.org/issue13658  closed by python-dev

#13660: maniandram maniandram wants to chat
http://bugs.python.org/issue13660  closed by pitrou

#13661: maniandram maniandram wants to chat
http://bugs.python.org/issue13661  closed by pitrou

#13662: os.listdir bug
http://bugs.python.org/issue13662  closed by ezio.melotti

#13667: __contains__ method behavior
http://bugs.python.org/issue13667  closed by benjamin.peterson

#13671: double comma cant be parsed in config module
http://bugs.python.org/issue13671  closed by lukasz.langa

#13675: IDLE won't open if it can't read recent-files.lst
http://bugs.python.org/issue13675  closed by michael.foord

#13678: way to prevent accidental variable overriding
http://bugs.python.org/issue13678  closed by benjamin.peterson

#12715: Add symlink support to shutil functions
http://bugs.python.org/issue12715  closed by pitrou

From brian at python.org  Fri Dec 30 20:29:36 2011
From: brian at python.org (Brian Curtin)
Date: Fri, 30 Dec 2011 13:29:36 -0600
Subject: [Python-Dev] [Python-checkins] cpython: Issue #12715: Add an
 optional symlinks argument to shutil functions (copyfile, 
In-Reply-To: <E1RgKC8-0006BJ-9N@dinsdale.python.org>
References: <E1RgKC8-0006BJ-9N@dinsdale.python.org>
Message-ID: <CAD+XWwpSMA0E6euhMZO5eZvWsYJCN6-KikMa9ryLt8cd3OkJ3Q@mail.gmail.com>

On Thu, Dec 29, 2011 at 11:55, antoine.pitrou
<python-checkins at python.org> wrote:
> http://hg.python.org/cpython/rev/cf57ef65bcd0
> changeset: ? 74194:cf57ef65bcd0
> user: ? ? ? ?Antoine Pitrou <solipsis at pitrou.net>
> date: ? ? ? ?Thu Dec 29 18:54:15 2011 +0100
> summary:
> ?Issue #12715: Add an optional symlinks argument to shutil functions (copyfile, copymode, copystat, copy, copy2).
> When that parameter is true, symlinks aren't dereferenced and the operation
> instead acts on the symlink itself (or creates one, if relevant).
>
> Patch by Hynek Schlawack.
>
> files:
> ?Doc/library/shutil.rst ?| ? 46 ++++-
> ?Lib/shutil.py ? ? ? ? ? | ?101 +++++++++---
> ?Lib/test/test_shutil.py | ?219 ++++++++++++++++++++++++++++
> ?Misc/NEWS ? ? ? ? ? ? ? | ? ?5 +
> ?4 files changed, 333 insertions(+), 38 deletions(-)
>
>
> diff --git a/Doc/library/shutil.rst b/Doc/library/shutil.rst
> --- a/Doc/library/shutil.rst
> +++ b/Doc/library/shutil.rst
> @@ -45,7 +45,7 @@
> ? ?be copied.
>
>
> -.. function:: copyfile(src, dst)
> +.. function:: copyfile(src, dst[, symlinks=False])
>
> ? ?Copy the contents (no metadata) of the file named *src* to a file named *dst*.
> ? ?*dst* must be the complete target file name; look at :func:`copy` for a copy that
> @@ -56,37 +56,56 @@
> ? ?such as character or block devices and pipes cannot be copied with this
> ? ?function. ?*src* and *dst* are path names given as strings.
>
> + ? If *symlinks* is true and *src* is a symbolic link, a new symbolic link will
> + ? be created instead of copying the file *src* points to.
> +
> ? ?.. versionchanged:: 3.3
> ? ? ? :exc:`IOError` used to be raised instead of :exc:`OSError`.
> + ? ? ?Added *symlinks* argument.

Can we expect that readers on Windows know how os.symlink works, or
should the stipulations of os.symlink usage also be laid out or at
least linked to from there?

Basically, almost everyone is going to get an OSError if they call
this on Windows. You have to be on Windows Vista or beyond *and* the
calling process has to have the proper privileges (typically gained
through elevation - "Run as Administrator").

From solipsis at pitrou.net  Fri Dec 30 20:39:20 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 30 Dec 2011 20:39:20 +0100
Subject: [Python-Dev] cpython: Issue #12715: Add an optional symlinks
 argument to shutil functions (copyfile, 
References: <E1RgKC8-0006BJ-9N@dinsdale.python.org>
	<CAD+XWwpSMA0E6euhMZO5eZvWsYJCN6-KikMa9ryLt8cd3OkJ3Q@mail.gmail.com>
Message-ID: <20111230203920.0cf28b1e@pitrou.net>

On Fri, 30 Dec 2011 13:29:36 -0600
Brian Curtin <brian at python.org> wrote:
> 
> Can we expect that readers on Windows know how os.symlink works, or
> should the stipulations of os.symlink usage also be laid out or at
> least linked to from there?

I assume it won't make a difference in real life, since symlinks are
quite rare under Windows.

> Basically, almost everyone is going to get an OSError if they call
> this on Windows. You have to be on Windows Vista or beyond *and* the
> calling process has to have the proper privileges (typically gained
> through elevation - "Run as Administrator").

I still haven't managed to use symlinks under Windows 7, myself.
The recipes I've tried didn't work.

Regards

Antoine.




From brian at python.org  Fri Dec 30 20:51:33 2011
From: brian at python.org (Brian Curtin)
Date: Fri, 30 Dec 2011 13:51:33 -0600
Subject: [Python-Dev] cpython: Issue #12715: Add an optional symlinks
 argument to shutil functions (copyfile, 
In-Reply-To: <20111230203920.0cf28b1e@pitrou.net>
References: <E1RgKC8-0006BJ-9N@dinsdale.python.org>
	<CAD+XWwpSMA0E6euhMZO5eZvWsYJCN6-KikMa9ryLt8cd3OkJ3Q@mail.gmail.com>
	<20111230203920.0cf28b1e@pitrou.net>
Message-ID: <CAD+XWwqk47WezsFuSjhMAP-Ju4y_Dv2R9QSjXhDuB1oSE53fLQ@mail.gmail.com>

On Fri, Dec 30, 2011 at 13:39, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Fri, 30 Dec 2011 13:29:36 -0600
> Brian Curtin <brian at python.org> wrote:
>>
>> Can we expect that readers on Windows know how os.symlink works, or
>> should the stipulations of os.symlink usage also be laid out or at
>> least linked to from there?
>
> I assume it won't make a difference in real life, since symlinks are
> quite rare under Windows.
>
>> Basically, almost everyone is going to get an OSError if they call
>> this on Windows. You have to be on Windows Vista or beyond *and* the
>> calling process has to have the proper privileges (typically gained
>> through elevation - "Run as Administrator").
>
> I still haven't managed to use symlinks under Windows 7, myself.
> The recipes I've tried didn't work.

This might be a place where an image in the documentation would be
helpful. I don't think we do that anywhere else, but maybe I could add
it to the (sorely out of date and in need of a rebuild) Windows FAQ?

What you need to do on Win7 is go to Start > All Programs >
Accessories > Command Prompt, but right click on it instead of left
click. Choose "Run as Administrator", then it'll make you choose yes
or no to elevate privileges. At that point, deep in the heart of
everyone's favorite operating system, it should acquire the
SeCreateSymbolicLink user privilege. After that, os.symlink should
work fine.

From jimjjewett at gmail.com  Sat Dec 31 02:04:39 2011
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 30 Dec 2011 20:04:39 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
Message-ID: <CA+OGgf6aaqz8e7t0cdznXjFfdNNEYMpOYvJ3_QC6S18DU7TXnQ@mail.gmail.com>

In http://mail.python.org/pipermail/python-dev/2011-December/115138.html,
Christian Heimes
pointed out that

> ... we don't have to alter the outcome of hash ... We just need to reduce the chance that
> an attacker can produce collisions in the dict (and set?)

I'll state it more strongly.  hash probably should not change (at
least for this), but we may
want to consider a different conflict resolution strategy when the
first slot is already filled.

Remember that there was a fair amount of thought and timing effort put
into selecting the
current strategy; it is deliberately sub-optimal for random input, in
order to do better with
typical input.      <
http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictnotes.txt >


If there is a change, it would currently be needed in three places for
each of set and dict
(the lookdict functions and insertdict_clean).  It may be worth adding
some macros just to
keep those six in sync. Once those macros are in place, that allows a
compile-time switch.

My personal opinion is that accepting *and parsing* enough data for
this to be a problem
is enough of an edge case that I don't want normal dicts slowed down
at all for this; I would
therefore prefer that the change be restricted to such a compile-time
switch, with current
behavior the default.


http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictobject.c#l571

   583    for (perturb = hash; ep->me_key != NULL; perturb >>= PERTURB_SHIFT) {
   584         i = (i << 2) + i + perturb + 1;

PERTURB_SHIFT is already a private #define to 5; per dictnotes, 4 and 6 perform
almost as well.  Someone worried can easily make that change today,
and be protected
from "generic" anti-python attacks.

I believe the salt suggestions have equivalent to replacing
perturb = hash;
with something like    perturb = hash + salt;

Changing     i = (i << 2) + i + perturb + 1;    would allow
effectively replacing the initial hash,
but risks spoiling performance in the non-adversary case.

Would there be objections to replacing those two lines with something like:

    for (perturb = FIRST_PERTURB(hash, key);
         ep->me_key != NULL;
         perturb = NEXT_PERTURB(hash, key, perturb)) {
        i = NEXT_SLOT(i, perturb);


The default macro definitions should keep things as they are

    #define FIRST_PERTURB(hash, key)    hash
    #define NEXT_PERTURB(hash, key, perturb)    perturb >> PERTURB_SHIFT
    #define NEXT_SLOT(i, perturb)    (i << 2) + i + perturb + 1

while allowing #ifdefs for (slower but) safer things like adding a
salt, or even using
alternative hashes.

-jJ

From victor.stinner at haypocalc.com  Sat Dec 31 03:22:24 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Sat, 31 Dec 2011 03:22:24 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
Message-ID: <4EFE71E0.2000505@haypocalc.com>

Le 29/12/2011 02:28, Michael Foord a ?crit :
> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>
> 	 http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

This PDF doesn't explain exactly the problem and how it can be solved. 
Let's try to summarize this "vulnerability".


The creation of a Python dictionary has a complexity of O(n) in most 
cases, but O(n^2) in the *worst* case. The attack tries to go into the 
worst case. It requires to compute a set of N keys having the same hash 
value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to 
compute these keys once. It looks like it is now cheap enough in 
practice to compute this dataset for Python (and other languages).

A countermeasure would be to check that we don't have more than X keys 
with the same hash value. But in practice, we don't know in advance how 
data are processed, and there are too many input vectors in various formats.

If we want to fix something, it should be done in the implementation of 
the dict type or in the hash algorithm. We can implement dict 
differently to avoid this issue, using a binary tree for example. 
Because dict is a fundamental type in Python, I don't think that we can 
change its implementation (without breaking backward compatibility and 
so applications in production). A possibility would be to add a *new* 
type, but all libraries and applications would need to be changed to fix 
the vulnerability.

The last choice is to change the hash algorithm. The *idea* is the same 
than adding salt to hashed password (in practice it will be a little bit 
different): if a pseudo-random salt is added, the attacker cannot 
prepare a single dataset, he/she will have to regenerate a new dataset 
for each possible salt value. If the salt is big enough (size in bits), 
the attacker will need too much CPU to generate the dataset (compute N 
keys with the same hash value). Basically, it slows down the attack by 
2^(size of the salt).

--

Another possibility would be to replace our fast hash function by a 
better hash function like MD5 or SHA1 (so the creation of the dataset 
would be too slow in practice = too expensive), but cryptographic hash 
functions are much slower (and so would slow down Python too much).

Limiting the size of the POST data doesn't solve the problem because 
there are many other input vectors and data formats. It may block the 
most simple attacks because the attacker cannot inject enough keys to 
slow down your CPU.

Victor

From steve at pearwood.info  Sat Dec 31 03:19:01 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 31 Dec 2011 13:19:01 +1100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CA+OGgf6aaqz8e7t0cdznXjFfdNNEYMpOYvJ3_QC6S18DU7TXnQ@mail.gmail.com>
References: <CA+OGgf6aaqz8e7t0cdznXjFfdNNEYMpOYvJ3_QC6S18DU7TXnQ@mail.gmail.com>
Message-ID: <4EFE7115.9070502@pearwood.info>

Jim Jewett wrote:

> My personal opinion is that accepting *and parsing* enough data for
> this to be a problem
> is enough of an edge case that I don't want normal dicts slowed down
> at all for this; I would
> therefore prefer that the change be restricted to such a compile-time
> switch, with current behavior the default.

By compile-time, do you mean when the byte-code is compilated, i.e. just 
before runtime, rather than a switch when compiling the Python executable from 
source? I will assume so.

I'm not a big fan of compile-time (runtime) switches. It makes it too hard to 
compare before-and-after behaviour within a single session, and impossible to 
have fine control over which objects have which behaviour. I don't like 
all-or-nothing settings. (E.g. I'd love to be able to turn -O optimization on 
and off on a per-function basis, but can't.)

How about using a similar strategy to the current dict behaviour with 
__missing__ and defaultdict? Here's my suggestion:


- If a dict subclass defines __salt__, then it is called to salt the hash
   value before lookups. If __salt__ is undefined or None, the current
   behaviour remains unchanged.

- Add a dict subclass (saltdict, for lack of a better name) that defines
   __salt__ appropriately to the collections module. In this case, I don't
   know enough to suggest what is an appropriate salt. I leave that to the
   security experts to argue about.

- Update the relevant standard library modules to use saltdict where needed.


This allows a single application or framework to use saltdict where necessary, 
without slowing down all dict accesses. Dicts which never see user-generated 
input (e.g. globals) can remain full-speed.

If there is no consensus about the best salting strategy, then apps can choose 
their own by subclassing dict.

Responsibility for doing the right thing falls onto the library author, rather 
than Python itself. Some people may consider that a minus.




-- 
Steven


From victor.stinner at haypocalc.com  Sat Dec 31 03:31:03 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Sat, 31 Dec 2011 03:31:03 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFC68E0.4000606@cheimes.de>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de>
Message-ID: <4EFE73E7.3070500@haypocalc.com>

Le 29/12/2011 14:19, Christian Heimes a ?crit :
> Perhaps the dict code is a better place for randomization.

The problem is the creation of a dict with keys all having the same hash 
value. The current implementation of dict uses a linked-list. Adding a 
new item requires to compare the new key to all existing keys (compare 
the value, not the hash, which is much slower).

We had to change completly how dict is implemented to be able to fix 
this issue. I don't think that we can change the dict implementation 
without breaking backward compatibility or breaking applications. Change 
the implementation would change dict properties, and applications rely 
on the properties of the current implementation.

Tell me if I am wrong.

Victor

From victor.stinner at haypocalc.com  Sat Dec 31 03:39:45 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Sat, 31 Dec 2011 03:39:45 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFC4F31.3090703@active-4.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4F31.3090703@active-4.com>
Message-ID: <4EFE75F1.9030305@haypocalc.com>

> In case the watchdog is not a viable solution as I had assumed it was, I
> think it's more reasonable to indeed consider adding a flag to Python
> that allows randomization of hashes optionally before startup.

A flag will only be needed if the overhead of the fix is too high.

> However as it was said earlier, the attack is a lot more complex to
> carry out on a 64bit environment that it's probably (as it stands right
> now!) safe to ignore.

I suppose that there are still servers running 32 bits Python.

> The main problem there however is not that it's a new attack but that
> some dickheads could now make prebaked attacks against websites to
> disrupt them that might cause some negative publicity.  In general
> though there are so many more ways to DDOS a website than this that I
> would rate the whole issue very low.

There are countermeasures for low level DDOS (ICMP ping flood, TCP syn 
flood, etc.). An application (or a firewall) cannot implement a 
countermeasure for this high level issue. It can only be fixed in Python 
directly (by changing the implementation of the dict type or of the hash 
function).

Victor

From lists at cheimes.de  Sat Dec 31 04:24:15 2011
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 31 Dec 2011 04:24:15 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFE73E7.3070500@haypocalc.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de>
	<4EFE73E7.3070500@haypocalc.com>
Message-ID: <4EFE805F.6000302@cheimes.de>

Am 31.12.2011 03:31, schrieb Victor Stinner:
> Le 29/12/2011 14:19, Christian Heimes a ?crit :
>> Perhaps the dict code is a better place for randomization.
> 
> The problem is the creation of a dict with keys all having the same hash 
> value. The current implementation of dict uses a linked-list. Adding a 
> new item requires to compare the new key to all existing keys (compare 
> the value, not the hash, which is much slower).
> 
> We had to change completly how dict is implemented to be able to fix 
> this issue. I don't think that we can change the dict implementation 
> without breaking backward compatibility or breaking applications. Change 
> the implementation would change dict properties, and applications rely 
> on the properties of the current implementation.
> 
> Tell me if I am wrong.

You are right and I was wrong. We can't do the randomization inside the
dict code. The randomization factor must used as initialization factor
(IV) for the hashing algorithm. At first I thought the attack used the
unique property of Python's dict implementation (perturbed hash instead
of buckets for equal hashes) but I was wrong. It took me several hours
of reading and digging into the math until I figured out my mistake.
Sorry! :)

This means we can't implement a salted dict unless the salted dict
re-implemention the hash algorithm for unicode, bytes and memoryview. I
doubt that a wise idea.

I've checked my first draft of a possible solution:
http://hg.python.org/features/randomhash/ . The pseudo RNG has to be
replaced with something better and it's missing an option to feed a
seed, too.

Christian

From lists at cheimes.de  Sat Dec 31 04:28:18 2011
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 31 Dec 2011 04:28:18 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFE7115.9070502@pearwood.info>
References: <CA+OGgf6aaqz8e7t0cdznXjFfdNNEYMpOYvJ3_QC6S18DU7TXnQ@mail.gmail.com>
	<4EFE7115.9070502@pearwood.info>
Message-ID: <4EFE8152.20109@cheimes.de>

Am 31.12.2011 03:19, schrieb Steven D'Aprano:
> How about using a similar strategy to the current dict behaviour with 
> __missing__ and defaultdict? Here's my suggestion:
> 
> 
> - If a dict subclass defines __salt__, then it is called to salt the hash
>    value before lookups. If __salt__ is undefined or None, the current
>    behaviour remains unchanged.

This was my initial proposal, too. It took me a while to figure out that
it won't work. Post-salting won't fix the issue. The random seed must be
used as IV inside hashing algorithm. My brain was still in holiday mode
and it took me a while to figure out the math. Sorry for any confusion!

Christian

From lists at cheimes.de  Sat Dec 31 04:59:41 2011
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 31 Dec 2011 04:59:41 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFE71E0.2000505@haypocalc.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFE71E0.2000505@haypocalc.com>
Message-ID: <4EFE88AD.2060505@cheimes.de>

Am 31.12.2011 03:22, schrieb Victor Stinner:
> The creation of a Python dictionary has a complexity of O(n) in most 
> cases, but O(n^2) in the *worst* case. The attack tries to go into the 
> worst case. It requires to compute a set of N keys having the same hash 
> value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to 
> compute these keys once. It looks like it is now cheap enough in 
> practice to compute this dataset for Python (and other languages).

Correct. The meet-in-the-middle attack and the unique properties of
algorithms that are similar to DJBX33A and DJBX33A make the attack easy
on platforms with 32bit hash.

> A countermeasure would be to check that we don't have more than X keys 
> with the same hash value. But in practice, we don't know in advance how 
> data are processed, and there are too many input vectors in various formats.
> 
> If we want to fix something, it should be done in the implementation of 
> the dict type or in the hash algorithm. We can implement dict 
> differently to avoid this issue, using a binary tree for example. 
> Because dict is a fundamental type in Python, I don't think that we can 
> change its implementation (without breaking backward compatibility and 
> so applications in production). A possibility would be to add a *new* 
> type, but all libraries and applications would need to be changed to fix 
> the vulnerability.

A BTree is too slow for common operations, it's O(log n) instead of O(1)
in average. We can't replace our dict with a btree type. A new btree
type is a lot of work, too.

The unique structure of CPython's dict implementation makes it harder to
get the number of values with equal hash. The academic hash map (the one
I learnt about at university) uses a bucket to store all elements with
equal hash (more precise hash: mod mask). However Python's dict however
perturbs the hash until it finds a free slot its array. The second,
third ... collision can be caused by a legit and completely different
(!) hash.

> The last choice is to change the hash algorithm. The *idea* is the same 
> than adding salt to hashed password (in practice it will be a little bit 
> different): if a pseudo-random salt is added, the attacker cannot 
> prepare a single dataset, he/she will have to regenerate a new dataset 
> for each possible salt value. If the salt is big enough (size in bits), 
> the attacker will need too much CPU to generate the dataset (compute N 
> keys with the same hash value). Basically, it slows down the attack by 
> 2^(size of the salt).

That's the idea of randomized hashing functions as implemented by Ruby
1.8, Perl and others. The random seed is used as IV. Multiple rounds of
multiply, XOR and MOD (integer overflows) cause a deviation. In your
other posting you were worried about the performance implication. A
randomized hash function just adds a single ADD operation, that's all.

Downside: With randomization all hashes are unpredictable and change
after every restart of the interpreter. This has some subtle side
effects like a different outcome of {a:1, b:1, c:1}.keys() after a
restart of the interpreter.

> Another possibility would be to replace our fast hash function by a 
> better hash function like MD5 or SHA1 (so the creation of the dataset 
> would be too slow in practice = too expensive), but cryptographic hash 
> functions are much slower (and so would slow down Python too much).

I agree with your analysis. Cryptographic hash functions are far too
slow for our use case. During my research I found another hash function
that claims to be fast and that may not be vulnerable to this kind of
attack: http://isthe.com/chongo/tech/comp/fnv/

Christian

From tjreedy at udel.edu  Sat Dec 31 06:02:43 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 31 Dec 2011 00:02:43 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CA+OGgf6aaqz8e7t0cdznXjFfdNNEYMpOYvJ3_QC6S18DU7TXnQ@mail.gmail.com>
References: <CA+OGgf6aaqz8e7t0cdznXjFfdNNEYMpOYvJ3_QC6S18DU7TXnQ@mail.gmail.com>
Message-ID: <jdm51r$c4h$1@dough.gmane.org>

On 12/30/2011 8:04 PM, Jim Jewett wrote:

> I'll state it more strongly.  hash probably should not change (at
> least for this),

I agree, especially since the vulnerability can be avoided by using 64 
bit servers and will generally abate as more switch anyway.

 > but we may
> want to consider a different conflict resolution strategy when the
> first slot is already filled.
>
> Remember that there was a fair amount of thought and timing effort put
> into selecting the
> current strategy; it is deliberately sub-optimal for random input, in
> order to do better with
> typical input.<
> http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictnotes.txt>

It would be good to have a set of attack strings to see how vulernerable 
Py dicts actually are (Python may not have been actually tested with 
data) and the affect of any change. I gave the project email of the 2 
presenters in my first post. They apparently want to work with language 
developers to improve defenses against attack.

-- 
Terry Jan Reedy


From stephen at xemacs.org  Sat Dec 31 13:03:22 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 31 Dec 2011 21:03:22 +0900
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFE71E0.2000505@haypocalc.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFE71E0.2000505@haypocalc.com>
Message-ID: <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp>

Victor Stinner writes:

 > Let's try to summarize this "vulnerability".
 > 
 > The creation of a Python dictionary has a complexity of O(n) in most 
 > cases, but O(n^2) in the *worst* case. The attack tries to go into the 
 > worst case. It requires to compute a set of N keys having the same hash 
 > value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to 
 > compute these keys once. It looks like it is now cheap enough in 
 > practice to compute this dataset for Python (and other languages).

I don't know the implementation issues well enough to claim it is a
solution, but this hasn't been mentioned before AFAICS:

While the dictionary probe has to start with a hash for backward
compatibility reasons, is there a reason the overflow strategy for
insertion has to be buckets containing lists?  How about
double-hashing, etc?

From lists at cheimes.de  Sat Dec 31 15:16:24 2011
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 31 Dec 2011 15:16:24 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFE71E0.2000505@haypocalc.com>
	<87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4EFF1938.6080809@cheimes.de>

Am 31.12.2011 13:03, schrieb Stephen J. Turnbull:
> I don't know the implementation issues well enough to claim it is a
> solution, but this hasn't been mentioned before AFAICS:
> 
> While the dictionary probe has to start with a hash for backward
> compatibility reasons, is there a reason the overflow strategy for
> insertion has to be buckets containing lists?  How about
> double-hashing, etc?

Python's dict implementation doesn't use bucket but open addressing (aka
closed hashed table). The algorithm for conflict resolution doesn't use
double hashing. Instead it takes the original and (in most cases) cached
hash and perturbs the hash with a series of add, multiply and bit shift ops.

From martin at v.loewis.de  Sat Dec 31 15:40:34 2011
From: martin at v.loewis.de (martin at v.loewis.de)
Date: Sat, 31 Dec 2011 15:40:34 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <4EFE73E7.3070500@haypocalc.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de>
	<4EFE73E7.3070500@haypocalc.com>
Message-ID: <20111231154034.Horde.Cc8gWML8999O-x7iTQRBreA@webmail.df.eu>


Zitat von Victor Stinner <victor.stinner at haypocalc.com>:

> The current implementation of dict uses a linked-list.
[...]
> Tell me if I am wrong.

At least with this statement, you are wrong: the current
implementation does *not* use a linked-list. Instead, it
uses open addressing.

Regards,
Martin




From pje at telecommunity.com  Sat Dec 31 19:04:28 2011
From: pje at telecommunity.com (PJ Eby)
Date: Sat, 31 Dec 2011 13:04:28 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<4EFE71E0.2000505@haypocalc.com>
	<87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CALeMXf5uGzzOCskFy11xRMzLBOsQ-ykfg7xwj1hV2P83szc3_A@mail.gmail.com>

On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull <stephen at xemacs.org>wrote:

> While the dictionary probe has to start with a hash for backward
> compatibility reasons, is there a reason the overflow strategy for
> insertion has to be buckets containing lists?  How about
> double-hashing, etc?
>

This won't help, because the keys still have the same hash value. ANYTHING
you do to them after they're generated will result in them still colliding.

The *only* thing that works is to change the hash function in such a way
that the strings end up with different hashes in the first place.
 Otherwise, you'll still end up with (deliberate) collisions.

(Well, technically, you could use trees or some other O log n data
structure as a fallback once you have too many collisions, for some value
of "too many".  Seems a bit wasteful for the purpose, though.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111231/72554d42/attachment.html>

From jyasskin at gmail.com  Sat Dec 31 22:04:02 2011
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Sat, 31 Dec 2011 13:04:02 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
Message-ID: <CA+6j2ggPu0nEk-d2DyZAnCFxXjvbU9sVEtDj1MV=CAv8XKHqLA@mail.gmail.com>

On Wed, Dec 28, 2011 at 5:37 PM, Jesse Noller <jnoller at gmail.com> wrote:
>
>
> On Wednesday, December 28, 2011 at 8:28 PM, Michael Foord wrote:
>
>> Hello all,
>>
>> A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
>>
>> http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>>
>> Although it's a security issue I'm posting it here because it is now public and seems important.
>>
>> The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
>>
>> reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB
>> 7 minutes of CPU usage for a 1 MB request
>> ~20 kbits/s ? keep one Core Duo core busy
>>
>> This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
>>
>> The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
>>
>> Their recommended fix is to randomize the hash function.
>>
>> All the best,
>>
>> Michael
>>
> Back up link for the PDF:
> http://dl.dropbox.com/u/1374/2007_28C3_Effective_DoS_on_web_application_platforms.pdf
>
> Ocert disclosure:
> http://www.ocert.org/advisories/ocert-2011-003.html

Discussion of hash functions in general:
http://burtleburtle.net/bob/hash/doobs.html
Two of the best hash functions that currently exist:
http://code.google.com/p/cityhash/ and
http://code.google.com/p/smhasher/wiki/MurmurHash.

I'm not sure exactly what problem the paper is primarily complaining about:
1. Multiply+add and multiply+xor hashes are weak: this would be solved
by changing to either of the better-and-faster hashes I linked to
above. On the other hand:
http://mail.python.org/pipermail/python-3000/2007-September/010327.html
2. It's possible to find collisions in any hash function in a 32-bit
space: only solved by picking a varying seed at startup or compile
time.

If you decide to change to a stronger hash function overall, it might
also be useful to change the advice "to somehow mix together (e.g.
using exclusive or) the hash values for the components" in
http://docs.python.org/py3k/reference/datamodel.html#object.__hash__.
hash(tuple(components)) will likely be better if tuple's hash is
improved.

Hash functions are already unstable across Python versions. Making
them unstable across interpreter processes (multiprocessing doesn't
share dicts, right?) doesn't sound like a big additional problem.
Users who want a distributed hash table will need to pull their own
hash function out of hashlib or re-implement a non-cryptographic hash
instead of using the built-in one, but they probably need to do that
already to allow themselves to upgrade Python.

Jeffrey

From pje at telecommunity.com  Sat Dec 31 22:43:00 2011
From: pje at telecommunity.com (PJ Eby)
Date: Sat, 31 Dec 2011 16:43:00 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CA+6j2ggPu0nEk-d2DyZAnCFxXjvbU9sVEtDj1MV=CAv8XKHqLA@mail.gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
	<CA+6j2ggPu0nEk-d2DyZAnCFxXjvbU9sVEtDj1MV=CAv8XKHqLA@mail.gmail.com>
Message-ID: <CALeMXf4uC3RZx44iQHK+NXLACYtXOhyiWCEbMtP-j_y1gpZ4RA@mail.gmail.com>

On Sat, Dec 31, 2011 at 4:04 PM, Jeffrey Yasskin <jyasskin at gmail.com> wrote:

> Hash functions are already unstable across Python versions. Making
> them unstable across interpreter processes (multiprocessing doesn't
> share dicts, right?) doesn't sound like a big additional problem.
> Users who want a distributed hash table will need to pull their own
> hash function out of hashlib or re-implement a non-cryptographic hash
> instead of using the built-in one, but they probably need to do that
> already to allow themselves to upgrade Python.
>

Here's an idea.  Suppose we add a sys.hash_seed or some such, that's
settable to an int, and defaults to whatever we're using now.  Then
programs that want a fix can just set it to a random number, and on Python
versions that support it, it takes effect.  Everywhere else it's a silent
no-op.

Downside: sys has to have slots for this to work; does sys actually have
slots?  My memory's hazy on that.  I guess actually it'd have to be
sys.set_hash_seed().  But same basic idea.

Anyway, this would make fixing the problem *possible*, while still pushing
off the hard decisions to the app/framework developers.  ;-)

Downside: every hash operation includes one extra memory access, but
strings only compute their hash once anyway.)

Given that changing dict won't help, and changing the default hash is a
non-starter, an option to set the seed is probably the way to go.  (Maybe
with an environment variable and/or command line option so users can work
around old code.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111231/0cb7bfb8/attachment.html>

From tjreedy at udel.edu  Sat Dec 31 23:38:48 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 31 Dec 2011 17:38:48 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <CALeMXf4uC3RZx44iQHK+NXLACYtXOhyiWCEbMtP-j_y1gpZ4RA@mail.gmail.com>
References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk>
	<0F70678AC2164512A7E6FCADB2F37EA8@gmail.com>
	<CA+6j2ggPu0nEk-d2DyZAnCFxXjvbU9sVEtDj1MV=CAv8XKHqLA@mail.gmail.com>
	<CALeMXf4uC3RZx44iQHK+NXLACYtXOhyiWCEbMtP-j_y1gpZ4RA@mail.gmail.com>
Message-ID: <jdo2u1$k0u$1@dough.gmane.org>

On 12/31/2011 4:43 PM, PJ Eby wrote:

> Here's an idea.  Suppose we add a sys.hash_seed or some such, that's
> settable to an int, and defaults to whatever we're using now.  Then
> programs that want a fix can just set it to a random number,

I do not think we can allow that to change once there are hashed 
dictionaries existing.

-- 
Terry Jan Reedy