[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Bengt Richter bokr at oz.net
Sat Feb 11 09:20:27 CET 2006


On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum <guido at python.org> wrote:

>> On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <nas at arctrix.com> > >The backwards compatibility problems *seem* to be relatively minor.
>> >I only found one instance of breakage in the standard library.  Note
>> >that my patch does not change PyObject_Str(); that would break
>> >massive amounts of code.  Instead, I introduce a new function:
>> >PyString_New().  I'm not crazy about the name but I couldn't think
>> >of anything better.
>
>On 2/10/06, Bengt Richter <bokr at oz.net> wrote:
>> Should this not be coordinated with PEP 332?
>
>Probably.. But that PEP is rather incomplete. Wanna work on fixing that?
>
I'd be glad to add my thoughts, but first of course it's Skip's PEP,
and Martin casts a long shadow when it comes to character coding issues
that I suspect will have to be considered.

(E.g., if there is a b'...' literal for bytes, the actual characters of
the source code itself that the literal is being expressed in could be ascii
or latin-1 or utf-8 or utf16le a la Microsoft, etc. UIAM, I read that the source
is at least temporarily normalized to Unicode, and then re-encoded (except now
for string literals?) per coding cookie or other encoding inference. (I may be
out of date, gotta catch up).

If one way or the other a string literal is in Unicode, then presumably so is
a byte string b'...' literal -- i.e. internally u"b'...'" just before
being turned into bytes.

Should that then be an internal straight u"b'...'".encode('byte') with default ascii + escapes
for non-ascii and non-printables, to define the full 8 bits without encoding error?
Should unicode be encodable into byte via a specific encoding? E.g., u'abc'.encode('byte','latin1'),
to distinguish producing a mutable byte string vs an immutable str type as with u'abc'.encode('latin1').
(but how does this play with str being able to produce unicode? And when do these changes happen?)
I guess I'm getting ahead of myself ;-)

So I would first ask Skip what he'd like to do, and Martin for some hints on reading, to avoid
going down paths he already knows lead to brick walls ;-) And I need to think more about PEP 349.

I would propose to do the reading they suggest, and edit up a new version of pep-0332.txt
that anyone could then improve further. I don't know about an early deadline. I don't want
to over-commit, as time and energies vary. OTOH, as you've noticed, I could be spending my
time more effectively ;-)

I changed the thread title, and will wait for some signs from you, Skip, Martin, Neil, and I don't
know who else might be interested...

Regards,
Bengt Richter



More information about the Python-Dev mailing list