[Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py ))
Guido van Rossum
guido at python.org
Tue May 8 16:10:51 CEST 2007
On 5/8/07, Jason Orendorff <jason.orendorff at gmail.com> wrote:
> On 5/7/07, Guido van Rossum <guido at python.org> wrote:
> > I don't know how this will work out yet. I'm not convinced that having
> > both mutable and immutable bytes is the right thing to do; but I'm
> > also not convinced of the opposite. I am slowly working on the
> > string/unicode unification, and so far, unfortunately, it is quite
> > daunting to get rid of 8-bit strings even at the Python level let
> > alone at the C level.
>
> Guido, if 3.x had an immutable bytes type, could 2to3 provide a
> better guarantee? Namely, "Set your default encoding to None
> in your 2.x code today, and 2to3 will not introduce bugs around
> str/unicode."
I don't know. I may be able to tell you when I'm further into the
process of unifying str and unicode.
> 2to3 could produce 3.x code that preserves the 2.x meaning by
> using 2.x-ish types, including immutable byte strings.
This sounds dangerously close to crippling 3.0 with backwards
compatibility. I want to reserve this option as a last resort.
> Without this, my understanding is that 2to3 will introduce bugs.
> Am I wrong?
No -- 2to3 cannot guarantee that your code will work correctly,
because it doesn't do any data flow analysis or type inferencing. This
is not limited to strings.
> This might be worth doing even if you decide an immutable 8-bit
> type is wrong for the core language. The type could be hidden
> away in an "upgradelib" module somewhere. Surely people will
> prefer correctness over "producing nice, idiomatic 3.x code"
> in the 2to3 tool.
With that I agree, at least in general (e.g. d.keys() gets translated
to list(d.keys()) and d.iterkeys(0 to iter(d.keys())). In the current
py3k-struni branch I have temporarily kept the 8-bit string type
around, renamed to str8. I am hoping I will be able to get rid of it
eventually but I may not succeed and then we'll have it available as a
backup.
For anyone who wants to discuss this more -- please come and help out
in the py3k-struni branch first. It is simply too soon to be able to
make decisions based on the evidence available so far, and I won't be
forced.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000
mailing list