Python's 8-bit cleanness deprecated?
Jeff Epler
jepler at unpythonic.net
Tue Feb 4 12:46:40 EST 2003
On Tue, Feb 04, 2003 at 01:36:04PM +0100, Just wrote:
> In article <Z9M%9.174650$AA2.6989950 at news2.tin.it>,
> Alex Martelli <aleax at aleax.it> wrote:
>
> > > I think raw 8bit must be set by default without any warnings.
> >
> > I disagree, but not hotly -- I'll be quite content with
> > whatever warning strategy ends up being adopted; say
> > I'm a +0 on the choice made for 2.3alpha. But be warned
> > that you'll have to argue against hotly +1 people --
> > check the python-dev archives to hone your arguments.
> > (Arguing here is not much use of course, since Guido
> > doesn't read c.l.py currently).
>
> Here's a possible compromise (which I'm not sure is implementable at
> all): Python could only issue warnings if 8-bit chars are used in string
> literals, and not if they only occur in comments.
What makes you believe that Python can tell what is a comment and what
is a string without knowing the encoding?
I think the only limitation of the source file encoding is that it must
be an ASCII superset. So for instance I could have a perverse encoding
where 0x81 decodes to u'\n', and 0x83 is another valid character in the
encoding
's'. Then this byte string
'#\x81"\x83"\x81'
actually decodes to
u'#\n"\uXXXX"\n"
which means the file contains a string with high-bit-set chars used in
a string literal.
If there is also a requirement that the encoding be capable of doing a
round-trip unchanged (eg s.decode("perverse").encode("perverse") == s
with s = "".join([chr(x) for x in range(256)])) then perhaps your idea
is a "safe" one. In that case the encoding can't map two values both
onto \n, the key to my example.
Jeff
More information about the Python-list
mailing list