Python's 8-bit cleanness deprecated?

Jeff Epler jepler at unpythonic.net
Sat Feb 8 21:56:29 EST 2003


On Sat, Feb 08, 2003 at 10:00:47PM +0100, Chris Liechti wrote:
> (strip comments before feeding it to the codec. dropping the rest of a line 
> with '#' shouldn't be that hard to do, is it?)

But where does the line end?  For instance, what if my encoding uses '\x81'
to by a synonym for '\n'?  Or what if my encoding has a two-byte mode where
'\x82\n' decodes to whitespace?

In the former case, the fragment '#\x81y' is not all comment, but the
fragment '#\x82\nz" is.

It seems like there might be a third kind of problematic encoding where a
sequence like \x83# is two bytes and wouldn't actually start a comment.
However, the \x83 would not be legal since it's not part of a valid token
unless inside a string (in which case the # would be in a string too)

No, I don't know of any encoding that has these characteristics with the
possible exception of a japanese encoding which it turns out can use % and
\ as second characters of double-byte characters.

Jeff





More information about the Python-list mailing list