Python's 8-bit cleanness deprecated?
Jeff Epler
jepler at unpythonic.net
Sat Feb 8 21:56:29 EST 2003
On Sat, Feb 08, 2003 at 10:00:47PM +0100, Chris Liechti wrote:
> (strip comments before feeding it to the codec. dropping the rest of a line
> with '#' shouldn't be that hard to do, is it?)
But where does the line end? For instance, what if my encoding uses '\x81'
to by a synonym for '\n'? Or what if my encoding has a two-byte mode where
'\x82\n' decodes to whitespace?
In the former case, the fragment '#\x81y' is not all comment, but the
fragment '#\x82\nz" is.
It seems like there might be a third kind of problematic encoding where a
sequence like \x83# is two bytes and wouldn't actually start a comment.
However, the \x83 would not be legal since it's not part of a valid token
unless inside a string (in which case the # would be in a string too)
No, I don't know of any encoding that has these characteristics with the
possible exception of a japanese encoding which it turns out can use % and
\ as second characters of double-byte characters.
Jeff
More information about the Python-list
mailing list