PEP 263 comments
Jason Orendorff
jason at jorendorff.com
Thu Feb 28 22:35:56 EST 2002
Piet van Oostrum wrote:
> JO> Advantage: simple, universal, easy, similar to what Java does.
>
> Java does accept iso-latin-1 files as input. In fact on my machine (Mac
> OSX) it doesn't even accept utf-8 files with the utf-8 signature. And
> strings containing utf-8 are interpreted as just 8-bit characters, meaning
> every byte is a character.
Oh! Yes, it works this way on Windows, too. javac assumes source
files are latin-1, and System.out.println() encodes output in latin-1.
Odd, because Java assumes UTF-8 everywhere else. Somehow I missed
this before (I guess because UTF-8 data slips through intact).
Scratch that, then.
> JO> No confusion about embedded 0x22 bytes in strings. Also,
> JO> stylistically I prefer not to have a document specify its own
> JO> encoding, or for comments to affect the meaning of a source
> JO> file.
>
> Which 0x22 bytes?
I'm referring to this paragraph in Martin's original post:
The only problem with this approach is that encodings where " or '
could be the second byte of a multi-byte character cannot be
supported as a source encoding. Python supports no such encoding
in the standard library at the moment, anyway, so this should not
be a problem.
\x22 is a double-quote mark. Martin is a little off on the last
bit, though; UTF-16 can produce \x22 bytes.
## Jason Orendorff http://www.jorendorff.com/
More information about the Python-list
mailing list