PEP 263 comments

Thu Feb 28 22:35:56 EST 2002

Piet van Oostrum wrote:
> JO> Advantage: simple, universal, easy, similar to what Java does.
>
> Java does accept iso-latin-1 files as input. In fact on my machine (Mac
> OSX) it doesn't even accept utf-8 files with the utf-8 signature. And
> strings containing utf-8 are interpreted as just 8-bit characters, meaning
> every byte is a character.

Oh!  Yes, it works this way on Windows, too.  javac assumes source
files are latin-1, and System.out.println() encodes output in latin-1.
Odd, because Java assumes UTF-8 everywhere else.  Somehow I missed
this before (I guess because UTF-8 data slips through intact).

Scratch that, then.

> JO> No confusion about embedded 0x22 bytes in strings.  Also,
> JO> stylistically I prefer not to have a document specify its own
> JO> encoding, or for comments to affect the meaning of a source
> JO> file.
>
> Which 0x22 bytes?

I'm referring to this paragraph in Martin's original post:

  The only problem with this approach is that encodings where " or '
  could be the second byte of a multi-byte character cannot be
  supported as a source encoding. Python supports no such encoding
  in the standard library at the moment, anyway, so this should not
  be a problem.

\x22 is a double-quote mark.  Martin is a little off on the last
bit, though; UTF-16 can produce \x22 bytes.

## Jason Orendorff    http://www.jorendorff.com/