Proposal: require 7-bit source str's

Neil Hodgson nhodgson at bigpond.net.au
Fri Aug 6 21:27:50 EDT 2004


Martin v. Löwis:

> For some source encodings (namely the CJK ones), conversion to UTF-8
> is absolutely necessary even for proper lexical analysis, as the
> byte that represents a backslash in ASCII might be the first byte
> of a two-byte sequence.

   Do you have a link to such an encoding? I understand 0x5c, '\' is often
displayed as a yen sign, but haven't seen it as the start byte of a multi
byte character.

   Regarding the 's' string prefix in the proposal, adding more prefixes
damages ease of understanding particularly when used in combination. There
should be a very strong need before another is introduced: I'd really hate
to be trying to work out the meaning of:

r$tu"/Raw/ $interpolated, translated Unicode string"

   Neil





More information about the Python-list mailing list