Proposal: require 7-bit source str's
"Martin v. Löwis"
martin at v.loewis.de
Fri Aug 6 22:17:25 CEST 2004
Hallvard B Furuseth wrote:
> That sounds like it could have a severe performance impact. However,
> maybe the compiler can set a flag if there are any such strings when it
> converts parsed strings from Unicode back to the file's encoding.
Yes. Unfortunately, line information is gone by that time, so you can't
point to the place of the error anymore.
> I can't say I like the idea, though. It assumes Python retains the
> internal implementations of 'coding:' which is described in PEP 263:
> Convert the source code to Unicode, then convert string literals back
> to the source character set.
It's a pretty safe assumption, though. It is the only reasonable
>>Notice that your approach only works for languages with single-byte
>>character sets anyway. Many multi-byte character sets use only
>>bytes < 128, and still they should get the warning you want to produce.
> They will. That's why I specified to do this after conversion to
> Unicode. But I notice my spec was unclear about that point.
> None of this properly addresses encodings that are not ASCII supersets
> (or subsets), like EBCDIC. Both Python and many Python programs seem to
> make the assumption that the character set is ASCII-based, so plain
> strings (with type str) can be output without conversion, while Unicode
> strings must be converted to the output device's character set.
Yes, Python assumes ASCII. There are is some code for EBCDIC support,
but on those platforms, Unicode is not supported.
> Sure. I wasn't protesting against people using of escape sequences.
> I was protesting against requiring that people use them.
But isn't that the idea of the str7bit feature? How else would you
put non-ASCII bytes into a string literal while simultaneously turning
on the 7-bit feature?
More information about the Python-list