Proposal: require 7-bit source str's

"Martin v. Löwis" martin at
Fri Aug 6 22:17:25 CEST 2004

Hallvard B Furuseth wrote:
> That sounds like it could have a severe performance impact.  However,
> maybe the compiler can set a flag if there are any such strings when it
> converts parsed strings from Unicode back to the file's encoding.  

Yes. Unfortunately, line information is gone by that time, so you can't
point to the place of the error anymore.

> I can't say I like the idea, though.  It assumes Python retains the
> internal implementations of 'coding:' which is described in PEP 263:
> Convert the source code to Unicode, then convert string literals back
> to the source character set.

It's a pretty safe assumption, though. It is the only reasonable
implementation strategy.

>>Notice that your approach only works for languages with single-byte
>>character sets anyway. Many multi-byte character sets use only
>>bytes < 128, and still they should get the warning you want to produce.
> They will.  That's why I specified to do this after conversion to
> Unicode.  But I notice my spec was unclear about that point.

Ah, ok.

> None of this properly addresses encodings that are not ASCII supersets
> (or subsets), like EBCDIC.  Both Python and many Python programs seem to
> make the assumption that the character set is ASCII-based, so plain
> strings (with type str) can be output without conversion, while Unicode
> strings must be converted to the output device's character set.

Yes, Python assumes ASCII. There are is some code for EBCDIC support,
but on those platforms, Unicode is not supported.

> Sure.  I wasn't protesting against people using of escape sequences.
> I was protesting against requiring that people use them.

But isn't that the idea of the str7bit feature? How else would you
put non-ASCII bytes into a string literal while simultaneously turning
on the 7-bit feature?


More information about the Python-list mailing list