Proposal: require 7-bit source str's

Hallvard B Furuseth h.b.furuseth at
Fri Aug 6 22:35:42 CEST 2004

Martin v. Löwis wrote:
>Hallvard B Furuseth wrote:
>> That sounds like it could have a severe performance impact.  However,
>> maybe the compiler can set a flag if there are any such strings when it
>> converts parsed strings from Unicode back to the file's encoding.  
> Yes. Unfortunately, line information is gone by that time, so you can't
> point to the place of the error anymore.

True.  One could recompile with a str7bit option to catch it earlier,
or one could make str7bit a compiler directive - then the unidentified
string will in practice be the doc string above the directive.

>> I can't say I like the idea, though.  It assumes Python retains the
>> internal implementations of 'coding:' which is described in PEP 263:
>> Convert the source code to Unicode, then convert string literals back
>> to the source character set.
> It's a pretty safe assumption, though. It is the only reasonable
> implementation strategy.

I disagree:

- For a number of source encodings (like utf-8:-) it should be easy
  to parse and charset-convert in the same step, and only convert
  selected parts of the source to Unicode.

- I think the spec is buggy anyway.  Converting to Unicode and back
  can change the string representation.  But I'll file a separate
  bug report for that.

>> Sure.  I wasn't protesting against people using of escape sequences.
>> I was protesting against requiring that people use them.
> But isn't that the idea of the str7bit feature? How else would you
> put non-ASCII bytes into a string literal while simultaneously turning
> on the 7-bit feature?

Sorry, I thought you were speaking of promising a __future__ when all
string literals are required to be 7-bit or u'' literals.

To use non-ASCII str literals with the str7bit feature turned on:
- insert a 'str7bit:False' declaration in the file, or
- use the s'8-bit str literal' syntax I suggested.


More information about the Python-list mailing list