Proposal: require 7-bit source str's
Hallvard B Furuseth
h.b.furuseth at usit.uio.no
Fri Aug 6 22:35:42 CEST 2004
Martin v. Löwis wrote:
>Hallvard B Furuseth wrote:
>> That sounds like it could have a severe performance impact. However,
>> maybe the compiler can set a flag if there are any such strings when it
>> converts parsed strings from Unicode back to the file's encoding.
> Yes. Unfortunately, line information is gone by that time, so you can't
> point to the place of the error anymore.
True. One could recompile with a str7bit option to catch it earlier,
or one could make str7bit a compiler directive - then the unidentified
string will in practice be the doc string above the directive.
>> I can't say I like the idea, though. It assumes Python retains the
>> internal implementations of 'coding:' which is described in PEP 263:
>> Convert the source code to Unicode, then convert string literals back
>> to the source character set.
> It's a pretty safe assumption, though. It is the only reasonable
> implementation strategy.
- For a number of source encodings (like utf-8:-) it should be easy
to parse and charset-convert in the same step, and only convert
selected parts of the source to Unicode.
- I think the spec is buggy anyway. Converting to Unicode and back
can change the string representation. But I'll file a separate
bug report for that.
>> Sure. I wasn't protesting against people using of escape sequences.
>> I was protesting against requiring that people use them.
> But isn't that the idea of the str7bit feature? How else would you
> put non-ASCII bytes into a string literal while simultaneously turning
> on the 7-bit feature?
Sorry, I thought you were speaking of promising a __future__ when all
string literals are required to be 7-bit or u'' literals.
To use non-ASCII str literals with the str7bit feature turned on:
- insert a 'str7bit:False' declaration in the file, or
- use the s'8-bit str literal' syntax I suggested.
More information about the Python-list