Proposal: require 7-bit source str's
Hallvard B Furuseth
h.b.furuseth at usit.uio.no
Thu Aug 5 23:38:20 CEST 2004
John Roth wrote:
>"Hallvard B Furuseth" <h.b.furuseth at usit.uio.no> wrote in message
>news:HBF.20040805p736 at bombur.uio.no...
>> Now that the '-*- coding: <charset> -*-' feature has arrived,
>> I'd like to see an addition:
>> # -*- str7bit:True -*-
>> After the source file has been converted to Unicode, cause a parse
>> error if a non-u'' string contains a non-7bit source character.
>> It can be used to ensure that the source file doesn't contain national
>> characters that the program will treat as characters in the current
>> locale's character set instead of in the source file's character set.
> Is this even an issue? If you specify utf-8 as the character
> set, I can't see how non-unicode strings could have
> anything other than 7-bit ascii, for the simple reason that
> the interpreter wouldn't know which encoding to use.
Sorry, I should have included an example.
# -*- coding:iso-8859-1; str7bit:True; -*-
A = u'hør' # ok
B = 'hør' # error because of str7bit.
The 'coding' directive ensures this source code is translated correctly
to Unicode. However, string B is then translated back to the source
character set so it can be stored as a str object and not a unicode
The print statement just outputs the bytes in B, it doesn't do any
character set handling. So if your terminal uses latin-2, it will
output the 'ø' as Latin small letter r with caron.
coding:utf-8 wouldn't help. B would remain a plain string, not a
Unicode string. The raw utf-8 bytes would be output.
More information about the Python-list