Proposal: require 7-bit source str's
Hallvard B Furuseth
h.b.furuseth at usit.uio.no
Fri Aug 6 03:06:33 EDT 2004
Martin v. Löwis wrote:
>Hallvard B Furuseth wrote:
>> Now that the '-*- coding: <charset> -*-' feature has arrived,
>> I'd like to see an addition:
>>
>> # -*- str7bit:True -*-
>>
>> After the source file has been converted to Unicode, cause a parse
>> error if a non-u'' string contains a non-7bit source character.
>>
>> It can be used to ensure that the source file doesn't contain national
>> characters that the program will treat as characters in the current
>> locale's character set instead of in the source file's character set.
>
> I doubt this helps as much as you'd like. You will need to change every
> source file with that annotation.
perl -i.bak -pe '
/\bstr7bit\b/ or
s/^(\s*#.*?-\*-.*?coding[=:]\s*[\w.-]+)(?=[;\s])/$1;str7bit:True/
' `find . -name '*.py' | xargs grep -l 'coding[=:]'`
> While you are at it, you could just
> as well check every source file directly.
True at first pass, but if Python catches it, a file will stay
clean once it has been cleaned up and marked as str7bit. That's
particularly useful when several people are working on the source.
A fix to your objection would be to instead warn about the
offending strings _unless_ the file is marked with str7bit:False,
but I figure that's a bit too drastic for the time being:-)
> So if anything, I think this should be a global option.
-W::str7bitWarning?
Come to think of it, that would also make it possible for a Python
program to reject add-ons (modules, execfile etc) which contain
unmarked 8-bit strings.
> Or, better yet,
> external checkers like pychecker could check for that.
Well, I don't think that's better, but if it's rejected for Python
that'll be my next stop.
--
Hallvard
More information about the Python-list
mailing list