[Python-3000] Support for PEP 3131

Jim Jewett jimjjewett at gmail.com
Fri May 25 01:12:27 CEST 2007


On 5/24/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Ka-Ping Yee schrieb:
> > On Thu, 24 May 2007, Jim Jewett wrote:
> >> So how about

> >> (1)  By default, python allows only ASCII.
> >> (2)  Additional characters are permitted if they appear in a table
> >> named on the command line.

> >> These additional characters should be restricted to code
> >> points larger than ASCII (so you can't easily turn "!" into
> >> an ID char), but beyond that, anything goes.  If you want to
> >> include punctuation or undefined characters, so be it.

> > +1!  This is a fine solution.  It is better than the "python -U"
> > option I proposed

> -2. Any solution found must also accommodate users which are
> unaware of the security issue, and just want to use their native
> language for identifiers. So requiring them to change their
> environment or pass additional command line parameters is
> unacceptable.

There is no hope of explaining security; therefore, the defaults
should be relatively safe.  If the default is "anything goes", that
isn't safe.  If the default is "ASCII", that is safe, but possibly
inconvenient.  It depends on how hard it is to make the switch.

Is your concern just that it should be possible to do once (perhaps at
install), rather than on each run?  That would probably be OK too, so
long as the default install was ASCII-only, so that *someone* had to
make a decision about what to allow.

I assume that large communities will standardize on a tailored table,
but a first-pass slightly-too-inclusive table is easy enough to
create.

Here are the Thaana lines from (unicode consortium file) Scripts.txt

0780..07A5    ; Thaana # Lo  [38] THAANA LETTER HAA..THAANA LETTER WAAVU
07A6..07B0    ; Thaana # Mn  [11] THAANA ABAFILI..THAANA SUKUN
07B1          ; Thaana # Lo       THAANA LETTER NAA

Though if it were me, I would probably simplify that to

0780..07B1    ; Thaana

Similarly, Devanagari has 15 lines in Scripts.txt, but you could simplify it to

0901..0939    ; Devanagari
093C..094D   ; Devanagari
0950..0954    ; Devanagari
0958..0963    ; Devanagari
0966..096F    ; Devanagari
097B..097F    ; Devanagari

or even
0901..097F    ; Devanagari and some undefined characters

if you (as a Devanagari speaker) were confident that none of your
characters would be confused with ASCII.  (In practice, you might well
want to exclude the Devangari numbers for looking too similar to ASCII
digits with different values, but ... that is a judgment call for
Devanagari speakers to make, so long as they make it explicitly.)

-jJ


More information about the Python-3000 mailing list