[Python-3000] Support for PEP 3131

Tue May 22 01:29:36 CEST 2007

On 5/21/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > Would it be acceptable to create an encoding such that you could read
> > and write

> >    Löwis

> > in your editor, but upon import, python treated it as though you had
> > writtten

> >    LU_246wis

> > Other modules would see LU_246wis, unless they also used that encoding
> > -- in which case the user should also see Löwis while editing.

> What problem would that solve? You could type the identifier that
> way - but you would need to know already that this is the identifier
> you want to type; how do you know?

(1)
If I am using the module based on its documentation, or based on
opening it up and reading it, then I can use the same encoding, and I
can write  Löwis.

(2)
If I do arbitrary introspection, such as

    import sys
    for k, v in sys.modules:
        if v:
            print dir(v)

then I will get something usable, though perhaps not easily readable.

(3)
The mapping is reversible, so I can work interactively with the
arbitrary characters by setting my console/idle preferences to the
special encoding.

> Again, I'm uncertain what the use case here would be. For "proper"
> transliteration, users can memorize easily what the transliterated
> name would be, and visually identify the two representations.

For two latin-based alphabets, yes.  I'm not so sure for non-western scripts.

As you pointed out, the correct transliteration may depend on the
natural language (instead of just the character code point), which
means we probably can't do it automatically.

It also has to be a one-way transliteration; if ö -> o (or oe) then an
o (or oe) in the result can't always be transliterated back.

> With a "numeric transliteration", users would *not* normally be
> able to tell what a transliterated character means, or how to
> transliterate a given character.

(1)  They shouldn't ever need to see the numeric version unless
they're intentionally peeking under the covers, or their site doesn't
have the appropriate encoding installed.  One advantage of this method
is that a single transliteration method could work for any language,
so it probably would be installed already.

(2)  Even if users did somehow see the numeric version, it wouldn't be
that awful.  For the langauges close enough to ASCII that a
transliteration is straightforward, the number of extra characters to
memorize is fairly small.

> > If the above is not acceptable, and even the internal representation
> > has to be readable, then would it be acceptable to make the
> > transliteration strategy something the user could set, similar to
> > today's coding: directive?

> Then I don't understand your above proposal. I thought you were
> proposing to replace all non-ASCII characters with some ASCII form
> on import of the module. What do you mean by "readable internal
> representation"?

This alternative would let an individual user say "I'm writing
Swedish; turn my ö into an o."   The actual identifiers used by Python
itself would be more readable, but the downside is that users would
have to read them more often, instead of using/editing/viewing
strictly in the untransliterated version.

-jJ