[Python-3000] Conservative Defaults (was: Re: Support for PEP 3131)

Jim Jewett jimjjewett at gmail.com
Mon Jun 4 03:27:06 CEST 2007


On 6/2/07, Rauli Ruohonen <rauli.ruohonen at gmail.com> wrote:

> and the whole issue of defaults is quite minor.

I disagree; the defaults are the most important issue.

Those most eager for unicode identifiers are afraid that people
(particularly beginning students) won't be able to use local-script
identifiers, unless it is the default.  My feeling is that the teacher
(or the person who pointed them to python) can change the default on a
per-install basis, since it can be a one-time change.

Those of us most nervous about unicode identifiers are concerned
precisely because "anything goes" may become a default.

If national characters become the default in Sweden or Japan, that is
OK.  These national divisions are already there, and probably
unavoidable.

On the other hand, if "anything from *any* script" becomes the
default, even on a single widespread distribution, then the community
starts to splinter in a new way.  It starts to separate between people
who distribute source code (generally ASCII) and people who are
effectively distributing binaries (not for human end-users to read).

That is bad enough on its own, but even worse because the distinction
isn't clearly marked.  As the misleading examples have shown, these
(effective) binaries can pretend to be regular source code doing one
thing, even though they actually do something different.

> On 6/2/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > Adding a tool to an arbitrarily large or small previously existing
> > toolchain, so that the majority of users can verify that their code
> > doesn't contain characters that shouldn't be allowed in the first
> > place, isn't a very good solution.

> I doubt the majority of users care, so the verifiers would be
> a minority.

Agreed, because the majority of users don't care about security at
all.  Outside the python context, this is one reason we have so much
spam (from compromised computers).  To protect the group at large,
security has to be the default.

Of course, security also has to be non-intrusive, or people will turn
it off.  A one-time decision to allow your own national characters,
which could be rolled into the initial install, or even a local
distribution -- that is fairly non-intrusive.

> You're exaggerating the amount of work caused [by adding to the toolchain]

No, he isn't.

My own process is often exactly:

(1)  Read or skim the code.
(2)
    (a)  Download it/save it as text, or
    (b)  Cut and paste the snippet from the webpage
(3)  Run it.

There is no external automated tool in the middle; forcing me to add
one would move python from the "things just work, and you can test
immediately" category into a compile/build/wait/test language.  I have
used python this way (when developing for a machine I could not access
directly), and ... I don't recommend it.

Hopefully, I can set my own python to enforce ASCII IDs (rather than
ASCII strings and comments).  But if too many people start to assume
that distributed code can freely mix other scripts, I'll start to get
random failures.  I'll probably allow Latin-1.  I might end up
allowing a few other scripts -- but then how should I say "script X or
script Y; not both"?  Keeping the default at ASCII for another release
or two will provide another release or two to answer this question.

> > Only because it is so rarely used that no one really runs into
> > unicode identifiers.

> It doesn't really matter why they're not a problem in practice,
> just that they aren't. A non-issue is a non-issue, no matter why.

Of course it matters.  If it isn't a problem only because of something
that wouldn't apply to python, then we still have to worry.

> ... Java, ... don't hear constant complaints

They aren't actually a problem because they aren't used; they aren't
used because almost no one knows about them.  Python would presumably
advertise the feature, and see more use.  (We shouldn't add it at all
*unless* we expect much more usage than unicode IDs have seen in other
programming languages.)

Also note that Java in particular already has static type checking
(which would resolve many of the objections) and is already a
compile/build/wait/test language (so the cost of additional tools is
less).  (I believe that C# is in this category too, but won't swear to
it.)

Not seeing problems in Lisp would be a valid argument -- except that
the internationalized IDs are explicitly marked.  Not just the files;
the individual IDs.  You have to write |lowercase| to get an ID made
of unexpected characters (including explicitly lower-case letters).

JavaScript would provide a legitimate example of a dynamic language
where unicode IDs caused no problem.  On the other hand, broken
javascript is already so common that I doubt anyone would have
noticed; python should (and currently does) meet a higher standard for
cross-platform interoperability.

In other words, python will be going out on a limb.  That doesn't mean
we shouldn't allow such Identifiers, but it does mean that we should
be cautious.

As an analogy, remember that function decorators were added to python
in version 2.4.  The initial patch would also have handled class
decorators.  No one came up with a single reason to disallow them that
didn't also apply to function decoration -- except one.  Guido wasn't
*sure* they were needed, and it would be easier to add them later (in
2.6) than it would have been to pull them back out.

The same one-step-at-a-time reasoning applies to unicode identifers.
Allowing IDs in your native language (or others that you explicitly
approve) is probably a good step.  Allowing IDs in *any* language by
default is probably going too far.

-jJ


More information about the Python-3000 mailing list