[Python-3000] Support for PEP 3131
Stephen J. Turnbull
turnbull at sk.tsukuba.ac.jp
Sun Jun 3 15:42:23 CEST 2007
Rauli Ruohonen writes:
> He did not say that such files or command-line options would be
> scalable either. They are fine tools for auditing, but not for using
> finished products. One should provide both auditing tools and ease
> of use of already audited code.
Ease of use of audited code is trivial; turn the checks off.
The question is how to do that.
> (1) Add a mandatory ASCII-only special comment at the beginning of
> each module. The comment would continue until the first empty
> line and would contain only valid directives matching some
> regular expression. Only whitespace is allowed before the
> comment. Anything else is a syntax error.
-1
You still need command-line options or local configuration files to
decide *what* to audit. We *don't* trust the file! Just because it
audits to having the character sets it claims doesn't mean it doesn't
use constructs we want to prohibit. Merely to define those is
non-trivial, and it is absolutely out of the question to expect that
the average Python user will know what the character set
"strictly-conforms-to-UTR39-restrictions-allows-confusables" is. So
those character sets are basically meaningless for ease of use; ease
of use is "globally restrict to what my students can read = ASCII +
Japanese".
Now, the same code that would be needed to audit the declarations you
propose could easily be generalized to *generate* them. Once you've
got that, who needs the auditing code in the Python translator? AIUI
the implementation of PEP 263, you could just substitute an auditing
UTF-8 codec based on that code for the PEP 263 standard UTF-8 codec.
This codec is Python code, and thus could be configured using a file,
which could be generated by the codec and compared with the old
version; the possibilities are endless ... and in no way need to be
defined in the language if I'm correct about the implementation.[1]
The reason I favor the single command line flag (perhaps even
restricted to the binary choice of compatibility ASCII vs. PEP 3131
Unicode) is as a transition strategy. I do not agree with Ka-Ping
inter alia that there are bogeymen under the bed, but then I live in
Japan, and there *is* no "under the bed" (we sleep on mats on the
floor<wink>). I think it's quite reasonable to provide a
non-invasive, *simple* auditing facility for those who want it. When
you're talking about security holes, the burden of proof should *not*
be on the paranoid, especially when the backward-compatibility cost of
security is *zero* (there are *no* Python programs containing
non-ASCII identifiers in the wild yet!)
As James Knight says, the "configure the world in one file" strategy
that jJ and I were batting around is a bit nuts, but it might not be a
bad strategy for configuring a loadable auditing codec or external
utility; I don't think that's wasted mental effort at all.
Footnotes:
[1] Caveat, the implementation will be much more heavyweight than a
standard codec since it must contain a Python parser.
More information about the Python-3000
mailing list