"a better input"

Gareth McCaughan Gareth.McCaughan at pobox.com
Wed May 8 21:56:51 EDT 2002


Alex Martelli wrote:

[I said:]
> >> To expand: we could really do with something that lets the user
> >> enter "46" or "0x1234" or "'walrus'" or "-2+6j" or "[1,'a',{3:4}]"
> >> and returns the same as |input| does for those, but that doesn't
> >> permit "f(123)" or "2**2002" or "[x for x in [1,2,3]]".
> 
> You'll have to pin that down more closely, as I can't see any
> easily definable difference between operators used as in:
>         -2+6j
> and operators used as in:
>         2**6
> yet apparently you want to allow the former but forbid the latter
> (why? what is gained in forcing people to do this in their head?).

I don't actually mind if an input() replacement evaluates 2+3.
I do mind if it is able to do arbitrary computation, where
"arbitrary" is fuzzily defined to cover things like

  1 anything that could take a large amount of time or
    memory to compute;

    (rationale: we don't want to facilitate DoS attacks;
    users will find it counterintuitive if what they think
    of as reading a value can consume unbounded resources.)

  2 anything that could compromise security if an attacker
    were allowed to decide what got read by input();

    (rationale: obvious.)

  3 anything that can't be explained quite accurately and
    quite quickly to a non-expert.

    (rationale: whatever input() does, we want to be able
    to explain it; one of Python's big attractions is that
    it's easy for beginners to get their heads round.)

In some sense, we want read() to be able to read "literals"
of any type. Unfortunately, lots of things that *look* like
literals aren't; for instance, -4 and [1,2,3]. For the most
part, Python users can ignore this and pretend -4 and [1,2,3]
are literals, though they may occasionally get confused if
they expect *identity* between (say) different instances
of [1,2,3] created by the same code.[1] I think { real, honest
Python literals } is too narrow a category to allow for the
results of new input().

In some other dynamic languages -- Common Lisp, for instance --
there's a sharper distinction (sharper from the user's point of
view, I mean) between reading and evaluating; the READ function
will read a single object but not do any evaluating. Well, almost.
There is an evil way to make evaluation happen at read time,
but you can arrange for them to be forbidden.

In fact, a fairly decent idea of what I think a sane input()
would be able to do is: Roughly the same as READ in Common Lisp
when you have *READ-EVAL* bound to NIL. I don't expect that to
be easy to do in Python.

I used the term "compromise security" above, and left it
undefined. I mean, for instance, that input() shouldn't
try to evaluate "2**(2**100)" in case that runs out of memory
and crashes the program; nor to evaluate "x" in case that
leaks information (the value of x) into what's thought to
be a safe place (the thing the user just typed in), possibly
exposing what ought to be kept secred; nor to evaluate "f(123)"
in case the function "f" does something the user oughtn't to
be able to provoke. I am still leaving "compromise security"
undefined, because I am not a security expert and if I try to
pin it down I might leave some loopholes.

> To me, it seems that taking the input string and applying restricted
> evaluation (carefully pruning what builtins we want to allow or
> disallow -- indeed perhaps _enriching_ the set of normal builtins
> with e.g. functions from math...) would be satisfactory.  But that
> wouldn't meet your examples -- not only 2**22, but also list
> comprehensions would then surely be allowed.

I have difficulty seeing how that can be done without
incurring most of the disadvantages of input().

> If you can give better specs of what you want to allow and disallow
> (and ideally WHY...) then we may think about implementation (e.g.
> by compiling then perusing the bytecodes to see if anything that
> must not be allowed has slipped in).

Does the above help?


[1] In practice, I believe this mistake is rare. The opposite
    mistake is commoner, as e.g. in the following usually-wrong
    construction:

        def f(x,y,z=[]):
            ... [stuff] ...

-- 
Gareth McCaughan  Gareth.McCaughan at pobox.com
.sig under construc



More information about the Python-list mailing list