[ python-Bugs-215126 ] Over restricted type checking on eval() function

Sun Jun 27 14:08:20 EDT 2004

Bugs item #215126, was opened at 2000-09-22 14:36
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=215126&group_id=5470

Category: Python Interpreter Core
Group: Feature Request
Status: Open
Resolution: None
Priority: 6
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Raymond Hettinger (rhettinger)
Summary: Over restricted type checking on eval() function

Initial Comment:
The built-in function eval() takes a string argument and a dictionary.  The second argument should allow any instance which defines __getitem__ as opposed to just dictionaries.

The following example creates a type error:
  eval, argument 2: expected dictionary,   instance found

class SpreadSheet:
    _cells = {}
    def __setitem__( self, key, formula ):
        self._cells[key] = formula
    def __getitem__( self, key ):
        return eval( self._cells[key], self )

ss = SpreadSheet()
ss['a1'] = '5'
ss['a2'] = 'a1*5'
ss['a2']

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-27 13:08

Message:
Logged In: YES 
user_id=80475

Nix, that last comment.  Have examples that call setitem().

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-27 12:36

Message:
Logged In: YES 
user_id=80475

Changed to PyDict_CheckExact().

Are you sure about having to change the sets and dels.  I've
tried several things at the interactive prompt and can't get
it to fail:

  >>> [locals() for i in (2,3)]

Do you have any examples (in part, so I can use them as test
cases)?

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-06-27 06:33

Message:
Logged In: YES 
user_id=4771

eval() can set items in the locals, e.g. by using list comprehension.  Moreover the ability to exec in custom dicts would be extremely useful too, e.g. to catch definitions while they appear.

If you really want to avoid any performance impact, using PyDict_CheckExact() in all critical parts looks good (do not use PyDict_Check(), it's both slower and not what we want because it prevents subclasses of dicts to override __xxxitem__).

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-27 03:37

Message:
Logged In: YES 
user_id=80475

Okay, the patch is ready for second review.
Attaching the revision with unittests.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-27 02:50

Message:
Logged In: YES 
user_id=80475

Attaching my minimal version of the patch.  It runs the
attached demo without exception and does not measurably show
on any of my timings.

The approach is to restrict the generalization to eval()
instead of exec().  Since eval() can't set values in the
locals dict, no changes  are needed to the setitem and
delitem calls.  Instead of using PyObject_GetItem()
directly, I do a regular lookup and fallback to the
generalizaiton if necessary -- this is why the normal case
doesn't get slowed down (the cost is a PyDict_Check which
uses values already in cache, and a branch predicatable
comparison)..

While the demo script runs, and the test_suite passes, it is
slightly too simple and doesn't yet handle eval('dir()',
globals(), M()) where M is a non-dict mapping.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-23 11:59

Message:
Logged In: YES 
user_id=80475

The quick patch works fine.

Do change the PyArg_ParseTuple() into the faster
PyArg_UnpackTuple().

Does this patch show any changes to pystone or other key
metrics?  Would the PyDict_GetItem trick have better
performance?  My guess is that it would.

+1 on using PyMapping_Check() for checking the locals
argument to eval().  That is as good as you can get without
actually running it.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-06-22 16:07

Message:
Logged In: YES 
user_id=4771

Quick patch attached.  I didn't try to use the PyDict_GetItem trick described, but just systematically use PyObject_GetItem/SetItem/DelItem when working with f_locals.

This might confuse some extension modules that expect PyEval_GetLocals() to return a dict object.

The eval trick is now: eval(code, nondict) --> eval(code, globals(), nondict).

Besides eval() I removed the relevant typecheck from execfile() and the exec statement.  Any other place I am missing?

We might want to still somehow check the type of the locals, to avoid strange errors caused by e.g. eval("a", "b").  PyMapping_Check() is the obvious candidate, but it looks like a hack.

More testing is needed.  test_descrtut.py line 84 now succeeds, unexpectedly, which is interpreted as a test failure.

Needs some docs, too.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-12 02:27

Message:
Logged In: YES 
user_id=80475

Armin, can you whip-up a quick patch so that we can explore
the implications of your suggestion.  Ideally, I would like
to see something like this go in for Py2.4.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-05-22 11:53

Message:
Logged In: YES 
user_id=80475

+1

Amrin's idea provides most of the needed functionality with
zero performance impact.  

Also, using a local dictionary for the application variables
is likely to be just exactly what a programmer would want to do.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-05-19 13:21

Message:
Logged In: YES 
user_id=4771

With minimal work and performance impact, we could allow
frame->f_locals to be something else than a real dictionary.
 It looks like it would be easily possible as long as we
don't touch the time-critical frame->f_globals.  Code
compiled by eval() always uses LOAD_NAME and STORE_NAME,
which is anyway allowed to be a bit slower than LOAD_FAST or
LOAD_GLOBAL.

Note that PyDict_GetItem() & co., as called by
LOAD/STORE_NAME, do a PyDict_Check() anyway.  We could just
replace it with PyDict_CheckExact() and if it fails fall
back to calling the appropriate ob_type->tp_as_mapping
method.  This would turn some of the PyDict_Xxx() API into a
general mapping API, half-way between its current dict-only
usage and the fully abstract PyObject_Xxx() API.

This is maybe a bit strange from the user's point of view,
because eval("expr", mydict) still wouldn't work: only
eval("expr", {}, mydict) would.
which does which does 
Now, there is no way I can think of that code compiled by
eval() could contain LOAD_GLOBAL or STORE_GLOBAL.  The only
way to tell the difference between eval("expr", mydict) and
eval("expr", {}, mydict) appears to be by calling globals()
or somehow inspecting the frame directly.  Therefore it
might be acceptable to redefine the two-argument eval(expr,
dict) to be equivalent not to eval(expr, dict, dict) but
eval(expr, {}, dict).  This hack might be useful enough even
if it won't work with the exec statement.

----------------------------------------------------------------------

Comment By: Philippe Fremy (pfremy)
Date: 2004-03-03 04:40

Message:
Logged In: YES 
user_id=233844

I have exactly the same need as the original poster. The
only difference in my case is that I inhreted a dictionnary.

I want to use the eval() function as a simple expression
evaluator. I have the
follwing dictionnary:
d['a']='1'
d['b']='2'
d['c']='a+b'

I want the following results:
d[a] -> 1 
d[b] -> 2 
d[c] -> 3

To do that, I was planning to use the eval() function and
overloading the __getitem__ of the global or local dictionnary:

class MyDict( dict ) : 
	def __getitem__( self, key ):
		print "__getitem__", key
		val = dict.__getitem__( self, key )
		print "val = '%s'" % val
		return eval( val , self )

But it does not work:
d[a]:
__getitem__ a
val = '1'
-> 1

d[b]:
__getitem__ b
val = '2'
-> 2

d[c]:
__getitem__ c
val = 'e+1'
ERROR
Traceback (most recent call last):
  File "test_parse_jaycos_config.py", line 83, in testMyDict
    self.assertEquals( d['c'], 2 )
  File "parse_config_file.py", line 10, in __getitem__
    return eval( val , self )
  File "<string>", line 0, in ?
TypeError: cannot concatenate 'str' and 'int' objects

d['c'] did fetch the 'a+1' value, which was passed to eval.
However, eval()
tried to evaluate the expression using the content of the
dictionnary instead
of using __getitem__. So it fetched the string '1' for 'a'
instead of the value 1, so the result is not suitable for
the addition.

----------------------------------------------------------------------

Comment By: Michael Chermside (mcherm)
Date: 2004-01-12 17:01

Message:
Logged In: YES 
user_id=99874

Hmm... I like this!

Of course, I am wary of adding *yet another* special double-
underscore function to satisfy a single special purpose, but 
this would satisfy all of *my* needs, and without any 
performance impact for lookups that are FOUND. Lookups that 
are NOT found would have a slight performance degrade (I 
know better than to speculate about the size of the effect 
without measuring it). I'm not really sure what percentage of 
dict lookups succeed.

At any rate, what are these "other contexts" you mention in 
which a __keyerror__ would "also be useful"? Because if we 
can find other places where it is useful, that significantly 
improves the usefulness.

OTOH, if the tests can be done ONLY for eval (I don't really 
think that's possible), then I'm certain no one cares about the 
performance of eval.

----------------------------------------------------------------------

Comment By: Gregory Smith (gregsmith)
Date: 2004-01-12 14:38

Message:
Logged In: YES 
user_id=292741

This may be  a compromise solution, which could also be
useful in other contexts: What if the object passed
is derived from dict - presumably that doesn't help
because the point is to do low-level calls to the
actual dict's lookup functions.
 Now, suppose we modify the basic dict type, so
that before throwing
a KeyError, it checks to see if it is really a derived
object with a method __keyerror__, and if so, calls
that and returns its result (or lets it throw)?
Now you can make objects that look like dicts, and
act like them at the low level, but can automatically
populate themselves when non-existent keys are requested.
Of course, __keyerror__('x') does not have to make
an 'x' entry in the dict; it could make no change, or
add several entries, depending on the desired semantics
regarding future lookups. It could be set up so that
every lookup fails and is forwarded by __keyerror__ to
the __getitem__ of another object, for instance.

The cost of this to the 'normal' dict lookup is that
the need to do PyDict_CheckExact() each time
a lookup fails.

----------------------------------------------------------------------

Comment By: Michael Chermside (mcherm)
Date: 2003-12-16 10:37

Message:
Logged In: YES 
user_id=99874

I'll just add my voice as somebody who would find this to 
be "darn handy" if it could ever done without noticably 
impacting the speed of python code that DOESN'T use eval().

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2000-09-22 23:18

Message:
Added to PEP 42.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2000-09-22 14:42

Message:
Changed Group to Feature Request.  Should be added to PEP 42 (I'll do that if nobody beats me to it).

CPython requires a genuine dict now for speed.  I believe JPython allows any mapping object.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=215126&group_id=5470