Simple and safe evaluator
Simon Forman
sajmikins at gmail.com
Thu Jun 19 19:10:09 EDT 2008
On Jun 16, 8:32 pm, bvdp <b... at mellowood.ca> wrote:
> sween... at acm.org wrote:
> > On Jun 17, 8:02 am, bvdp <b... at mellowood.ca> wrote:
>
> >> Thanks. That was easy :)
>
> >>> The change to the _ast version is left as an exercise to the reader ;)
> >> And I have absolutely no idea on how to do this. I can't even find the
> >> _ast import file on my system. I'm assuming that the _ast definitions
> >> are buried in the C part of python, but that is just a silly guess.
>
> >> Bob.
>
> > If you just need numeric expressions with a small number of functions,
> > I would suggest checking the expression string first with a simple
> > regular expression, then using the standard eval() to evaluate the
> > result. This blocks the attacks mentioned above, and is simple to
> > implement. This will not work if you want to allow string values in
> > expressions though.
>
> > import re
> > def safe_eval( expr, safe_cmds=[] ):
> > toks = re.split( r'([a-zA-Z_\.]+|.)', expr )
> > bad = [t for t in toks if len(t)>1 and t not in safe_cmds]
> > if not bad:
> > return eval( expr )
>
> Yes, this appears to be about as good (better?) an idea as any.
> Certainly beats writing my own recursive decent parser for this :)
>
> And it is not dependent on python versions. Cool.
>
> I've run a few tests with your code and it appears to work just fine.
> Just a matter of populating the save_cmds[] array and putting in some
> error traps. Piece of cake. And should be fast as well.
>
> Thanks!!!
>
> Bob.
FWIW, I got around to implementing a function that checks if a string
is safe to evaluate (that it consists only of numbers, operators, and
"(" and ")"). Here it is. :)
import cStringIO, tokenize
def evalSafe(source):
'''
Return True if a source string is composed only of numbers,
operators
or parentheses, otherwise return False.
'''
try:
src = cStringIO.StringIO(source).readline
src = tokenize.generate_tokens(src)
src = (token for token in src if token[0] is not tokenize.NL)
for token in src:
ttype, tstr = token[:2]
if (
tstr in "()" or
ttype in (tokenize.NUMBER, tokenize.OP)
and not tstr == ',' # comma is an OP.
):
continue
raise SyntaxError("unsafe token: %r" % tstr)
except (tokenize.TokenError, SyntaxError):
return False
return True
for s in (
'(1 2)', # Works, but isn't math..
'1001 * 99 / (73.8 ^ 88 % (88 + 23e-10 ))', # Works
'1001 * 99 / (73.8 ^ 88 % (88 + 23e-10 )',
# Raises TokenError due to missing close parenthesis.
'(1, 2)', # Raises SyntaxError due to comma.
'a * 21', # Raises SyntaxError due to identifier.
'import sys', # Raises SyntaxError.
):
print evalSafe(s), '<--', repr(s)
More information about the Python-list
mailing list