Simple eval
George Sakkis
george.sakkis at gmail.com
Sun Nov 18 22:48:50 EST 2007
On Nov 18, 8:24 pm, greg <g... at cosc.canterbury.ac.nz> wrote:
> Tor Erik Sønvisen wrote:
> > Comments, speedups, improvements in general, etc are appreciated.
>
> You're doing a lot of repeated indexing of token[0]
> and token[1] in your elif branches. You might gain some
> speed by fetching these into locals before entering the
> elif chain.
>
> Also you could try ordering the branches so that the
> most frequent cases come first. Probably strings and
> numbers first, then the various kinds of bracket.
> This would also give you a chance to avoid pulling out
> token[1] until you need it.
>
> token[1].startswith('u'): It's probably faster to
> use an index to get the first character, if you know
> that the string is not empty.
I tried several of these micro optimizations but there was very little
improvement; eval() remains practically 5 times faster. The major
bottleneck is generate_tokens(); replacing simple_eval() with the
following is still 3 times slower than eval():
def simple_eval(source):
for _ in generate_tokens(StringIO(source).readline): pass
That's not very surprising since generate_tokens() is quite general
and yields more information than necessary. Clearly if performance is
critical you should write your own simple_generate_tokens(), possibly
as a cut down version of the generic one.
Leaving performance aside, below is a slightly more compact version.
The almost identical code for handling lists and tuples is factored
out in _iter_sequence(). The 'token' parameter here is the actual
token, not the 5-tuple yielded by generate_tokens(). Finally this
version handles negative and long numbers (which the original didn't):
from string import digits
from cStringIO import StringIO
from tokenize import generate_tokens, NL
_consts = {'None': None, 'False': False, 'True': True}
def simple_eval(source):
itertokens = generate_tokens(StringIO(source).readline)
next = (token[1] for token in itertokens
if token[0] is not NL).next
res = atom(next, next())
if next():
raise SyntaxError("bogus data after expression")
return res
def atom(next, token):
def _iter_sequence(end):
token = next()
while token != end:
yield atom(next, token)
token = next()
if token == ',':
token = next()
firstchar = token[0]
if token in _consts:
return _consts[token]
elif token[-1] == 'L':
return long(token)
elif firstchar in digits:
return float(token) if '.' in token else int(token)
elif firstchar in '"\'':
return token[1:-1].decode('string-escape')
elif firstchar == 'u':
return token[2:-1].decode('unicode-escape')
elif token == '-':
return -atom(next, next())
elif token == '(':
return tuple(_iter_sequence(')'))
elif token == '[':
return list(_iter_sequence(']'))
elif token == '{':
out = {}
token = next()
while token != '}':
key = atom(next, token)
next() # Skip key-value delimiter (':')
token = next()
out[key] = atom(next, token)
token = next()
if token == ',':
token = next()
return out
raise SyntaxError('malformed expression (%r)' % token)
Regards,
George
More information about the Python-list
mailing list