Extracting attributes from compiled python code or parse trees
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Mon Jul 23 18:43:33 EDT 2007
En Mon, 23 Jul 2007 18:13:05 -0300, Matteo <mahall at ncsa.uiuc.edu> escribió:
> I am trying to get Python to extract attributes in full dotted form
> from compiled expression. For instance, if I have the following:
>
> param = compile('a.x + a.y','','single')
>
> then I would like to retrieve the list consisting of ['a.x','a.y'].
>
> The reason I am attempting this is to try and automatically determine
> data dependencies in a user-supplied formula (in order to build a
> dataflow network). I would prefer not to have to write my own parser
> just yet.
If it is an expression, I think you should use "eval" instead of "single"
as the third argument to compile.
> Alternatively, I've looked at the parser module, but I am experiencing
> some difficulties in that the symbol list does not seem to match that
> listed in the python grammar reference (not surprising, since I am
> using python2.5, and the docs seem a bit dated)
Yes, the grammar.txt in the docs is a bit outdated (or perhaps it's a
simplified one), see the Grammar/Grammar file in the Python source
distribution.
> In particular:
>
>>>> import parser
>>>> import pprint
>>>> import symbol
>>>> tl=parser.expr("a.x").tolist()
>>>> pprint.pprint(tl)
>
> [258,
> [326,
> [303,
> [304,
> [305,
> [306,
> [307,
> [309,
> [310,
> [311,
> [312,
> [313,
> [314,
> [315,
> [316, [317, [1, 'a']], [321, [23, '.'], [1,
> 'x']]]]]]]]]]]]]]]],
> [4, ''],
> [0, '']]
>
>>>> print symbol.sym_name[316]
> power
>
> Thus, for some reason, 'a.x' seems to be interpreted as a power
> expression, and not an 'attributeref' as I would have anticipated (in
> fact, the symbol module does not seem to contain an 'attributeref'
> symbol)
Using this little helper function to translate symbols and tokens:
names = symbol.sym_name.copy()
names.update(token.tok_name)
def human_readable(lst):
lst[0] = names[lst[0]]
for item in lst[1:]:
if isinstance(item,list):
human_readable(item)
the same tree becomes:
['eval_input',
['testlist',
['test',
['or_test',
['and_test',
['not_test',
['comparison',
['expr',
['xor_expr',
['and_expr',
['shift_expr',
['arith_expr',
['term',
['factor',
['power',
['atom', ['NAME', 'a']],
['trailer', ['DOT', '.'], ['NAME', 'x']]]]]]]]]]]]]]]],
['NEWLINE', ''],
['ENDMARKER', '']]
which is correct is you look at the symbols in the (right) Grammar file.
But if you are only interested in things like a.x, maybe it's a lot
simpler to use the tokenizer module, looking for the NAME and OP tokens as
they appear in the source expression.
--
Gabriel Genellina
More information about the Python-list
mailing list