simple string parsing ?

Peter Abel PeterAbel at gmx.net
Thu Sep 9 17:06:36 EDT 2004


TAG <tonino.greco at gmail.com> wrote in message news:<mailman.3110.1094742044.5135.python-list at python.org>...
> Thanks for this  -I will lookit TPG :)
> 
> Tonino
> 
> 
> On Thu, 9 Sep 2004 16:20:15 +0200, Marc Boeren <m.boeren at guidance.nl> wrote:
> > 
> > 
> > 
> > > =+GC142*(GC94+0.5*sum(GC96:GC101))
> > >
> > > and I want to get :
> > >
> > > ['=', '+', 'GC142', '*', '(', 'GC94', '+', '0.5', '*', 'sum', '(',
> > > 'GC96', ':', 'GC101', ')', ')']
> > >
> > > how can I get this ??????
> > 
> > The quick and dirty way: you have a formula containing a lot of
> > delimiters. Any part of the string that is not a delimiter is grouped
> > into a substring. So:
> > 
> > >>> formula = '=+GC142*(GC94+0.5*sum(GC96:GC101))'
> > >>> delimiters = '=+*():'
> > >>> parts = []
> > >>> appending = False
> > >>> for char in formula:
> > ...   if char in delimiters:
> > ...     parts+= [char]
> > ...     appending = False
> > ...   else:
> > ...     if appending:
> > ...       parts[-1]+= char
> > ...     else:
> > ...       parts+= [char]
> > ...       appending = True
> > ...
> > >>> parts
> > ['=', '+', 'GC142', '*', '(', 'GC94', '+', '0.5', '*', 'sum', '(',
> > 'GC96', ':', 'GC101', ')', ')']
> > 
> > This is simply to get you what you want, if you wish to use this formula
> > to actually compute something, it may be wise to dive into the various
> > parser packages, I found TPG (Toy Parser Generator) easy to use for
> > simple things...
> > 
> > Cheerio, Marc.
> >

Among others I have to code geometric algos in assembler like
e.g. xnew=x*cos(phi)-y*sin(phi)
     ynew=y*cos(phi)+xsin(phi)

This is done by macro calls to feed an arithmetic coprocessor
with a 4 register deep stack which is upn organized.
The idea was to write the formula as a comment and let python
generate the assembler macro calls.
Here is my first approach to do the first step 
corresponding to your problem:

>>> import sys,os,shlex, StringIO,re,string
>>> formula='y=+GC142*(GC94+0.5*sum(GC96:GC101))'
>>> infile = StringIO.StringIO(formula)
>>> x = shlex.shlex(infile)
>>> input=[x.get_token()]
>>> while input[-1]:
... 	if input[-1]=='.':
... 		input[-2]=input[-2]+'.'+x.get_token()
... 		del input[-1]
... 	input.append(x.get_token())
... 
>>> input
['y', '=', '+', 'GC142', '*', '(', 'GC94', '+', '0.5', '*', 'sum',
'(', 'GC96', ':', 'GC101', ')', ')', '']
>>> 

Quite an other approach was to use the python compiler 
with the restriction that my formula had to follow python
sytax which was no problem for me.
So if you replace the divide sign ":" by "/" this works too.

>>> import compiler
>>> formula='y=+GC142*(GC94+0.5*sum(GC96/GC101))'
>>> compiler.parse(formula)
Module(None, Stmt([Assign([AssName('y', 'OP_ASSIGN')],
Mul((UnaryAdd(Name('GC142')), Add((Name('GC94'), Mul((Const(0.5),
CallFunc(Name('sum'), [Div((Name('GC96'), Name('GC101')))], None,
None))))))))]))

The result represents function calls to process formula. The nesting
of the function calls is arithmeticly correct. So if you code 
function for each functionname you can do in that specific 
functions what ever you want to do.
BTW the nesting is upn-organized what helped me very much.
Maybe this approach is a little bit silly cause python gurus 
will tell me that the result of compiler.parse(...) is a 
compile.ast-instance where you can do things in a more pythonic 
manner, but I'm not so deep in the compiler.ast philosophy and 
the results of what I did were fully satisfying for me.

Hope I could give you some ideas.

Regards
Peter



More information about the Python-list mailing list