parsing an Excel formula with the re module

Tim Chase python.list at tim.thechases.com
Tue Jan 5 14:49:13 EST 2010


vsoler wrote:
> Hence, I need to parse Excel formulas. Can I do it by means only of re
> (regular expressions)?
> 
> I know that for simple formulas such as "=3*A7+5" it is indeed
> possible. What about complex for formulas that include functions,
> sheet names and possibly other *.xls files?

Where things start getting ugly is when you have nested function 
calls, such as

   =if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14, 
(Min(C1:C25)+3)*18,Max(B1:B25)))

Regular expressions don't do well with nested parens (especially 
arbitrarily-nesting-depth such as are possible), so I'd suggest 
going for a full-blown parsing solution like pyparsing.

If you have fair control over what can be contained in the 
formulas and you know they won't contain nested parens/functions, 
you might be able to formulate some sort of "kinda, sorta, maybe 
parses some forms of formulas" regexp.

-tkc





More information about the Python-list mailing list