parsing an Excel formula with the re module

MRAB python at
Tue Jan 5 23:37:08 CET 2010

Mensanator wrote:
> On Jan 5, 12:35 pm, MRAB <pyt... at> wrote:
>> vsoler wrote:
>>> Hello,
>>> I am acessing an Excel file by means of Win 32 COM technology.
>>> For a given cell, I am able to read its formula. I want to make a map
>>> of how cells reference one another, how different sheets reference one
>>> another, how workbooks reference one another, etc.
>>> Hence, I need to parse Excel formulas. Can I do it by means only of re
>>> (regular expressions)?
>>> I know that for simple formulas such as "=3*A7+5" it is indeed
>>> possible. What about complex for formulas that include functions,
>>> sheet names and possibly other *.xls files?
>>> For example    "=Book1!A5+8" should be parsed into ["=","Book1", "!",
>>> "A5","+","8"]
>>> Can anybody help? Any suggestions?
>> Do you mean "how" or do you really mean "whether", ie, get a list of the
>> other cells that are referred to by a certain cell, for example,
>> "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5]
> Ok, although "Book1" would be the default name of a workbook, with
> default
> worksheets labeled "Sheet1". "Sheet2", etc.
> If I had a worksheet named "Sheety" that wanted to reference a cell on
> "Sheetx"
> OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
> a completely
> different workbook (say Book1 with worksheets labeled "Sheet1",
> "Sheet2") then
> the cell might have =[Book1]Sheet1!A7.
> And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.

I forgot about the dollars! In that case, the regex is:

     references = re.findall(r"\b((?:\w+!)?\$?[A-Za-z]+\$?\d+)\b", formula)

