reverse engineering Excel spreadsheet
Laurent Pointal
laurent.pointal at wanadoo.fr
Sun Apr 1 13:43:04 EDT 2007
Duncan Smith wrote:
> Hello,
> I am currently implementing (mainly in Python) 'models' that come
> to me as Excel spreadsheets, with little additional information. I am
> expected to use these models in a web application. Some contain many
> worksheets and various macros.
>
> What I'd like to do is extract the data and business logic so that I can
> figure out exactly what these models actually do and code it up. An
> obvious (I think) idea is to generate an acyclic graph of the cell
> dependencies so that I can identify which cells contain only data (no
> parents) and those that depend on other cells. If I could also extract
> the relationships (functions), then I could feasibly produce something
> in pure Python that would mirror the functionality of the original
> spreadsheet (using e.g. Matplotlib for plots and more reliable RNGs /
> statistical functions).
>
> The final application will be running on a Linux server, but I can use a
> Windows box (i.e. win32all) for processing the spreadsheets (hopefully
> not manually). Any advice on the feasibility of this, and how I might
> achieve it would be appreciated.
>
> I assume there are plenty of people who have a better knowledge of e.g.
> COM than I do. I suppose an alternative would be to convert to Open
> Office and use PyUNO, but I have no experience with PyUNO and am not
> sure how much more reliable the statistical functions of Open Office
> are. At the end of the day, the business logic will not generally be
> complex, it's extracting it from the spreadsheet that's awkward. Any
> advice appreciated. TIA. Cheers.
>
> Duncan
As I remember, there is a documentation about Excel documents in xlrd
package. And with that, you dont need to use Excel via COM to find data in
the document.
http://www.lexicon.net/sjmachin/xlrd.htm
May also look at pyExcelerator
http://sourceforge.net/projects/pyexcelerator/
More information about the Python-list
mailing list