[Tutor] Parsing Word Docs
sanelson at gmail.com
Thu Mar 8 15:40:22 CET 2007
I have a directory containing a load of word documents, say 100 or so.
which is updated every hour.
I want a cgi script that effectively does a grep on the word docs, and
returns each doc that matches the search term.
I've had a look at doing this by looking at each binary file and
reimplementing strings(1) to capture useful info. I've also read that
one can treat a word doc as a COM object. Am I right in thinking that
I can't do this on python under unix?
What other ways are there? Or is the binary parsing the way to go?
More information about the Tutor