[Tutor] extracting text from word files (.doc, .docx) and pdf
Emile van Sebille
emile at fenx.com
Wed Jan 26 00:59:38 CET 2011
On 1/25/2011 1:52 PM Juan Jose Del Toro said...
> Dear List;
>
> I am looking for a way to extract parts of a text from word (.doc,.docx)
I recently did a project extracting data from word documents and used
antiword (http://www.winfield.demon.nl/) then used it like this:
def setContent(self):
self.content =
[
ii.strip().replace("Ëš","")
for ii in
commands.getoutput('/usr/local/bin/antiword "%s"' %
doc).split("\n")
if ii
]
Emile
More information about the Tutor
mailing list