[Tutor] extracting text from word files (.doc, .docx) and pdf

Juan Jose Del Toro jdeltoro1973 at gmail.com
Tue Jan 25 22:52:56 CET 2011


Dear List;

I am looking for a way to extract parts of a text from word (.doc,.docx)
files as well as pdf; the idea is to walk through the whole directory tree
and populate a csv file with an excerpt from each file.
For PDF I found PyPdf <http://pybrary.net/pyPdf/>ave found nothing to read
doc, docx
-- 
¡Saludos! / Greetings!
Juan José Del Toro M.
jdeltoro1973 at gmail.com
Guadalajara, Jalisco MEXICO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110125/8ee7be48/attachment.html>


More information about the Tutor mailing list