[Tutor] Convert doc to txt on Ubuntu
Emad Nawfal (عماد نوفل)
emadnawfal at gmail.com
Wed Sep 16 21:14:30 CEST 2009
On Wed, Sep 16, 2009 at 3:03 PM, Carnell, James E <
jecarnell at saintfrancis.com> wrote:
>
> I am needing to access the text in hundreds of Microsoft .doc files on an
> Ubuntu OS. I looked at win32 , but only saw support for windows. I am going
> through all of these files to create a fairly simple text delimited file for
> a spreadsheet.
>
> A) Batch convert to text files so I can access them
> B) import some module that allows me to decode this format
> C) Open Office allows batch conversion to .odc ,but still don't know how to
> access
> D) Buy a 24 pack, some Twinkies, and go watch David Hasselhoff reruns
>
> Opening .txt documents works fine.
>
> Currently get:
>
> inFile = open("myTestFile.doc", "r")
> testRead = inFile.read()
>
> Traceback (most recent call last):
> File "<pyshell#11>", line 1, in <module>
> test = inFile.read()
> File "/usr/lib/python3.0/io.py", line 1728, in read
> decoder.decode(self.buffer.read(), final=True))
> File "/usr/lib/python3.0/io.py", line 1299, in decode
> output = self.decoder.decode(input, final=final)
> File "/usr/lib/python3.0/codecs.py", line 300, in decode
> (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
> invalid data
>
> Any help greatly appreciated Thanks bunches.
>
> ubuntu comes with antiword, a program that does exactly this. I usually use
> it through through the commands module in python.
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
--
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"
Emad Soliman Nawfal
Indiana University, Bloomington
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090916/cb7b6093/attachment.htm>
More information about the Tutor
mailing list