Converting .doc to .txt in Linux
pavlovevidence at gmail.com
Fri Sep 5 06:31:07 CEST 2008
On Sep 4, 4:18 pm, Tommy Nordgren <tommy.nordg... at comhem.se> wrote:
> On Sep 4, 2008, at 9:54 PM, patrick.wa... at gmail.com wrote:
> > Hi Everyone,
> > I had previously asked a similar question,
> > but at that point I was using Windows and now I am using Linux.
> > Basically, I have some .doc files that I need to convert into txt
> > files encoded in utf-8. However, win32com.client doesn't work in
> > Linux.
> > It's been giving me quite a headache all day. Any ideas would be
> > greatly appreciated.
> > Best,
> > Patrick
> > #Windows Code:
> > import glob,os,codecs,shutil,win32com.client
> > from win32com.client import Dispatch
> > input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
> > input_dir = '/home/pwaldo2/work/workbench/current_documents/'
> > outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'
> > for doc in glob.glob1(input):
> > WordApp = Dispatch("Word.Application")
> > WordApp.Visible = 1
> > WordApp.Documents.Open(doc)
> > WordApp.ActiveDocument.SaveAs(doc,7)
> > WordApp.ActiveDocument.Close()
> > WordApp.Quit()
> > for doc in glob.glob(input):
> > txt_split = os.path.splitext(doc)
> > txt_doc = txt_split + '.txt'
> > txt_doc_path = os.path.join(outpath,txt_doc)
> > doc_path = os.path.join(input_dir,doc)
> > shutil.copy(doc_path,txt_doc_path)
> > --
> You can do it manually with Open Office. <http://www.openoffice.org/>
> A free office suite.
On Debian there is a package called "unoconv"--written in Python--that
can do the conversions from the command line. It requires a running
instance of Open Office. However, the doc-to-txt conversion of Open
Office isn't that good. (It wasn't as good as Word's formatted text
converter, last time I used it.)
More information about the Python-list