MS Word parser

kenicheema at gmail.com kenicheema at gmail.com
Wed Jun 13 12:47:48 EDT 2007


On Jun 13, 1:28 am, Tim Golden <m... at timgolden.me.uk> wrote:
> keniche... at gmail.com wrote:
> > Hi all,
> > I'm currently using antiword to extract content from MS Word files.
> > Is there another way to do this without relying on any command prompt
> > application?
>
> Well you haven't given your environment, but is there
> anything to stop you from controlling Word itself via
> COM? I'm no Word expert, but looking around, this
> seems to work:
>
> <code>
> import win32com.client
> word = win32com.client.Dispatch ("Word.Application")
> doc = word.Documents.Open ("c:/temp/temp.doc")
> text = doc.Range ().Text
>
> open ("c:/temp/temp.txt", "w").write (text.encode ("UTF-8"))
> </code>
>
> TJG

Tim,
I'm on Linux (RedHat) so using Word is not an option for me.  Any
other suggestions?




More information about the Python-list mailing list