Python word to text

Tim Golden mail at timgolden.me.uk
Tue Sep 1 10:05:14 EDT 2009


BJörn Lindqvist wrote:
> 2009/9/1 Nitebirdz <nitebirdz at sacredchaos.com>:
>> On Tue, Sep 01, 2009 at 11:38:30AM +0200, BJörn Lindqvist wrote:
>>> Hello everybody,
>>>
>>> I'm looking for a pure Python solution for converting word documents
>>> to text. App Engine doesn't allow external programs, which means that
>>> external programs like catdoc and antiword can't be used. Anyone know
>>> of any?
>>>
>> A quick search returned this:
>>
>> http://code.activestate.com/recipes/279003/
> 
> It requires windows.

I'm moderately confident that no (published) solution exists
for this without relying on an installed Word or an external
program of the kind you mentioned. Obviously, there's nothing
to stop someone creating a Python module which does the
equivalent, possibly by wrapping the core of the catdoc/antiword
code in a Python module or by recoding its functionality in
Python. But I imagine you knew that :)

If you were talking Excel, you'd be in luck thanks to the
sterling work done by John Machin and others. But I imagine
that the market for word doc interchange / conversion is
considerably smaller, especially within restricted environments.

Depending on the source of your docs, it would be possible to
save them as, eg, XML or something for which a converter is
available in Python. Even text-only, I suppose. But I suppose
that you're asking because that's not a possibility?

TJG



More information about the Python-list mailing list