How to Read/Write RTF and Word Files?

Alex Martelli aleaxit at yahoo.com
Mon May 21 04:28:23 EDT 2001


"Greg Jorgensen" <greg at pdxperts.com> wrote in message
news:Pine.LNX.4.33.0105201926390.6410-100000 at C800000-A.potlnd1.or.home.com..
.
> On Sat, 19 May 2001, Dry Ice wrote:
>
> > How to Read/Write RTF and Word Files?
    ...
> Microsoft Word uses a proprietary and undocumented format for .doc files.
> The Word file format has changed significantly across versions of Word.
> Whether by reverse-engineering or licensing the spec from Microsoft, quite
> a few companies have implemented at least some Word import/export
> capabilities.

Yep.  And there are opensource programs that attempt the same
feat (presumably after reverse-engineering, in this case).

http://www.fe.msk.ru/~vitus/catdoc/ offers such tools that manage
to extract some text from MS Word (and Excel) files most of the
time, and also points to other such tools.  These tools aren't
in Python, but you might use them from Python, or maybe recode in
Python the algorithms & heuristics they embody.


Alex






More information about the Python-list mailing list