character-filtering and Word (& company)

John Machin sjmachin at lexicon.net
Sat Mar 26 19:55:14 EST 2005


Charles Hartman wrote:
> I'm working on text-handling programs that want plain-text files as
> input. It's fine to tell users to feed the programs with plain-text
> only, but not all users know what this means, even after you explain
> it, or they forget. So it would be nice to be able to handle
gracefully
> the stuff that MS Word (or any word-processor) puts into a file.
> Inserting a 0-127 filter is easy but not very friendly. Typically,
the
> w.p. file loads OK (into a wx.StyledTextCtrl a.k.a Scintilla editing
> pane), and mostly be readable. Just a few characters will be wrong:
> "smart" quotation marks and the like.
>
> Is there some well-known way to filter or translate this w.p.
garbage?
> I don't know whether encodings are relevant; I don't know what
encoding
> an MSW file uses. I don't see how to use s.translate() because I
don't
> know how to predict what the incoming format will be.
>
> Any hints welcome.

This may help: http://wvware.sourceforge.net/

[not a recommendation, I've never used it]




More information about the Python-list mailing list