character-filtering and Word (& company)

John Machin sjmachin at
Sun Mar 27 01:55:14 CET 2005

Charles Hartman wrote:
> I'm working on text-handling programs that want plain-text files as
> input. It's fine to tell users to feed the programs with plain-text
> only, but not all users know what this means, even after you explain
> it, or they forget. So it would be nice to be able to handle
> the stuff that MS Word (or any word-processor) puts into a file.
> Inserting a 0-127 filter is easy but not very friendly. Typically,
> w.p. file loads OK (into a wx.StyledTextCtrl a.k.a Scintilla editing
> pane), and mostly be readable. Just a few characters will be wrong:
> "smart" quotation marks and the like.
> Is there some well-known way to filter or translate this w.p.
> I don't know whether encodings are relevant; I don't know what
> an MSW file uses. I don't see how to use s.translate() because I
> know how to predict what the incoming format will be.
> Any hints welcome.

This may help:

[not a recommendation, I've never used it]

