character-filtering and Word (& company)

Cameron Laird claird at lairds.us
Sun Mar 27 08:08:06 EST 2005


In article <1111884913.972303.295770 at z14g2000cwz.googlegroups.com>,
John Machin <sjmachin at lexicon.net> wrote:
>
>Charles Hartman wrote:
>> I'm working on text-handling programs that want plain-text files as
>> input. It's fine to tell users to feed the programs with plain-text
>> only, but not all users know what this means, even after you explain
>> it, or they forget. So it would be nice to be able to handle
>gracefully
>> the stuff that MS Word (or any word-processor) puts into a file.
>> Inserting a 0-127 filter is easy but not very friendly. Typically,
>the
>> w.p. file loads OK (into a wx.StyledTextCtrl a.k.a Scintilla editing
>> pane), and mostly be readable. Just a few characters will be wrong:
>> "smart" quotation marks and the like.
>>
>> Is there some well-known way to filter or translate this w.p.
>garbage?
>> I don't know whether encodings are relevant; I don't know what
>encoding
>> an MSW file uses. I don't see how to use s.translate() because I
>don't
>> know how to predict what the incoming format will be.
>>
>> Any hints welcome.
>
>This may help: http://wvware.sourceforge.net/
>
>[not a recommendation, I've never used it]
>

As Mike Meyer wrote, there is *not* standardization.  wvWare is
indeed useful.  Before you go farther, though, I want to empha-
size to you what a challenge this is.  While it sounds simple to
users to collect their writings through a Web interface, this
turns out to present difficulties that go on and on.  Anything
you can do to structure the problem helps.

One minor variation that can help is to expose TEXTAREAs or
equivalent, and ask users to cut-and-paste their content into
them.  In some situations, that's surprisingly effective.



More information about the Python-list mailing list