[python-win32] Re: MS unicode bewilderment

Don Dwiggins dond@advancedmp.com
22 Jan 2003 08:51:24 -0800


Andrew Brown writes:
> My problem is that I don't know what character codes to use to search
> for MS smartquotes within that string. I have tried \x94, which
> doesn't work. I have tried various Unicode tricks, and they don't seem
> to work either. (The point, in case you're wondering, is to find all
> smart-quoted strings longer than about 100 words and wrap them in
> <blockquote> tags. But there is another spinoff: when I write books, I
> often need to check for quoted passages on which copyright fees pare
> payable. There is a rule of thumb about the length of "fair comment",
> something like 400 words. A script that could count quotes and show me
> the ones over that limit would be really helpful sometimes.

This doesn't address the Python/Unicode question, but here's a possible
workaround: make a copy of the document; in the copy, use MSWord's
find/replace to replace smart quotes by "dumb" ones (caveat: I haven't tried
this -- it may not work).  Save the copy and run your code on it, looking
for normal ascii quotes.  (You might be able to replace the begin and end
smart quotes by different characters, if that's important.)

For what it's worth,
-- 

Don Dwiggins				"Solvitur Ambulando"
d.l.dwiggins@computer.org