[Tutor] Trouble in dealing with special characters.

Mats Wichmann mats at wichmann.us
Fri Dec 7 10:28:31 EST 2018


On 12/7/18 3:20 AM, Steven D'Aprano wrote:

>> How to know whether in a given string(sentence) is there any that is not
>> ASCII character and how to replace?
> 
> That's usually the wrong solution. That's like saying, "My program can't 
> add numbers greater than 100. How do I tell if a number is greater than 
> 100, and turn it into a number smaller than 100?"

yes, it's usually the wrong solution, but in the case of quote marks it
is *possible* is is the wanted solution: certain text editing products
(cough cough Microsoft Word) are really prone to putting in typographic
quote marks.  Everyone knows not to use Word for editing your code, but
that doesn't mean some stuff doesn't make it into a data set we forced
to process, if someone exports some text from an editor, etc. There are
more quoting styles in the world than the English style, e.g. this one
is used in many languages: „quoted text“  (I don't know if that will
survive the email system, but starts with a descended double-quote mark).

It's completely up to what the application needs; it *might* as I say be
appropriate to normalize text so that only a single double-quote and
only a single single-quote (or apostrophe) style is used.  Or it might not.




More information about the Tutor mailing list