Looking for UNICODE to ASCII Conversioni Example Code
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sat Oct 19 12:16:02 EDT 2013
On Sat, 19 Oct 2013 11:14:30 -0300, Zero Piraeus wrote:
> :
>
> On Sat, Oct 19, 2013 at 09:19:12AM +0000, Steven D'Aprano wrote:
>> Make no mistake, this sort of simple-minded stripping of accents and
>> diacritics is an extremely ham-fisted thing to do.
[...]
> Joking aside, there is a legitimate use for asciifying text in this way:
> creating unambiguous identifiers.
>
> For example, a miscreant may create the username 'míguel' in order to
> pose as another user 'miguel', relying on other users inattentiveness.
> Asciifying is one way of reducing the risk of that.
I'm pretty sure that Oliver and 0liver may not agree. Neither will
Megal33tHaxor and Mega133tHaxor.
It's true that there are *more* opportunities for this sort of
shenanigans with Unicode, so I guess your comment about "reducing" the
risk (rather than eliminating it) is strictly correct. But there are
other (better?) ways to do so, e.g. you could generate an identicon for
the user to act as a visual checksum:
http://en.wikipedia.org/wiki/Identicon
Another reasonable use for accent-stripping is searches. If I'm searching
for music by the Blue Öyster Cult, it would be good to see results for
Blue Oyster Cult as well. And vice versa. (A good search engine should
consider *adding* accents as well as removing them.)
On the other hand, if you name your band ▼□■□■□■, you deserve to wallow
in obscurity :-)
--
Steven
More information about the Python-list
mailing list