convert unicode characters to visibly similar ascii characters
peter.bulychev at gmail.com
Tue Jul 1 20:31:15 CEST 2008
I want to convert unicode character into ascii one.
The method ".encode('ASCII') " can convert only those unicode characters,
which fit into 0..128 range.
But there are still lots of characters beyond this range, which can be
manually converted to some visibly similar ascii characters. For instance,
there are several quotation marks in unicode, which can be converted into
ascii quotation mark.
Can this conversion be performed in automatic manner? After googling I've
only found that there exists Unicode database, which stores human-readable
information on notation of all unicode characters (
ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt). And there also exists
the Python adapter for this database (
http://docs.python.org/lib/module-unicodedata.html). Using this database I
can do something like `if notation.find('QUOTATION')!=-1:\n\treturn "'"`. I
believe there is more elegant way. Am I right?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list