convert unicode characters to visibly similar ascii characters

M.-A. Lemburg mal at egenix.com
Wed Jul 2 10:39:13 CEST 2008


On 2008-07-01 20:31, Peter Bulychev wrote:
> Hello.
> 
> I want to convert unicode character into ascii one.
> The method ".encode('ASCII') " can convert only those unicode characters,
> which fit into 0..128 range.
> 
> But there are still lots of characters beyond this range, which can be
> manually converted to some visibly similar ascii characters. For instance,
> there are several quotation marks in unicode, which can be converted into
> ascii quotation mark.
> 
> Can this conversion be performed in automatic manner? After googling I've
> only found that there exists Unicode database, which stores human-readable
> information on notation of all unicode characters (
> ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt). And there also exists
> the Python adapter for this database (
> http://docs.python.org/lib/module-unicodedata.html). Using this database I
> can do something like `if notation.find('QUOTATION')!=-1:\n\treturn "'"`. I
> believe there is more elegant way. Am I right?

You could write a codec which translates Unicode into a ASCII
lookalike characters, but AFAIK there is no standard for doing
this.

I guess the best choice is to use the Unicode code point names
as basis. These can be accessed via unicodedata.name(). You can
then create a mapping which can be processed by the character
map codec.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jul 02 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania             4 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611



More information about the Python-list mailing list