Normalize a polish L

Thorsten Kampe thorsten at
Mon Oct 15 20:20:30 CEST 2007

* Peter Bengtsson (Mon, 15 Oct 2007 16:33:26 -0000)
> In UTF8, \u0141 is a capital L with a little dash through it as can be
> seen in this image:
> I tried this:
> >>> import unicodedata
> >>> unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
> ''
> I was hoping it would convert it it 'L' because that's what it
> visually looks like. And I've seen it becoming a normal ascii L before
> in other programs such as Thunderbird.

The 'L' is actually pronounced like the English "w"...
> I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
> none of them helped.

>>> unicodedata.decomposition(u'\N{LATIN CAPITAL LETTER C WITH CEDILLA}')
'0043 0327'

>>> unicodedata.normalize('NFKD', u'\N{LATIN CAPITAL LETTER C WITH CEDILLA}').encode('ascii','ignore')

>>> unicodedata.decomposition(u'\N{LATIN CAPITAL LETTER L WITH STROKE}')

More information about the Python-list mailing list