Normalize a polish L

Peter Bengtsson peterbe at gmail.com
Tue Oct 16 10:50:53 EDT 2007


On Oct 15, 10:57 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Oct 16, 2:33 am, Peter Bengtsson <pete... at gmail.com> wrote:
>
>
>
> > In UTF8, \u0141 is a capital L with a little dash through it as can be
> > seen in this image:http://static.peterbe.com/lukasz.png
>
> > I tried this:>>> import unicodedata
> > >>> unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
>
> > ''
>
> > I was hoping it would convert it it 'L' because that's what it
> > visually looks like. And I've seen it becoming a normal ascii L before
> > in other programs such as Thunderbird.
>
> > I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
> > none of them helped.
>
> > What am I doing wrong?
>
> The character in question is NOT composed (in the way that Unicode
> means) of an 'L' and a little slash; hence the concepts of
> "normalization" and "decomposition" don't apply.
>
> To "asciify" such text, you need to build a look-up table that suits
> your purpose. unicodedata.decomposition() is (accidentally) useful in
> providing *some* of the entries for such a table.

Thank you! That explains it.




More information about the Python-list mailing list