Reading special Danish letters in Python

Vlastimil Brom vlastimil.brom at gmail.com
Sun Dec 19 17:53:48 EST 2010


2010/12/19 Martin Hvidberg <Martin at hvidberg.net>:
> Dear list
>
> I have to read some data from an ASCII text file, filter it, and then export
> it to a .dbf file. Basically a straight forward task...
> My problem is that the input files contains some special national (Danish)
> characters, and it appears that I have to do something special to handle
> these in Python.
> The Danish language contains three letters not in the English alphabet: æ, ø
> and å.
> E.g. the Danish city name 'SOLRØD' is red by Python as 'SOLR\xc3\x98D'
> The three letters, in lower and upper case, seems to get translated as
> follow:
>
> æ = \xc3\xa6
> ø = \xc3\xb8
> å = \xc3\xa5
> Æ = \xc3\x86
> Ø = \xc3\x98
> Å = \xc3\x85
>
> Question:
> What is this, how do I get my Danish letters back?
>
> Best Regards
> Martin
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
Hi,
it seems, your data is utf-8 encoded text
cf.
>>> u"æ".encode("utf-8")
'\xc3\xa6'
>>> u"ø".encode("utf-8")
'\xc3\xb8'
you can decode the file content using this encoding if it is already
read somewhere or use codecs.open with the same encoding
>>> print '\xc3\xa6'.decode("utf-8")
æ
>>>

hth,
  vbr



More information about the Python-list mailing list