[Python-porting] conversation of byte to str
Resul Cetin
Resul-Cetin at gmx.net
Sat Dec 13 22:21:12 CET 2008
Hi,
I am currently porting a program which reads some xml data from https using
urllib and reads it with minidom to create some other data for us. I expected
to get a str but all current data I get is bytes (which makes sense of course
as it is more common). So my idea was to do a bytes.decode("utf-8") but that
gives me a ascii error when it tries to decode utf-8 data (german umlauts in
this case).
What I do is:
import urllib.request
a = urllib.request.urlopen('http://www.example.com/')
b = a.read()
b holds now
b'<asd>\n\t<p>aa\xc3\xa4aa</p>\n</asd>\n'
When I try to decode that with
str(b'<asd>\n\t<p>aa\xc3\xa4aa</p>\n</asd>\n', encoding="utf-8")
or
b.decode("utf-8")
I get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.0/io.py", line 1491, in write
b = encoder.encode(s)
File "/usr/lib/python3.0/encodings/ascii.py", line 22, in encode
return codecs.ascii_encode(input, self.errors)[0]
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position
15: ordinal
not in range(128)
b.decode("utf-8", "ignore") work either... I am extrem glueless how to get a
string to
feed minidom with it.
PS: I got my version from ubuntu jaunty today
Regards,
Resul Cetin
More information about the Python-porting
mailing list