[Python-porting] conversation of byte to str

Resul Cetin Resul-Cetin at gmx.net
Sat Dec 13 22:21:12 CET 2008


Hi,
I am currently porting a program which reads some xml data from https using 
urllib and reads it with minidom to create some other data for us. I expected
to get a str but all current data I get is bytes (which makes sense of course
as it is more common). So my idea was to do a bytes.decode("utf-8") but that
gives me a ascii error when it tries to decode utf-8 data (german umlauts in
this case).

What I do is:
 import urllib.request
 a = urllib.request.urlopen('http://www.example.com/')
 b = a.read()


b holds now
 b'<asd>\n\t<p>aa\xc3\xa4aa</p>\n</asd>\n'

When I try to decode that with
 str(b'<asd>\n\t<p>aa\xc3\xa4aa</p>\n</asd>\n', encoding="utf-8")
or
 b.decode("utf-8")

I get
 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/lib/python3.0/io.py", line 1491, in write
     b = encoder.encode(s)
   File "/usr/lib/python3.0/encodings/ascii.py", line 22, in encode
     return codecs.ascii_encode(input, self.errors)[0]
 UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 
15: ordinal 
not in range(128)


b.decode("utf-8", "ignore") work either... I am extrem glueless how to get a 
string to 
feed minidom with it.

PS: I got my version from ubuntu jaunty today

Regards,
	Resul Cetin




More information about the Python-porting mailing list