[Tutor] output not in ANSI, conversing char set to locale.getpreferredencoding()

Peter Otten __peter__ at web.de
Tue Aug 14 16:03:46 CEST 2012


leon zaat wrote:

> I get the error:
> UnicodeDecodeError: 'ascii' codecs can't decode byte 0xc3 in position 7:
> ordinal not in range(128) for the openbareruimtenaam=u'' +
> (openbareruimtenaam1.encode(chartype)) line.


The error message means that database.select() returns a byte string.

bytestring.encode(encoding)

implicitly attempts

bytestring.decode("ascii").encode(encoding)

and will fail for non-ascii bytestrings no matter what encoding you pass to 
the encode() method.
 
> I know that the default system codecs is ascii and chartype=b'cp1252'
> But how can i get the by pass the ascii encoding?

You have to find out the database encoding -- then you can change the 
failing line to

database_encoding = ... # you need to find out yourself, but many use the
                        # UTF-8 -- IMO the only sensible choice these days
file_encoding = "cp1252"

openbareruimtenaam = openbareruimtenaam1.decode(
    database_encoding).encode(file_encoding)

As you now have a bytestring again you can forget about codecs.open() which 
won't work anyway as the csv module doesn't support unicode properly in 
Python 2.x (The csv documentation has the details).

PS: the u"..." prefix is a way to write unicode constants in Python 
sourcecode, you cannot create unicode a variable by tucking it in front of a 
string. 

u"" + bytestring

will trigger a decode

u"" + bytestring.decode("ascii")

and is thus an obcure way to spell

bytestring.decode("ascii")




More information about the Tutor mailing list