Unicode conversion problem (codec can't decode)

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Fri Apr 4 08:25:29 CEST 2008

En Fri, 04 Apr 2008 01:35:08 -0300, Eric S. Johansson <esj at harvee.org>  

> I'm having a problem (Python 2.4) converting strings with random 8-bit
> characters into an escape form which is 7-bit clean for storage in a  
> database.
> Here's an example:
> body = meta['mini_body'].encode('unicode-escape')
> when given an 8-bit string, (in meta['mini_body']), the code fragment  
> above
> yields the error below.
> 'ascii' codec can't decode byte 0xe1 in position 13: ordinal not in  
> range(128)

Because unicode-escape expects an unicode object as input; if you pass a  
byte string, it tries to convert it to unicode using the default encoding  
(ascii) and fails.

> I've read a lot of stuff about Unicode and Python and I'm pretty  
> comfortable
> with how you can convert between different encoding types.  What I don't
> understand is how to go from a byte string with 8-bit characters to an  
> encoded
> string where 8-bit characters are turned into  two character hexadecimal  
> sequences.

Almost there: use string-escape instead; it takes a byte string and  
returns another byte string in ASCII.

> I really don't care about the character set used.  I'm looking for a  
> matched set
> of operations that converts the string to a seven bits a form and back  
> to its
> original form.

Ok, string-escape should work. But which database are you using that can't  
handle 8bit strings?

Gabriel Genellina

More information about the Python-list mailing list