UTF-8 problem encoding and decoding in Python3

MRAB python at mrabarnett.plus.com
Tue Oct 12 12:04:56 EDT 2010


On 12/10/2010 15:45, Hidura wrote:
> Don't work this is the error what give me TypeError: sequence item 0:
> expected bytes, str found, i continue trying to figure out how resolve
> it if you have another idea please tellme, but thanks anyway!!!
>
> On Mon, Oct 11, 2010 at 4:27 AM, Almar Klein <almar.klein at gmail.com
> <mailto:almar.klein at gmail.com>> wrote:
>
>
>     On 10 October 2010 23:01, Hidura <hidura at gmail.com
>     <mailto:hidura at gmail.com>> wrote:
>
>         I try to encode a binary file what was upload to a server and is
>         extract from the wsgi.input of the environ and comes as an unicode
>         string.
>
>
>     Firstly, UTF-8 is not meant to encode arbitrary binary data. But I
>     guess you could have a Unicode string in which the character index
>     represents a byte number. (But it's ugly!)
>
>     So if you can, you could make sure to send the file as just bytes,
>     or if it must be a string, base64 encoded. If this is not possible
>     you can try the code below to obtain the bytes, not a very fast
>     solution, but it should work (Python 3):
>
>
>     MAP = {}
>     for i in range(256):
>          MAP[tmp] = eval("'\\u%04i'" % i)
 >
 >     # Let's say 'a' is your string
 >     b''.join([MAP[c] for c in a])
 >

I don't know what you're trying to do here.

1. 'tmp' is the same for every iteration of the 'for' loop.

2. A Unicode escape sequence expects 4 hexadecimal digits; the 'i'
format gives a decimal number.

3. Using 'eval' to make a string this way is the long (and wrong) way
to do it; chr(i) would have the same effect.

4. The result of the eval is a string, but you're performing a join
with a bytestring, hence the exception.



More information about the Python-list mailing list