UTF-8 problem encoding and decoding in Python3

Almar Klein almar.klein at gmail.com
Mon Oct 11 04:27:35 EDT 2010


On 10 October 2010 23:01, Hidura <hidura at gmail.com> wrote:

> I try to encode a binary file what was upload to a server and is
> extract from the wsgi.input of the environ and comes as an unicode
> string.
>

Firstly, UTF-8 is not meant to encode arbitrary binary data. But I guess you
could have a Unicode string in which the character index represents a byte
number. (But it's ugly!)

So if you can, you could make sure to send the file as just bytes, or if it
must be a string, base64 encoded. If this is not possible you can try the
code below to obtain the bytes, not a very fast solution, but it should work
(Python 3):


MAP = {}
for i in range(256):
    MAP[tmp] = eval("'\\u%04i'" % i)

# Let's say 'a' is your string
b''.join([MAP[c] for c in a])


Cheers,
  Almar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101011/567c1199/attachment.html>


More information about the Python-list mailing list