Unicode problems, yet again

Fredrik Lundh fredrik at pythonware.com
Sun Apr 24 03:07:51 EDT 2005


Ivan Voras wrote:

> I have a string fetched from database, in iso8859-2, with 8bit
> characters, and I'm trying to send it over the network, via a socket:
>
>    File "E:\Python24\lib\socket.py", line 249, in write
>      data = str(data) # XXX Should really reject non-string non-buffers
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
> position 123: ordinal not in range(128)
>
> The other end knows it should expect this encoding, so how to send it?
>
> (Does anyone else feel that python's unicode handling is, well...
> suboptimal at least?)

you mean it should be able to automagically infer that you want your
Unicode strings to be shipped in ISO-8859-2 when you write them
to a socket?  wouldn't that annoy everyone using more common en-
codings, such as ISO-8859-1, UTF-8, and EUC-JP?

(the only "suboptimal" thing with Python's Unicode system is that it
forces you to learn that text is not just a bunch of bytes.  for some
reason, some programmers find that being extremely hard -- and
for some reason, the same programmers usually have no problems
understanding that python integers, floats, and other objects are not
just a bunch of bytes.  go figure...)

</F>






More information about the Python-list mailing list