[Python-bugs-list] [Bug #114830] Writing of Unicode data not portable

noreply@sourceforge.net noreply@sourceforge.net
Thu, 21 Sep 2000 14:31:14 -0700


Bug #114830, was updated on 2000-Sep-19 13:46
Here is a current snapshot of the bug.

Project: Python
Category: Core
Status: Closed
Resolution: Fixed
Bug Group: None
Priority: 7
Summary: Writing of Unicode data not portable

Details: In 2.0b1, writing a Unicode object to a socket has an outcome that depends on the endianness of the processor. This is highly undesirable,
and not documented either in the reference manuals. It also break backwards compatibility in hard-to-analyse ways, as the other end of the socket may react strangely when confronted with Python's internal representation of a Unicode object.

Since determination of an encoding must be an application decision, it seems best if attempts to write Unicode objects to sockets produce an exception. To some degree, the same criticism applies to binary files.

Follow-Ups:

Date: 2000-Sep-21 14:31
By: lemburg

Comment:
Fixed by special casing "s#" for Unicode objects: "s#" will not
return the UTF-16 data, but instead the default encoded string
version of the Unicode object.

To access the internal data, either use one of the access macros
or PyObject_AsReadBuffer() (the read buffer interface still exists 
and returns the internal data representation like the buffer interface
defines).

-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=114830&group_id=5470