
"Martin v. Loewis" wrote:
[Over to python-dev. Jaco noticed that writing Unicode objects to a StringIO object stopped working in 2.2, see http://mail.python.org/pipermail/xml-sig/2001-December/006891.html ]
Marc-Andre writes
Actually, I think that this is a bug in the documentation, not the code. StringIO and cStringIO were never meant to work on anything but strings and memory buffers.
IMO, "strings" should include both byte strings and Unicode strings. Mixing them may not be allowed, but that is a different story.
In fact, there is an open bug (#216388) that cStringIO rejects Unicode objects. If that gets fixed, we get the funny scenario that StringIO rejects Unicode object, whereas cStringIO accepts them.
StringIO and cStringIO use different methods for storing the snippets: StringIO makes use of a buffer list which gets compressed every now and then, while cStringIO uses a raw memory buffer for this purpose. Both of these implementation are targetted at providing a file IO like interface to in-memory "files". Since Python file object don't magically support Unicode, I wonder where the idea came from that StringIO/cStringIO should. That patch I applied to StringIO/cStringIO for 2.2 was aimed at making these two more compatible to the standard Python file object. The latter uses the "s#" parser marker for .write() and thus can also accept memory buffers. This was previously not possible with either of the two StringIO implementation (StringIO.py failed when trying to join different buffer compatible objects, cStringIO only accepted real string objects).
The note that Fred added to the docs about StringIO's capability of storing Unicode in it's buffer list is simply an artifact of the implementation.
There are many developers who take this note literally. Claiming that this was not intentional is a mistake.
Please use the .encode() method on Unicode objects before writing them to a StringIO object.
If you want to end up with a byte string, this is a good idea.
That's the idea behind StringIO objects... they are in-memory file object emulators.
But I think it is pointless to require encoding them when you want to end up with a Unicode string; you'd have to invoke unicode() on the result, for no apparent reason but a bug in the StringIO implementation.
This is a different application. It should be easy enough to subclass StringIO as UnicodeIO class and then have this class implement fast Unicode snippet joining. I'm not sure whether the same can be done with cStringIO's type. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/