
- Since we added a note to the docs that StringIO supports Unicode, we clearly should continue to support that, and it's a bug if it doesn't.
I still believe that the docs are wrong, but nevermind. I'll fix StringIO.py to continue to support Unicode in addition to strings and buffer objects. It's basically only about special casing Unicode in the .write() method.
Thanks.
BTW, I was never aware of the doc changes in this area and the test suite didn't bring up the issues either.
Can you please add something to the test suite that makes sure this feature works?
- OTOH, Unicode for cStringIO should be considered at best a feature request. I don't mind if cStringIO doesn't support Unicode -- it never has, AFAIK, so it won't break much code. I don't believe it's much faster than StringIO, unless you use the C API (like cPickle does).
Unicode support in cStringIO would require a new implementation since the machinery uses raw byte buffers.
That's why I don't care much about it. :-)
- Of course, when Unicode is supported, mixing ASCII and Unicode should be supported too. (But not necessarily mixing 8-bit strings containing characters in the range \200-\377, since there's no default encoding for this range.)
In StringIO.py this is not much of a problem since it uses a list of snippets. Note that this is also why StringIO.py "supported" Unicode in the first place (and that's why I think it was more an artifact of the implementation than true intent).
But it was useful! :-)
- Since this changed from 2.1 to 2.2, we should restore this capability in 2.2.1; I would say that 2.2.1 can't go out until this is fixed.
Try to mark the checkin messages as "2.2.1 bugfix", for the 2.2.1 patch czar. --Guido van Rossum (home page: http://www.python.org/~guido/)