[Python-Dev] Printing and __unicode__

M.-A. Lemburg mal@lemburg.com
Thu, 14 Nov 2002 10:10:20 +0100

Martin v. Loewis wrote:
> "M.-A. Lemburg" <mal@lemburg.com> writes:
>>The fact that StringIO works with Unicode (and then only in the
>>case where you *only* pass Unicode to it) is more an implementation
>>detail than a true feature. 
> It's a true feature. You explicitly fixed that feature in
> revision 1.20
> date: 2002/01/06 17:15:05;  author: lemburg;  state: Exp;  lines: +8 -5
> Restore Python 2.1 StringIO.py behaviour: support concatenating
> Unicode string snippets to larger Unicode strings.
> This fix should also go into Python 2.2.1.
> after you broke it in
> revision 1.19
> date: 2001/09/24 17:34:52;  author: lemburg;  state: Exp;  lines: +4 -1
> branches:  1.19.12;
> StringIO patch #462596: let's [c]StringIO accept read buffers on
> input to .write() too.

I doubt that it's a true feature. The fact that I broke it
in the above patch by introducing the str(data) call in
StringIO.py suggests that whoever complained about this change
was using an implementation detail rather than a documented
and originally intended feature of StringIO.

If you need something like StringIO for Unicode then I would
suggest to create a similar object which then only deals with
Unicode, e.g. UnicodeIO.

cStringIO could then be extended to also support such an object
by using the same trick as SRE does to support two native
types (putting the code into a .h file and then including
it twice).

Back to the original question. I don't have a problem with
leaving in the Unicode support in StringIO's .write() method,
but the introduction of the Unicode print support should not
rely on this detail. Instead someone wanting to write Unicode
only to a StringIO like object should be directed to UnicodeIO.

Now, to satisfy the request of the poster who wanted support for
__unicode__ in PyFile_WriteObject() we need to add something
which lets PyFile_WriteObject() determine wether to look
for __unicode__ or not (per default, it passes through
Unicode objects as-is and applies str() to all other objects).

I like the idea of using the .encoding attribute as flag
for this. What I don't like is that setting it to None
should be used for Unicode-only streams (ones that take
Unicode on input and use Unicode on output). To me,
.encoding = None would signal: this stream doesn't do anything
to the input data and passes it to the output stream as-is.

Much better, IMHO, would be to use .encoding = 'unicode'
on Unicode-only streams such as the mentioned UnicodeIO

In summary, StringIO objects should not implement .encoding
while a new Unicode-only stream-like object UnicodeIO
should have .encoding = 'unicode'.

The same could then be done with the corresponding cStringIO

PS: Some may not know, but the obvious way of fixing printing
of Unicode by adding a tp_print slot implementation does not
work, since that slot takes a FILE* pointer as file "object"
which, of course, cannot include any additional information
such as the encoding.

Marc-Andre Lemburg
CEO eGenix.com Software GmbH
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/