[Python-Dev] Printing and __unicode__

Guido van Rossum guido@python.org
Thu, 14 Nov 2002 08:48:58 -0500


> Martin v. Loewis wrote:
> > "M.-A. Lemburg" <mal@lemburg.com> writes:
> > 
> > 
> >>The fact that StringIO works with Unicode (and then only in the
> >>case where you *only* pass Unicode to it) is more an implementation
> >>detail than a true feature. 
> > 
> > It's a true feature. You explicitly fixed that feature in
> > 
> > revision 1.20
> > date: 2002/01/06 17:15:05;  author: lemburg;  state: Exp;  lines: +8 -5
> > Restore Python 2.1 StringIO.py behaviour: support concatenating
> > Unicode string snippets to larger Unicode strings.
> > 
> > This fix should also go into Python 2.2.1.
> > 
> > after you broke it in
> > 
> > revision 1.19
> > date: 2001/09/24 17:34:52;  author: lemburg;  state: Exp;  lines: +4 -1
> > branches:  1.19.12;
> > StringIO patch #462596: let's [c]StringIO accept read buffers on
> > input to .write() too.
> 
> I doubt that it's a true feature. The fact that I broke it
> in the above patch by introducing the str(data) call in
> StringIO.py suggests that whoever complained about this change
> was using an implementation detail rather than a documented
> and originally intended feature of StringIO.
> 
> If you need something like StringIO for Unicode then I would
> suggest to create a similar object which then only deals with
> Unicode, e.g. UnicodeIO.

But since StringIO already works for Unicode, why bother?

> cStringIO could then be extended to also support such an object
> by using the same trick as SRE does to support two native
> types (putting the code into a .h file and then including
> it twice).

(Off-topic: each time I fix a bug twice, once in stringobject.c and
once in unicodeobject.c, I wish we'd done that for string and unicode
objects.  But it's too late now, and also may not be realistic given
some different implementation choices.)

> Back to the original question. I don't have a problem with
> leaving in the Unicode support in StringIO's .write() method,
> but the introduction of the Unicode print support should not
> rely on this detail.

Agreed.

>                      Instead someone wanting to write Unicode
> only to a StringIO like object should be directed to UnicodeIO.
> 
> Now, to satisfy the request of the poster who wanted support for
> __unicode__ in PyFile_WriteObject() we need to add something
> which lets PyFile_WriteObject() determine wether to look
> for __unicode__ or not (per default, it passes through
> Unicode objects as-is and applies str() to all other objects).
> 
> I like the idea of using the .encoding attribute as flag
> for this. What I don't like is that setting it to None
> should be used for Unicode-only streams (ones that take
> Unicode on input and use Unicode on output). To me,
> .encoding = None would signal: this stream doesn't do anything
> to the input data and passes it to the output stream as-is.

But I'm not sure that's a useful feature.  Maybe encoding=None could
mean the current StringIO behavior. <0.5 wink>

> Much better, IMHO, would be to use .encoding = 'unicode'
> on Unicode-only streams such as the mentioned UnicodeIO
> object.

Yes.  (Except 'unicode' is not an encoding name, right?  Maybe it
should be?)

> In summary, StringIO objects should not implement .encoding
> while a new Unicode-only stream-like object UnicodeIO
> should have .encoding = 'unicode'.
> 
> The same could then be done with the corresponding cStringIO
> objects.
> 
> PS: Some may not know, but the obvious way of fixing printing
> of Unicode by adding a tp_print slot implementation does not
> work, since that slot takes a FILE* pointer as file "object"
> which, of course, cannot include any additional information
> such as the encoding.

Yes, tp_print is only an optimization for tp_repr and tp_str when
writing to a "real" file object.

--Guido van Rossum (home page: http://www.python.org/~guido/)