unicode vs. str: not quite parallel?

Tue Nov 12 04:29:14 EST 2002

loewis at informatik.hu-berlin.de (Martin v. Löwis) writes:

> ht at cogsci.ed.ac.uk (Henry S. Thompson) writes:
> 
> > If you print an object to a normal stream, and object's class has a
> >  __str__ method, what appears is the result of the __str__ method.
> 
> What is "a normal stream"?

One without a codec.getwriter wrapped around it.

> >>> f=open("/tmp/bla","w")
> >>> class X:
> ...   def __str__(self):
> ...     print "STR"
> ...     return "str"
> ... 
> >>> x=X()
> >>> str(x)
> STR
> 'str'
> >>> f.write(x)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: argument 1 must be string or read-only character buffer, not instance
> 
> So it is not at all common that you can write arbitrary things into a
> byte stream.

Sorry I wasn't clearer, but I did say 'print', not 'write', and if
you use print your example works fine:

>>> f=open("/tmp/bla","w")  
>>> class X:                
...   def __str__(self):    
...     print "STR"         
...     return "str"        
... 
>>> x=X()
>>> print x
 STR
str
>>> print >>f,x
STR
>>> f.close()

> > I've searched the archives but found no joy for this one -- any help
> > out there?
> 

> I can't offer help, but I will instead ask for help.
> 
> This looks like a bug. PyUnicode_FromObject does not consider invoking
> __unicode__, but I think it should. In fact, I cannot understand why
> PyObject_Unicode and PyUnicode_FromObject are different functions.
> There is already a comment in this function suggesting that.
> 
> So please either submit a bug report, or, better yet, a patch (I
> *will* forget about this if there is no reminder on SF).

Will do.

> In return, I can offer a work-around: When you lookup a stream writer,
> don't use that directly. Instead, do
> 
>  basewriter = codecs.get_writer(encodingname)
>  class writer(basewriter):
>    def write(self, data):
>      data = unicode(data)
>      return self.__bases__[0].write(data)

This certainly points in a useful direction, but I don't think it's
quite right -- it's not the .write method that is the problem here,
it's whatever print does. . .  I think it's correct for .Write to
complain if it doesn't get PyString or PyUnicode.

ht
-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2002, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht at cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
 [mail really from me _always_ has this .sig -- mail without it is forged spam]

__unicode__ vs. __str__: not quite parallel?

unicode vs. str: not quite parallel?