unicode bit me

Terry Reedy tjreedy at udel.edu
Mon May 11 01:47:06 EDT 2009


anuraguniyal at yahoo.com wrote:

> so unicode(obj) calls __unicode__ on that object

It will look for the existence of type(ob).__unicode__ ...

 > and if it isn't there __repr__ is used

According to the below, type(ob).__str__ is tried first.

> __repr__ of list by default return a str even if __repr__ of element
> is unicode

 From the fine library manual, built-in functions section:
(I reccommend using it, along with interactive experiments.)

"repr( object)
Return a string ..."

"str( [object])
Return a string ..."

"unicode( [object[, encoding [, errors]]])

Return the Unicode string version of object using one of the following 
modes:

If encoding and/or errors are given, ...

If no optional parameters are given, unicode() will mimic the behaviour 
of str() except that it returns Unicode strings instead of 8-bit 
strings. More precisely, if object is a Unicode string or subclass it 
will return that Unicode string without any additional decoding applied.

For objects which provide a __unicode__() method, it will call this 
method without arguments to create a Unicode string. For all other 
objects, the 8-bit string version or representation is requested and 
then converted to a Unicode string using the codec for the default 
encoding in 'strict' mode.
"

'unicode(somelist)' has no optional parameters, so skip to third 
paragraph.  Somelist is not a unicode instance, so skip to the last 
paragraph.  If you do dir(list) I presume you will *not* see 
'__unicode__' listed.  So skip to the last sentence.
unicode(somelist) == str(somelist).decode(default,'strict').

I do not believe str() and repr() are specifically documented for 
builtin classes other than the general description, but you can figure 
that str(collection) or repr(collection) will call str or repr on the 
members of the collection in order to return a str, as the doc says. 
(Details are available by experiment.)  Str(uni_string) encodes with the 
default encoding, which seems to be 'ascii' in 2.x.  I am sure it uses 
'strict' errors.

I would agree that str(some_unicode) could be better documented, like 
unicode(some_str) is.

> so my only solution looks like to use my own list class everywhere i
> use list
> class mylist(list):
>     def __unicode__(self):
>         return u"["+u''.join(map(unicode,self))+u"]"

Or write a function and use that instead, or, if and when you can, 
switch to 3.x where str and repr accept and produce unicode.

tjr




More information about the Python-list mailing list