unicode bit me

anuraguniyal at yahoo.com anuraguniyal at yahoo.com
Mon May 11 14:14:18 CEST 2009


On May 11, 10:47 am, Terry Reedy <tjre... at udel.edu> wrote:
> anuraguni... at yahoo.com wrote:
> > so unicode(obj) calls __unicode__ on that object
>
> It will look for the existence of type(ob).__unicode__ ...
>
>  > and if it isn't there __repr__ is used
>
> According to the below, type(ob).__str__ is tried first.
>
> > __repr__ of list by default return a str even if __repr__ of element
> > is unicode
>
>  From the fine library manual, built-in functions section:
> (I reccommend using it, along with interactive experiments.)
>
> "repr( object)
> Return a string ..."
>
> "str( [object])
> Return a string ..."
>
> "unicode( [object[, encoding [, errors]]])
>
> Return the Unicode string version of object using one of the following
> modes:
>
> If encoding and/or errors are given, ...
>
> If no optional parameters are given, unicode() will mimic the behaviour
> of str() except that it returns Unicode strings instead of 8-bit
> strings. More precisely, if object is a Unicode string or subclass it
> will return that Unicode string without any additional decoding applied.
>
> For objects which provide a __unicode__() method, it will call this
> method without arguments to create a Unicode string. For all other
> objects, the 8-bit string version or representation is requested and
> then converted to a Unicode string using the codec for the default
> encoding in 'strict' mode.
> "
>
> 'unicode(somelist)' has no optional parameters, so skip to third
> paragraph.  Somelist is not a unicode instance, so skip to the last
> paragraph.  If you do dir(list) I presume you will *not* see
> '__unicode__' listed.  So skip to the last sentence.
> unicode(somelist) == str(somelist).decode(default,'strict').
>
> I do not believe str() and repr() are specifically documented for
> builtin classes other than the general description, but you can figure
> that str(collection) or repr(collection) will call str or repr on the
> members of the collection in order to return a str, as the doc says.
Thanks for the explanation.

> (Details are available by experiment.)  Str(uni_string) encodes with the
> default encoding, which seems to be 'ascii' in 2.x.  I am sure it uses
> 'strict' errors.
>
> I would agree that str(some_unicode) could be better documented, like
> unicode(some_str) is.
>
> > so my only solution looks like to use my own list class everywhere i
> > use list
> > class mylist(list):
> >     def __unicode__(self):
> >         return u"["+u''.join(map(unicode,self))+u"]"
>
> Or write a function and use that instead, or, if and when you can,
> switch to 3.x where str and repr accept and produce unicode.
>
> tjr




More information about the Python-list mailing list