unicode bit me
Terry Reedy
tjreedy at udel.edu
Mon May 11 01:47:06 EDT 2009
anuraguniyal at yahoo.com wrote:
> so unicode(obj) calls __unicode__ on that object
It will look for the existence of type(ob).__unicode__ ...
> and if it isn't there __repr__ is used
According to the below, type(ob).__str__ is tried first.
> __repr__ of list by default return a str even if __repr__ of element
> is unicode
From the fine library manual, built-in functions section:
(I reccommend using it, along with interactive experiments.)
"repr( object)
Return a string ..."
"str( [object])
Return a string ..."
"unicode( [object[, encoding [, errors]]])
Return the Unicode string version of object using one of the following
modes:
If encoding and/or errors are given, ...
If no optional parameters are given, unicode() will mimic the behaviour
of str() except that it returns Unicode strings instead of 8-bit
strings. More precisely, if object is a Unicode string or subclass it
will return that Unicode string without any additional decoding applied.
For objects which provide a __unicode__() method, it will call this
method without arguments to create a Unicode string. For all other
objects, the 8-bit string version or representation is requested and
then converted to a Unicode string using the codec for the default
encoding in 'strict' mode.
"
'unicode(somelist)' has no optional parameters, so skip to third
paragraph. Somelist is not a unicode instance, so skip to the last
paragraph. If you do dir(list) I presume you will *not* see
'__unicode__' listed. So skip to the last sentence.
unicode(somelist) == str(somelist).decode(default,'strict').
I do not believe str() and repr() are specifically documented for
builtin classes other than the general description, but you can figure
that str(collection) or repr(collection) will call str or repr on the
members of the collection in order to return a str, as the doc says.
(Details are available by experiment.) Str(uni_string) encodes with the
default encoding, which seems to be 'ascii' in 2.x. I am sure it uses
'strict' errors.
I would agree that str(some_unicode) could be better documented, like
unicode(some_str) is.
> so my only solution looks like to use my own list class everywhere i
> use list
> class mylist(list):
> def __unicode__(self):
> return u"["+u''.join(map(unicode,self))+u"]"
Or write a function and use that instead, or, if and when you can,
switch to 3.x where str and repr accept and produce unicode.
tjr
More information about the Python-list
mailing list