[ python-Bugs-1001011 ] str.join([ str-subtype-instance ]) misbehaves

SourceForge.net noreply at sourceforge.net
Sat Aug 7 17:48:58 CEST 2004


Bugs item #1001011, was opened at 2004-07-31 00:08
Message generated for change (Comment added) made by niemeyer
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1001011&group_id=5470

Category: Type/class unification
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Thomas Wouters (twouters)
Assigned to: Nobody/Anonymous (nobody)
Summary: str.join([ str-subtype-instance ]) misbehaves

Initial Comment:
Joining a list of string subtype instances usually
results in a single string instance:

  >>> class mystr(str): pass
  >>> type("".join([mystr("a"), mystr("b")]))
  <type 'str'>

But if the list only contains one object that is a
string subtype instance, that instance is returned
unchanged:

  >>> type("".join([mystr("a")]))
  <class '__main__.mystr'>

This can have odd effects, for instance when the result
of "".join(lst) is used as the returnvalue of a __str__
hook. "".join should perhaps return the type of the
joining string, but definately vary its type based on
the *number* of items its joining.



----------------------------------------------------------------------

>Comment By: Gustavo Niemeyer (niemeyer)
Date: 2004-08-07 15:48

Message:
Logged In: YES 
user_id=7887

If this was considered a bug: 
 
>>> type(ms("a")+ms("b")) 
<type 'str'> 
 
>>> type(ms("a")[:]) 
<type 'str'> 
 
Are these bugs as well? 
 
I belive this is how the implementation was intended to be, even if not 
optimal for subclasses. 
 
I suggest closing this bug as invalid, and writing a PEP about the possible new 
subclass support change (for all classes), if there's enough interest. 
 

----------------------------------------------------------------------

Comment By: Terry J. Reedy (tjreedy)
Date: 2004-08-05 16:10

Message:
Logged In: YES 
user_id=593130

Duh, my turn to forget. For any beginners reading this ...
>>> class ms(str): pass
...
>>> a=ms('a')
>>> type(''.join((a,)))
<class '__main__.ms'>

Expanding mhw's second point:

>>> e=ms()
>>> type(e)
<class '__main__.ms'>
>>> import copy
>>> e2=copy.copy(e)
>>> type(e2)
<class '__main__.ms'>
>>> e3=e[:]
>>> type(e3)
<type 'str'>
>>> id(e),id(e2),id(e3)
(9494608, 9009936, 8577440)

so [:] is not exactly an abbreviated synonym for copy().  Is 
this a butg?  (I haven't rechecked the respective docs yet.)

One reason I hesitate to call the OP's original observation a 
bug is that the whole sujbect of operations on subtype 
instances seems not completely baked.  Knowing the result 
types in all cases may require experiments as well as doc 
reading.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2004-08-05 12:04

Message:
Logged In: YES 
user_id=6656

A clue for Terry: think about what "(a)" isn't :-)

I initially agreed that this was a bug because, e.g.
str_subclass()[:] returns a str.  Isn't this the same sort
of thing?



----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-08-04 20:28

Message:
Logged In: YES 
user_id=38388

I agree with Terry. The result type is defined by the
semantics or the list elements and the length of the list:

len(list) > 1:
sep.join(list) := list[0] + sep + ... + sep + list[n]

len(list) == 1:
sep.join(list) := list[0]

len(list) == 0:
sep.join(list) := sep[:0]


----------------------------------------------------------------------

Comment By: Terry J. Reedy (tjreedy)
Date: 2004-08-04 19:39

Message:
Logged In: YES 
user_id=593130

This behavior does not, to me, clearly violate the current doc:
"Return a string which is the concatenation of the strings in 
the sequence seq"
where string is bytestring or Unicodestring.  If one takes
'string' narrowly, then your subclass instances should be 
rejected as input.  If one takes 'string' more broadly as 
isinstance(s,basestring) then your subclass should be equally 
acceptible as input or output.  If neither consistent 
interpretation of 'string' is meant, then there is a doc bug, or 
at least an underspecification.

Workaround 0: if len(seq) == 1: ...
Workaround 1. map(str, seq)) to force str out.

*However*, in playing around (in 2.2), I discovered:

>>> type(''.join((a)))
<type 'str'>
>>> type(''.join([a]))
<class '__main__.ms'>
>>> type(''.join({a:None}))
<class '__main__.ms'>

Having the type of the join of a singleton depend on the type 
(mutability?) of the singleton wrapper is definitely disquieting.

Workaround 2: tuple(seq)


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2004-08-02 14:25

Message:
Logged In: YES 
user_id=6656

What are you asking?  I agree it's a bug.  I'm sure you're 
competent to write a patch :-)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1001011&group_id=5470


More information about the Python-bugs-list mailing list