String.join revisited (URGENT for 1.6)

Johannes Stezenbach yawyi at gmx.de
Mon May 29 17:41:16 EDT 2000


Fredrik Lundh <effbot at telia.com> wrote:
>now, would your modified version be a great improvement over the
>current 1.6 release?  that pretty much depends on your answers to
>these two questions:
>
>-- does "join" really make sense on things that are not strings?
>   (hint: does "+" always mean concatenation?  does "join" make
>   sense if it doesn't?  should join([1, 2, 3], 10) really return 36?)

The overloaded meaning of '+' makes non-sensical use of join() possible,
but this is not the fault of the join() definition.

It might be useful if join([[1], [2], [3]], [10]) returned [1, 10, 2, 10, 3].
User defined classes can define their own __add__ which could make
sense with join() (e.g. join a list of HTML paragraph objects with a
separator of type <hr>, <img>, <li> etc.)

You might argue that this is somewhat contrived, and in those rare
cases where one needs to join non-strings an explicit loop would
be easy to write.
OTOH, the definition of join as "concatenating the elements of a
sequence with an interspersed separator" is IMHO simple, elegant
and useful, regardless of the type of the separator and sequence
elements.

>-- does it really matter if the implementation hook happens to be
>   called "join"?

No.
But wait: If what you mean by "implementation hook" is that Python
programmers should keep using string.join() exclusively and never
use "".join() directly, then please rename "".join to "".__join__ to
make this clear. Otherwise keep on reading, please.

>in the current release, the answers are "no" (hypergeneralization)
>and "no" (no matter what it's called, people will find it and use it).
>
>if you have better answers, please motivate.

Well, generalization of join() was not my goal, it was just
the vehicle I used to try to convince you to make join() a method
of sequences instead of strings.

IIRC your point was: "There's more that one string type, each of
which needs it't own join() -> it makes most sense to make join()
a method of the string types."

So why the hell do I want to convince you to change this?
Because I think that this strategy is good for the implementor
of stringmodule.c and unicodemodule.c only, but unconvenient for
Python programmers.

In my very humble opinion
  words = ["foo", "bar", "baz"]
  print words.join(" ")
looks right and
  print " ".join(words)
looks wrong. It will confuse people and provoke programming errors.
It's not a nice thing to have in a CP4E language.

(Not that it is that difficult to get used to -- humans can adapt
to hostile environments like case-insensitive languages or even
Perl <wink>. Mastering "".join() should be a piece of cake...)

Since Python doesn't have a tangible Sequence base class and by its
dynamically typed nature no list-of-strings (etc.) class, there is
no natural place to stick the string-join implementation in.
But as Eric Jacobs showed, it is easy to define a generalized
sequence.join(), except that this would have bad performance for the
common case of joining sequences of strings.

So I say: Implement the easy generalized join() and fiddle the
optimization for the string-join in. Python programmers won't
mind some hidden ugliness in the C code if they are rewarded
with better looking Python code.

Two possible implementations for the join() with the "right"
look-and-feel would be (of course in C, to be repeated for
each sequence type):

    def join(self, separator):
        if type(separator) == type(""):
            # insert efficient implementation of " ".join
        elif type(separator) == type(u""):
            # insert efficient implementation of u" ".join
        else:
            result = self[:1]
            for x in self[1:]:
              result = result + separator + x
          return result

or more general but a little less efficient:

    def join(self, separator):
        if hasattr(separator, "__join__"):
            # join() implementation stays in the string/unicode type
            return separator.__join__(self)
        else:
            result = self[:1]
            for x in self[1:]:
              result = result + separator + x
          return result

Johannes




More information about the Python-list mailing list