I come to praise .join, not to bury it...

Alex Martelli aleaxit at yahoo.com
Sat Mar 3 15:29:55 EST 2001


"Russell E. Owen" <owen at astrono.junkwashington.emu> wrote in message
news:97p7iq$kk2$1 at nntp6.u.washington.edu...
    [snip]
> However, I do agree that join is not intuitive. I believe the problem
> (at least for the way I look at it) is that join should be a list
> method, not a string method. I.e.:
>
>   joined_string = ['a', 'b'].join(', ')
>
> makes a lot of sense to me. The current join string method does not.

This seems to be a widespread opinion, and I've already tried
to explain why my take on it differs, but that was a few months
ago, and apparently the current crop of discussants hasn't read
those discussions, so, let's give it one more spin.

Python does not have multi-methods: it's a single-dispatch
language.  Thus, quite apart from minor issues of syntax sugar,
there is ONE important difference between:

    paak.join(klop)                # form 1
    klop.join(paak)                # form 2
    amodule.join(paak, klop)  # form 3

Form 1 states: this .join call CAN be directly polymorphic on
(==directly dispatch on) paak, but NOT on klop -- if it's at
all polymorphic on klop, it must be by accessing it through
some standard interface ('interfaces' are not formalized in
Python, or Smalltalk, but, informally, they do exist and are
widely used -- "this argument must be a file-like object
and implement method write", "this must be a sequence",
etc).

Form 2 states the reverse: this .join call CAN be directly
polymorphic on (==directly dispatch on) klop, but NOT on
paak -- if it's at all polymorphic on paak, it must be by
accessing it through some standard interface.

Form 3 states that this call (on a module function) is not
directly polymorphic on _either_ klop or paak, though it
may access either or both through standard interfaces and
thus attain a (constrained) degree of polymorphism on
either or both of them.

It would be different in a multi-dispatch language, of
course, but Python isn't one (nor is Smalltalk).

So, when one states "join should be a method of klop, not
of paak" ('form 2 is better than form 1'), one is really
stating something like...: "this .join may more easily need
to be generally-polymorphic on klop, while, if it needs any
polymorphism on paak, it will more reasonably be able to
obtain it indirectly by accessing paak through some standard
interface".
If one expresses a preference for form 3 over either 1 or 2,
then one is stating that general polymorphism is not very
necessary on either argument, that standard interfaces
will do the job satisfactorily enough if any polymorphism at
all is needed.

I hope we do agree so far -- it doesn't seem to me that,
so far, there is really anything contentious or debatable
in what I've written.  If one considers syntax sugar in se
and per se more important than polymorphic semantics,
of course, then one _might_ think my assertions so far
are contentious and debatable -- but I wouldn't be very
interested in debating them: for me, the importance of
substance always dominates that of mere form... I'd
rather read a great novel in a poor, cheap binding, than
trash hand-bound in the best-quality leather.  Those who
disagree are welcome to their opinions (and particularly
their expensively-bound trash literature!-) but we'll never
manage to convince each other of anything anyway.


So, back to the specific discussion: Python 2 implicitly
asserts "when a string ``joiner'' is used to join the items
in a sequence of strings, it's more important than the
method be polymorphic on the joiner-object; polymorphism
on the sequence of strings can be obtained indirectly by
accessing it through a standard interface".

You guys who disagree are implicitly asserting that full
fledged polymorphism over the sequence of strings being
joined is more important -- than either no polymorphism
at all is needed on the joiner object, or that what is needed
can be obtained by accessing it through a standard interface.

So, let's compared these implicit underlying assertions --
what kind of polymorphism is needed and where.


Say that I'm implementing a sequence-like object.  Would
having .join as a part of the set of methods I must write
be an _advantage_ to me?  As things stand now, to "be a
sequence" I must just implement __getitem__ in the
appropriate manner -- accepting an index from 0 to N-1,
and raising IndexError for indices outside the range -- and
maybe implement __length__ to return N.  With this small
amount of effort, I gain the ability to be used as sequence
in any existing context -- from the for statement onwards.

For example, if my object gets passed to a file-like object's
writelines method, then, voila, my string items, which I
return in sequence from __getitem__, get 'written' to the
file-like object one after another, in order -- I don't have
to do anything peculiar to allow that, nor am I given any
chance to interfere with the procedure in case of need --
that's what it means to get polymorphism by being accessed
through a standard interface, rather than direct, generalized
polymorphism.  The standard interface makes you work much
less (you implement it once, and it comes in handy for many
kinds of client code), but it gives you no say on what is to
happen in _specific_ situations -- you're not even informed
for what exact purpose / in what exact contect your standard
interface's methods are being called.

I'm pretty happy not having to worry about "how are my
items to be joined [written one after another] when I am
passed to writelines", etc -- I implement the sequence
standard interface, and I get 'for free' the ability of being
polymorphically used as a sequence "everywhere".  Cool!

At least in all use cases I've ever found, my object is
quite happy NOT being told 'your items are to be written
one after another to this file-like object, do whatever is
appropriate' -- rather, it's asked for its items one after
the other, and the asker does whatever is appropriate --
the sequence object's task is JUST to be a sequence, to
present the items appropriately and let it be known when
there are no more.

Is the case 'your items are to be made into a single
string by being joined by this joiner-object' -- the join
method -- all that different from the writelines method
case?  I don't think so -- why would I _want_ to do
something different and peculiar when what is to be
done with my items (one after another in sequence)
is to be concatenated into a string rather than written
into a file?  In either case I'll just present my items
in sequence, one after the other, and whatever object
is asking for them will do whatever operation it needs
to do.  So, it's quite appropriate for the sequence to be
passed as an argument to .writelines, .join, etc; all the
polymorphism needed on the sequence object is well
handled by accessing it through the standard sequence
interface.


What about the joiner object?  Could IT be accessed
through a standard interface, rather than provide the
general polymorphism via dispatch?
Well, WHAT standard interface would that be?  What
methods would it provide, and for what purposes?
I can't think of any appropriate interface except one
with a .join method -- which would rather beg the
question, since a .join method is exactly what we're
trying to design...!  We could _name_ it differently
in a standard interface, but, what for?  We may as
well accept the name 'join'.

So, IF we need any polymorphism at all on the part
of the joiner-object, it had better be general dispatch
polymorphism -- either that, or no polymorphism at
all.  Well, IS any polymorphism at all useful on the
joiner object...?


As it happens, it is.  As for other string methods, for
example, the fact that .join is a method of string
objects (and, polymorphically, Unicode strings) saves
the type-switching that a module-level function would
have to do, each and every time, to distinguish whether
single-byte character strings or Unicode ones are
involved.  But, there are other opportunities -- consider,
for example:

class MixedJoiner:
    def __init__(self, joiner1, joiner2):
        self.joiner1 = joiner1
        self.joiner2 = joiner2
    def join(self, sequence):
        return joiner2.join((
            joiner1.join(sequence[:-1]),
            sequence[-1]))

Now, if we get sequences of strings and want to join
them in such wise that the LAST joining uses a different
joiner than the others, we just have to instantiate the
appropriate MixedJoiner instance and use it:

>>> comma_and = MixedJoiner(', ',' and ')
>>> print comma_and.join(('wine','women','song'))
wine, women and song
>>> print comma_and.join(('spam','butter','bacon','eggs'))
spam, butter, bacon and eggs

Now, any piece of client code written to use a joiner
object polymorphically through its join method will
work just as well with our MixedJoiner instance:

def print_joined(joiner, *sequence):
    print joiner.join(sequence)

we can now call
>>> print_joined(comma_and, 'wine', 'women', 'song')
to get the output of:
wine, women and song
just as we call it to get the printing to use space-joining:
>>> print_joined(' ','wine', 'women', 'song')
the latter emitting, of course:
wine women song


So, since we want general polymorphism on the joiner
object, but are quite content with polymorphism through
the standard sequence interface on the sequence object,
it is _just right_ that .join be a method on the joiner
object (e.g., a string) and that it take the sequence of
string to be joined as its argument.

I have not heard ANY _technical_ arguments opposed to
this last time the discussion went around -- nothing but
vague aesthetic, "it should be the other way", "I find it
ugly" kind of complaints.  Unless and until further notice,
then, I class these together with the complaints of people
who dislike 0-based indexing, on similar vague bases -- as
0-based indexing _works_ better than 1-based, then, for
me, it is a superior choice.  Aesthetics is in the eye of the
beholder, and the beholder _can_ be retrained and grow
a 'technically better and more appropriate' view of the
world -- one that will let him or her 'see' as 'beautiful'
those approaches which offer maximal effectiveness and
simplicity.  Technical issues such as where we want general
polymorphism versus standardized-interface access, just
like ones about arithmetic, tend to be more persistent, as
well as applying more generally, and impersonally, while
aesthetics are more subjective, individual, and fickle.


Alex






More information about the Python-list mailing list