I come to praise .join, not to bury it...

Tue Mar 6 04:02:08 EST 2001

"Greg Ewing" <greg at cosc.canterbury.ac.nz> wrote in message
news:3AA46134.D385208B at cosc.canterbury.ac.nz...
    [snip]
> > indeed, the string module supplies ready-made versions
> > of such calls
>
> For the time being, yes, but the intention has been
> hinted at on python-dev to deprecate and eventually
> remove the string module. That worries me.

It doesn't worry _me_, but I see where you're coming
from.  Still, I would pass on the issue.  The advantage
of having "one obvious way to do it" IS big, but so is
the problem of hurting the feelings of joiner.join-haters.
I'm glad I have no say on that particular decision!

> > if the string.join function had to typeswitch on the
> > joiner-object to distinguish Unicode from single-byte --
>
> I don't get this business about avoiding type switches
> by dispatching on the joiner, because you still have to
> typeswitch on all the OTHER strings that you're joining.

No!  string_join (in stringobject.c) currently just _checks_
PyString_Check on the items -- if any item fails that, it
delegates to PyUnicode_Join (if PyUnicode_Check OK's that).

The latter is (IMHO) better-coded -- it uses the 'access
through standard interface' paradigm, i.e., it calls
PyUnicode_FromObject to let the items being joined have
a chance to make themselves into Unicode.

Unfortunately, there is (yet) no standard __unicode__
(or whatever) magic method, so the "make yourself into
Unicode" has to go through __str__.

Still, despite these signs of 'early design stages, still
needs to mature', the Unicode-joiner behavior is quite a
bit better already -- consider a typical case:

But absolutely *NOT*!  You access the OTHER stuff through
standard interfaces.  Take a typical case:

class stringable:
    def __init__(self, x):
        self.x = x
    def __str__(self):
        return str(self.x)

class singlebyte:
    def __init__(self, N):
        self.N = N
    def __len__(self):
        return self.N
    def __getitem__(self, index):
        if index>=self.N:
            raise IndexError, index
        return stringable(index)

Now, with something such as:

seq = singlebyte(7)

print '+'.join(seq)

print u'/'.join(seq)

the first print will raise an exception (as the joiner wants
the items to BE string-object already -- it does NOT use the
'access through standard interface' paradigm to give them a
chance to MAKE themselves into strings); the second one will
work just fine (as, IMHO, should the first one).

But anyway, the call to PyUnicode_FromObject is NOT a
typeswitch -- it's an "access through standard interface",
CONCEPTUALLY.  There should be a slot in the type's method
table that gets directly probed to see if the object knows
how to make a Unicode object out of itself -- a fallback
to the _existing_ slot that may indicate a function whereby
the object knows how to make a singlebyte-string object out
of itself should also be included of course.

The implementation may not (yet) EXPLOIT the design's strengths,
but that is no basis by which to judge the DESIGN itself.

> So you save one (very fast C-level) typeswitch out of
> quite a lot. Nowhere near a big enough gain to justify
> anything, to my mind.

You gain a clean approach, which the implementation could
(and should) exploit to access the items in the sequence
through the _appropriate_ standard interface (also, there
should be no mandate on the sequence to define 'length',
I think, just as there isn't for a for-loop -- but I do
understand there are tradeoffs here).  Some gain today,
and an excellent architectural/design basis for further
gains tomorrow when the implementation catches up with
the design's potential for excellence.

You further gain, as I have already shown, the advantage
of polymorphism when you want a joiner object to do
something special -- at NO COST, as usual for uses of
polymorphism -- even code you wrote months ago, with no
idea that 'special joining' would be needed, will get
this extra flexibility if it accepts the joiner object
to use as an argument and just calls its join method.

There are NO technical costs associated with these
advantages.  Your purely-aesthetical, technically
unmotivated dislike for joiner.join, on a strictly
syntax-sugar level, is clearly just about balanced
by the just-as-aesthetical, just-as-arbitrary liking
it by others.  Only technical advantages and costs
can be fairly put on the balance -- and, here, we
have ZERO technical costs to weigh against clear
technical advantages; it does not matter, therefore,
if one judges those technical advantages large (as
I do) or small (as others might) -- they're >0, and
so the joiner.join approach is a 100% winner on
technical grounds (the only rationally debatable ones).

If a single approach should be allowed, then it would
therefore HAVE to be the one presenting technical
pluses, rather than the one presenting NO pluses
on that level.

Note that I've seen NO arguments AT ALL for having
.join be a method on SEQUENCE object (rather than
on the joiner), despite this being often mentioned
on an irrational-purely-aesthetical basis.  Any
taker...?

> (And if ordinary strings and Unicode strings are ever
> unified, this argument will go away completely.)

If it should ever happen that NO polymorphism can
ever appear on the joiner object, then there would
be no technical advantage in allowing polymorphic
behavior on it -- a tautology.  In fact, special kinds
of quasi-string objects might still emerge in the same
kind of far-away-hypothetical-future frame as the
one in which single-byte/Unicode distinctions can
disappear, and then, again, the polymorphic aspects
of joiner.join will show to advantage (as they always
will for any special-joining needs).

Alex