I come to praise .join, not to bury it...

Alex Martelli aleaxit at yahoo.com
Fri Mar 9 10:27:54 EST 2001


"Andrew MacIntyre" <andymac at bullseye.apana.org.au> wrote in message
news:mailman.984074812.27842.python-list at python.org...
> On Wed, 7 Mar 2001, Alex Martelli wrote:
>
> > Suppose I'm building a line of a CSV file, a popular
> > textual format where fields are separated (aka joined)
> > with commas -- the acronym stands for 'comma-separated
> > values'.  So, what I want to do is
> >     'comma-separate these values'
> > and, since 'separate' equates to join, this equates to:
> >     'comma-join these values'
> > so, I code:
> >     comma.join(these_values)
> > *WHAT'S BACKWARDS ABOUT IT*?!
>
> Well thats fine from your point of view, but to me you've just argued
> against your point of basing programming around natural language
> (although I may have misconstrued your argument).

I'm definitely not arguing that design should be based on
natural language; rather, I'm taking exception to people
who claim some programming-language expression is 'backwards'
*when it even happens to be basically homomorphic to some
expressions in THEIR OWN native natural language*!


> From my POV, a method is something that logically "sends a message" to its
> object.

...requesting the object to perform some action (execute
some method).  Fine with me.

> In this context, having the joiner be the object is backwards,

No it isn't.  We send to the joiner the request to execute
its method, thus performing the joining-action.

> because it makes more sense to me to think in terms of:
> "send message <join members with separator parameter and return string> to
> a sequence"

But that's exactly what I'm arguing against!  That's *NOT*
optimal.  We do NOT want a sequence to have code for the
purpose of 'joining its items' -- it's not a special task
that every sequence should somehow perform or delegate;
rather, the sequence's task is just to *ENUMERATE* its
items, for whatever purpose that enumeration happens to
be needed this time.  Further, we want the enumeration to
happen *identically*, whether it's for the purpose of
joining items one after the other inside some joiner
object's join-method, or for the purpose of joining items
one after the other inside a file-object's writelines
method, or for any of a zillion other purposes yet.

Consider .writelines again, for example.  I have never
heard objections to it being a method of the file object,
taking the sequence as its argument and accessing it via
the sequence's standard interface (i.e. just asking the
sequence to enumerate its items one after the other).

And, indeed, it's an excellent architecture -- some file
like objects may implement .writelines as just a loop
over their own .write method, others may take advantage
of performance shortcuts (e.g., if they hold their data
internally as a list of pieces, they'll be able to directly
implement .writelines by delegating to .extend).

But the .join/.writelines parallel is VERY close, so why
would the SAME architectural choice (to have the sequence
object as an argument, accessing it through its ability
to enumerate its items, only) cause problems in one case
and not the other?!

I think it's a misperception due to thinking of strings
as somehow 'passive', 'data', while a file-object is
thought of as 'active', 'a truly object-y object', or
something.  I further opine that this misperception is
susceptible of being corrected _in individuals who are
more interested in furthering the effectiveness of their
perceptive mechanisms than in trying to win a point in
a Usenet discussion_ -- which, I hope, describes a vast
majority of Pythonistas:-).


Alex






More information about the Python-list mailing list