String.join revisited (URGENT for 1.6)
Johannes Stezenbach
yawyi at gmx.de
Mon May 29 17:41:16 EDT 2000
Fredrik Lundh <effbot at telia.com> wrote:
>now, would your modified version be a great improvement over the
>current 1.6 release? that pretty much depends on your answers to
>these two questions:
>
>-- does "join" really make sense on things that are not strings?
> (hint: does "+" always mean concatenation? does "join" make
> sense if it doesn't? should join([1, 2, 3], 10) really return 36?)
The overloaded meaning of '+' makes non-sensical use of join() possible,
but this is not the fault of the join() definition.
It might be useful if join([[1], [2], [3]], [10]) returned [1, 10, 2, 10, 3].
User defined classes can define their own __add__ which could make
sense with join() (e.g. join a list of HTML paragraph objects with a
separator of type <hr>, <img>, <li> etc.)
You might argue that this is somewhat contrived, and in those rare
cases where one needs to join non-strings an explicit loop would
be easy to write.
OTOH, the definition of join as "concatenating the elements of a
sequence with an interspersed separator" is IMHO simple, elegant
and useful, regardless of the type of the separator and sequence
elements.
>-- does it really matter if the implementation hook happens to be
> called "join"?
No.
But wait: If what you mean by "implementation hook" is that Python
programmers should keep using string.join() exclusively and never
use "".join() directly, then please rename "".join to "".__join__ to
make this clear. Otherwise keep on reading, please.
>in the current release, the answers are "no" (hypergeneralization)
>and "no" (no matter what it's called, people will find it and use it).
>
>if you have better answers, please motivate.
Well, generalization of join() was not my goal, it was just
the vehicle I used to try to convince you to make join() a method
of sequences instead of strings.
IIRC your point was: "There's more that one string type, each of
which needs it't own join() -> it makes most sense to make join()
a method of the string types."
So why the hell do I want to convince you to change this?
Because I think that this strategy is good for the implementor
of stringmodule.c and unicodemodule.c only, but unconvenient for
Python programmers.
In my very humble opinion
words = ["foo", "bar", "baz"]
print words.join(" ")
looks right and
print " ".join(words)
looks wrong. It will confuse people and provoke programming errors.
It's not a nice thing to have in a CP4E language.
(Not that it is that difficult to get used to -- humans can adapt
to hostile environments like case-insensitive languages or even
Perl <wink>. Mastering "".join() should be a piece of cake...)
Since Python doesn't have a tangible Sequence base class and by its
dynamically typed nature no list-of-strings (etc.) class, there is
no natural place to stick the string-join implementation in.
But as Eric Jacobs showed, it is easy to define a generalized
sequence.join(), except that this would have bad performance for the
common case of joining sequences of strings.
So I say: Implement the easy generalized join() and fiddle the
optimization for the string-join in. Python programmers won't
mind some hidden ugliness in the C code if they are rewarded
with better looking Python code.
Two possible implementations for the join() with the "right"
look-and-feel would be (of course in C, to be repeated for
each sequence type):
def join(self, separator):
if type(separator) == type(""):
# insert efficient implementation of " ".join
elif type(separator) == type(u""):
# insert efficient implementation of u" ".join
else:
result = self[:1]
for x in self[1:]:
result = result + separator + x
return result
or more general but a little less efficient:
def join(self, separator):
if hasattr(separator, "__join__"):
# join() implementation stays in the string/unicode type
return separator.__join__(self)
else:
result = self[:1]
for x in self[1:]:
result = result + separator + x
return result
Johannes
More information about the Python-list
mailing list