Python's simplicity philosophy
aleax at aleax.it
Mon Nov 17 10:47:25 CET 2003
SUZUKI Hisao wrote:
> As to sum(), when learning string addition (concatenation),
> one may wonder why sum() does not handle it:
I originally had sum try to detect the "summing strings" case and
delegate under the covers to ''.join -- Guido vetoed that as too
"clever" (which has a BAD connotation in Python) and had me forbid
the "summing strings" case instead, for simplicity.
> >>> sum(['a', 'b', 'c'])
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: unsupported operand type(s) for +: 'int' and 'str'
whereupon, one hopes, the user checks sum's doc:
>>> print sum.__doc__
sum(sequence, start=0) -> value
Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start'. When the sequence is empty, returns start.
and if one tries anyway:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]
the error message should direct the user to the proper way to sum
strings. Having more than one "obvious way to do it" was avoided.
> while general reduce() does it just as _expected_:
> >>> reduce(str.__add__, ['a', 'b', 'c'])
Well, if what you "expect" is the following performance...:
[alex at lancelot tmp]$ timeit.py -c -s'x=map(str,range(999))'
1000 loops, best of 3: 1.82e+03 usec per loop
[alex at lancelot tmp]$ timeit.py -c -s'x=map(str,range(999))' "''.join(x)"
10000 loops, best of 3: 68 usec per loop
i.e., a slowdown by about 2700% for a 999-items sequence,
[alex at lancelot tmp]$ timeit.py -c -s'x=map(str,range(1999))'
100 loops, best of 3: 5e+03 usec per loop
[alex at lancelot tmp]$ timeit.py -c -s'x=map(str,range(1999))' "''.join(x)"
10000 loops, best of 3: 143 usec per loop
growing to 3500% for a 1999-items sequence, and so on without bounds,
then no doubt ``reduce does it just as expected'' by YOU.
Most people, however, EXPECT sensible performance, not slow-downs by
factors of tens or hundreds of times, when they use constructs that are
considered "normal and supported" in the language and its built-ins.
This makes reduce a terrible performance trap just waiting to catch the
unwary. It SEEMS to work all right, but in fact it's doing nothing of
the kind, nor can it -- it's defined to iteratively run N repetitions
of whatever function you pass as the first argument, therefore it can
never have O(N) performance when used to add up a sequence of strings,
but always, necessarily O(N squared). It's _conceivable_ (although it
currently appears unlikely that Guido will ever countenance it) that
'sum' can be specialcased (to use faster approaches to summation when it
is dealing with sequences, not numbers) to give the O(N) performance
most people (sensibly) DO expect; that just depends on loosening its
current specs, while maintaining the concept of "sum of a sequence".
No such avenue is open for 'reduce': it will always be a terrible
performance trap just waiting to pounce on the unwary.
> It may be sum() that is more difficult to learn...
I have enough experience teaching both built-in functions, by now,
that I can rule this hypothesis out entirely.
> For this particular problem, it is better to use
> ''.join(['a', 'b', 'c']), you know.
Yes, and sum's error messages tells you so, so, if you DON'T know,
you learn immediately.
> However, it is important for Python to have an easy and generic
> way to do things. If we have to read the manual through to solve
> anything, what is the point to use Python instead of Perl (or Ruby,
> to some extent)?
Why do you need to read the manual after getting an error message
that tells you to """ use ''.join(seq) instead """? As for the
importance of "easy and generic", I would agree -- I'd FAR rather
have 'sum' be able to handle _well_ sums of any kinds -- but Guido
doesn't, so far. If you have arguments that might convince him, make
then. But 'reduce' just isn't a candidate, as the "easy" part simply
> However, It may be better to give reduce() some nice notation.
"Syntax sugar causes cancer of the semicolon". The nicer the
notation, the more superficially attractive you make it to use
a construct with unacceptable performance implications, the
craziest and most terrible the performance trap you're building
for the poor unwary users. I think it's an appalling direction
to want to move the language in.
More information about the Python-list