[Python-ideas] All ideas together: Fast sum() for non-numbers
Sergey
sergemp at mail.ru
Fri Jul 5 09:41:03 CEST 2013
On Jul 4, 2013 Joshua Landau wrote:
> One thing I didn't understand from the original post was that this
> was a "special case".
That's because it wasn't.
There were too many different ideas during this thread, I'll try to
collect them all together. So...
Sum() is a great function. Why? Because it makes the code simple!
You can always live without sum(), you can use itertools, reduce,
lambdas, etc. But using sum() always makes the code easier to read.
Simple Code is the goal! Faster sum() is just a tool to reach it!
Sum() was designed to support different types from the beginning.
For example, Tim Peters suggested to use it for the list of
datetime.timedelta [1]. Unfortunately for some common types sum()
is too slow. This encourages people to fallback to other solutions,
write a code that is hard to read and maintain.
That's why I suggest making sum a little faster. Still not the
fastest thing ever, but fast enough to have a choice between
"a little faster" and "easier to read".
Faster sum() at least for something (e.g. lists) is already a good
thing, but even the great "fast sum() for everything" goal is easy
to reach.
I already wrote some patches. So now there 4 suggestions/patches
pending:
1. Fast sum() for lists and tuples. [2]
This patch adds a special case optimisation. I wasn't going to do
it that way, but sum() already have 2 special cases for ints and
floats (yes, sum() already has special cases, and it has them for
the last 6 years), one more special case to reach the "fast for
everything" goal is not that bad. Practicality beats purity.
This patch introduces no behavior changes, and may go even for
python2 bugfix release.
2. Fast sum() for strings. [3]
This patch is a small enhancement of #1, that makes sum() O(N) for
strings. Obviously it introduces a behavior change and I do NOT
suggest it for inclusion. I believe that ''.join() alternative is
simple enough and don't have to be replaces with sum() for normal
use cases. It is just to shows that sum() can be O(N) for strings.
3. Fast sum() for everything else. [4]
This is the original patch that I suggested. It rearranges existing
code so that sum() uses __iadd__ if available, making sum() fast
for all built-in and custom types with fast __iadd__.
Even with #1 this patch also introduces no behavior changes, and
may go even for python2 bugfix release.
4. Explicit documentation of sum() complexity.
If patches #1 or #3 are accepted it may be a good idea to
explicitly state what sum() needs to be O(N) for user types.
E.g.:
"""
function:: sum(iterable[, start])
Sums *start* and the items of an *iterable* from left to right and
returns the total. *start* defaults to ``0``.
To make advantage of faster __iadd__ implementation sum() uses it
if possible. sum() has linear time complexity for built-in types
and all types providing constant-time `__iadd__` or `__add__`.
For some use cases, there are good alternatives to sum(). The
preferred, fast way to concatenate a sequence of string is by
[...]
"""
Patches #1 and #2 were written as an answer to "you can't make sum()
fast for everything, e.g. strings and tuples". Yes, I can. :)
Together these patches achieve the "fast sum() for everything" goal
and make sure, that people know what is required to keep sum() fast.
That's all the suggestions I have for now.
Well, there's one more, but let's first decide about these. :)
--
[1] http://mail.python.org/pipermail/python-dev/2003-April/034837.html
[2] http://bugs.python.org/file30769/fastsum-special.patch
without the string part
[3] the string part of
http://bugs.python.org/file30769/fastsum-special.patch
[4] http://bugs.python.org/file30705/fastsum.patch
More information about the Python-ideas
mailing list