[Python-Dev] Re: [Python-checkins] python/dist/src/Python
bltinmodule.c, 2.292.10.1, 2.292.10.2
Alex Martelli
aleaxit at yahoo.com
Sun Oct 26 13:20:52 EST 2003
On Sunday 26 October 2003 05:23 pm, Armin Rigo wrote:
...
> I must admit I was a bit surprized when I first tested sum(), without first
> reading its doc because I thought I knew what it should do. I expected it
> to be a fast equivalent to:
>
> def sum(seq, start=0):
> for item in seq:
> start = start + seq
> return start
It IS equivalent to that -- plus an explicit typetest to raise if start is an
instance of str or unicode. I had originally tried forwarding to ''.join for
strings, but Guido preferred to forbid them, and it still doesn't look like
a problem to me.
Alas, "fast" is debatable:-).
> reduce(operator.add, seq, start)
sum doesn't reproduce reduce's quirk of using the first item of seq if start
is not given. So, the def version is closer.
> I immediately tried it with strings and lists. I immediately thought about
> lists because of their use of "+" for concatenation.
>
> So it seems that neither strings nor lists are properly supported, neither
Strings are explicitly disallowed, so that should take care of the surprise
factor for that specific case.
As for lists, the semantics are right, the speed is not (could be made way
faster with modest effort). Same for other mutable sequences.
As for tuples and other immutable sequences, they ARE treated exactly
like your 'def' above (roughly like your reduce) would treat them -- not
very fast, but if all you know about something is that it's an immutable
sequence, there's nothing more you can do. The use case of making a
huge tuple from many smaller ones seems weird enough that I don't
see specialcasing tuples specifically as particularly worthwhile (other
immutable sequences that aren't exactly tuples would still suffer, after
all).
> tuples by the way, and my opinion on this is that it strongly contradicts
> the principle of least surprize.
For mutable sequences, I agree. For immutable ones, I don't see the
performance trap as being a practical problem for tuples (and weirder
things) -- it WOULD be for strings, but as we specifically disallow them
with a message reminding the user of ''.join, in practice the problem
seems minor.
Maybe I'm coming to this with a too-pragmatical stance...?
> I would not object to an implementation of sum() that special-case lists,
> tuples and strings for efficiency. (by which I mean I can contribute a
> patch)
I think all mutable sequences (that aren't especially weird in their + vs
+= behavior) might be handled correctly, without specialcasing, roughly
as follows (Phillip Eby's idea):
def sum(seq, start=0):
it = iter(seq)
try: result = start + it.next()
except StopIteration: return start
for item in it:
result += item
return result
my original idea was perhaps a bit goofier, something like:
def sum(seq, start=0):
try: start = copy.copy(start)
except TypeError:
for item in seq:
start = start + item
else:
for item in seq:
start += item
return start
Admittedly the latter version may accept a few more cases, e.g.
both versions would accept:
sum([ range(3), 'foo' ], [])
because [] is copyable, []+range(3) is fine, and list.__iadd__ is
more permissive than list.__add__; however, the first version
would fail on:
sum([ 'foo', range(3) ], [])
because []+'foo' fails, while the second version would be fine
because [] is _still_ copyable and __iadd__ is still permissive:-).
So, perhaps, implementing by Phillip's idea would still not reduce
the suprise factor enough. Hmmm...
> > language or library feature. The problem of the += loop on strings is
> > essentially solved by psyco, which has tricks to catch that and make
> > it almost as fast as ''.join; but psyco can't get into a built-in
> > function such as sum, and thus can't help us with the performance trap
> > there.
>
> Supporing sum() in Psyco is no big issue, and it could help the same way as
> it does for str.__add__. (It is not explicitely supported yet, but it
> could be added.) Still I believe that getting the complexity right in
> CPython is important, when it can be done.
Absolutely -- we can't rely on psyco for everything, particularly not for
getting the big-O right as opposed to speeding things up by constant
multipliers (in part, for example, because psyco doesn't work on Mac's,
which are going to be a HUGE part of Python's installed base...).
However, I would be happy to "leave it to psyco" for a sum of a large
sequence of tuples or other immutable sequences...:-). I just don't
think that people in practice _will_ fall into that performance-trap...
Alex
More information about the Python-Dev
mailing list