[Python-ideas] Intermediate Summary: Fast sum() for non-numbers

Ron Adam ron3200 at gmail.com
Tue Jul 16 04:12:07 CEST 2013



On 07/15/2013 05:40 PM, Joshua Landau wrote:
> On 15 July 2013 23:27, Ron Adam<ron3200 at gmail.com>  wrote:
>> >So ...
>> >
>> >      + 1   Add specialised sum_iters(),  (based on the sum() patch.)
> You mean chain.from_iterable?

No, I mean a new function written in C, which writes (appends) the values 
directly into a new (or start) sequence.  Chain from_iterable builds a 
generator and yields the items out.  That's not going to be as fast, but it 
does use much less memory in many situations.

> That's the fastest we're getting for
> iterables. Maybe sum_lists() could be faster, but then we're into "if
> you need that niche and you need it that fast, write it yourself"
> territory.

If it was a python function writen in python, this would be true, but as a 
builtin C function, it could be faster.   Common built-in types could be 
optimised in sum_iters(), just as Sergery has done in the patch for sum().

One of the main sticking points in the discussion is weather or not sum() 
should be a recommended way of summing non-number types.  Adding a new 
function supports the (current) view that sum shouldn't be recommended to 
sum non-number types.  (Although it would still work for backwards 
compatibility reasons.)


>> >      + .5  increase other options speed.. (not easy or even possible)
>> >      + .25  Patch sum, and document it's use for sequences. [1]
>> >
>> >[1] Assuming this can be done with no backwards compatibility or new side
>> >effects.  Lots of new tests should be added for this.
> But then it doesn't duck-type well ⇒ people should avoid using it ⇒
> the original change just becomes an attractive nuisance

That's why the tests are needed, and why it's not my first choice.


> chain.from_iterable doesn't have this problem.

I agree.

It's about using the right tool for the right job.  Not weather one is 
better than the other.


>> >By far, the easiest and least problematic choice is to add a new function.

How about a decision tree?


<Recommend sum() for summing iterables?>   # <<<< Are we still stuck here?

     (YES)
          patch sum.
          Add more tests to cover questionable cases.
          <Does it pass all tests?>

             (YES)
                [GOTO 2]

             (NO)
                [GOTO 1]

A:   (NO)                                  # I think we should be here.
          Crate new function based on the sum() patch.
          create many tests.
          <Does it pass tests>

             (YES)
                <Is it significantly faster or better
                 in some way than other alternatives?>
                {AND}
                <Is there enough use cases to support adding it? [*1]>

                   (YES)
                      [GOTO B]

                   (NO)
                      [DONE]   # Good idea, but not worth doing.

             (NO)
                [DONE]         # Something wrong with idea.


B:   Add docs to patch.
      <Ask for inclusion?>

          (YES)        # Accepted!
             Add news entry if needed to patch.
             apply patch
             [DONE]    # Yay

          (NO)
             rejected  # Goto "A" if changes are needed.
             [DONE]    # We tried. [*2]


[*1] A preferred way to verify this is to find places in python's library 
where it helps make the code better in some way.  Finding enough of these 
examples is a good indication it's on the right track and helps 
significantly in convincing others it's worth doing.

[*2] It might be determined that the patch would cause problems down the 
road, be difficult to maintain, or there might be some other competing idea 
that would be preferred.

Of course there would be feedback cycles in most of these steps and others 
might put things in a different order, but it's pretty much follows the 
standard path most patches are done on the tracker.  Lets help Sergey 
through this process and not be too quick to reject his ideas.

Cheers,
    Ron



More information about the Python-ideas mailing list