[Python-ideas] Intermediate Summary: Fast sum() for non-numbers

Ron Adam ron3200 at gmail.com
Tue Jul 16 00:27:04 CEST 2013


On 07/15/2013 04:36 PM, David Mertz wrote:
> On Mon, Jul 15, 2013 at 12:24 PM, Joshua Landau <joshua at landau.ws
> <mailto:joshua at landau.ws>> wrote:
>
>     But it is (sort of). I asked my brother (under 20, above 10, not sure
>     how much more I should say on a mailing list), who is about as
>     not-programmer-techy as any computer user could reasonably be. I asked
>     him to add two lists. He *concatenated them*ยน.
>
>
> Here's my experimental contribution.  I cut and drew on three pieces of
> paper similar to the below ASCII art:
>
> ______________________
> |  4   5   6   2   1
>
>        ______________________
>        |  6   12   13   19   100
>
> ______________________
> |  100   200   300
>
> In particular, I put small integers on them, but for generality made the
> numbers sometimes out of natural sort order.  I also made the lists of
> different lengths so that elementwise addition would pose a problem (a
> subject *could* decide to fill in the additive identity zero for the
> "missing" elements if she wanted to, but this would have to be a
> decision).  I also placed the papers deliberately so that the left edges
> were not aligned (as pictured) so the notion of columns would not be forced
> on an informant (but not prohibited either).
>
> I found as a subject a "programming-naive" but well-educated subject in her
> 40s (a friend, no kidnapping of strangers off the street).  I asked
> something worded close to the following:
>
> "Can you sum these lists? An acceptable answer would be that the question
> does not make sense.  If it does make sense, what result do you get?"
>
> As a possible aid, I had a notepad placed nearby, in case some sort of
> copying operation was felt relevant (but I just made sure the notepad was
> on the table, I didn't say anything about whether it should or should not
> be used).
>
> Her answer was to write the additive sum of *each* slip of paper (list).
> I.e. three numbers: 18, 150, 600.

Nice test.  So what the discussion is trying to determine is, is it better 
to treat sequences as fundamentally different, than values, in python.

Often times in human language, we need to give more information to get the 
desired point across.  One way is to use a common simpler expression, with 
another simple hint.  OR the other way is to use more specific language 
that doesn't require a hint.

The 'hint' can be subtle, and is usually not included in examples we use 
when comparing human language to computer language.  That makes the 
argument for a simpler term a bit stronger.


So we have these two approaches...

(1)  In the case of sum(x, start), the hint would be the variable name of 
either x, or start.  Without that hint, you need to scan forward or 
backwards in the source code to figure out what sum() will actually do.


(2)  The more specific approach would be to have two functions that don't 
need hints because their purpose is more limited.

     sum_values(x)    # Not really needed as sum() is good at this already.
     sum_iters(x)     # Fast sum of iters..

Because they are more limited in scope, they can be optimised to a greater 
deal, and made to work without a start value.


Sergey's patch does increase the speed quite a bit, (over the other 
suggestions), and combining lists is fairly common, so I do thing it should 
be used in either a new function, or in sum(), depending on how the 
developers feel about weather or not it is better to create more separation 
between how sequences and values are treated.

Although, if one of the other alternatives can be made as fast, then that 
would be good too.

So ...

      + 1   Add specialised sum_iters(),  (based on the sum() patch.)
      + .5  increase other options speed.. (not easy or even possible)
      + .25  Patch sum, and document it's use for sequences. [1]


[1] Assuming this can be done with no backwards compatibility or new side 
effects.  Lots of new tests should be added for this.

By far, the easiest and least problematic choice is to add a new function.

Cheers,
    Ron































More information about the Python-ideas mailing list