restriction on sum: intentional bug?

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Sun Oct 18 23:15:41 EDT 2009


On Sun, 18 Oct 2009 19:52:41 -0700, Ethan Furman wrote:

> This is obviously slow on strings, but mention of that is already in the
> docs, and profiling will also turn up such bottlenecks.  Get the code
> working first, then optimize, yes?  

Well, maybe. Premature optimization and all, but sometimes you just 
*know* something is going to be slow, so you avoid it.

And it's amazing how O(N**2) algorithms can hide for years. Within the 
last month or two, there was a bug reported for httplib involving 
repeated string concatenation:

http://bugs.python.org/issue6838

I can only imagine that the hidden O(N**2) behaviour was there in the 
code for years before somebody noticed it, reported it, spent hours 
debugging it, and finally discovered the cause and produced a patch.

The amazing thing is, if you look in the httplib.py module, you see this 
comment:

# XXX This accumulates chunks by repeated string concatenation,
# which is not efficient as the number or size of chunks gets big.




> We've all seen questions on this
> list with folk using the accumulator method for joining strings, and
> then wondering why it's so slow -- the answer given is the same as we
> would give for sum()ing a list of strings -- use join instead.  Then we
> have Python following the same advice we give out -- don't break
> duck-typing, any ensuing errors are the responsibility of the caller.

I'd be happy for sum() to raise a warning rather than an exception, and 
to do so for both strings and lists. Python, after all, is happy to let 
people shoot themselves in the foot, but it's only fair to give them 
warning the gun is loaded :)



-- 
Steven



More information about the Python-list mailing list