[Chicago] Common mistakes that can slow down Python

Sat Mar 1 17:50:53 CET 2014

On Sat, Mar 01, 2014 at 09:14:46AM -0600, Tathagata Dasgupta wrote:
> I keep going back to Raymond Hettinger's "Transforming Code into Beautiful,
> Idiomatic Python" (1,2) talk from last PyCon, and wonder if is there more
> to it. Its fun exercise doing cProfile/timeit -ing them ...

Premature optimization...  ;-)

I'm just skipping through the card deck because I'm sure I've seen this
talk before, and a few of the examples present beautiful, idiomatic
code that doesn't perform the same computation.  Slide 13, for example,
and it may not be a common case, but...

 >>> data
 {'a': 1, 'r': 4, 'b': 3, 'ra': 2}
 >>> d = dict(data)
 >>> e = d
 >>> d = {k:d[k] for k in d if not k.startswith('r')}
 >>> d
 {'a': 1, 'b': 3}
 >>> e
 {'a': 1, 'r': 4, 'b': 3, 'ra': 2}

The dict comprehension rebinds d rather than modifying the dict it was
bound to at the start.  Sure, in a lot of uses this won't matter, but
the binding to e that I added shows how it may.  This is like the
difference between l.sort() and sorted(l), and there are times and
places when either of them will be the right thing and the other not.

Maybe he covered that in the talk?

Another one, with another dict issue.  Slide 16, on the glories of
defaultdict.  And that's truly great, but once that dict has been
filled with counts it still has that default behaviour, which may
surprise code that uses it later.  Granted, it seems more likely to
surprise it by NOT throwing an exception when a bug leads to an access
with a key that isn't supposed to be there... and perhaps blowing up at
some remove when the automagically defaulted 0 gets used where a
positive non-zero is expected?  Like SLide 13's, it's a subtle, tricky
difference that often will not be noticed.

Repeat above general issue for every use of defaultdict.  To be fair,
in the Grouping With Dictionaries example I could see where getting an
empty list for uninitialized lengths/key values could be a good thing,
and since the natural thing to do with an empty list is to bail out of
the «for x in list_of_x» without doing anything it's less likely to go
awry.  Probably.

Hmmm, do I still need to worry about running code in a 2.5 environment? 
NamedTuples are nice but as with so much else of the constant churn
they've been irrelevant for me because I don't have time to update
every piece of code (and environemtn) to every latest new release.  To
be honest, as much as I like the new features, few if any of them are
more compelling than not "fixing" what isn't broken.  Sometimes I
wonder if this means I'm not a True Pythonista.  ;-/

&tc...

-- 
To be alive, is that not to be
again and again surprised?  -- Nicholas van Rijn