Another win for profiling.
roy at panix.com
Fri Jul 29 07:46:32 EDT 2011
It's often said that you shouldn't try to guess what's slow, but use
profiling tools to measure what's slow. I had a great example of that
yesterday. We have some web server code that does a big database
(MongoDB) query and some post-processing of the data in python. It
worked fine in testing, but in production we started seeing slowness.
We had one query which was taking about 15 seconds to return 8000
values. We looked at the post-processing code and convinced ourselves
everything was O(n log n) or better. We looked at the database and
convinced ourselves that all the required indexes existed.
So, I pulled out cProfile. It turned out that fully 50% of the time was
being spent in the gettz() calls in this routine:
"""Convert d into a unix timestamp. d is assumed to be in UTC, as
the pymongo module returns datetimes based in UTC, but without
actual timezone information attached."""
d = d.replace(tzinfo = dateutil.tz.gettz('UTC'))
d = d.astimezone(dateutil.tz.gettz())
return time.mktime(d.timetuple()) + (d.microsecond / 1e6)
which was being called once per returned value from the database.
Factoring out the gettz() calls got us a 2x speedup in page load time.
Never would have found that without profiling.
Profiling also showed that most of the rest of the time was spent in the
BSON deserialization routines. We found a way to reduce the amount of
data being returned, and eliminated most of that too. If we stared at
the code long enough we probably would have realized that on our own,
but we never would have guessed the gettz() calls were so expensive.
More information about the Python-list