[Python-Dev] Hotshot

Mon Jan 26 16:50:04 EST 2004

> From: Walter Dörwald
> 
> The biggest problem I had with hotshot is the filesize. I was
> using the above script to profile a script which normally
> runs for about 10-15 minutes. After ca. 20 minutes the size
> of hotshot.prof was over 1 gig. Is there any possibility to
> reduce the filesize?

In my coverage tool (which I *still* haven't managed to get permission to release!!!) I was storing arrays of file numbers (indexes into a list), line numbers (into the file) and type (line, exception, etc). Every thousand or so entries, I would take the arrays, to a `tostring()` then compress the string. Then the 3 sets of data was written to file as a binary pickle.

At the end of the run, a tuple was pickled for each file consisting of things like the fully-qualified module name, the filename, number of lines, the compressed source code and info about the members in the module (e.g. type, starting line, end line, etc - useful for finding in the source code).

I also included the (compressed) stacktrace of any exception which ended the coverage run.

This resulted in multi-megabyte files being reduced to a few kilobytes, without a major memory or performance penalty, and all the data I needed to display the results i.e. no need for the coverage to be displayed on the same computer. The compression wasn't as good as if I did larger chunks, but the difference wasn't particularly significant - maybe a few percentage points. Also, since the chunks being compressed were fairly small, they didn't hurt the performance much.

It does make the extractor much more complex, but I thought it was worth it.

Tim Delaney