RE: [Python-Dev] Hotshot

From: Walter Dörwald
The biggest problem I had with hotshot is the filesize. I was using the above script to profile a script which normally runs for about 10-15 minutes. After ca. 20 minutes the size of hotshot.prof was over 1 gig. Is there any possibility to reduce the filesize?
In my coverage tool (which I *still* haven't managed to get permission to release!!!) I was storing arrays of file numbers (indexes into a list), line numbers (into the file) and type (line, exception, etc). Every thousand or so entries, I would take the arrays, to a `tostring()` then compress the string. Then the 3 sets of data was written to file as a binary pickle. At the end of the run, a tuple was pickled for each file consisting of things like the fully-qualified module name, the filename, number of lines, the compressed source code and info about the members in the module (e.g. type, starting line, end line, etc - useful for finding in the source code). I also included the (compressed) stacktrace of any exception which ended the coverage run. This resulted in multi-megabyte files being reduced to a few kilobytes, without a major memory or performance penalty, and all the data I needed to display the results i.e. no need for the coverage to be displayed on the same computer. The compression wasn't as good as if I did larger chunks, but the difference wasn't particularly significant - maybe a few percentage points. Also, since the chunks being compressed were fairly small, they didn't hurt the performance much. It does make the extractor much more complex, but I thought it was worth it. Tim Delaney

>> The biggest problem I had with hotshot is the filesize. Tim> In my coverage tool (which I *still* haven't managed to get Tim> permission to release!!!) I was storing arrays of file numbers Tim> (indexes into a list), line numbers (into the file) and type (line, Tim> exception, etc). Every thousand or so entries, I would take the Tim> arrays, to a `tostring()` then compress the string. Then the 3 sets Tim> of data was written to file as a binary pickle. It seems to me it might be simpler to just write the profile file through the gzip module and teach the hotshot.stats.load() function to recognize such files and uncompress accordingly. Skip

Skip Montanaro writes:
It seems to me it might be simpler to just write the profile file through the gzip module and teach the hotshot.stats.load() function to recognize such files and uncompress accordingly.
The problem with this is that the low-level log reader and writer are in C rather than Python (using fopen(), fwrite(), etc.). We could probably work out a reader that gets input buffers from an arbitrary Python file object, and maybe we could handle writing that way, but that does change the cost of each write. HotShot tries to compensate for that, but it's unclear how successful that is. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

Fred L. Drake, Jr. wrote:
Skip Montanaro writes:
It seems to me it might be simpler to just write the profile file through the gzip module and teach the hotshot.stats.load() function to recognize such files and uncompress accordingly.
The problem with this is that the low-level log reader and writer are in C rather than Python (using fopen(), fwrite(), etc.). We could probably work out a reader that gets input buffers from an arbitrary Python file object, and maybe we could handle writing that way, but that does change the cost of each write. HotShot tries to compensate for that, but it's unclear how successful that is.
OK, I've checked both the old profile.py and the new hotshotmain.py (Thanks for checking it in, Skip) with one of my scripts. The script runs for: real 0m6.192s user 0m6.010s sys 0m0.160s when running standalone. Using the old profile.py I get the following: real 0m46.892s user 0m43.430s sys 0m0.970s Running with the new hotshotmain.py gives the following run times: real 1m6.873s user 1m5.220s sys 0m0.840s The size of hotshop.prof is 5104590 bytes. After gzipping it with "gzip -9" the size of hotshop.prof drops to 562467 bytes. So gzipping might help, but dropping filesize from 1 GB to 100 MB still doesn't sound so convincing. And I wonder what would happen to run time. Bye, Walter Dörwald
participants (4)
-
Delaney, Timothy C (Timothy)
-
Fred L. Drake, Jr.
-
Skip Montanaro
-
Walter Dörwald