[Python-bugs-list] [ python-Bugs-670816 ] pickles are way slow

SourceForge.net noreply@sourceforge.net
Sun, 19 Jan 2003 13:48:27 -0800


Bugs item #670816, was opened at 2003-01-19 16:37
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=670816&group_id=5470

Category: Python Library
Group: Python 2.2.1
Status: Open
Resolution: None
Priority: 5
Submitted By: L. Peter Deutsch (lpd)
Assigned to: Nobody/Anonymous (nobody)
Summary: pickles are way slow

Initial Comment:
I'm using Python pickles as the external representation
for passing structured data from a C app to a Python
app. (Integrating the C app through Python's C API is
not an option.) The data consists of many relatively
small objects organized into large, moderately deep trees.

What I've found is that a good 1/3 of the time in the
Python app is spent loading the pickles, even with
cPickle. As far as I can tell, the main reason for this
is an inordinate amount of time spent looking up class
names because classes, unlike all other objects in the
pickle file, cannot be "memoized", but must be looked
up from strings each time.

Currently, instances are represented by the following
sequence of pickle tags:
    MARK
    << possible constructor arguments >>
    INST modulename \n classname \n
    ...

I propose adding two new tags, CLASS and CONSTRUCT.
    CLASS modulename \n classname \n
does the first part of INST -- look up the class name.
However, it then pushes the class object on the stack
rather than instantiating it.
    CONSTRUCT
does the second part of INST, but takes the class
object from the stack rather than reading and looking
up the name.

Using these two tags with PUT and GET allows classes to
be memoized efficiently just like other objects.

I believe no security issues are introduced: CONSTRUCT
would do the same checks as INST.

I estimate the total amount of new code to be under 100
lines of C and Python combined (assuming that the
pickle writing code is also changed to take advantage
of the new tags).

I'm willing to write and test the code, for both
pickle.py and cPickle.c, but I don't know how to submit
it for approval and release. If someone can tell me the
process, I don't think it would take me very long to do
the implementation.

TIA.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-19 16:48

Message:
Logged In: YES 
user_id=33168

Below are some links which describe the process.  You should
probably verify your approach is acceptable on python-dev.

http://www.python.org/dev/
http://www.python.org/dev/process.htmlhttp://www.python.org/patches/

Good Luck!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=670816&group_id=5470