[Python-Dev] pickling of large arrays
Ralf W. Grosse-Kunstleve
rwgk@yahoo.com
Thu, 20 Feb 2003 04:38:58 -0800 (PST)
This is question is related to PEP 307, "Extensions to the pickle protocol",
http://www.python.org/peps/pep-0307.html .
Apparently the new Pickle "protocol 2" provides a mechanism for
avoiding large temporaries, but only for lists and dicts (section
"Pickling of large lists and dicts" near the end). I am wondering if
the new protocol could also help us to eliminate large temporaries when
pickling Boost.Python extension classes.
We wrote an open source C++ array library with Boost.Python bindings.
For pickling we use the __getstate__, __setstate__ protocol. As it
stands pickling involves converting the arrays to Python strings,
similar to what is done in Numpy. There are two mechanisms:
1. "single buffered":
For numeric types (int, long, double, etc.) a Python string is
allocated based on an upper estimate for the required size
(PyString_FromStringAndSize). The entire numeric array is converted
directly to that string. Finally the Python string is resized
(_PyString_Resize).
With this mechanism there are 2 copies of the array in memory:
- the original array and
- the Python string.
2. "double buffered":
For some user-defined element types it is very difficult to estimate
an upper limit for the size of the string representation. Therefore
the array is first converted to a dynamically growing C++
std::string, which is then copied to a Python string.
With this mechanism there are 3 copies of the array in memory:
- the original array,
- the std::string, and
- the Python string.
For very large arrays the memory overhead can be a limiting factor.
Could the new protocol 2 help us in some way?
Thank you in advance,
Ralf
__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/