[Python-ideas] o(n**2) problem with marshal.dumps for large objects, with patch
Aaron Watters
aaron.watters at gmail.com
Thu Jan 10 19:37:32 CET 2008
Hi folks. Much to my surprise I found that one of
my applications seemed to be running slow as a result
of marshal.dumps. I think the culprit is the w_more(...)
function, which grows the marshal buffer in 1k units.
This means that a marshal of size 100k will have 100
reallocations and string copies. Other parts of Python
(and java etc) have a proportional reallocation strategy
which reallocates a new size based on the existing size.
This mean a 100k marshal requires just 5 or so
reallocations and string copies (n log n versus n**2
asymptotic performance).
I humbly submit the following patch (based on python 2.6a0
source). I think it is a strict improvement on the existing
code, but I've been wrong before (twice ;)).
-- Aaron Watters
PATCH FOLLOWS
*** marshal.c.original 2008-01-10 10:15:40.686838800 -0500
--- marshal.c 2008-01-10 11:32:01.838654000 -0500
***************
*** 61,75 ****
static void
w_more(int c, WFILE *p)
{
Py_ssize_t size, newsize;
if (p->str == NULL)
return; /* An error already occurred */
size = PyString_Size(p->str);
! newsize = size + 1024;
if (_PyString_Resize(&p->str, newsize) != 0) {
p->ptr = p->end = NULL;
}
else {
p->ptr = PyString_AS_STRING((PyStringObject *)p->str) +
size;
p->end =
PyString_AS_STRING((PyStringObject *)p->str) +
newsize;
--- 61,76 ----
static void
w_more(int c, WFILE *p)
{
Py_ssize_t size, newsize;
if (p->str == NULL)
return; /* An error already occurred */
size = PyString_Size(p->str);
! newsize = size + size + 1024;
! /* printf("new size %d\n", newsize); */
if (_PyString_Resize(&p->str, newsize) != 0) {
p->ptr = p->end = NULL;
}
else {
p->ptr = PyString_AS_STRING((PyStringObject *)p->str) +
size;
p->end =
PyString_AS_STRING((PyStringObject *)p->str) +
newsize;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20080110/9fca7a06/attachment.html>
More information about the Python-ideas
mailing list