o(n**2) problem with marshal.dumps for large objects, with patch

Hi folks. Much to my surprise I found that one of my applications seemed to be running slow as a result of marshal.dumps. I think the culprit is the w_more(...) function, which grows the marshal buffer in 1k units. This means that a marshal of size 100k will have 100 reallocations and string copies. Other parts of Python (and java etc) have a proportional reallocation strategy which reallocates a new size based on the existing size. This mean a 100k marshal requires just 5 or so reallocations and string copies (n log n versus n**2 asymptotic performance). I humbly submit the following patch (based on python 2.6a0 source). I think it is a strict improvement on the existing code, but I've been wrong before (twice ;)). -- Aaron Watters PATCH FOLLOWS *** marshal.c.original 2008-01-10 10:15:40.686838800 -0500 --- marshal.c 2008-01-10 11:32:01.838654000 -0500 *************** *** 61,75 **** static void w_more(int c, WFILE *p) { Py_ssize_t size, newsize; if (p->str == NULL) return; /* An error already occurred */ size = PyString_Size(p->str); ! newsize = size + 1024; if (_PyString_Resize(&p->str, newsize) != 0) { p->ptr = p->end = NULL; } else { p->ptr = PyString_AS_STRING((PyStringObject *)p->str) + size; p->end = PyString_AS_STRING((PyStringObject *)p->str) + newsize; --- 61,76 ---- static void w_more(int c, WFILE *p) { Py_ssize_t size, newsize; if (p->str == NULL) return; /* An error already occurred */ size = PyString_Size(p->str); ! newsize = size + size + 1024; ! /* printf("new size %d\n", newsize); */ if (_PyString_Resize(&p->str, newsize) != 0) { p->ptr = p->end = NULL; } else { p->ptr = PyString_AS_STRING((PyStringObject *)p->str) + size; p->end = PyString_AS_STRING((PyStringObject *)p->str) + newsize;

Aaron Watters wrote:
Could you please submit the patch at http://bugs.python.org/ ? Patches in mailing list posts have a tendency to get lost. An unified diff (diff -u) is preferred. Thanks Christian

2008/1/10, Christian Heimes <lists@cheimes.de>:
Please, include also an small example that shows this improvement, so it can actually be measured (some lines of Python code that takes long at first, but small time after the patch). Thank you!! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/

Aaron Watters wrote:
Could you please submit the patch at http://bugs.python.org/ ? Patches in mailing list posts have a tendency to get lost. An unified diff (diff -u) is preferred. Thanks Christian

2008/1/10, Christian Heimes <lists@cheimes.de>:
Please, include also an small example that shows this improvement, so it can actually be measured (some lines of Python code that takes long at first, but small time after the patch). Thank you!! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/
participants (3)
-
Aaron Watters
-
Christian Heimes
-
Facundo Batista