[Python-ideas] o(n**2) problem with marshal.dumps for large objects, with patch

Aaron Watters aaron.watters at gmail.com
Thu Jan 10 19:37:32 CET 2008


Hi  folks.  Much to my surprise I found that one of
my applications seemed to be running slow as a result
of marshal.dumps.  I think the culprit is the w_more(...)
function, which grows the marshal buffer in 1k units.
This means that a marshal of size 100k will have 100
reallocations and string copies.  Other parts of Python
(and java etc) have a proportional reallocation strategy
which reallocates a new size based on the existing size.
This mean a 100k marshal requires just 5 or so
reallocations and string copies (n log n versus n**2
asymptotic performance).

I humbly submit the following patch (based on python 2.6a0
source).  I think it is a strict improvement on the existing
code, but I've been wrong before (twice ;)).
   -- Aaron Watters

PATCH FOLLOWS

*** marshal.c.original  2008-01-10 10:15:40.686838800 -0500
--- marshal.c   2008-01-10 11:32:01.838654000 -0500
***************
*** 61,75 ****
  static void
  w_more(int c, WFILE *p)
  {
        Py_ssize_t size, newsize;
        if (p->str == NULL)
                return; /* An error already occurred */
        size = PyString_Size(p->str);
!       newsize = size + 1024;
        if (_PyString_Resize(&p->str, newsize) != 0) {
                p->ptr = p->end = NULL;
        }
        else {
                p->ptr = PyString_AS_STRING((PyStringObject *)p->str) +
size;
                p->end =
                        PyString_AS_STRING((PyStringObject *)p->str) +
newsize;
--- 61,76 ----
  static void
  w_more(int c, WFILE *p)
  {
        Py_ssize_t size, newsize;
        if (p->str == NULL)
                return; /* An error already occurred */
        size = PyString_Size(p->str);
!       newsize = size + size + 1024;
!       /* printf("new size %d\n", newsize); */
        if (_PyString_Resize(&p->str, newsize) != 0) {
                p->ptr = p->end = NULL;
        }
        else {
                p->ptr = PyString_AS_STRING((PyStringObject *)p->str) +
size;
                p->end =
                        PyString_AS_STRING((PyStringObject *)p->str) +
newsize;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20080110/9fca7a06/attachment.html>


More information about the Python-ideas mailing list