[Python-Dev] collections module

Tim Peters tim.one at comcast.net
Mon Jan 12 19:46:58 EST 2004


[Guido]
> Just in case nobody had thought of it before, couldn't the realloc()
> call be avoided when roundupsize(oldsize) == roundupsize(newsize)???

Followup:  it's not quite that easy, because (at least) PyList_New(int size)
can create a list whose allocated size hasn't been rounded up.

If I "fix" that, then it's possible to get away with just one roundupsize()
call on most list.insert() calls (including list.append()), while avoiding
the realloc() call too.  Patch attached.

I timed it like so:

def punchit():
    from time import clock as now
    a = []
    push = a.append
    start = now()
    for i in xrange(1000000):
        push(i)
    finish = now()
    return finish - start

for i in range(10):
    print punchit()

and got these elapsed times (this is Windows, so this is elapsed wall-clock
time):

       before           after
-------------  --------------
1.05227710823  1.02203188119
1.05532442716  0.569660068053
1.05461539751  0.568627533147
1.0564449622   0.569336562799
1.05964146231  0.573247959235
1.05679528655  0.568503494862
1.05569402772  0.569745553898
1.05383177727  0.569499991619
1.05528000805  0.569163914916
1.05636618113  0.570356526258

So pretty consistent here, apart from the glitch on the first "after" run,
and about an 80% speedup.

Yawn <wink>.

If someone wants to run with this, be my guest.  There may be other holes
(c.f. PyList_New above), and it could sure use a comment about why it's
correct (if it in fact is <0.9 wink>).
-------------- next part --------------
Index: Objects/listobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v
retrieving revision 2.175
diff -u -r2.175 listobject.c
--- Objects/listobject.c	4 Jan 2004 06:08:16 -0000	2.175
+++ Objects/listobject.c	13 Jan 2004 00:31:54 -0000
@@ -57,13 +57,15 @@
 {
 	PyListObject *op;
 	size_t nbytes;
+	int allocated_size = roundupsize(size);
+
 	if (size < 0) {
 		PyErr_BadInternalCall();
 		return NULL;
 	}
-	nbytes = size * sizeof(PyObject *);
+	nbytes = allocated_size * sizeof(PyObject *);
 	/* Check for overflow */
-	if (nbytes / sizeof(PyObject *) != (size_t)size) {
+	if (nbytes / sizeof(PyObject *) != (size_t)allocated_size) {
 		return PyErr_NoMemory();
 	}
 	op = PyObject_GC_New(PyListObject, &PyList_Type);
@@ -78,7 +80,7 @@
 		if (op->ob_item == NULL) {
 			return PyErr_NoMemory();
 		}
-		memset(op->ob_item, 0, sizeof(*op->ob_item) * size);
+		memset(op->ob_item, 0, sizeof(*op->ob_item) * allocated_size);
 	}
 	op->ob_size = size;
 	_PyObject_GC_TRACK(op);
@@ -154,7 +156,8 @@
 		return -1;
 	}
 	items = self->ob_item;
-	NRESIZE(items, PyObject *, self->ob_size+1);
+	if (roundupsize(self->ob_size) - 1 == self->ob_size || items == NULL)
+		NRESIZE(items, PyObject *, self->ob_size+1);
 	if (items == NULL) {
 		PyErr_NoMemory();
 		return -1;


More information about the Python-Dev mailing list