[Python-checkins] CVS: python/dist/src/Objects dictobject.c,2.90,2.91

Tim Peters tim_one@users.sourceforge.net
Tue, 22 May 2001 13:40:25 -0700


Update of /cvsroot/python/python/dist/src/Objects
In directory usw-pr-cvs1:/tmp/cvs-serv30669/python/dist/src/Objects

Modified Files:
	dictobject.c 
Log Message:
SF patch #425242:  Patch which "inlines" small dictionaries.
The idea is Marc-Andre Lemburg's, the implementation is Tim's.
Add a new ma_smalltable member to dictobjects, an embedded vector of
MINSIZE (8) dictentry structs.  Short course is that this lets us avoid
additional malloc(s) for dicts with no more than 5 entries.

The changes are widespread but mostly small.

Long course:  WRT speed, all scalar operations (getitem, setitem, delitem)
on non-empty dicts benefit from no longer needing NULL-pointer checks
(ma_table is never NULL anymore).  Bulk operations (copy, update, resize,
clearing slots during dealloc) benefit in some cases from now looping
on the ma_fill count rather than on ma_size, but that was an unexpected
benefit:  the original reason to loop on ma_fill was to let bulk
operations on empty dicts end quickly (since the NULL-pointer checks
went away, empty dicts aren't special-cased any more).

Special considerations:

For dicts that remain empty, this change is a lose on two counts:
the dict object contains 8 new dictentry slots now that weren't
needed before, and dict object creation also spends time memset'ing
these doomed-to-be-unsused slots to NULLs.

For dicts with one or two entries that never get larger than 2, it's
a mix:  a malloc()/free() pair is no longer needed, and the 2-entry case
gets to use 8 slots (instead of 4) thus decreasing the chance of
collision.  Against that, dict object creation spends time memset'ing
4 slots that aren't strictly needed in this case.

For dicts with 3 through 5 entries that never get larger than 5, it's a
pure win:  the dict is created with all the space they need, and they
never need to resize.  Before they suffered two malloc()/free() calls,
plus 1 dict resize, to get enough space.  In addition, the 8-slot
table they ended with consumed more memory overall, because of the
hidden overhead due to the additional malloc.

For dicts with 6 or more entries, the ma_smalltable member is wasted
space, but then these are large(r) dicts so 8 slots more or less doesn't
make much difference.  They still benefit all the time from removing
ubiquitous dynamic null-pointer checks, and get a small benefit (but
relatively smaller the larger the dict) from not having to do two
mallocs, two frees, and a resize on the way *to* getting their sixth
entry.

All in all it appears a small but definite general win, with larger 
benefits in specific cases.  It's especially nice that it allowed to
get rid of several branches, gotos and labels, and overall made the
code smaller.


Index: dictobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v
retrieving revision 2.90
retrieving revision 2.91
diff -C2 -r2.90 -r2.91
*** dictobject.c	2001/05/19 07:04:38	2.90
--- dictobject.c	2001/05/22 20:40:22	2.91
***************
*** 6,14 ****
  
  /*
!  * MINSIZE is the minimum size of a dictionary.
   */
  
- #define MINSIZE 4
- 
  /* define this out if you don't want conversion statistics on exit */
  #undef SHOW_CONVERSION_COUNTS
--- 6,16 ----
  
  /*
!  * MINSIZE is the minimum size of a dictionary.  This many slots are
!  * allocated directly in the dict object (in the ma_smalltable member).
!  * This must be a power of 2, and the first entry in the polys[] vector must
!  * match.
   */
+ #define MINSIZE 8
  
  /* define this out if you don't want conversion statistics on exit */
  #undef SHOW_CONVERSION_COUNTS
***************
*** 17,24 ****
  Table of irreducible polynomials to efficiently cycle through
  GF(2^n)-{0}, 2<=n<=30.  A table size is always a power of 2.
  */
  static long polys[] = {
! 	4 + 3,
! 	8 + 3,
  	16 + 3,
  	32 + 5,
--- 19,40 ----
  Table of irreducible polynomials to efficiently cycle through
  GF(2^n)-{0}, 2<=n<=30.  A table size is always a power of 2.
+ For a table size of 2**i, the polys entry is 2**i + j for some j in 1 thru
+ 2**i-1 inclusive.  The polys[] entries here happen to add in the smallest j
+ values "that work".  Work means this:  given any integer k in 1 thru 2**i-1
+ inclusive, a poly works if & only if repeating this code:
+ 	print k
+ 	k <<= 1
+ 	if k >= 2**i:
+ 		k ^= poly
+ prints every integer in 1 thru 2**i-1 inclusive exactly once before printing 
+ k a second time.  Theory can be used to find such polys efficiently, but the 
+ operational defn. of "works" is sufficient to find them in reasonable time 
+ via brute force program (hint:  any poly that has an even number of 1 bits 
+ cannot work; ditto any poly with low bit 0; exploit those).
  */
+ 
  static long polys[] = {
! /*	4 + 3, */	/* first active entry if MINSIZE == 4 */
! 	8 + 3,		/* first active entry if MINSIZE == 8 */
  	16 + 3,
  	32 + 5,
***************
*** 47,51 ****
  	268435456 + 9,
  	536870912 + 5,
! 	1073741824 + 83,
  };
  
--- 63,68 ----
  	268435456 + 9,
  	536870912 + 5,
! 	1073741824 + 83
! 	/* 2147483648 + 9 -- if we ever boost this to unsigned long */
  };
  
***************
*** 101,106 ****
--- 118,129 ----
  	int ma_size;  /* total # slots in ma_table */
  	int ma_poly;  /* appopriate entry from polys vector */
+ 	/* ma_table points to ma_smalltable for small tables, else to
+ 	 * additional malloc'ed memory.  ma_table is never NULL!  This rule
+ 	 * saves repeated runtime null-tests in the workhorse getitem and
+ 	 * setitem calls.
+ 	 */
  	dictentry *ma_table;
  	dictentry *(*ma_lookup)(dictobject *mp, PyObject *key, long hash);
+ 	dictentry ma_smalltable[MINSIZE];
  };
  
***************
*** 122,125 ****
--- 145,158 ----
  #endif
  
+ /* Set dictobject* mp to empty but w/ MINSIZE slots, using ma_smalltable. */
+ #define empty_to_minsize(mp) do { 					\
+ 	memset((mp)->ma_smalltable, 0, sizeof((mp)->ma_smalltable));	\
+ 	(mp)->ma_table = (mp)->ma_smalltable;				\
+ 	(mp)->ma_size = MINSIZE;					\
+ 	(mp)->ma_used = (mp)->ma_fill = 0;				\
+ 	(mp)->ma_poly = polys[0];					\
+ 	assert(MINSIZE < (mp)->ma_poly && (mp)->ma_poly < MINSIZE*2);	\
+     } while(0)
+ 
  PyObject *
  PyDict_New(void)
***************
*** 137,145 ****
  	if (mp == NULL)
  		return NULL;
! 	mp->ma_size = 0;
! 	mp->ma_poly = 0;
! 	mp->ma_table = NULL;
! 	mp->ma_fill = 0;
! 	mp->ma_used = 0;
  	mp->ma_lookup = lookdict_string;
  #ifdef SHOW_CONVERSION_COUNTS
--- 170,174 ----
  	if (mp == NULL)
  		return NULL;
! 	empty_to_minsize(mp);
  	mp->ma_lookup = lookdict_string;
  #ifdef SHOW_CONVERSION_COUNTS
***************
*** 321,325 ****
  			&& compare(ep->me_key, key) == 0))
  			return ep;
! 		else if (ep->me_key == dummy && freeslot == NULL)
  			freeslot = ep;
  		/* Cycle through GF(2^n)-{0} */
--- 350,354 ----
  			&& compare(ep->me_key, key) == 0))
  			return ep;
! 		if (ep->me_key == dummy && freeslot == NULL)
  			freeslot = ep;
  		/* Cycle through GF(2^n)-{0} */
***************
*** 375,408 ****
  
  	assert(minused >= 0);
! 	for (i = 0, newsize = MINSIZE; ; i++, newsize <<= 1) {
! 		if (i >= sizeof(polys)/sizeof(polys[0])) {
! 			/* Ran out of polynomials */
! 			PyErr_NoMemory();
! 			return -1;
! 		}
  		if (newsize > minused) {
  			newpoly = polys[i];
  			break;
  		}
  	}
! 	newtable = PyMem_NEW(dictentry, newsize);
! 	if (newtable == NULL) {
  		PyErr_NoMemory();
  		return -1;
  	}
! 	memset(newtable, '\0', sizeof(dictentry) * newsize);
  	mp->ma_size = newsize;
  	mp->ma_poly = newpoly;
- 	mp->ma_table = newtable;
- 	mp->ma_fill = 0;
  	mp->ma_used = 0;
  
  	/* Copy the data over; this is refcount-neutral for active entries;
  	   dummy entries aren't copied over, of course */
! 	for (i = 0, ep = oldtable; i < oldsize; i++, ep++) {
! 		if (ep->me_value != NULL)	/* active entry */
  			insertdict(mp, ep->me_key, ep->me_hash, ep->me_value);
! 
  		else if (ep->me_key != NULL) {	/* dummy entry */
  			assert(ep->me_key == dummy);
  			Py_DECREF(ep->me_key);
--- 404,454 ----
  
  	assert(minused >= 0);
! 	assert(oldtable != NULL);
! 	newpoly = 0;
! 	newsize = MINSIZE;
! 	for (i = 0; i < sizeof(polys)/sizeof(polys[0]); ++i) {
  		if (newsize > minused) {
  			newpoly = polys[i];
  			break;
  		}
+ 		newsize <<= 1;
+ 		if (newsize < 0)   /* overflow */
+ 			break;
  	}
! 	if (newpoly == 0) {
! 		/* Ran out of polynomials or newsize overflowed. */
  		PyErr_NoMemory();
  		return -1;
+ 	}
+ 	if (newsize == MINSIZE) {
+ 		newtable = mp->ma_smalltable;
+ 		if (newtable == oldtable)
+ 			return 0;
+ 	}
+ 	else {
+ 		newtable = PyMem_NEW(dictentry, newsize);
+ 		if (newtable == NULL) {
+ 			PyErr_NoMemory();
+ 			return -1;
+ 		}
  	}
! 	assert(newtable != oldtable);
! 	mp->ma_table = newtable;
  	mp->ma_size = newsize;
+ 	memset(newtable, 0, sizeof(dictentry) * newsize);
  	mp->ma_poly = newpoly;
  	mp->ma_used = 0;
+ 	i = mp->ma_fill;
+ 	mp->ma_fill = 0;
  
  	/* Copy the data over; this is refcount-neutral for active entries;
  	   dummy entries aren't copied over, of course */
! 	for (ep = oldtable; i > 0; ep++) {
! 		if (ep->me_value != NULL) {	/* active entry */
! 			--i;
  			insertdict(mp, ep->me_key, ep->me_hash, ep->me_value);
! 		}
  		else if (ep->me_key != NULL) {	/* dummy entry */
+ 			--i;
  			assert(ep->me_key == dummy);
  			Py_DECREF(ep->me_key);
***************
*** 411,415 ****
  	}
  
! 	if (oldtable != NULL)
  		PyMem_DEL(oldtable);
  	return 0;
--- 457,461 ----
  	}
  
! 	if (oldtable != mp->ma_smalltable)
  		PyMem_DEL(oldtable);
  	return 0;
***************
*** 424,429 ****
  		return NULL;
  	}
- 	if (mp->ma_table == NULL)
- 		return NULL;
  #ifdef CACHE_HASH
  	if (!PyString_Check(key) ||
--- 470,473 ----
***************
*** 480,493 ****
  			return -1;
  	}
! 	if (mp->ma_fill >= mp->ma_size) {
! 		/* No room for a new key.
! 		 * This only happens when the dict is empty.
! 		 * Let dictresize() create a minimal dict.
! 		 */
! 		assert(mp->ma_used == 0);
! 		if (dictresize(mp, 0) != 0)
! 			return -1;
! 		assert(mp->ma_fill < mp->ma_size);
! 	}
  	n_used = mp->ma_used;
  	Py_INCREF(value);
--- 524,528 ----
  			return -1;
  	}
! 	assert(mp->ma_fill < mp->ma_size);
  	n_used = mp->ma_used;
  	Py_INCREF(value);
***************
*** 529,537 ****
  	}
  	mp = (dictobject *)op;
- 	if (((dictobject *)op)->ma_table == NULL)
- 		goto empty;
  	ep = (mp->ma_lookup)(mp, key, hash);
  	if (ep->me_value == NULL) {
- 	empty:
  		PyErr_SetObject(PyExc_KeyError, key);
  		return -1;
--- 564,569 ----
***************
*** 551,571 ****
  PyDict_Clear(PyObject *op)
  {
- 	int i, n;
- 	register dictentry *table;
  	dictobject *mp;
  	if (!PyDict_Check(op))
  		return;
  	mp = (dictobject *)op;
! 	table = mp->ma_table;
! 	if (table == NULL)
! 		return;
  	n = mp->ma_size;
! 	mp->ma_size = mp->ma_used = mp->ma_fill = 0;
! 	mp->ma_table = NULL;
! 	for (i = 0; i < n; i++) {
! 		Py_XDECREF(table[i].me_key);
! 		Py_XDECREF(table[i].me_value);
  	}
! 	PyMem_DEL(table);
  }
  
--- 583,650 ----
  PyDict_Clear(PyObject *op)
  {
  	dictobject *mp;
+ 	dictentry *ep, *table;
+ 	int table_is_malloced;
+ 	int fill;
+ 	dictentry small_copy[MINSIZE];
+ #ifdef Py_DEBUG
+ 	int i, n;
+ #endif
+ 
  	if (!PyDict_Check(op))
  		return;
  	mp = (dictobject *)op;
! #ifdef Py_DEBUG
  	n = mp->ma_size;
! 	i = 0;
! #endif
! 
! 	table = mp->ma_table;
! 	assert(table != NULL);
! 	table_is_malloced = table != mp->ma_smalltable;
! 
! 	/* This is delicate.  During the process of clearing the dict,
! 	 * decrefs can cause the dict to mutate.  To avoid fatal confusion
! 	 * (voice of experience), we have to make the dict empty before
! 	 * clearing the slots, and never refer to anything via mp->xxx while
! 	 * clearing.
! 	 */
! 	fill = mp->ma_fill;
! 	if (table_is_malloced)
! 		empty_to_minsize(mp);
! 
! 	else if (fill > 0) {
! 		/* It's a small table with something that needs to be cleared.
! 		 * Afraid the only safe way is to copy the dict entries into
! 		 * another small table first.
! 		 */
! 		memcpy(small_copy, table, sizeof(small_copy));
! 		table = small_copy;
! 		empty_to_minsize(mp);
! 	}
! 	/* else it's a small table that's already empty */
! 
! 	/* Now we can finally clear things.  If C had refcounts, we could
! 	 * assert that the refcount on table is 1 now, i.e. that this function
! 	 * has unique access to it, so decref side-effects can't alter it.
! 	 */
! 	for (ep = table; fill > 0; ++ep) {
! #ifdef Py_DEBUG
! 		assert(i < n);
! 		++i;
! #endif
! 		if (ep->me_key) {
! 			--fill;
! 			Py_DECREF(ep->me_key);
! 			Py_XDECREF(ep->me_value);
! 		}
! #ifdef Py_DEBUG
! 		else
! 			assert(ep->me_value == NULL);
! #endif
  	}
! 
! 	if (table_is_malloced)
! 		PyMem_DEL(table);
  }
  
***************
*** 603,619 ****
  dict_dealloc(register dictobject *mp)
  {
- 	register int i;
  	register dictentry *ep;
  	Py_TRASHCAN_SAFE_BEGIN(mp)
  	PyObject_GC_Fini(mp);
! 	for (i = 0, ep = mp->ma_table; i < mp->ma_size; i++, ep++) {
! 		if (ep->me_key != NULL) {
  			Py_DECREF(ep->me_key);
  		}
- 		if (ep->me_value != NULL) {
- 			Py_DECREF(ep->me_value);
- 		}
  	}
! 	if (mp->ma_table != NULL)
  		PyMem_DEL(mp->ma_table);
  	mp = (dictobject *) PyObject_AS_GC(mp);
--- 682,697 ----
  dict_dealloc(register dictobject *mp)
  {
  	register dictentry *ep;
+ 	int fill = mp->ma_fill;
  	Py_TRASHCAN_SAFE_BEGIN(mp)
  	PyObject_GC_Fini(mp);
! 	for (ep = mp->ma_table; fill > 0; ep++) {
! 		if (ep->me_key) {
! 			--fill;
  			Py_DECREF(ep->me_key);
+ 			Py_XDECREF(ep->me_value);
  		}
  	}
! 	if (mp->ma_table != mp->ma_smalltable)
  		PyMem_DEL(mp->ma_table);
  	mp = (dictobject *) PyObject_AS_GC(mp);
***************
*** 706,713 ****
  	PyObject *v;
  	long hash;
! 	if (mp->ma_table == NULL) {
! 		PyErr_SetObject(PyExc_KeyError, key);
! 		return NULL;
! 	}
  #ifdef CACHE_HASH
  	if (!PyString_Check(key) ||
--- 784,788 ----
  	PyObject *v;
  	long hash;
! 	assert(mp->ma_table != NULL);
  #ifdef CACHE_HASH
  	if (!PyString_Check(key) ||
***************
*** 1169,1174 ****
  			return NULL;
  	}
! 	ok = (mp->ma_size != 0
! 	      && (mp->ma_lookup)(mp, key, hash)->me_value != NULL);
  	return PyInt_FromLong(ok);
  }
--- 1244,1248 ----
  			return NULL;
  	}
! 	ok = (mp->ma_lookup)(mp, key, hash)->me_value != NULL;
  	return PyInt_FromLong(ok);
  }
***************
*** 1184,1189 ****
  	if (!PyArg_ParseTuple(args, "O|O:get", &key, &failobj))
  		return NULL;
- 	if (mp->ma_table == NULL)
- 		goto finally;
  
  #ifdef CACHE_HASH
--- 1258,1261 ----
***************
*** 1198,1202 ****
  	val = (mp->ma_lookup)(mp, key, hash)->me_value;
  
-   finally:
  	if (val == NULL)
  		val = failobj;
--- 1270,1273 ----
***************
*** 1216,1221 ****
  	if (!PyArg_ParseTuple(args, "O|O:setdefault", &key, &failobj))
  		return NULL;
- 	if (mp->ma_table == NULL)
- 		goto finally;
  
  #ifdef CACHE_HASH
--- 1287,1290 ----
***************
*** 1229,1234 ****
  	}
  	val = (mp->ma_lookup)(mp, key, hash)->me_value;
- 
-   finally:
  	if (val == NULL) {
  		val = failobj;
--- 1298,1301 ----
***************
*** 1284,1293 ****
  	if (ep->me_value == NULL) {
  		i = (int)ep->me_hash;
! 		/* The hash field may be uninitialized trash, or it
! 		 * may be a real hash value, or it may be a legit
! 		 * search finger, or it may be a once-legit search
! 		 * finger that's out of bounds now because it
! 		 * wrapped around or the table shrunk -- simply
! 		 * make sure it's in bounds now.
  		 */
  		if (i >= mp->ma_size || i < 1)
--- 1351,1358 ----
  	if (ep->me_value == NULL) {
  		i = (int)ep->me_hash;
! 		/* The hash field may be a real hash value, or it may be a
! 		 * legit search finger, or it may be a once-legit search
! 		 * finger that's out of bounds now because it wrapped around
! 		 * or the table shrunk -- simply make sure it's in bounds now.
  		 */
  		if (i >= mp->ma_size || i < 1)
***************
*** 1481,1486 ****
  			return -1;
  	}
! 	return (mp->ma_size != 0
! 		&& (mp->ma_lookup)(mp, key, hash)->me_value != NULL);
  }
  
--- 1546,1550 ----
  			return -1;
  	}
! 	return (mp->ma_lookup)(mp, key, hash)->me_value != NULL;
  }