[Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin)

Sat, 27 Apr 2002 01:01:03 -0400

[martin@v.loewis.de]
> That's what I mean (I'm *really* confused about memory family APIs,
> ever since everything changed :-)

Here's the in-depth course:

    PyMem_xyz calls the platform malloc/realloc/free (fiddled for
        x-platform uniformity in NULL and 0 handling)

    PyObject_xyz calls pymalloc's malloc/realloc/free

and instead of a dozen layers of indirection we've now got crushingly
straightforward WYSIWYG preprocessor blocks like:

#ifdef WITH_PYMALLOC
#ifdef PYMALLOC_DEBUG
#define PyObject_MALLOC		_PyObject_DebugMalloc
#define PyObject_Malloc		_PyObject_DebugMalloc
#define PyObject_REALLOC	_PyObject_DebugRealloc
#define PyObject_Realloc	_PyObject_DebugRealloc
#define PyObject_FREE		_PyObject_DebugFree
#define PyObject_Free		_PyObject_DebugFree

#else	/* WITH_PYMALLOC && ! PYMALLOC_DEBUG */
#define PyObject_MALLOC		PyObject_Malloc
#define PyObject_REALLOC	PyObject_Realloc
#define PyObject_FREE		PyObject_Free
#endif

#else	/* ! WITH_PYMALLOC */
#define PyObject_MALLOC		PyMem_MALLOC
#define PyObject_REALLOC	PyMem_REALLOC
#define PyObject_FREE		PyMem_FREE
#endif	/* WITH_PYMALLOC */

#define PyObject_Del PyObject_Free
#define PyObject_DEL PyObject_FREE

/* for source compatibility with 2.2 */
#define _PyObject_Del PyObject_Free

All the names you love are still there, it's just that most of them are
redundant now <wink>.

> ...
> I do think that the Unicode data should be managed by pymalloc as
> well.

Well, that largely depends on how big these suckers are.  Calling
PyObject_XYZ adds real overhead if pymalloc can't handle the requested size:
all the overhead of the system routines, + the overhead of pymalloc figuring
out it can't handle it.  I expect it's also not good to mix pymalloc with
custom free lists:  you hold on to one object from a pymalloc pool, and it
prevents the entire pool from getting recycled for another size class.  So
if you want to investigate using pymalloc more heavily for Unicode objects,
I suggest two things:

1. Get rid of the Unicode-specific free list.

2. Change the object layout to embed the str member storage, just as
   PyStringObject does.

#1 is pretty localized, but #2 would require changing a lot of code.

> Of course, DecodeUTF8 would then raise the same issue: decoding
> UTF-8 doesn't know how many characters you'll get, either. This
> currently does not try to be clever, but allocates enough memory for
> the worst case.

I just put a patch up on SourceForge that's *less* clever, but shouldn't
waste any memory in the end.  I expect you'll be happy with it, or rant
inconsolably.  It's all the same to me <wink>.