[Python-Dev] stackable ints [stupid idea (ignore) :v]

Tim Peters tim_one at email.msn.com
Sat Jun 12 23:37:08 CEST 1999


[Aaron Watters]
> ...
> I thought it would be good to be able to do the following loop
> with Numeric arrays
>
>     for x in array1:
>          array2[x] = array3[x] + array4[x]
>
> without any memory management being involved.  Right now, I think the
> for loop has to continually dynamically allocate each new x

Actually not, it just binds x to the sequence of PyObject*'s already in
array1, one at a time.  It does bump & drop the refcount on that object a
lot.  Also irksome is that it keeps allocating/deallocating a little integer
on each trip, for the under-the-covers loop index!  Marc-Andre (I think)
had/has a patch to worm around that, but IIRC it didn't make much difference
(wouldn't expect it to, though -- not if the loop body does any real work).

One thing a smarter Python compiler could do is notice the obvious <snort>:
the *internal* incref/decref operations on the object denoted by x in the
loop above must cancel out, so there's no need to do any of them.
"internal" == those due to the routine actions of the PVM itself, while
pushing and popping the eval stack.  Exploiting that is tedious; e.g.,
inventing a pile of opcode variants that do the same thing as today's except
skip an incref here and a decref there.

> and intermediate sum (and immediate deallocate them)

The intermediate sum is allocated each time, but not deallocated (the
pre-existing object at array2[x] *may* be deallocated, though).

> and that makes the loop piteously slow.

A lot of things conspire to make it slow.  David is certainly right that, in
this particular case, array2[array1] = array3[array1] + etc worms around the
worst of them.

> The idea replacing pyobject *'s with a struct [typedescr *, data *]
> was a space/time tradeoff to speed up operations like the above
> by eliminating any need for mallocs or other memory management..

Fleshing out details may make it look less attractive.  For machines where
ints are no wider than pointers, the "data *" can be replaced with the int
directly and then there's real potential.  If for a float the "data*" really
*is* a pointer, though, what does it point *at*?  Some dynamically allocated
memory to hold the float appears to be the only answer, and you're right
back at the problem you were hoping to avoid.

Make the "data*" field big enough to hold a Python float directly, and the
descriptor likely zooms to 128 bits (assuming float is IEEE double and the
machine requires natural alignment).

Let's say we do that.  Where does the "+" implementation get the 16 bytes it
needs to store its result?  The space presumably already exists in the slot
indexed by array2[x], but the "+" implementation has no way to *know* that.
Figuring it out requires non-local analysis, which is quite a few steps
beyond what Python's compiler can do today.  Easiest:  internal functions
all grow a new PyDescriptor* argument into which they are to write their
result's descriptor.  The PVM passes "+" the address of the slot indexed by
array2[x] if it's smart enough; or, if it's not, the address of the stack
slot descriptor into which today's PVM *would* push the result.  In the
latter case the PVM would need to copy those 16 bytes into the slot indexed
by array2[x] later.

Neither of those are simple as they sound, though, at least because if
array2[x] holds a descriptor with a real pointer in its variant half, the
thing to which it points needs to get decref'ed iff the add succeeds.  It
can get very messy!

> I really can't say whether it'd be worth it or not without some sort of
> real testing.  Just a thought.

It's a good thought!  Just hard to make real.

but-if-michael-hudson-keeps-hacking-at-bytecodes-and-christian-
    keeps-trying-to-prove-he's-crazier-than-michael-by-2001-
    we'll-be-able-to-generate-optimized-vector-assembler-for-
    it<wink>-ly y'rs  - tim






More information about the Python-Dev mailing list