RE: [Python-Dev] stackable ints [stupid idea (ignore) :v]
From: "Tim Peters" <tim_one@email.msn.com>
Jumping in to opine that mixing tag/type bits with native pointers is a Really Bad Idea. Put the bits on the low end and word-addressed machines are screwed. Put the bits on the high end and you've made severe assumptions about how the platform parcels out address space. In any case you're stuck with ugly macros everywhere.
Agreed. Never ever mess with pointers. This mistake has been made over and over again by each new generation of computer hardware and software and it's still a mistake. I thought it would be good to be able to do the following loop with Numeric arrays for x in array1: array2[x] = array3[x] + array4[x] without any memory management being involved. Right now, I think the for loop has to continually dynamically allocate each new x and intermediate sum (and immediate deallocate them) and that makes the loop piteously slow. The idea replacing pyobject *'s with a struct [typedescr *, data *] was a space/time tradeoff to speed up operations like the above by eliminating any need for mallocs or other memory management.. I really can't say whether it'd be worth it or not without some sort of real testing. Just a thought. -- Aaron Watters
On Fri, 11 Jun 1999, Aaron Watters wrote:
I thought it would be good to be able to do the following loop with Numeric arrays
for x in array1: array2[x] = array3[x] + array4[x]
without any memory management being involved. Right now, I think the
FYI, I think it should be done by writing: array2[array1] = array3[array1] + array4[array1] and doing "the right thing" in NumPy. In other words, I don't think the core needs to be involved. --david PS: I'm in the process of making the NumPy array objects ExtensionClasses, which will make the above much easier to do.
David Ascher wrote:
On Fri, 11 Jun 1999, Aaron Watters wrote:
I thought it would be good to be able to do the following loop with Numeric arrays
for x in array1: array2[x] = array3[x] + array4[x]
without any memory management being involved. Right now, I think the
FYI, I think it should be done by writing:
array2[array1] = array3[array1] + array4[array1]
and doing "the right thing" in NumPy. In other words, I don't think the core needs to be involved.
For NumPy, this is very ok, dealing with arrays in an array world. Without trying to repeat myself, I'd like to say that I still consider it an unsolved problem which is worth to be solved or to be proven unsolvable: How to do simple things in an efficient way with many tiny Python objects, without writing an extension, without rethinking a problem into APL like style, and without changing the language. ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home
[Aaron Watters]
... I thought it would be good to be able to do the following loop with Numeric arrays
for x in array1: array2[x] = array3[x] + array4[x]
without any memory management being involved. Right now, I think the for loop has to continually dynamically allocate each new x
Actually not, it just binds x to the sequence of PyObject*'s already in array1, one at a time. It does bump & drop the refcount on that object a lot. Also irksome is that it keeps allocating/deallocating a little integer on each trip, for the under-the-covers loop index! Marc-Andre (I think) had/has a patch to worm around that, but IIRC it didn't make much difference (wouldn't expect it to, though -- not if the loop body does any real work). One thing a smarter Python compiler could do is notice the obvious <snort>: the *internal* incref/decref operations on the object denoted by x in the loop above must cancel out, so there's no need to do any of them. "internal" == those due to the routine actions of the PVM itself, while pushing and popping the eval stack. Exploiting that is tedious; e.g., inventing a pile of opcode variants that do the same thing as today's except skip an incref here and a decref there.
and intermediate sum (and immediate deallocate them)
The intermediate sum is allocated each time, but not deallocated (the pre-existing object at array2[x] *may* be deallocated, though).
and that makes the loop piteously slow.
A lot of things conspire to make it slow. David is certainly right that, in this particular case, array2[array1] = array3[array1] + etc worms around the worst of them.
The idea replacing pyobject *'s with a struct [typedescr *, data *] was a space/time tradeoff to speed up operations like the above by eliminating any need for mallocs or other memory management..
Fleshing out details may make it look less attractive. For machines where ints are no wider than pointers, the "data *" can be replaced with the int directly and then there's real potential. If for a float the "data*" really *is* a pointer, though, what does it point *at*? Some dynamically allocated memory to hold the float appears to be the only answer, and you're right back at the problem you were hoping to avoid. Make the "data*" field big enough to hold a Python float directly, and the descriptor likely zooms to 128 bits (assuming float is IEEE double and the machine requires natural alignment). Let's say we do that. Where does the "+" implementation get the 16 bytes it needs to store its result? The space presumably already exists in the slot indexed by array2[x], but the "+" implementation has no way to *know* that. Figuring it out requires non-local analysis, which is quite a few steps beyond what Python's compiler can do today. Easiest: internal functions all grow a new PyDescriptor* argument into which they are to write their result's descriptor. The PVM passes "+" the address of the slot indexed by array2[x] if it's smart enough; or, if it's not, the address of the stack slot descriptor into which today's PVM *would* push the result. In the latter case the PVM would need to copy those 16 bytes into the slot indexed by array2[x] later. Neither of those are simple as they sound, though, at least because if array2[x] holds a descriptor with a real pointer in its variant half, the thing to which it points needs to get decref'ed iff the add succeeds. It can get very messy!
I really can't say whether it'd be worth it or not without some sort of real testing. Just a thought.
It's a good thought! Just hard to make real. but-if-michael-hudson-keeps-hacking-at-bytecodes-and-christian- keeps-trying-to-prove-he's-crazier-than-michael-by-2001- we'll-be-able-to-generate-optimized-vector-assembler-for- it<wink>-ly y'rs - tim
participants (4)
-
Aaron Watters -
Christian Tismer -
David Ascher -
Tim Peters