Associative arrays in hardware

Full-on associative memory wouldn't be necessary. If there were a machine-level instruction in current chips that did a dict-style lookup almost as fast as a base+index*size, Python could *scream*. It might work like 80x86's lea (load effective address): hashaddr base, recsize, dictsize, value, destreg It would be very CISCy, and there are a whole host of practical problems dealing with hashing functions, good collision resolution, etc. But any pointer/index-based (e.g. interned string) lookup could go almost as fast as C++'s "this->". Mainstream is moving toward dynamic languages, and it makes sense to support that in hardware. Heck, I hear routers and switches already have things like this. Does anybody have the ear of an Intel chip designer? Neil

Given how modern hardware is implemented, with a large set of their instructions being translated into even lower-level instructions, I don't know there would actually be a huge benefit. Just because there is an instruction for something doesn't mean its absolutely blazing. Most likely this instruction would take several cycles to execute, just like the equivalent instructions currently implementing a dict- lookup. This also can't handle the fact that most (useful) values in a dictionary are not atomic (they are not a single int, float, etc.) and require custom __hash__ functions. How do you propose a hardware solution would solve those hurdles? On Apr 15, 2008, at 1:24 AM, Neil Toronto wrote:

Neil Toronto wrote:
But how would you make it that fast without some kind of associative lookup hardware? Without that, you'd just be implementing the Python dict lookup algorithm in microcode, which might be a bit faster, but not spectacularly so -- the same number of memory accesses would still be needed. -- Greg

Greg Ewing wrote:
Good point. They'd have to add that to the RAM. :D Or at least the cache...
Right, it's memory access that's the bottleneck. A hardware dict lookup could know its own memory access characteristics and plan for them. In particular, it would be much better at predicting what to bring into cache next than current CPUs are at predicting based on software algorithms, if they predict at all. On the other side of it, they could optimize their lookups for their own hardware, particularly collision resolution methods. Neil

Given how modern hardware is implemented, with a large set of their instructions being translated into even lower-level instructions, I don't know there would actually be a huge benefit. Just because there is an instruction for something doesn't mean its absolutely blazing. Most likely this instruction would take several cycles to execute, just like the equivalent instructions currently implementing a dict- lookup. This also can't handle the fact that most (useful) values in a dictionary are not atomic (they are not a single int, float, etc.) and require custom __hash__ functions. How do you propose a hardware solution would solve those hurdles? On Apr 15, 2008, at 1:24 AM, Neil Toronto wrote:

Neil Toronto wrote:
But how would you make it that fast without some kind of associative lookup hardware? Without that, you'd just be implementing the Python dict lookup algorithm in microcode, which might be a bit faster, but not spectacularly so -- the same number of memory accesses would still be needed. -- Greg

Greg Ewing wrote:
Good point. They'd have to add that to the RAM. :D Or at least the cache...
Right, it's memory access that's the bottleneck. A hardware dict lookup could know its own memory access characteristics and plan for them. In particular, it would be much better at predicting what to bring into cache next than current CPUs are at predicting based on software algorithms, if they predict at all. On the other side of it, they could optimize their lookups for their own hardware, particularly collision resolution methods. Neil
participants (3)
-
Calvin Spealman
-
Greg Ewing
-
Neil Toronto