[Cython] Hash-based vtables
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Sat Jun 30 13:01:07 CEST 2012
On 06/30/2012 12:57 PM, Dag Sverre Seljebotn wrote:
> My time is rather limited but I'm slowly trying to get another SEP 200
> in place.
>
> Something that hit me, when I tried to make up my mind about whether to
> have (key, ptr) entries or (key, flags, ptr), is that the fast hash
> table entries can actually be arbitrary size. So we could make the table
> itself
>
> void *table[n]
>
> and then n would be a power of 2 (TBD: benchmark cost of allowing other
> sizes). Since we have the d[i] displacements, it's no problem at all to
> construct displacements to account for variable-size entries.
>
> Proposal:
>
> C-source for an un-initialized table (signature string is placeholder
> and not up for discussion now):
>
> { "3:method:foo:i4i4->i4", (void*)EXCEPT_STAR_FLAG, &foo_method,
> "2:numpy:SHAPE", &get_shape_method,
> "2:fieldoffset:barfield", (void*)5, 0 /*padding to n=2^k*/ }
>
> I.e. all keys are prepended by the number of slots they use. So methods
> get to use 3 sizeof(void*) slots since they need the flags, but entries
> that don't need flags use only 2 slots. (In this case, "numpy:SHAPE" is
> a protocol defined by NumPy and so doesn't need any flags; or the flags
> are stored under "numpy:FLAGS" by that protocol.)
>
> Then, PyExtensibleType_Ready parses this and rearranges the table to a
> perfect hash-table. As part of that, it parses the string literal keys
> and interns them, so that the number of slots becomes available in a
> more coder-friendly manner:
>
> typedef {
> uint64_t hash; /* lower-64 bit of md5 */
> uint32_t strlen; /* we allow \0 in key */
> uint8_t nslots; /* set to 3 for first example */
> char *key; /* set to "method:foo:i4i4->i4" */
> } fasttable_key_t;
>
> Then, the interned keys for the table is the fasttable_key_t*.
>
> Storing the hash inside the key has two pros:
> - Caching the md5 work (provided the interner uses a faster hash
> function to go from string to
Sorry.
Storing the hash inside the key has two pros:
- Caching the md5 work (provided the interner uses a faster hash
function to go from string to key "object")
- You don't always have to store both key and hash in global variables
(see below), you can dereference the key for the hash if you want to.
Dag
>
> Lookup would happen like this:
>
> typedef {
> fasttable_key_t *key;
> uintptr_t flags;
> void *funcptr
> } method_t;
>
> (method_t*)PyCustomSlots_Find(mykey, mykey->hash);
> /* or, faster: */
> (method_t*)PyCustomSlots_Find(mykey, 0x45343453453fabaULL);
>
> If you want to scan the table linearly (to avoid having to bother with
> getting an interned key), you would scan a table of void*, and for every
> entry cast the key to fasttable_key_t* and check nslots for how much to
> skip to get to the next entry.
>
> Too complicated?
>
> Dag
More information about the cython-devel
mailing list