Speed: bytecode vz C API calls

Thu Dec 11 09:56:14 EST 2003

Jacek Generowicz wrote:

> One key feature of this is that I can use it to replace any[*]
> function with a functionally equivalent but faster one, without having
> to change any of the call sites of the original.

Understood, but not until this post. 
Strictly speaking, all optimizations that fulfill the above criterium should
be built into the compiler, while memoizing will not for its memory
tradeoffs and dependancy of the input data.

> IIUC, Aahz suggested to replace the proxy function with a
> dictionary. This sounds reasonable, as, after all, the proxy maps its
> arguments to its return values, while a dictionary maps keys to
> values. The whole point of this is that function call overhead is
> large compared to dictionary lookup, which is (almost) all that the
> proxy does. However, it's the "almost" that creates the problem. If
> I'm prepared to mess around with the calls to the original functions
> and replace them with your suggestion, then, of course the problem is
> solved ... however, I _really_ do not want to replace the calls to the
> original ...

See below for my ideas towards a generalized memoizing framework. I've
adressed the problem by writing a map() variant that knows about result
caches (essentially Aahz' innerloop() I see). You could use it to shade the
builtin.

>   def foo(some_type):
>       ...
>       return something
> 
> 
>   foo = memoize(foo)
> 
>   data = [int, float, my_class, his_class, str, other_class] * 10**6
> 
>   map(foo, data)  [+]

Do you discard the result for real?

> [+] Can you see why I don't want to mess with the call site ?

No.

Now to my code. The idea is to make memoized functions available to
different modules from a central registry, and to expose the cached results
for hand-optimized hotspots. Otherwise it was all mentioned in this thread,
if not the original post.

<memoize.py>
import __builtin__

class RegistryItem:
    """ provide the necessary infrastructure to transparently speed up
        one function
    """
    def __init__(self, func, argc):
        cache = self.cache = {}
        def memoized(*args):
            try:
                return cache[args]
            except KeyError:
                result = cache[args] = func(*args)
            return result

        assert argc > 0
        if argc == 1:
            def callseq(seq):
                print seq[:3]
                result = []
                append = result.append
                for arg in seq:
                    try:
                        append(cache[arg])
                    except KeyError:
                        value = cache[arg] = func(arg)
                        append(value)
                return result
        else:
            def callseq(*seqs):
                result = []
                append = result.append
                if len(seqs) == 1:
                    # multiple args as one tuple in the sequence
                    for args in seqs[0]:
                        try:
                            append(cache[args])
                        except KeyError:
                            value = cache[args] = func(*args)
                            append(value)
                else:
                    # multiple args in multiple sequences
                    # XXX special case (too lazy for now)
                    return map(memoized, *seqs)
                return result
        self.wrapped = memoized
        self.callseq = callseq
        self.original = func

class Registry:
    """ Registry for function variants using a result cache
        fast = register(slow, <number of args>)
    """
    def __init__(self):
        self.data = {}
    def __call__(self, func, argc):
        assert argc > 0
        try:
            return self.data[func]
        except KeyError:
            ri = RegistryItem(func, argc)
            self.data[func] = ri
            self.data[ri.wrapped] = ri
        return ri.wrapped

    def __getitem__(self, func):
        return self.data[func]

    def getmap(self):
        """ provide a cache-enhanced version of map
        """
        data = self.data
        def memomap(func, *seqs):
            try:
                callseq = data[func].callseq
            except KeyError:
                return __builtin__.map(func, *seqs)
            return callseq(*seqs)
        return memomap

registry = register = Registry()
</memoize.py>

<example.py>
# for improved clarity of this example I've have omitted
# rebinding the original names and instead used fastXXX
import memoize, time

def slow(arg):
    print "no cache or cache miss with %r" % arg
    time.sleep(0.1)
    return "<%s>" % arg

sample = range(10) * 100

# preparation code

fast = memoize.register(slow, True)
# typically used like so:
# slow = memoize.register(slow)

fastmap = memoize.registry.getmap()
# could also be used like so:
# map = memoize.registry[slow].getmap()

# some tests
assert fast is memoize.registry[fast].wrapped
assert fast is memoize.registry[slow].wrapped
assert slow is memoize.registry[slow].original

# stray function call
print slow(10)

# stray function call using cache
print fast(20)

#built-in loop support
fastmap(slow, sample)
fastmap(fast, sample)

# hand-crafted code using the cache
cache = memoize.registry[slow].cache
for arg in sample:
    cache[arg]

</example.py>

Feel free to play with the above. Be warned that there will be bugs. 
However, before working on the code I will have to profile to make at least
sure it does not slow down things... maybe this weekend.

Peter