Ethan Furman wrote: re-posting to list Craig Yoshioka wrote:
Here is what the context might look like:
class Uncached(object): def __init__(self,path): self.path = path self.lock = path + '.locked' def __enter__(self): if os.path.exists(self.path): return SKipWithBlock # skips body goes straight to __exit__ try: os.close(os.open(self.lock,os.O_CREAT|os.O_EXCL|os.O_RDWR)) except OSError as e: if e.errno != errno.EEXIST: raise while os.path.exists(self.lock): time.sleep(0.1) return self.__enter__() return self.path def __exit__(self,et,ev,st): if os.path.exists(self.lock): os.unlink(self.lock)
class Cache(object): def __init__(self,*args,**kwargs): self.base = os.path.join(CACHE_DIR,hashon(args,kwargs)) #..... def create(self,path): return Uncached(os.path.join(self.base,path)) #.....
def cached(func): def wrapper(*args,**kwargs): cache = Cache(*args,**kwargs) return func(cache,*args,**kwargs) return wrapper
--------------------------------------------------------------------- Person using code: ---------------------------------------------------------------------
@cached def createdata(cache,x): path = cache.pathfor('output.data') with cache.create(path) as cpath: with open(cpath,'wb') as cfile: cfile.write(x*10000) return path
pool.map(createdata,['x','x','t','x','t'])
---------------------------------------------------------------------
so separate processes return the path to the cached data and create it if it doesn't exist, and even wait if another process is working on it.
my collaborators could hopefully very easily wrap their programs with minimal effort using the cleanest syntax possible, and since inputs get hashed to consistent output paths for each wrapped function, the wrapped functions can be easily combined, chained, etc. and behind the scenes they are reusing as much work as possible.
Here are the current possible alternatives:
1. use the passed var as a flag, they must insert the if for every use of the context, if not, then cached results get recomputed
@cached def createdata(cache,x): path = cache.pathfor('output.data') with cache.create(path) as cpath: if not cpath: return with open(cpath,'wb') as cfile: cfile.write(x*10000) return path
2. using the for loop and an iterator instead of a context, is more fool-proof, but a bit confusing?
@cached def createdata(cache,x): path = cache.pathfor('output.data') for cpath in cache.create(path): if not cpath: return with open(cpath,'wb') as cfile: cfile.write(x*10000) return path
3. using a class the outputs and caching function need to be specified separately so that calls can be scripted together, also a lot more boilerplate:
class createdata(CachedWrapper): def outputs(self,x): self.outputs += [self.cache.pathfor('output.data')] def tocache(self,x): with open(self.outputs[0],'wb') as cfile: cfile.write(x*10000)