creating garbage collectable objects (caching objects)

Dave Angel davea at ieee.org
Sun Jun 28 14:51:31 EDT 2009


News123 wrote:
> Hi.
>
> I started playing with PIL.
>
> I'm performing operations on multiple images and would like compromise
> between speed and memory requirement.
>
> The fast approach would load all images upfront and create then multiple
> result files. The problem is, that I do not have enough memory to load
> all files.
>
> The slow approach is to load each potential source file only when it is
> needed and to release it immediately after (leaving it up to the gc to
> free memory when needed)
>
>
>
> The question, that I have is whether there is any way to tell python,
> that certain objects could be garbage collected if needed and ask python
> at a later time whether the object has been collected so far (image has
> to be reloaded) or not (image would not have to be reloaded)
>
>
> # Fastest approach:
> imgs = {}
> for fname in all_image_files:
>     imgs[fname] = Image.open(fname)
> for creation_rule in all_creation_rules():
>     img = Image.new(...)
>     for img_file in creation_rule.input_files():
> 	img = do_somethingwith(img,imgs[img_file])
>     img.save()
>
>
> # Slowest approach:
> for creation_rule in all_creation_rules():
>     img = Image.new(...)
>     for img_file in creation_rule.input_files():
>         src_img = Image.open(img_file)
> 	img = do_somethingwith(img,src_img)
>     img.save()
>
>
>
> # What I'd like to do is something like:
> imgs = GarbageCollectable_dict()
> for creation_rule in all_creation_rules():
>     img = Image.new(...)
>     for img_file in creation_rule.input_files():
>         if src_img in imgs: # if 'm lucke the object is still there
> 		src_img = imgs[img_file]
>         else:
>         	src_img = Image.open(img_file)
> 	img = do_somethingwith(img,src_img)
>     img.save()
>
>
>
> Is this possible?
>
> Thaks in advance for an answer or any other ideas of
> how I could do smart caching without hogging all the system's
> memory
>
>
>   
You don't say what implementation of Python, nor on what OS platform.  
Yet you're asking how to influence that implementation.

In CPython, version 2.6 (and probably most other versions, but somebody 
else would have to chime in) an object is freed as soon as its reference 
count goes to zero.  So the garbage collector is only there to catch 
cycles, and it runs relatively infrequently.

So, if you keep a reference to an object, it'll not be freed.  
Theoretically, you can use the weakref module to keep a reference 
without inhibiting the garbage collection, but I don't have any 
experience with the module.  You could start by studying its 
documentation.  But probably you want a weakref.WeakValueDictionary.  
Use that in your third approach to store the cache.

If you're using Cython or Jython, or one of many other implementations, 
the rules will be different.

The real key to efficiency is usually managing locality of reference.  
If a given image is going to be used for many output files, you might 
try to do all the work with it before going on to the next image.  In 
that case, it might mean searching all_creation_rules for rules which 
reference the file you've currently loaded, measurement is key.





More information about the Python-list mailing list