creating garbage collectable objects (caching objects)

Dave Angel davea at ieee.org
Mon Jun 29 07:01:20 EDT 2009


News123 wrote:
> Dave Angel wrote:
>   
>> News123 wrote:
>>     
>>> Hi.
>>>
>>> I started playing with PIL.
>>>
>>> I'm performing operations on multiple images and would like compromise
>>> between speed and memory requirement.
>>> . . .
>>>
>>> The question, that I have is whether there is any way to tell python,
>>> that certain objects could be garbage collected if needed and ask python
>>> at a later time whether the object has been collected so far (image has
>>> to be reloaded) or not (image would not have to be reloaded)
>>>
>>>
>>>       
>
>   
>>>   
>>>       
>> You don't say what implementation of Python, nor on what OS platform. 
>> Yet you're asking how to influence that implementation.
>>     
>
> Sorry my fault. I'm using C-python under Windows and under Linux
>   
>> In CPython, version 2.6 (and probably most other versions, but somebody
>> else would have to chime in) an object is freed as soon as its reference
>> count goes to zero.  So the garbage collector is only there to catch
>> cycles, and it runs relatively infrequently.
>>     
>
> If CYthon frees objects as early as possible (as soon as the refcount is
> 0), then weakref wil not really help me.
> In this case I'd have to elaborate into a cache like structure.
>   
>> So, if you keep a reference to an object, it'll not be freed. 
>> Theoretically, you can use the weakref module to keep a reference
>> without inhibiting the garbage collection, but I don't have any
>> experience with the module.  You could start by studying its
>> documentation.  But probably you want a weakref.WeakValueDictionary. 
>> Use that in your third approach to store the cache.
>>
>> If you're using Cython or Jython, or one of many other implementations,
>> the rules will be different.
>>
>> The real key to efficiency is usually managing locality of reference. 
>> If a given image is going to be used for many output files, you might
>> try to do all the work with it before going on to the next image.  In
>> that case, it might mean searching all_creation_rules for rules which
>> reference the file you've currently loaded, measurement is key.
>>     
>
> Changing the order of the images to be calculated is key and I'm working
> on that.
>
> For a first step I can reorder the image creation such, that all outpout
> images, that depend only on one input image will be calculated one after
> the other.
>
> so for this case I can transform:
> # Slowest approach:
> for creation_rule in all_creation_rules():
>     img = Image.new(...)
>     for img_file in creation_rule.input_files():
>         src_img = Image.open(img_file)
>         img = do_somethingwith(img,src_img) # wrong indentation in OP
>     img.save()
>
>
> into
> src_img = Image.open(img_file)
> for creation_rule in all_creation_rules_with_on_src_img():
>     img = Image.new(...)
>     img = do_somethingwith(img,src_img)
>     img.save()
>
>
> What I was more concerned is a group of output images depending on TWO
> or more input images.
>
> Depending on the platform (and the images) I might not be able to
> preload all two (or more images)
>
> So,  as CPython's garbage collection takes always place immediately,
> then I'd like to pursue something else.
> I can create a cache, which caches input files as long as python leaves
> at least n MB available for the rest of the system.
>
> For this I have to know how much RAM is still available on a system.
>
> I'll start looking into this.
>
> thanks again
>
>
>
> N
>
>
>   
As I said earlier, I think weakref is probably what you need.  A weakref 
is still a reference from the point of view of the ref-counting, but not 
from the point of view of the garbage collector.  Have you read the help 
on weakref module?  In particular, did you read Pep 0205?   
http://www.python.org/dev/peps/pep-0205/

Object cache is one of the two reasons for the weakref module.




More information about the Python-list mailing list