creating garbage collectable objects (caching objects)

News123 news123 at free.fr
Mon Jun 29 03:59:09 EDT 2009


Dave Angel wrote:
> News123 wrote:
>> Hi.
>>
>> I started playing with PIL.
>>
>> I'm performing operations on multiple images and would like compromise
>> between speed and memory requirement.
>> . . .
>>
>> The question, that I have is whether there is any way to tell python,
>> that certain objects could be garbage collected if needed and ask python
>> at a later time whether the object has been collected so far (image has
>> to be reloaded) or not (image would not have to be reloaded)
>>
>>

>>   
> You don't say what implementation of Python, nor on what OS platform. 
> Yet you're asking how to influence that implementation.

Sorry my fault. I'm using C-python under Windows and under Linux
> 
> In CPython, version 2.6 (and probably most other versions, but somebody
> else would have to chime in) an object is freed as soon as its reference
> count goes to zero.  So the garbage collector is only there to catch
> cycles, and it runs relatively infrequently.

If CYthon frees objects as early as possible (as soon as the refcount is
0), then weakref wil not really help me.
In this case I'd have to elaborate into a cache like structure.
> 
> So, if you keep a reference to an object, it'll not be freed. 
> Theoretically, you can use the weakref module to keep a reference
> without inhibiting the garbage collection, but I don't have any
> experience with the module.  You could start by studying its
> documentation.  But probably you want a weakref.WeakValueDictionary. 
> Use that in your third approach to store the cache.
> 
> If you're using Cython or Jython, or one of many other implementations,
> the rules will be different.
> 
> The real key to efficiency is usually managing locality of reference. 
> If a given image is going to be used for many output files, you might
> try to do all the work with it before going on to the next image.  In
> that case, it might mean searching all_creation_rules for rules which
> reference the file you've currently loaded, measurement is key.

Changing the order of the images to be calculated is key and I'm working
on that.

For a first step I can reorder the image creation such, that all outpout
images, that depend only on one input image will be calculated one after
the other.

so for this case I can transform:
# Slowest approach:
for creation_rule in all_creation_rules():
    img = Image.new(...)
    for img_file in creation_rule.input_files():
        src_img = Image.open(img_file)
        img = do_somethingwith(img,src_img) # wrong indentation in OP
    img.save()


into
src_img = Image.open(img_file)
for creation_rule in all_creation_rules_with_on_src_img():
    img = Image.new(...)
    img = do_somethingwith(img,src_img)
    img.save()


What I was more concerned is a group of output images depending on TWO
or more input images.

Depending on the platform (and the images) I might not be able to
preload all two (or more images)

So,  as CPython's garbage collection takes always place immediately,
then I'd like to pursue something else.
I can create a cache, which caches input files as long as python leaves
at least n MB available for the rest of the system.

For this I have to know how much RAM is still available on a system.

I'll start looking into this.

thanks again



N





More information about the Python-list mailing list