creating garbage collectable objects (caching objects)
Dave Angel
davea at ieee.org
Mon Jun 29 07:01:20 EDT 2009
News123 wrote:
> Dave Angel wrote:
>
>> News123 wrote:
>>
>>> Hi.
>>>
>>> I started playing with PIL.
>>>
>>> I'm performing operations on multiple images and would like compromise
>>> between speed and memory requirement.
>>> . . .
>>>
>>> The question, that I have is whether there is any way to tell python,
>>> that certain objects could be garbage collected if needed and ask python
>>> at a later time whether the object has been collected so far (image has
>>> to be reloaded) or not (image would not have to be reloaded)
>>>
>>>
>>>
>
>
>>>
>>>
>> You don't say what implementation of Python, nor on what OS platform.
>> Yet you're asking how to influence that implementation.
>>
>
> Sorry my fault. I'm using C-python under Windows and under Linux
>
>> In CPython, version 2.6 (and probably most other versions, but somebody
>> else would have to chime in) an object is freed as soon as its reference
>> count goes to zero. So the garbage collector is only there to catch
>> cycles, and it runs relatively infrequently.
>>
>
> If CYthon frees objects as early as possible (as soon as the refcount is
> 0), then weakref wil not really help me.
> In this case I'd have to elaborate into a cache like structure.
>
>> So, if you keep a reference to an object, it'll not be freed.
>> Theoretically, you can use the weakref module to keep a reference
>> without inhibiting the garbage collection, but I don't have any
>> experience with the module. You could start by studying its
>> documentation. But probably you want a weakref.WeakValueDictionary.
>> Use that in your third approach to store the cache.
>>
>> If you're using Cython or Jython, or one of many other implementations,
>> the rules will be different.
>>
>> The real key to efficiency is usually managing locality of reference.
>> If a given image is going to be used for many output files, you might
>> try to do all the work with it before going on to the next image. In
>> that case, it might mean searching all_creation_rules for rules which
>> reference the file you've currently loaded, measurement is key.
>>
>
> Changing the order of the images to be calculated is key and I'm working
> on that.
>
> For a first step I can reorder the image creation such, that all outpout
> images, that depend only on one input image will be calculated one after
> the other.
>
> so for this case I can transform:
> # Slowest approach:
> for creation_rule in all_creation_rules():
> img = Image.new(...)
> for img_file in creation_rule.input_files():
> src_img = Image.open(img_file)
> img = do_somethingwith(img,src_img) # wrong indentation in OP
> img.save()
>
>
> into
> src_img = Image.open(img_file)
> for creation_rule in all_creation_rules_with_on_src_img():
> img = Image.new(...)
> img = do_somethingwith(img,src_img)
> img.save()
>
>
> What I was more concerned is a group of output images depending on TWO
> or more input images.
>
> Depending on the platform (and the images) I might not be able to
> preload all two (or more images)
>
> So, as CPython's garbage collection takes always place immediately,
> then I'd like to pursue something else.
> I can create a cache, which caches input files as long as python leaves
> at least n MB available for the rest of the system.
>
> For this I have to know how much RAM is still available on a system.
>
> I'll start looking into this.
>
> thanks again
>
>
>
> N
>
>
>
As I said earlier, I think weakref is probably what you need. A weakref
is still a reference from the point of view of the ref-counting, but not
from the point of view of the garbage collector. Have you read the help
on weakref module? In particular, did you read Pep 0205?
http://www.python.org/dev/peps/pep-0205/
Object cache is one of the two reasons for the weakref module.
More information about the Python-list
mailing list