High memory usage - program mistake or Python feature?
Gerald Klix
Gerald.Klix at klix.ch
Mon May 26 15:28:29 EDT 2003
The clean (OO) solution is build a hierachy of factory classes.
class AbstractLineIteratorFactory:
def __init__( self, file ):
self.file = file
def getIterator( self ):
raise NotImplementedError
# This is not exactly a factory
class InMemoryIteratorFactory( AbstractLineIteratorFactory ):
def __init__( self, file ):
AbstractLineIteratorFactory.__init__( self, file )
self.lines = None
def getIterator( self ):
"""I don't answer a real iterator"
if self.lines is None:
self.lines = self.file.readlines()
return self.lines
class RealIteratorFactory( AbstractLineIteratorFactory ):
def getIterator( self ):
return self.file.xreadlines
This solution has one drawback:
Everywhere you pass the file object you have add a second parameter
with the IteratorFactory. And you have to replace every call to
xreadlines with a call to iteratorFactory.getIterator.
You can add a __getattr__ method to AbstractIteratorFactory
as follows:
def __getattr__( self, name ):
return getattr( self.file, name )
Thus delegating every attribute access, except getIterator and in turn
every method call, to an attribute access on your file object.
So much for OO-theory, now the pythonic solution
(This needs at least Python 2.2):
Derive a (new style) class from the builtin file type.
class OptionalLineCachingFile( file ):
def cacheOn( self ):
"""Activate line caching"""
self.lines = self.readlines()
self.xreadlines = self.getLines
def cacheOff( self ):
"""Deactivate the cache"""
del self.lines
del self.xreadlines
def getLines( self ):
return self.lines
Than replace every call to open or file with a call to
OptionalLineCachingFile.
Please note:
Both solutions do not answer a real iterator in the caching case.
Turning them to iterators is left as exercise to the gentle reader ;-)
HTH,
Gerald
PS: None of the code above has beend syntax checked or even tested :-]
Ben S wrote:
> Gerald Klix wrote:
>
>>Simply call xreadlines again, that gives a new iterator.
>
>
> How would I do it transparently, though? I mean, I want to have several
> bits of code such as the following:
>
> selectedLines = GetLinesThatMatch(allLines)
>
> And I don't want to change them all if I need to switch between an
> in-memory copy of the file (for speed) and the on-disk version from
> xreadlines() (for memory conservation). Just using xreadlines again
> means I have to change all these lines of code, as I can't just say
> allLines = file.xreadlines() as it won't reset it each time.
>
> I know this may sound like a trivial problem but I am trying to learn
> how I can use Python to isolate myself from such changes.
>
> --
> Ben Sizer
> http://pages.eidosnet.co.uk/kylotan
>
>
>
>>Ben S wrote:
>>
>>>Hmm, in quick experiments with using xreadlines instead of readlines,
>>>there is obviously the problem that while a single iteration over
>>>either container works the same way, in order to repeat iterations
>>>over xreadlines I need to somehow reset the iterator, which a quick
>>>look at the documentation doesn't show me how to do. How do I do
>>>this, so that my functions can take a list of lines without caring
>>>whether those lines are in memory or coming from xreadlines?
>>>
>>>--
>>>Ben Sizer
>>>http://pages.eidosnet.co.uk/kylotan
>>
>
>
More information about the Python-list
mailing list