[Tutor] Iterable Understanding

spir denis.spir at free.fr
Fri Nov 13 20:48:30 CET 2009


Le Fri, 13 Nov 2009 17:58:30 +0000,
Stephen Nelson-Smith <sanelson at gmail.com> s'exprima ainsi:

> I think I'm having a major understanding failure.
> 
> So having discovered that my Unix sort breaks on the last day of the
> month, I've gone ahead and implemented a per log search, using heapq.
> 
> I've tested it with various data, and it produces a sorted logfile, per log.
> 
> So in essence this:
> 
> logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ),
>          LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ),
>          LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ]
> 
> Gives me a list of LogFiles - each of which has a getline() method,
> which returns a tuple.
> 
> I thought I could merge iterables using Kent's recipe, or just with
> heapq.merge()
> 
> But how do I get from a method that can produce a tuple, to some
> mergable iterables?
> 
> for log in logs:
>   l = log.getline()
>   print l
> 
> This gives me three loglines.  How do I get more?  Other than while True:

I'm not 100% sure to understand your needs and intention; just have a try. Maybe what you want actually is rather:

for log in logs:
  for line in log:
    print l

Meaning your log objects need be iterable. To do this, you must have an __iter__ method that would surely simply return the object's getline (or maybe replace it alltogether). Then when walking the log with for...in, python will silently call getline until error. This means getline must raise StopIteration when the log is "empty" and __iter__ must "reset" it.
Another solution may be to subtype "file", for a file is precisely an iterator over lines; and you really get your data from a file. Simply (sic), there must some job done about this issue of time stamps (haven't studied in details). Still, i guess this track may be worth an little study.
Once you get logs iterable, you may subtype list for your overall log collection and set it an __iter__ method like:

    for log in self:
        for line in log:
            yield line

(The trick is not from me.)
Then you can write:
    for line in my_log_collection

> Of course tuples are iterables, but that doesn't help, as I want to
> sort on timestamp... so a list of tuples would be ok....  But how do I
> construct that, bearing in mind I am trying not to use up too much
> memory?
> 
> I think there's a piece of the jigsaw I just don't get.  Please help!
> 
> The code in full is here:
> 
> import gzip, heapq, re
> 
> class LogFile:
>    def __init__(self, filename, date):
>        self.logfile = gzip.open(filename, 'r')
>        for logline in self.logfile:
>            self.line = logline
>            self.stamp = self.timestamp(self.line)
>            if self.stamp.startswith(date):
>                break
>        self.initialise_heap()
> 
>    def timestamp(self, line):
>        stamp = re.search(r'\[(.*?)\]', line).group(1)
>        return stamp
> 
>    def initialise_heap(self):
>        initlist=[]
>        self.heap=[]
>        for x in xrange(10):
>            self.line=self.logfile.readline()
>            self.stamp=self.timestamp(self.line)
>            initlist.append((self.stamp,self.line))
>        heapq.heapify(initlist)
>        self.heap=initlist
> 
> 
>    def getline(self):
>        self.line=self.logfile.readline()
>        stamp=self.timestamp(self.line)
>        heapq.heappush(self.heap, (stamp, self.line))
>        pop = heapq.heappop(self.heap)
>        return pop
> 
> logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ),
>          LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ),
>          LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ]
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
> 


--------------------------------
* la vita e estrany *

http://spir.wikidot.com/





More information about the Tutor mailing list