[Tutor] Logfile multiplexing

Stephen Nelson-Smith sanelson at gmail.com
Tue Nov 10 14:25:55 CET 2009


Hi Kent,

> One error is that the initial line will be the same as the first
> response from getline(). So you should call getline() before trying to
> access a line. Also you may need to filter all lines - what if there
> is jitter at midnight, or the log rolls over before the end.

Well ultimately I definitely have to filter two logfiles per day, as
logs rotate at 0400.  Or do you mean something else?

> More important, though, you are pretty much writing your own iterator
> without using the iterator protocol. I would write this as:
>
> class LogFile:
>   def __init__(self, filename, date):
>       self.logfile = gzip.open(filename, 'r')
>       self.date = date
>
>   def __iter__(self)
>       for logline in self.logfile:
>           stamp = self.timestamp(logline)
>           if stamp.startswith(date):
>               yield (stamp, logline)
>
>   def timestamp(self, line):
>       return " ".join(self.line.split()[3:5])

Right - I think I understand that.

>From here I get:

import gzip

class LogFile:
    def __init__(self, filename, date):
        self.logfile = gzip.open(filename, 'r')
        self.date = date

    def __iter__(self):
        for logline in self.logfile:
            stamp = self.timestamp(logline)
            if stamp.startswith(date):
                yield (stamp, logline)

    def timestamp(self, line):
        return " ".join(self.line.split()[3:5])

l = LogFile("/home/stephen/access_log-20091105.gz", "[04/Nov/2009")

I get:

Python 2.4.3 (#1, Jan 21 2009, 01:11:33)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import kent
>>> kent.l
<kent.LogFile instance at 0x2afb05142bd8>
>>> dir(kent.l)
['__doc__', '__init__', '__iter__', '__module__', 'date', 'logfile',
'timestamp']
>>> for line in kent.l:
...   print line
...
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "kent.py", line 10, in __iter__
    stamp = self.timestamp(logline)
  File "kent.py", line 15, in timestamp
    return " ".join(self.line.split()[3:5])
AttributeError: LogFile instance has no attribute 'line'
>>> for stamp,line in kent.l:
...   print stamp,line
...
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "kent.py", line 10, in __iter__
    stamp = self.timestamp(logline)
  File "kent.py", line 15, in timestamp
    return " ".join(self.line.split()[3:5])
AttributeError: LogFile instance has no attribute 'line'
>>> for stamp,logline in kent.l:
...   print stamp,logline
...
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "kent.py", line 10, in __iter__
    stamp = self.timestamp(logline)
  File "kent.py", line 15, in timestamp
    return " ".join(self.line.split()[3:5])
AttributeError: LogFile instance has no attribute 'line'


> You are reading through the entire file on load because your timestamp
> check is failing. You are filtering out the whole file and returning
> just the last line. Check the dates you are supplying vs the actual
> data - they don't match.

Yes, I found that out in the end!  Thanks!

S.


More information about the Tutor mailing list