[Tutor] Confused about lists...

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Fri, 23 Feb 2001 02:02:39 -0800 (PST)


Dear Chris,

Commenting is a good thing, but commenting every line is a little bit
of... overkill.  *grin*

Let's take a look at the code itself:

> f = open("/var/log/maillog", "r")
> for i in f.readlines():
>     if i[4] == 'open-sytems':
>         f += 1
>     print f
> f.close()


> I didnt wrap this line so it appears as it does in the log.
> What I was TRYING to do is get the above code to read a maillog and count
> the ammount of times 'open-systems' appears in the 4th field. I am trying

One thing I see is that you're using the name 'f' for two purposes: first,
as a handle to some file, and second, as a counter.  You might want to
separate the usage of these into two variables.  Let's call them 'f' for
the file, and 'c' for the counter.  If you have time, you might want to
think of slightly more descriptive names for your variables to make things
easier to read.


> to write a simple parser for a maillog to count things like total messages
> received/sent, connections/day, total time spent on connections, etc..
> I had 'f +=1' and 'print f' changed to use i instead of f. But that
> printed out TWICE as many lines as it should have. It seems to be just

Can you explain more what you mean by twice?  Oh!  I think I see what you
mean.  This part of the code might be what's causing the duplicate
printing:

>     if i[4] == 'open-sytems':
>         c += 1                 ## [some text changed from the original]
>     print c


In this case, regardless if we see an 'open-sytems' or not, the program
will print the value of f.  This might lead to the following output:

###
0
0
1
1
2
2
2
3
###

which would look like its doubling up.  You probably want to print out the
value of your counter only if its just recently changed.  If so, try
this:

     if i[4] == 'open-sytems':
         c += 1                  # let's change it to 'c'
         print c                 # because 'f' sounds like a 'file'




The other thing you'll need to check involves this part:

###
for i in f.readlines():
     if i[4] == 'open-sytems':
###

Could you show us an example of what your file would look like?  The only
thing that worries me is that 'i' will be a line, but 'i[4]' is going to
be a single character --- Python will not automatically pull columns out
of a string without some help.  For example, say that '/var/log/maillog'
contains the following line of text:

"Feb 19 04:24:27 c82114-a  sendmail[7269]: EAA07269: \
to=<dyoo@hkn.eecs.berkeley.edu>, delay=00:00:00, xdelay=00:00:00,\
mailer=esmtp, relay=hkn.eecs.berkeley.edu. [128.32.138.117],\
stat=Sent (EAA01739 Message accepted for delivery)"

i[4] looks like the '1' from the date 'Feb 19', and not 'c82114-a'.

We need to tell Python how we break up a string into columns.  We could
separate things between commas, or between spaces---but we need to give
Python a "delimiter" character that separates the columns.

You might want to play around with string.split():

###
>>> string.split('this is a short string', ' ')                
['this', 'is', 'a', 'short', 'string']
>>> string.split('i,could,be a line,from a,,csf file', ',')
['i', 'could', 'be a line', 'from a', '', 'csf file']
###

I think I went a little fast though; if you have any questions, please
feel free to ask the tutor list again.  Good luck to you.