[Tutor] Confused about lists...

Bruce Sass bsass@freenet.edmonton.ab.ca
Fri, 23 Feb 2001 03:23:44 -0700 (MST)


Here is a revised version of your program, with pointers...


# you're gonna need this
import string

# It is a good idea to use a number of contrived data sets to
# test your code with, that way you can make sure it does what you
# want in all cases.  e.g.(for this prg): no 'open-system' on any
# line, on all lines, no field 4, etc.
f = open("test/data1", "r")

# Use descriptive variable names and initialize them.
counter = 0
for i in f.readlines():
#  ^^^ i will be a line, i.e., a string of characters
# you want a list of words so you need to split up the string...
    if string.strip(i) and string.split(i)[3] == 'open-systems':
        counter = counter + 1
print counter

The "if" condition needs some explaining.

The first element, string.strip(i), removes any leading and trailing
whitespace from the line, it is there to catch any blank lines.  All
the log files I looked at on my system ended with "\n\n", this will
turn the last line, "\n", into "" (i.e., "false").

The next element is a little more complex... string.split(i) returns a
list of words, the [3] gets at the fourth item in the list (indices
start at zero).  I could have done...

	wordlist = string.split(i)
	word = wordlist[3]

...then tested for...

	word == 'open-systems'

...and that may even be a better way to handle the situation,
depending on what you think of this next bit...

So, the condition is true if the line is not blank and the fourth word
is 'open-systems' - it works, but is a hack!  The reason it is a hack
is because it relies on a quirk in the way Python processes the "and"
and "or" operators.  To see the effect just reverse the 'split' and
'strip' terms so that string.split(i)[3] is evaluated first, then run
the program on a data set with a blank line in it.  Keep in mind that,
logically, (A and B) is the same as (B and A).  I'll leave it up to
you to find a way to fix the bad code (homework ;).


- Bruce

On Fri, 23 Feb 2001, Chris Watson wrote:
> Feb 23 00:35:09 open-systems postfix/cleanup[11489]: A11D516C: message-id=<foo@foo.com>
> I didnt wrap this line so it appears as it does in the log.
> What I was TRYING to do is get the above code to read a maillog and count
> the ammount of times 'open-systems' appears in the 4th field. I am trying
> to write a simple parser for a maillog to count things like total messages
> received/sent, connections/day, total time spent on connections, etc..
> I had 'f +=1' and 'print f' changed to use i instead of f. But that
> printed out TWICE as many lines as it should have. It seems to be just
> counting the lines in the log file ignoring my i[4] == 'open-systems'
> line. 5 lines of code and apparently I only know how open and close work
> for SURE. heh. Any pointers? And any pointers to some good doc's on lists
> and loops? I appreciate the advice.