[Tutor] Introduction - log exercise

Dave Angel davea at ieee.org
Tue Nov 17 22:30:43 CET 2009


Antonio de la Fuente wrote:
> Hi everybody,
>
> This is my first post here. I have started learning python and I am new to
> programing, just some bash scripting, no much. 
> Thank you for the kind support and help that you provide in this list.
>
> This is my problem: I've got a log file that is filling up very quickly, this
> log file is made of blocks separated by a blank line, inside these blocks there
> is a line "foo", I want to discard blocks with that line inside it, and create a
> new log file, without those blocks, that will reduce drastically the size of the
> log file. 
>
> The log file is gziped, so I am going to use gzip module, and I am going to pass
> the log file as an argument, so sys module is required as well.
>
> I will read lines from file, with the 'for loop', and then I will check them for
> 'foo' matches with a 'while loop', if matches I (somehow) re-initialise the
> list, and if there is no matches for foo, I will append line to the list. When I
> get to a blank line (end of block), write myList to an external file. And start
> with another line.
>
> I am stuck with defining 'blank line', I don't manage to get throught the while
> loop, any hint here I will really appreciate it.
> I don't expect the solution, as I think this is a great exercise to get wet
> with python, but if anyone thinks that this is the wrong way of solving the
> problem, please let me know.
>
>
> #!/usr/bin/python
>
> import sys
> import gzip
>
> myList =]
>
> # At the moment not bother with argument part as I am testing it with a
> # testing log file
> #fileIn =zip.open(sys.argv[1])
>
> fileIn =zip.open('big_log_file.gz', 'r')
> fileOut =pen('outputFile', 'a')
>
> for line in fileIn:
>     while line !=blank_line':
>         if line ='foo':
>             Somehow re-initialise myList
> 	    break
>         else:
>             myList.append(line)
>     fileOut.writelines(myList)
>
>
> Somehow rename outputFile with big_log_file.gz
>
> fileIn.close()
> fileOut.close()
>
> -------------------------------------------------------------
>
> The log file will be fill with:
>
>
> Tue Nov 17 16:11:47 GMT 2009
> 	bladi bladi bla
> 	tarila ri la
> 	patatin pataton
> 	tatati tatata
>
> Tue Nov 17 16:12:58 GMT 2009
> 	bladi bladi bla
> 	tarila ri la
> 	patatin pataton
> 	foo
> 	tatati tatata
>
> Tue Nov 17 16:13:42 GMT 2009
> 	bladi bladi bla
> 	tarila ri la
> 	patatin pataton
> 	tatati tatata
>
>
> etc, etc ,etc
> ..............................................................
>
> Again, thank you.
>
>   
You've got some good ideas, and I'm going to give you hints, rather than 
just writing it for you, as you suggested.

First, let me point out that there are advanced features in Python that 
could make a simple program that'd be very hard for a beginner to 
understand.  I'll give you the words, but recommend that you not try it 
at this time.  If you were to wrap the file in a generator that returned 
you a "paragraph" at a time, the same way as it's now returning a line 
at a time, then the loop would be simply a for-loop on that generator, 
checking each paragraph for whether it contained "foo" and if so, 
writing it to the output.


But you can also do it without using advanced features, and that's what 
I'm going to try to outline.

Two things you'll be testing each line for:  is it blank, and is it "foo".
   if line.isspace()  will test if a line is whitespace only, as Wayne 
pointed out.
   if line == "foo" will test if a line has exactly "foo" in it.  But if 
you apparently have leading whitespace, and
trailing newlines, and if they're irrelevant, then you might want
   if line.strip() == "foo"

I would start by just testing for blank lines.   Try replacing all blank 
lines with "***** blank line ****"  and print each
line. See whether the output makes sense.  if it does, go on to the next 
step.
   for line in ....
         if line-is-blank
               line-is-fancy-replacement
        print line

Now, instead of just printing the line, add it to a list object.  Create 
an object called paragraph(rather than a file) as an empty list object, 
before the for loop.
Inside the for loop, if the line is non-empty, add it to the paragraph.  
If the line is empty, then print the paragraph (with something before 
and after it,
so you can see what came from each print stmt).  Then blank it (outlist 
= []).
Check whether this result looks good, and if so, continue on.

Next version of the code: whenever you have a non-blank line, in 
addition to adding it to the list, also check it for whether it's  
equal-foo.
If so, set a flag.  When printing the outlist, skip the printing if the 
flag is set.  Remember that you'll have to clear this flag each time you 
blank
the mylist, both before the loop, and in the middle of the loop.

Once this makes sense, you can worry about actually writing the output 
to a real file, maybe compressing it, maybe doing deletes and renames
as appropriate.   You probably don't need shutil module,  os module 
probably has enough functions for this.

At any of these stages, if you get stuck, call for help.  But your code 
will be only as complex as that stage needs, so we can find one bug at a 
time.

DaveA



More information about the Tutor mailing list