[Tutor] Introduction - log exercise

bob gailer bgailer at gmail.com
Tue Nov 17 21:26:20 CET 2009


Antonio de la Fuente wrote:
> Hi everybody,
>
> This is my first post here. I have started learning python and I am new to
> programing, just some bash scripting, no much. 
> Thank you for the kind support and help that you provide in this list.
>
> This is my problem: I've got a log file that is filling up very quickly, this
> log file is made of blocks separated by a blank line, inside these blocks there
> is a line "foo", I want to discard blocks with that line inside it, and create a
> new log file, without those blocks, that will reduce drastically the size of the
> log file. 
>
> The log file is gziped, so I am going to use gzip module, and I am going to pass
> the log file as an argument, so sys module is required as well.
>
> I will read lines from file, with the 'for loop', and then I will check them for
> 'foo' matches with a 'while loop', if matches I (somehow) re-initialise the
> list, and if there is no matches for foo, I will append line to the list. When I
> get to a blank line (end of block), write myList to an external file. And start
> with another line.
>
> I am stuck with defining 'blank line', I don't manage to get throught the while
> loop, any hint here I will really appreciate it.
> I don't expect the solution, as I think this is a great exercise to get wet
> with python, but if anyone thinks that this is the wrong way of solving the
> problem, please let me know.
>
>
> #!/usr/bin/python
>
> import sys
> import gzip
>
> myList = []
>
> # At the moment not bother with argument part as I am testing it with a
> # testing log file
> #fileIn = gzip.open(sys.argv[1])
>
> fileIn = gzip.open('big_log_file.gz', 'r')
> fileOut = open('outputFile', 'a')
>
> for line in fileIn:
>     while line != 'blank_line':
>         if line == 'foo':
>             Somehow re-initialise myList
> 	    break
>         else:
>             myList.append(line)
>     fileOut.writelines(myList)
>   
Observations:
0 - The other responses did not understand your desire to drop any  
paragraph containing 'foo'.
1 - The while loop will run forever, as it keeps processing the same line.
2 - In your sample log file the line with 'foo' starts with a tab. line 
== 'foo' will always be false.
3 - Is the first line in the file Tue Nov 17 16:11:47 GMT 2009 or blank?
4 - Is the last line blank?

Better logic:

# open files
paragraph = []
keep = True
for line in fileIn:
  if line.isspace(): # end of paragraph
    if keep:
      outFile.writelines(paragraph)
    paragraph = []
    keep = True
  else:
    if keep:
      if line == '\tfoo':
        keep = False
      else:
        paragraph.append(line)
# anticipating last line not blank, write last paragraph
if keep:
   outFile.writelines(paragraph)

# use shutil to rename


-- 
Bob Gailer
Chapel Hill NC
919-636-4239


More information about the Tutor mailing list