[Tutor] Introduction - log exercise

Antonio de la Fuente toni at muybien.org
Thu Nov 19 01:01:19 CET 2009


* Antonio de la Fuente <toni at muybien.org> [2009-11-17 16:58:08 +0000]:

> Date: Tue, 17 Nov 2009 16:58:08 +0000
> From: Antonio de la Fuente <toni at muybien.org>
> To: Python Tutor mailing list <tutor at python.org>
> Subject: [Tutor] Introduction - log exercise
> Organization: (muybien.org)
> User-Agent: Mutt/1.5.20 (2009-06-14)
> Message-ID: <20091117165639.GA3411 at cateto>
> 
> Hi everybody,
> 
> This is my first post here. I have started learning python and I am new to
> programing, just some bash scripting, no much. 
> Thank you for the kind support and help that you provide in this list.
> 
> This is my problem: I've got a log file that is filling up very quickly, this
> log file is made of blocks separated by a blank line, inside these blocks there
> is a line "foo", I want to discard blocks with that line inside it, and create a
> new log file, without those blocks, that will reduce drastically the size of the
> log file. 
> 
> The log file is gziped, so I am going to use gzip module, and I am going to pass
> the log file as an argument, so sys module is required as well.
> 
> I will read lines from file, with the 'for loop', and then I will check them for
> 'foo' matches with a 'while loop', if matches I (somehow) re-initialise the
> list, and if there is no matches for foo, I will append line to the list. When I
> get to a blank line (end of block), write myList to an external file. And start
> with another line.
> 
> I am stuck with defining 'blank line', I don't manage to get throught the while
> loop, any hint here I will really appreciate it.
> I don't expect the solution, as I think this is a great exercise to get wet
> with python, but if anyone thinks that this is the wrong way of solving the
> problem, please let me know.
> 
> 
> #!/usr/bin/python
> 
> import sys
> import gzip
> 
> myList = []
> 
> # At the moment not bother with argument part as I am testing it with a
> # testing log file
> #fileIn = gzip.open(sys.argv[1])
> 
> fileIn = gzip.open('big_log_file.gz', 'r')
> fileOut = open('outputFile', 'a')
> 
> for line in fileIn:
>     while line != 'blank_line':
>         if line == 'foo':
>             Somehow re-initialise myList
> 	    break
>         else:
>             myList.append(line)
>     fileOut.writelines(myList)
> 
> 
> Somehow rename outputFile with big_log_file.gz
> 
> fileIn.close()
> fileOut.close()
> 
> -------------------------------------------------------------
> 
> The log file will be fill with:
> 
> 
> Tue Nov 17 16:11:47 GMT 2009
> 	bladi bladi bla
> 	tarila ri la
> 	patatin pataton
> 	tatati tatata
> 
> Tue Nov 17 16:12:58 GMT 2009
> 	bladi bladi bla
> 	tarila ri la
> 	patatin pataton
> 	foo
> 	tatati tatata
> 
> Tue Nov 17 16:13:42 GMT 2009
> 	bladi bladi bla
> 	tarila ri la
> 	patatin pataton
> 	tatati tatata
> 
> 
> etc, etc ,etc
> ..............................................................
> 
> Again, thank you.
> 
This is how, with your help, finally, I wrote the script.
The way I compress the file at the end of the script, opening files
again, didn't feel right, but couldn't make it work other way, and I
was very tired at the time.
First test, seems to work, but I will do a more deep test tomorrow.
Thank you all.

#!/usr/bin/python                                                                                                                                                                         
                                                                                                                                                                                          # Importing modules that I'm going to need
import sys                                                                                                                                                                                
import gzip                                                                                                                                                                               
import os                                                                                                                                                                                 
																							  # Initialising paragraph list
paragraph = []                                                                                                                                                                            
# Flag to signal which paragraphs to keep and which one to discard.
keep = True                                                                                                                                                                               
# Getting file name, without extension, from parameter pass to script,
# to rename file.
renameFile, ignored = os.path.splitext(sys.argv[1])                                                                                                                                       
# Opening argument file.
fileIn = gzip.open(sys.argv[1])                                                                                                                                                           
fileOut = open('outputFile', 'a')                                                                                                                                                         

# Only one argument pass to script, gzip file.
if len(sys.argv) != 2:                                                                                                                                                                    
    print 'Usage: log_exercise01.py <gzip logfile>'                                                                                                                                       
    sys.exit(1)                                                                                                                                                                           

# Main loop
for line in fileIn:                                                                                                                                                                       
    # If a blank line in log file
    if line.isspace():                                                                                                                                                                    
    # I append a blank line to list to keep the format
        paragraph.append('\n')                                                                                                                                                            
	# If true append line to file, keeping formating with the
	# "".join trick
        if keep:                                                                                                                                                                          
            fileOut.write(  "".join(paragraph)  )                                                                                                                                         
	# Re-initialise list
        paragraph = []                                                                                                                                                                    
        keep = True                                                                                                                                                                       
    # Else append line to paragraph list and if stripping the line from
    # the initial tab is 'foo' then set flag keep to false, to discard
    # paragraph.
    else:                                                                                                                                                                                 
        paragraph.append(line)                                                                                                                                                            
        if line.strip() == 'foo':                                                                                                                           
            keep = False                                                                                                                                                                  

# Compressing file that has been created
f_in = open('outputFile', 'r')                                                                                                                                                           
f_out = gzip.open(sys.argv[1], 'w')                                                                                                                                                      
f_out.writelines(f_in)                                                                                                                                                                    

# Renaming the file with the same name that parameter passed and
# deleting the file created by the script.
os.rename('outputFile', renameFile)                                                                                                                                                       
os.remove(renameFile)                                                                                                                                                                     

f_in.close()                                                                                                                                                                              
f_out.close()                                                                                                                                                                             
fileIn.close()                                                                                                                                                                            
fileOut.close()             

-- 
-----------------------------
Antonio de la Fuente Martínez
E-mail: toni at muybien.org
-----------------------------

En una organización jerárquica, cuanto más alto es el nivel, mayor es la
confusión.
		-- Ley de Dow. 


More information about the Tutor mailing list