[Tutor] Introduction - log exercise
Antonio de la Fuente
toni at muybien.org
Thu Nov 19 01:01:19 CET 2009
* Antonio de la Fuente <toni at muybien.org> [2009-11-17 16:58:08 +0000]:
> Date: Tue, 17 Nov 2009 16:58:08 +0000
> From: Antonio de la Fuente <toni at muybien.org>
> To: Python Tutor mailing list <tutor at python.org>
> Subject: [Tutor] Introduction - log exercise
> Organization: (muybien.org)
> User-Agent: Mutt/1.5.20 (2009-06-14)
> Message-ID: <20091117165639.GA3411 at cateto>
>
> Hi everybody,
>
> This is my first post here. I have started learning python and I am new to
> programing, just some bash scripting, no much.
> Thank you for the kind support and help that you provide in this list.
>
> This is my problem: I've got a log file that is filling up very quickly, this
> log file is made of blocks separated by a blank line, inside these blocks there
> is a line "foo", I want to discard blocks with that line inside it, and create a
> new log file, without those blocks, that will reduce drastically the size of the
> log file.
>
> The log file is gziped, so I am going to use gzip module, and I am going to pass
> the log file as an argument, so sys module is required as well.
>
> I will read lines from file, with the 'for loop', and then I will check them for
> 'foo' matches with a 'while loop', if matches I (somehow) re-initialise the
> list, and if there is no matches for foo, I will append line to the list. When I
> get to a blank line (end of block), write myList to an external file. And start
> with another line.
>
> I am stuck with defining 'blank line', I don't manage to get throught the while
> loop, any hint here I will really appreciate it.
> I don't expect the solution, as I think this is a great exercise to get wet
> with python, but if anyone thinks that this is the wrong way of solving the
> problem, please let me know.
>
>
> #!/usr/bin/python
>
> import sys
> import gzip
>
> myList = []
>
> # At the moment not bother with argument part as I am testing it with a
> # testing log file
> #fileIn = gzip.open(sys.argv[1])
>
> fileIn = gzip.open('big_log_file.gz', 'r')
> fileOut = open('outputFile', 'a')
>
> for line in fileIn:
> while line != 'blank_line':
> if line == 'foo':
> Somehow re-initialise myList
> break
> else:
> myList.append(line)
> fileOut.writelines(myList)
>
>
> Somehow rename outputFile with big_log_file.gz
>
> fileIn.close()
> fileOut.close()
>
> -------------------------------------------------------------
>
> The log file will be fill with:
>
>
> Tue Nov 17 16:11:47 GMT 2009
> bladi bladi bla
> tarila ri la
> patatin pataton
> tatati tatata
>
> Tue Nov 17 16:12:58 GMT 2009
> bladi bladi bla
> tarila ri la
> patatin pataton
> foo
> tatati tatata
>
> Tue Nov 17 16:13:42 GMT 2009
> bladi bladi bla
> tarila ri la
> patatin pataton
> tatati tatata
>
>
> etc, etc ,etc
> ..............................................................
>
> Again, thank you.
>
This is how, with your help, finally, I wrote the script.
The way I compress the file at the end of the script, opening files
again, didn't feel right, but couldn't make it work other way, and I
was very tired at the time.
First test, seems to work, but I will do a more deep test tomorrow.
Thank you all.
#!/usr/bin/python
# Importing modules that I'm going to need
import sys
import gzip
import os
# Initialising paragraph list
paragraph = []
# Flag to signal which paragraphs to keep and which one to discard.
keep = True
# Getting file name, without extension, from parameter pass to script,
# to rename file.
renameFile, ignored = os.path.splitext(sys.argv[1])
# Opening argument file.
fileIn = gzip.open(sys.argv[1])
fileOut = open('outputFile', 'a')
# Only one argument pass to script, gzip file.
if len(sys.argv) != 2:
print 'Usage: log_exercise01.py <gzip logfile>'
sys.exit(1)
# Main loop
for line in fileIn:
# If a blank line in log file
if line.isspace():
# I append a blank line to list to keep the format
paragraph.append('\n')
# If true append line to file, keeping formating with the
# "".join trick
if keep:
fileOut.write( "".join(paragraph) )
# Re-initialise list
paragraph = []
keep = True
# Else append line to paragraph list and if stripping the line from
# the initial tab is 'foo' then set flag keep to false, to discard
# paragraph.
else:
paragraph.append(line)
if line.strip() == 'foo':
keep = False
# Compressing file that has been created
f_in = open('outputFile', 'r')
f_out = gzip.open(sys.argv[1], 'w')
f_out.writelines(f_in)
# Renaming the file with the same name that parameter passed and
# deleting the file created by the script.
os.rename('outputFile', renameFile)
os.remove(renameFile)
f_in.close()
f_out.close()
fileIn.close()
fileOut.close()
--
-----------------------------
Antonio de la Fuente Martínez
E-mail: toni at muybien.org
-----------------------------
En una organización jerárquica, cuanto más alto es el nivel, mayor es la
confusión.
-- Ley de Dow.
More information about the Tutor
mailing list