[Tutor] Introduction - log exercise

bob gailer bgailer at gmail.com
Wed Nov 18 02:24:37 CET 2009


Antonio de la Fuente wrote:
> * bob gailer <bgailer at gmail.com> [2009-11-17 15:26:20 -0500]:
>
>   
>> Date: Tue, 17 Nov 2009 15:26:20 -0500
>> From: bob gailer <bgailer at gmail.com>
>> To: Antonio de la Fuente <toni at muybien.org>
>> CC: Python Tutor mailing list <tutor at python.org>
>> Subject: Re: [Tutor] Introduction - log exercise
>> User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
>> Message-ID: <4B0306EC.8000105 at gmail.com>
>>
>> Antonio de la Fuente wrote:
>>     
>>> Hi everybody,
>>>
>>> This is my first post here. I have started learning python and I am new to
>>> programing, just some bash scripting, no much. Thank you for the
>>> kind support and help that you provide in this list.
>>>
>>> This is my problem: I've got a log file that is filling up very quickly, this
>>> log file is made of blocks separated by a blank line, inside these blocks there
>>> is a line "foo", I want to discard blocks with that line inside it, and create a
>>> new log file, without those blocks, that will reduce drastically the size of the
>>> log file.
>>>
>>> The log file is gziped, so I am going to use gzip module, and I am going to pass
>>> the log file as an argument, so sys module is required as well.
>>>
>>> I will read lines from file, with the 'for loop', and then I will check them for
>>> 'foo' matches with a 'while loop', if matches I (somehow) re-initialise the
>>> list, and if there is no matches for foo, I will append line to the list. When I
>>> get to a blank line (end of block), write myList to an external file. And start
>>> with another line.
>>>
>>> I am stuck with defining 'blank line', I don't manage to get throught the while
>>> loop, any hint here I will really appreciate it.
>>> I don't expect the solution, as I think this is a great exercise to get wet
>>> with python, but if anyone thinks that this is the wrong way of solving the
>>> problem, please let me know.
>>>
>>>
>>> #!/usr/bin/python
>>>
>>> import sys
>>> import gzip
>>>
>>> myList = []
>>>
>>> # At the moment not bother with argument part as I am testing it with a
>>> # testing log file
>>> #fileIn = gzip.open(sys.argv[1])
>>>
>>> fileIn = gzip.open('big_log_file.gz', 'r')
>>> fileOut = open('outputFile', 'a')
>>>
>>> for line in fileIn:
>>>    while line != 'blank_line':
>>>        if line == 'foo':
>>>            Somehow re-initialise myList
>>> 	    break
>>>        else:
>>>            myList.append(line)
>>>    fileOut.writelines(myList)
>>>       
>> Observations:
>> 0 - The other responses did not understand your desire to drop any
>> paragraph containing 'foo'.
>>     
>
> Yes, paragraph == block, that's it
>
>   
>> 1 - The while loop will run forever, as it keeps processing the same line.
>>     
>
> Because the tabs in the line with foo?!
>   

No - because within the loop there is nothing reading the next line of 
the file!
>   
>> 2 - In your sample log file the line with 'foo' starts with a tab.
>> line == 'foo' will always be false.
>>     
>
> So I need first to get rid of those tabs, right? I can do that with
> line.strip(), but then I need the same formatting for the fileOut.
>
>   
>> 3 - Is the first line in the file Tue Nov 17 16:11:47 GMT 2009 or blank?
>>     
>
> First line is Tue Nov 17 16:11:47 GMT 2009
>
>   
>> 4 - Is the last line blank?
>>     
>
> last line is blank.
>
>   
>> Better logic:
>>
>>     
> I would have never thought this way of solving the problem. Interesting.
>   
>> # open files
>> paragraph = []
>> keep = True
>> for line in fileIn:
>>  if line.isspace(): # end of paragraph 
>>     
>
> Aha! finding the blank line
>
>   
>>    if keep:
>>      outFile.writelines(paragraph)
>>    paragraph = []
>>     
>
> This is what I called re-initialising the list.
>
>   
>>    keep = True
>>  else:
>>    if keep:
>>      if line == '\tfoo':
>>        keep = False
>>      else:
>>        paragraph.append(line)
>> # anticipating last line not blank, write last paragraph
>> if keep:
>>   outFile.writelines(paragraph)
>>
>> # use shutil to rename
>>
>>     
> Thank you.
>
>   
>> -- 
>> Bob Gailer
>> Chapel Hill NC
>> 919-636-4239
>>     
>
>   


-- 
Bob Gailer
Chapel Hill NC
919-636-4239


More information about the Tutor mailing list