Sorting Apache Log Files

Nick Perkins nperkins7 at home.com
Mon Jun 18 22:42:21 EDT 2001


If you are merging really big logfiles, you probably don't need to read them
all into memory in order to do this.  Since you are merging 2 files that are
already sorted, you don't really need the full power of the sort() function.

You could just open both files, as well as the output file, then read one
line from each output file, decide which one comes first, and write that
line to the output. Then just keep going until one file is done, and copy
the rest of the other file over.

With a bit more logic, you could generalize this to merge more than 2 files,
but then you can always do that anyway, by just merging them 2 at a time,
and 'accumulating' the result.



"Lenny Self" <lenny at squiggie.com> wrote in message
news:rfyX6.597$Qp2.488157 at news.uswest.net...
> Thanks for your help.  This is what I ended up doing... It seems to work
> quite nicely and seems fast enough.  Although, I'm not sure how fast its
> going to be with 20MB of logs :)
>
> #!/usr/bin/pyton
>
> import string
>
> # Reading file into list
> list = open("d:/work/access.log","r").readlines()
> def compare (line1,line2):
>     # Nicely sucks out the apache date stamp
>     datestamp1 = line1[string.find(line1,"[") + 1:string.rfind(line1,"]")]
>     datestamp2 = line2[string.find(line2,"[") + 1:string.rfind(line2,"]")]
>     # Compare the date stamps and return appropriate value
>     if datestamp1 < datestamp2:
>             return -1
>     elif datestamp2 < datestamp1:
>             return 1
>     else:
>             return 0
> list.sort(compare)
> # Writing sorted list to new file
> open("d:/work/newfile.txt","w").writelines(list)
>
> Thanks.
>
>     -- Lenny
>
>
> "Sheila King" <sheila at spamcop.net> wrote in message
> news:td3titkav6amrrfjimkjkf4kngp7u4ahpg at 4ax.com...
> > On 18 Jun 2001 15:55:53 -0700, lenny.self at qsent.com (Lenny) wrote in
> > comp.lang.python in article
> > <b1aa9ab6.0106181455.681ef924 at posting.google.com>:
> >
> > : I was planning on loading each of the log files
> > :into a list and then sorting the list.  Unfortualy, I am unaware of
> > :how to do that when the value I wish to search on isn't at the
> > :beginning of the line.  I need to search on Apache's date string.
> >
> > How about this? Create a list of tuples, where the tuple is:
> >
> > (datestamp, full_line)
> >
> > So, as you put each line from the log into the list, grab the datestamp
> > from the line, make a tuple and then sort the list on the first element
> > of each tuple?
> >
> > --
> > Sheila King
> > http://www.thinkspot.net/sheila/
> > http://www.k12groups.org/
> >
> >
>
>





More information about the Python-list mailing list