[Tutor] Logfile Manipulation
Wayne Werner
waynejwerner at gmail.com
Mon Nov 9 16:15:37 CET 2009
On Mon, Nov 9, 2009 at 7:46 AM, Stephen Nelson-Smith <sanelson at gmail.com>wrote:
> And the problem I have with the below is that I've discovered that the
> input logfiles aren't strictly ordered - ie there is variance by a
> second or so in some of the entries.
>
Within a given set of 10 lines, is the first line and last line "in order" -
i.e.
1
2
4
3
5
8
7
6
9
10
> I can sort the biggest logfile (800M) using unix sort in about 1.5
> mins on my workstation. That's not really fast enough, with
> potentially 12 other files....
>
If that's the case, then I'm pretty sure you can create sort of a queue
system, and it should probably cut down on the sorting time. I don't know
what the default python sorting algorithm is on a list, but AFAIK you'd be
looking at a constant O(log 10) time on each insertion by doing something
such as this:
log_generator = (d for d in logdata)
mylist = # first ten values
while True:
try:
mylist.sort()
nextdata = mylist.pop(0)
mylist.append(log_generator.next())
except StopIteration:
print 'done'
#Do something with nextdata
Or now that I look, python has a priority queue (
http://docs.python.org/library/heapq.html ) that you could use instead. Just
push the next value into the queue and pop one out - you give it some
initial qty - 10 or so, and then it will always give you the smallest value.
HTH,
Wayne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20091109/80124e34/attachment.htm>
More information about the Tutor
mailing list