[Tutor] read file and match string

Jim Ragsdale overlord@netdoor.com
Thu, 31 Jan 2002 17:41:05 -0600


[Kalle Svensson ]
Ok, im searching a log file from an Unreal Tournament server. it generates
some output, but im really intested in 1 line:
ScriptLog: [PLAYER_JOIN] {RS}]V{AD_l)OG_3 ip.address.omitted:1040 2600

Each line will have ScriptLog: [PLAYER_JOIN] a name and ip:port and another
number. The other lines are just different server events and other things of
interest. i just want the ip address. The logs can get big. right now it is
at 1.5meg and has been running for almost 2 days. The log resets when the
server does so i  want to save the ip's to a different file so i can keep
them thru server restarts and crashes and such.

I did the regular expression because that is what i found on the net. Messed
w/ it until i got it to match :)  And I just upgraded to Python2.2.  Using
xreadline on a 1.56meg file on my 400celeron laptop takes about .4 sec.( Ran
start = time.clock() before and finish = time.clock() after and took the
difference) Also used the localtime function to write a date/time stamp in
the logfile name(iplog.1.31.2002.5.35.log).

Right now im trying to clean it up, use functions and stuff. do it right :)
I would like to add some functionality as i go along. Say maybe determine if
the log file has been reset and if it hasnt, start from where it left off
last time. If it has been restarted, start fresh and maybe start a new
output file.

Thanks for your input! Ill have to say that python is my favorite language.
easy to understand (or at least relativly :) ) and powerful. If i ever
figure out oop im sure it will be even better!

Hope everyone is able to follow the email. I tried to give it some semblance
of order but dont know if I succeeded :)



----- Original Message -----
From: "Kalle Svensson" <kalle@gnupung.net>
To: "Python Tutor Mailing List" <tutor@python.org>
Sent: Thursday, January 31, 2002 4:59 PM
Subject: Re: [Tutor] read file and match string


> [Jim Ragsdale]
> > import mmap, re
> > def search(filename, rx):
> >     f = open(filename, 'r+')
> >     mem = mmap.mmap(f.fileno(), 0)
> >     for match in rx.finditer(mem):
> >         print match.group(0)
> >     mem.close()
> >     f.close()
> [...]
> > Can someone explain the top snippet to me? looks like a function
> > that takes a filename argument and what is the rx?
>
> A regular expression object, like re.compile("foo").
>
> > Is this what is needed for what i am doing or is it slightly
> > different?
>
> I believe it's slightly different.  The regular expression in the new
> function should match to the end of the line.  If you had
> p = re.compile("foo")
> you want
> rx = re.compile(".*foo.*")
> now (I think).
> Also, it prints the results to standard output, instad of writing them
> to a result file.
>
> > The mem line looks like it opens the file like xreadlines.
>
> The mem line maps the file into memry, thereby making access to it
> faster.  It might be a bad idea if your file is very large, say as
> large as your RAM.
>
> > Any help would be appreciated. Thanks!
>
> If the string you're searching for is simple, it might be faster to
> use the string find method instead of regular expressions.  Also, if
> you're using an old version of python (1.5.2 or 2.0), try upgrading to
> 2.1.2 or 2.2, I think the file reading stuff (with xreadlines, like
> you used first) has been optimized a bit in those newer versions.
>
> Also, a warning:  I don't use mmap or re very much, and might be
> totally wrong.  I hope somebody will correct me in that case.
>
> Peace,
>   Kalle
> --
> Kalle Svensson (kalle@gnupung.net) - Laziness, impatience, hubris: Pick
two!
> English: http://www.gnupung.net/  Svenska:
http://www.lysator.liu.se/~kalle/
> Stuff: ["http://www.%s.org/" % x for x in "gnu debian python
emacs".split()]
>
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor