[Tutor] Fw: File handling: open a file at specified byte?
Alan Gauld
alan.gauld at freenet.co.uk
Mon Feb 20 09:01:31 CET 2006
Forwarding for list visibility
----- Original Message -----
From: "Brian Gustin" <brian at daviesinc.com>
To: "Alan Gauld" <alan.gauld at freenet.co.uk>
Sent: Monday, February 20, 2006 2:23 AM
Subject: Re: [Tutor] File handling: open a file at specified byte?
>
> > look at the file tell() and seek() methods.
> >
> > They will tell you the current location and allow you to move to a
> > specific location.
>
>
> OK..I did try using seek and tell, and couldnt get working code to do
> what I needed it to, however, it did lead me to discover the fileinput
> module, so.. Ive tested it on my test file, and it works quite well, I'd
> like to see if you can offer any better suggestions - keeping in mind a
> log file can grow to as large as 3 GB, so memory management will bee
> important, as will execution time (I will need this parser to execute on
> a file as large as 3 - 4 GB in under 10 minutes time, ideally shooting
> for less than 1 minute)
>
> Code follows:
> ##START CODE ##########
> #!/usr/bin/python
> #for testing of tux parser
> # read "live" log file and parse it into separate domain files
> import string
> import re
> import fileinput
>
> myfiles={}
> line=1
> last=0
> try:
> bkmk = open('bookmark','r')
> last = bkmk.readline()
> bkmk.close()
> except:
> pass
> for outputdata in fileinput.input('./testfile.tuxlog'):
> #sourcelist.sort()
> #print outputdata
> if fileinput.filelineno() < int(last):
> continue
> else:
> info = re.search('(?<=GET )([a-zA-Z0-9\-\.]+)', outputdata)
> try:
> namecheck = info.group(0)
> except AttributeError:
> continue
> try:
> namecheck=namecheck.replace('www.','')
> check = re.search('(\.[a-z]+$)',namecheck)
> if check == None:
> domain = 'Errors'
> else:
> res = re.search('(\ (301|404|403|302)\ 0)',outputdata)
> if res == None:
> domain = namecheck
> else:
> domain = '404_301errors'
> outputdata=outputdata.replace(' '+domain+'/',' /')
> if myfiles.has_key(domain):
> domhandle = myfiles.get(domain)
> else:
>
> domhandle=open('/var/log/tuxp/'+domain+'-access.log.1','w+')
> myfiles[domain] = domhandle
>
>
> domhandle.write(outputdata)
> except:
> continue
> bookmark = fileinput.lineno() #get the last line no handled. could
> this instead be run just before closing the handle?
> rel = open('./bookmark','w')
> rel.write(str(bookmark))
> rel.close()
> #print "BOOKMARK: %s"%bookmark,
> #print domain+' - ',
> #print namecheck,
> # line +=1
> #print str(line)+"\n"
> #print fileinput.filelineno()
> fileinput.close()
>
>
> ############ END CODE############
>
More information about the Tutor
mailing list