<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">


<HTML><HEAD><TITLE></TITLE>


<META http-equiv=Content-Type content="text/html; charset=us-ascii">


<META content="MSHTML 6.00.2900.3492" name=GENERATOR></HEAD>


<BODY><!-- Converted from text/plain format -->


<P><FONT size=2><FONT face=Arial><FONT color=#0000ff>Hi 


Terry,<BR></FONT><BR>-----Original Message-----<BR>From: Terry Reedy [</FONT><A 


href="mailto:tjreedy@udel.edu"><FONT 


face=Arial>mailto:tjreedy@udel.edu</FONT></A><FONT face=Arial>]<BR>Sent: 


Wednesday, January 14, 2009 01:57<BR>To: python-list@python.org<BR>Subject: Re: 


Could you suggest optimisations ?<BR><BR>Barak, Ron wrote:<BR>> 


Hi,<BR>><BR>> In the attached script, the longest time is spent in the 


following<BR>> functions (verified by psyco log):<BR><BR>I cannot help but 


wonder why and if you really need all the rigamorole with file pointers, 


offsets, and tells instead of<BR><BR>for line in open(...):<BR>   do 


your processing.</FONT></FONT></P>


<P><FONT size=2><FONT face=Arial><FONT color=#0000ff>I'm building a database of 


the found events in the logs (those records between the first and last regexs in 


regex_array).<BR>The user should then be able to navigate among these events 


(among other functionality).<BR>This is why I need the tells and offsets, so I'd 


know the place in the logs where an event starts/ends.</FONT></FONT></FONT></P>


<P><FONT face=Arial color=#0000ff size=2>Bye,<BR>Ron.</FONT><FONT 


face=Arial><BR><BR><FONT size=2>><BR>>     def 


match_generator(self,regex):<BR>>         


"""<BR>>         Generate the next 


line of self.input_file 


that<BR>>         matches 


regex.<BR>>         


"""<BR>>         generator_ = 


self.line_generator()<BR>>         


while 


True:<BR>>             


self.file_pointer = 


self.input_file.tell()<BR>>             


if self.file_pointer != 


0:<BR>>                 


self.file_pointer -= 


1<BR>>             


if (self.file_pointer + 2) >= 


self.last_line_offset:<BR>>                 


break<BR>>             


line_ = 


generator_.next()<BR>>             


print "%.2f%%   \r" % (((self.last_line_offset -<BR>> 


self.input_file.tell()) / (self.last_line_offset * 1.0)) * 


100.0),<BR>>             


if not 


line_:<BR>>                 


break<BR>>             


else:<BR>>                 


match_ = 


regex.match(line_)<BR>>                 


groups_ = 


re.findall(regex,line_)<BR>>                 


if 


match_:<BR>>                     


yield line_.strip("\n"), groups_<BR>> <BR>>     


def 


get_matching_records_by_regex_extremes(self,regex_array):<BR>>         


"""<BR>>         Function 


will:<BR>>         Find the record 


matching the first item of 


regex_array.<BR>>         Will save 


all records until the last item of 


regex_array.<BR>>         Will save 


the last line.<BR>>         Will 


remember the position of the beginning of the next line 


in<BR>>         


self.input_file.<BR>>         


"""<BR>>         start_regex = 


regex_array[0]<BR>>         end_regex 


= regex_array[len(regex_array) - 


1]<BR>> <BR>>         


all_recs = []<BR>>         generator_ 


= 


self.match_generator<BR>> <BR>>         


try:<BR>>             


match_start,groups_ = 


generator_(start_regex).next()<BR>>         


except 


StopIteration:<BR>>             


return(None)<BR>> <BR>>         


if match_start != 


None:<BR>>             


all_recs.append([match_start,groups_])<BR>> <BR>>             


line_ = 


self.line_generator().next()<BR>>             


while 


line_:<BR>>                 


match_ = 


end_regex.match(line_)<BR>>                 


groups_ = 


re.findall(end_regex,line_)<BR>>                 


if match_ != 


None:<BR>>                     


all_recs.append([line_,groups_])<BR>>                     


return(all_recs)<BR>>                 


else:<BR>>                     


all_recs.append([line_,[]])<BR>>                     


line_ = 


self.line_generator().next()<BR>> <BR>>     def 


line_generator(self):<BR>>         


"""<BR>>         Generate the next 


line of self.input_file, and 


update<BR>>         self.file_pointer 


to the beginning of that 


line.<BR>>         


"""<BR>>         while 


self.input_file.tell() <= 


self.last_line_offset:<BR>>             


self.file_pointer = 


self.input_file.tell()<BR>>             


line_ = 


self.input_file.readline()<BR>>             


if not 


line_:<BR>>                 


break<BR>>             


yield line_.strip("\n")<BR>><BR>> I was trying to think of optimisations, 


so I could cut down on<BR>> processing time, but got no inspiration.<BR>> 


(I need the "print "%.2f%%   \r" ..." line for user's 


feedback).<BR>><BR>> Could you suggest any optimisations ?<BR>> 


Thanks,<BR>> Ron.<BR>> <BR>> <BR>> P.S.: Examples of 


processing times 


are:<BR>><BR>>         * 


2m42.782s  on two files with combined size of    792544 


bytes<BR>>           (no 


matches found).<BR>>         * 


28m39.497s on two files with combined size of 4139320 


bytes<BR>>           (783 


matches found).<BR>><BR>>     These times are quite 


unacceptable, as a normal input to the program<BR>>     


would be ten files with combined size of ~17MB.<BR>><BR>><BR>> 


----------------------------------------------------------------------<BR>> 


--<BR>><BR>> --<BR>> </FONT></FONT><A 


href="http://mail.python.org/mailman/listinfo/python-list"><FONT face=Arial 


size=2>http://mail.python.org/mailman/listinfo/python-list</FONT></A><BR><BR><BR></P></BODY></HTML>