[Tutor] Increase performance of the script
Peter Otten
__peter__ at web.de
Sun Dec 9 15:17:53 EST 2018
Asad wrote:
> Hi All ,
>
> I have the following code to search for an error and prin the
> solution .
>
> /A/B/file1.log size may vary from 5MB -5 GB
>
> f4 = open (r" /A/B/file1.log ", 'r' )
> string2=f4.readlines()
Do not read the complete file into memory. Read one line at a time and keep
only those lines around that you may have to look at again.
> for i in range(len(string2)):
> position=i
> lastposition =position+1
> while True:
> if re.search('Calling rdbms/admin',string2[lastposition]):
> break
> elif lastposition==len(string2)-1:
> break
> else:
> lastposition += 1
You are trying to find a group of lines. The way you do it for a file of the
structure
foo
bar
baz
end-of-group-1
ham
spam
end-of-group-2
you find the groups
foo
bar
baz
end-of-group-1
bar
baz
end-of-group-1
baz
end-of-group-1
ham
spam
end-of-group-2
spam
end-of-group-2
That looks like a lot of redundancy which you can probably avoid. But
wait...
> errorcheck=string2[position:lastposition]
> for i in range ( len ( errorcheck ) ):
> if re.search ( r'"error(.)*13?"', errorcheck[i] ):
> print "Reason of error \n", errorcheck[i]
> print "script \n" , string2[position]
> print "block of code \n"
> print errorcheck[i-3]
> print errorcheck[i-2]
> print errorcheck[i-1]
> print errorcheck[i]
> print "Solution :\n"
> print "Verify the list of objects belonging to Database "
> break
> else:
> continue
> break
you throw away almost all the hard work to look for the line containing
those four lines? It looks like you only need the
"error...13" lines, the three lines that precede it and the last
"Calling..." line occuring before the "error...13".
> The problem I am facing in performance issue it takes some minutes to
> print out the solution . Please advice if there can be performance
> enhancements to this script .
If you want to learn the Python way you should try hard to write your
scripts without a single
for i in range(...):
...
loop. This style is usually the last resort, it may work for small datasets,
but as soon as you have to deal with large files performance dives.
Even worse, these loops tend to make your code hard to debug.
Below is a suggestion for an implementation of what your code seems to be
doing that only remembers the four recent lines and works with a single
loop. If that saves you some time use that time to clean the scripts you
have lying around from occurences of "for i in range(....): ..." ;)
from __future__ import print_function
import re
import sys
from collections import deque
def show(prompt, *values):
print(prompt)
for value in values:
print(" {}".format(value.rstrip("\n")))
def process(filename):
tail = deque(maxlen=4) # the last four lines
script = None
with open(filename) as instream:
for line in instream:
tail.append(line)
if "Calling rdbms/admin" in line:
script = line
elif re.search('"error(.)*13?"', line) is not None:
show("Reason of error:", tail[-1])
show("Script:", script)
show("Block of code:", *tail)
show(
"Solution",
"Verify the list of objects belonging to Database"
)
break
if __name__ == "__main__":
filename = sys.argv[1]
process(filename)
More information about the Tutor
mailing list