[Tutor] Increase performance of the script
Steven D'Aprano
steve at pearwood.info
Sun Dec 9 19:00:58 EST 2018
On Sun, Dec 09, 2018 at 03:45:07PM +0530, Asad wrote:
> Hi All ,
>
> I have the following code to search for an error and prin the
> solution .
Please tidy your code before asking for help optimizing it. We're
volunteers, not being paid to work on your problem, and your code is too
hard to understand.
Some comments:
> f4 = open (r" /A/B/file1.log ", 'r' )
> string2=f4.readlines()
You have a variable "f4". Where are f1, f2 and f3?
You have a variable "string2", which is a lie, because it is not a
string, it is a list.
I will be very surprised if the file name you show is correct. It has a
leading space, and two trailing spaces.
> for i in range(len(string2)):
> position=i
Poor style. In Python, you almost never need to write code that iterates
over the indexes (this is not Pascal). You don't need the assignment
position=i. Better:
for position, line in enumerate(lines):
...
> lastposition =position+1
Poorly named variable. You call it "last position", but it is actually
the NEXT position.
> while True:
> if re.search('Calling rdbms/admin',string2[lastposition]):
Unnecessary use of regex, which will be slow. Better:
if 'Calling rdbms/admin' in line:
break
> break
> elif lastposition==len(string2)-1:
> break
If you iterate over the lines, you don't need to check for the end of
the list yourself.
A better solution is to use the *accumulator* design pattern to collect
a block of lines for further analysis:
# Untested.
with open(filename, 'r') as f:
block = []
inside_block = False
for line in f:
line = line.strip()
if inside_block:
if line == "End of block":
inside_block = False
process(block)
block = [] # Reset to collect the next block.
else:
block.append(line)
elif line == "Start of block":
inside_block = True
# At the end of the loop, we might have a partial block.
if block:
process(block)
Your process() function takes a single argument, the list of lines which
makes up the block you care about.
If you need to know the line numbers, it is easy to adapt:
for line in f:
becomes:
for linenumber, line in enumerate(f):
# The next line is not needed in Python 3.
linenumber += 1 # Adjust to start line numbers at 1 instead of 0
and:
block.append(line)
becomes
block.append((linenumber, line))
If you re-write your code using this accumulator pattern, using ordinary
substring matching and equality instead of regular expressions whenever
possible, I expect you will see greatly improved performance (as well as
being much, much easier to understand and maintain).
--
Steve
More information about the Tutor
mailing list