[Tutor] Increase performance of the script

Sun Dec 9 19:00:58 EST 2018

On Sun, Dec 09, 2018 at 03:45:07PM +0530, Asad wrote:
> Hi All ,
> 
>           I have the following code to search for an error and prin the
> solution .

Please tidy your code before asking for help optimizing it. We're 
volunteers, not being paid to work on your problem, and your code is too 
hard to understand.

Some comments:

> f4 = open (r" /A/B/file1.log  ", 'r' )
> string2=f4.readlines()

You have a variable "f4". Where are f1, f2 and f3?

You have a variable "string2", which is a lie, because it is not a 
string, it is a list.

I will be very surprised if the file name you show is correct. It has a 
leading space, and two trailing spaces.

> for i in range(len(string2)):
>     position=i

Poor style. In Python, you almost never need to write code that iterates 
over the indexes (this is not Pascal). You don't need the assignment 
position=i. Better:

for position, line in enumerate(lines):
    ...

>     lastposition =position+1

Poorly named variable. You call it "last position", but it is actually 
the NEXT position.

>     while True:
>          if re.search('Calling rdbms/admin',string2[lastposition]):

Unnecessary use of regex, which will be slow. Better:

    if 'Calling rdbms/admin' in line:
        break

>           break
>          elif lastposition==len(string2)-1:
>           break

If you iterate over the lines, you don't need to check for the end of 
the list yourself.

A better solution is to use the *accumulator* design pattern to collect 
a block of lines for further analysis:

# Untested.
with open(filename, 'r') as f:
    block = []
    inside_block = False
    for line in f:
        line = line.strip()
        if inside_block:
            if line == "End of block":
                inside_block = False
                process(block)
                block = []  # Reset to collect the next block.
            else:
                block.append(line)
        elif line == "Start of block":
            inside_block = True
    # At the end of the loop, we might have a partial block.
    if block:
         process(block)

Your process() function takes a single argument, the list of lines which 
makes up the block you care about.

If you need to know the line numbers, it is easy to adapt:

    for line in f:

becomes:

    for linenumber, line in enumerate(f):
        # The next line is not needed in Python 3.
        linenumber += 1  # Adjust to start line numbers at 1 instead of 0

and:

    block.append(line)

becomes 

    block.append((linenumber, line))

If you re-write your code using this accumulator pattern, using ordinary 
substring matching and equality instead of regular expressions whenever 
possible, I expect you will see greatly improved performance (as well as 
being much, much easier to understand and maintain).

-- 
Steve