[Tutor] Increase performance of the script
Asad
asad.hasan2004 at gmail.com
Tue Dec 11 10:37:58 EST 2018
Hi All,
I used your solution , however found a strange issue with deque :
I am using python 2.6.6:
>>> import collections
>>> d = collections.deque('abcdefg')
>>> print 'Deque:', d
File "<stdin>", line 1
print 'Deque:', d
^
SyntaxError: invalid syntax
>>> print ('Deque:', d)
Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
>>> print d
File "<stdin>", line 1
print d
^
SyntaxError: invalid syntax
>>> print (d)
deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
In python 2.6 print statement work as print "Solution"
however after import collection I have to use print with print("Solution")
is this a known issue ?
Please let me know .
Thanks,
On Mon, Dec 10, 2018 at 10:30 PM <tutor-request at python.org> wrote:
> Send Tutor mailing list submissions to
> tutor at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/tutor
> or, via email, send a message with subject or body 'help' to
> tutor-request at python.org
>
> You can reach the person managing the list at
> tutor-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Tutor digest..."
> Today's Topics:
>
> 1. Re: Increase performance of the script (Peter Otten)
> 2. Re: Increase performance of the script (Steven D'Aprano)
> 3. Re: Increase performance of the script (Steven D'Aprano)
>
>
>
> ---------- Forwarded message ----------
> From: Peter Otten <__peter__ at web.de>
> To: tutor at python.org
> Cc:
> Bcc:
> Date: Sun, 09 Dec 2018 21:17:53 +0100
> Subject: Re: [Tutor] Increase performance of the script
> Asad wrote:
>
> > Hi All ,
> >
> > I have the following code to search for an error and prin the
> > solution .
> >
> > /A/B/file1.log size may vary from 5MB -5 GB
> >
> > f4 = open (r" /A/B/file1.log ", 'r' )
> > string2=f4.readlines()
>
> Do not read the complete file into memory. Read one line at a time and
> keep
> only those lines around that you may have to look at again.
>
> > for i in range(len(string2)):
> > position=i
> > lastposition =position+1
> > while True:
> > if re.search('Calling rdbms/admin',string2[lastposition]):
> > break
> > elif lastposition==len(string2)-1:
> > break
> > else:
> > lastposition += 1
>
> You are trying to find a group of lines. The way you do it for a file of
> the
> structure
>
> foo
> bar
> baz
> end-of-group-1
> ham
> spam
> end-of-group-2
>
> you find the groups
>
> foo
> bar
> baz
> end-of-group-1
>
> bar
> baz
> end-of-group-1
>
> baz
> end-of-group-1
>
> ham
> spam
> end-of-group-2
>
> spam
> end-of-group-2
>
> That looks like a lot of redundancy which you can probably avoid. But
> wait...
>
>
> > errorcheck=string2[position:lastposition]
> > for i in range ( len ( errorcheck ) ):
> > if re.search ( r'"error(.)*13?"', errorcheck[i] ):
> > print "Reason of error \n", errorcheck[i]
> > print "script \n" , string2[position]
> > print "block of code \n"
> > print errorcheck[i-3]
> > print errorcheck[i-2]
> > print errorcheck[i-1]
> > print errorcheck[i]
> > print "Solution :\n"
> > print "Verify the list of objects belonging to Database "
> > break
> > else:
> > continue
> > break
>
> you throw away almost all the hard work to look for the line containing
> those four lines? It looks like you only need the
> "error...13" lines, the three lines that precede it and the last
> "Calling..." line occuring before the "error...13".
>
> > The problem I am facing in performance issue it takes some minutes to
> > print out the solution . Please advice if there can be performance
> > enhancements to this script .
>
> If you want to learn the Python way you should try hard to write your
> scripts without a single
>
> for i in range(...):
> ...
>
> loop. This style is usually the last resort, it may work for small
> datasets,
> but as soon as you have to deal with large files performance dives.
> Even worse, these loops tend to make your code hard to debug.
>
> Below is a suggestion for an implementation of what your code seems to be
> doing that only remembers the four recent lines and works with a single
> loop. If that saves you some time use that time to clean the scripts you
> have lying around from occurences of "for i in range(....): ..." ;)
>
>
> from __future__ import print_function
>
> import re
> import sys
> from collections import deque
>
>
> def show(prompt, *values):
> print(prompt)
> for value in values:
> print(" {}".format(value.rstrip("\n")))
>
>
> def process(filename):
> tail = deque(maxlen=4) # the last four lines
> script = None
> with open(filename) as instream:
> for line in instream:
> tail.append(line)
> if "Calling rdbms/admin" in line:
> script = line
> elif re.search('"error(.)*13?"', line) is not None:
> show("Reason of error:", tail[-1])
> show("Script:", script)
> show("Block of code:", *tail)
> show(
> "Solution",
> "Verify the list of objects belonging to Database"
> )
> break
>
>
> if __name__ == "__main__":
> filename = sys.argv[1]
> process(filename)
>
>
>
>
>
>
> ---------- Forwarded message ----------
> From: "Steven D'Aprano" <steve at pearwood.info>
> To: tutor at python.org
> Cc:
> Bcc:
> Date: Mon, 10 Dec 2018 09:43:20 +1100
> Subject: Re: [Tutor] Increase performance of the script
> On Sun, Dec 09, 2018 at 03:45:07PM +0530, Asad wrote:
> > Hi All ,
> >
> > I have the following code to search for an error and prin the
> > solution .
> >
> > /A/B/file1.log size may vary from 5MB -5 GB
> [...]
>
> > The problem I am facing in performance issue it takes some minutes to
> print
> > out the solution . Please advice if there can be performance enhancements
> > to this script .
>
> How many minutes is "some"? If it takes 2 minutes to analyse a 5GB file,
> that's not bad performance. If it takes 2 minutes to analyse a 5MB file,
> that's not so good.
>
>
>
> --
> Steve
>
>
>
>
> ---------- Forwarded message ----------
> From: "Steven D'Aprano" <steve at pearwood.info>
> To: tutor at python.org
> Cc:
> Bcc:
> Date: Mon, 10 Dec 2018 11:00:58 +1100
> Subject: Re: [Tutor] Increase performance of the script
> On Sun, Dec 09, 2018 at 03:45:07PM +0530, Asad wrote:
> > Hi All ,
> >
> > I have the following code to search for an error and prin the
> > solution .
>
> Please tidy your code before asking for help optimizing it. We're
> volunteers, not being paid to work on your problem, and your code is too
> hard to understand.
>
> Some comments:
>
>
> > f4 = open (r" /A/B/file1.log ", 'r' )
> > string2=f4.readlines()
>
> You have a variable "f4". Where are f1, f2 and f3?
>
> You have a variable "string2", which is a lie, because it is not a
> string, it is a list.
>
> I will be very surprised if the file name you show is correct. It has a
> leading space, and two trailing spaces.
>
>
> > for i in range(len(string2)):
> > position=i
>
> Poor style. In Python, you almost never need to write code that iterates
> over the indexes (this is not Pascal). You don't need the assignment
> position=i. Better:
>
> for position, line in enumerate(lines):
> ...
>
>
> > lastposition =position+1
>
> Poorly named variable. You call it "last position", but it is actually
> the NEXT position.
>
>
> > while True:
> > if re.search('Calling rdbms/admin',string2[lastposition]):
>
> Unnecessary use of regex, which will be slow. Better:
>
> if 'Calling rdbms/admin' in line:
> break
>
>
> > break
> > elif lastposition==len(string2)-1:
> > break
>
> If you iterate over the lines, you don't need to check for the end of
> the list yourself.
>
>
> A better solution is to use the *accumulator* design pattern to collect
> a block of lines for further analysis:
>
> # Untested.
> with open(filename, 'r') as f:
> block = []
> inside_block = False
> for line in f:
> line = line.strip()
> if inside_block:
> if line == "End of block":
> inside_block = False
> process(block)
> block = [] # Reset to collect the next block.
> else:
> block.append(line)
> elif line == "Start of block":
> inside_block = True
> # At the end of the loop, we might have a partial block.
> if block:
> process(block)
>
>
> Your process() function takes a single argument, the list of lines which
> makes up the block you care about.
>
> If you need to know the line numbers, it is easy to adapt:
>
> for line in f:
>
> becomes:
>
> for linenumber, line in enumerate(f):
> # The next line is not needed in Python 3.
> linenumber += 1 # Adjust to start line numbers at 1 instead of 0
>
> and:
>
> block.append(line)
>
> becomes
>
> block.append((linenumber, line))
>
>
> If you re-write your code using this accumulator pattern, using ordinary
> substring matching and equality instead of regular expressions whenever
> possible, I expect you will see greatly improved performance (as well as
> being much, much easier to understand and maintain).
>
>
>
> --
> Steve
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> https://mail.python.org/mailman/listinfo/tutor
>
--
Asad Hasan
+91 9582111698
More information about the Tutor
mailing list