Python Grep (was: writing (Gnu)MAKE in Python)
Eric Hagemann
ehagemann at home.com
Sat Jun 17 19:42:27 EDT 2000
If your file can fit in memory I believe you will find that reading the
whole thing in (using readlines() ), then searching over the lines is a bit
faster. I was doing something similar for files in the 5MB range and the
speed dropped from 5 sec to about 2 sec ( I was doing more than the grep but
.....)
Also you might have better luck with the regular expression stuff (rather
than the find command) and precompiling the string
Cheers
"Doug Stanfield" <DOUGS at oceanic.com> wrote in message
news:8457258D741DD411BD3D0050DA62365907A23A at huina.oceanic.com...
> [John said:]
> > > ...I would like to have Bash, sed, and grep as Python
> > > programs rather than compiled C.
> >
> [and Corageous asked:]
> > Why?
>
> Consider this an attempt at translation.
>
> There have been posts in the past about having a Python shell. Perhaps
> thats what John wants. Not that I'd like that. It may be that John wants
> the functionality of sed and grep more easily accessible in Python. That
is
> something I care about (thus the self serving attempt at manipulating this
> thread. ;-)
>
> Grep functions in particular are something that I wonder about. The
> following is an experiment:
>
> #!/usr/bin/python
> #
> # pygrep.py
> #
> #
> """ An attempt to compare the use of the grep
> command with a 'Pythonic' method of finding
> all lines in a file that contain a string.
> """
> import string, os
>
> def pygrep(the_string,the_file):
> """ Search for and return all occurrances of lines
> in the_file that contain the_string.
> This is simplistic and unprotected but its
> purpose is only to learn how to make it fast. """
>
> find = string.find
>
> holder = open(the_file,'r')
> while 1:
> line = holder.readline()
> if not line:
> break
> if find(line,the_string) <> -1:
> print line
>
> def mygrep(the_string,the_file):
> """ This is usually what I do when I need this. """
>
> command = 'grep %s %s' % (the_string,the_file)
> response = os.popen(command,'r')
> lines = response.read()
> print lines
>
> if __name__ == '__main__':
> import time
>
> test_file = "/usr/local/devices/motor.10"
> test_string = "26086594"
>
> first = time.time()
> pygrep(test_string,test_file)
> second = time.time()
> mygrep(test_string,test_file)
> third = time.time()
>
> print "Python: %s" % ((second - first),)
> print "OS : %s" % ((third - second),)
>
> I run this and get:
>
> $ ./pygrep.py
>
Punc,202,26086594,SG6,1,2,30days,-9,53,732970,358877,0,67,176761,39326,1,24.
> 94.89.251,
>
>
Punc,202,26086594,SG6,1,2,30days,-9,53,751771,371322,0,68,179118,40654,1,24.
> 94.89.251,
>
>
Punc,202,26086594,SG6,1,2,30days,-9,53,787679,393889,0,68,184387,42710,1,24.
> 94.89.251,
>
>
Punc,202,26086594,SG6,1,2,30days,-9,53,732970,358877,0,67,176761,39326,1,24.
> 94.89.251,
>
Punc,202,26086594,SG6,1,2,30days,-9,53,751771,371322,0,68,179118,40654,1,24.
> 94.89.251,
>
Punc,202,26086594,SG6,1,2,30days,-9,53,787679,393889,0,68,184387,42710,1,24.
> 94.89.251,
>
> Python: 6.7979799509
> OS : 0.0741490125656
>
> Am I missing a Pydiom that would narrow the gap or is using the OS
function
> the best way?
>
> -Doug-
>
More information about the Python-list
mailing list