sgrep.so

Darrell news at dorb.com
Wed May 5 13:31:28 EDT 1999


I swiged sgrep into python. This is all very much just a first crack to see
how it would work.
Modified sgrep a bit to avoid reading files from disk,fixed a memory leak
and a GP fault.

The test file was 4.7meg of SGML. These tests aren't very scientific but if
I waited until everything was perfect, that might be a while.

Source with VC6 work space and Linux Makefile
http://www.dorb.com/darrell/

Be sure to look at this for new features.
http://www.cs.helsinki.fi/~jjaakkol/sgrep/README.txt

--Darrell

##### Test sgrep Used 31 meg. The results are stored in an array class.
>>> import time
>>> from sgrep import *
>>> args=['','"Auto"']
>>> t1= time.time()
>>> buf=open('dism.dsr').read()
>>> l=[]
>>> for x in range(100):
...     l.append(sgrepArgs(args,buf))
...
>>> print time.time()-t1
31.4170000553
>>>
##### Test re Used 63 meg. The results are stored in a python list with
python objects.
##### I believe thats the diff in memory and partly performance.
>>> l=[]
>>> import time, re
>>> from sgrep import *
>>> args=['','"Auto"']
>>> t1= time.time()
>>> buf=open('dism.dsr').read()
>>> l=[]
>>> for x in range(100):
...     l.append(re.findall("Auto",buf))
...
>>> print time.time()-t1
47.5
###### Test sgrep with a little more complicated test
>>> import time
>>> from sgrep import *
>>> args=['','("AutoTagger".."/AutoTagger") containing "para"']
>>> t1= time.time()
>>> cc=sgrepArgs(args,'now is the now time')
>>> buf=open('dism.dsr').read()
>>> l=[]
>>> for x in range(100):
...     l.append(sgrepArgs(args,buf))
...
>>> print time.time()-t1
33.1720000505
>>>
>>>
>>> import time
>>> from sgrep import *
>>> args=['','"para" not in ("AutoTagger".."/AutoTagger")']
>>> t1= time.time()
>>> buf=open('dism.dsr').read()
>>> l=[]
>>> for x in range(100):
...     l.append(sgrepArgs(args,buf))
...
>>> print time.time()-t1
33.2180000544
>>> print len(l[0])
6424
>>>
>>>
>>> import time
>>> from sgrep import *
>>> args=['','("AutoTagger".."/AutoTagger") not containing "para"']
>>> t1= time.time()
>>> buf=open('dism.dsr').read()
>>> l=[]
>>> for x in range(100):
...     l.append(sgrepArgs(args,buf))
...
>>> print time.time()-t1
33.1720000505
>>> print len(l[0])
295
>>>
>>>









More information about the Python-list mailing list