![](https://secure.gravatar.com/avatar/f3c39fa1e72338f570742b492a0343af.jpg?s=120&d=mm&r=g)
import time from sgrep import * args=['','"Auto"'] t1= time.time() buf=open('dism.dsr').read() l=[] for x in range(100): ... l.append(sgrepArgs(args,buf)) ... print time.time()-t1 31.4170000553
##### Test re Used 63 meg. The results are stored in a python list with
[Moderator's note: "sgrep (structured grep) is a tool for searching text files and filtering text streams using structural criteria. The data model of sgrep is based on regions, which are non-empty substrings of text."] I swiged sgrep into python. This is all very much just a first crack to see how it would work. Modified sgrep a bit to avoid reading files from disk,fixed a memory leak and a GP fault. The test file was 4.7meg of SGML. These tests aren't very scientific but if I waited until everything was perfect, that might be a while. Source with VC6 work space and Linux Makefile http://www.dorb.com/darrell/ Be sure to look at this for new features. http://www.cs.helsinki.fi/~jjaakkol/sgrep/README.txt --Darrell <P><A HREF="http://www.dorb.com/darrell/">sgrep wrapper</A> - module to use the <A HREF="http://www.cs.helsinki.fi/~jjaakkol/sgrep/README.txt">sgrep</A> structured text/*ML search tool from within Python. (06-May-99) ##### Test sgrep Used 31 meg. The results are stored in an array class. python objects. ##### I believe thats the diff in memory and partly performance.
l=[] import time, re from sgrep import * args=['','"Auto"'] t1= time.time() buf=open('dism.dsr').read() l=[] for x in range(100): ... l.append(re.findall("Auto",buf)) ... print time.time()-t1 47.5 ###### Test sgrep with a little more complicated test import time from sgrep import * args=['','("AutoTagger".."/AutoTagger") containing "para"'] t1= time.time() cc=sgrepArgs(args,'now is the now time') buf=open('dism.dsr').read() l=[] for x in range(100): ... l.append(sgrepArgs(args,buf)) ... print time.time()-t1 33.1720000505
import time from sgrep import * args=['','"para" not in ("AutoTagger".."/AutoTagger")'] t1= time.time() buf=open('dism.dsr').read() l=[] for x in range(100): ... l.append(sgrepArgs(args,buf)) ... print time.time()-t1 33.2180000544 print len(l[0]) 6424
import time from sgrep import * args=['','("AutoTagger".."/AutoTagger") not containing "para"'] t1= time.time() buf=open('dism.dsr').read() l=[] for x in range(100): ... l.append(sgrepArgs(args,buf)) ... print time.time()-t1 33.1720000505 print len(l[0]) 295
-- ----------- comp.lang.python.announce (moderated) ---------- Article Submission Address: python-announce@python.org Python Language Home Page: http://www.python.org/ Python Quick Help Index: http://www.python.org/Help.html ------------------------------------------------------------
participants (1)
-
Darrell