[Tutor] about counting files

Bob Gailer bgailer@alum.rpi.edu
Tue Apr 29 12:18:25 2003


--=======1E297A1E=======
Content-Type: multipart/alternative; x-avg-checked=avg-ok-7CB65F26; boundary="=====================_6848697==.ALT"


--=====================_6848697==.ALT
Content-Type: text/plain; x-avg-checked=avg-ok-7CB65F26; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 8bit

At 09:33 PM 4/28/2003 -0700, Abdirizak abdi wrote:
>[snip]mport glob, getopt
>import fileinput,re,shelve,linecache,sys
>#from TextSplitter import TextSplitter
>
>aword =re.compile (r'<[^<>]*>|\b[\w-]+\b') #using xml as well.
>index={}
>
># Generate an index in file indexFileName
>def genIndex(indexFileName, extension):
>
>    fname='*.'+extension
>
>    for line in fileinput.input(glob.glob(fname)):
>       location = fileinput.filename(), fileinput.filelineno()
>       for word in aword.findall(line.lower()):
>          if word[0] != '<':
>             index.setdefault(word,[]).append(location)
>
>    print index  # for testing
>
>    shelf = shelve.open(indexFileName,'n')
>    for word in index:
>       shelf[word] = index[word]
>    shelf.close()
>
>if __name__ == '__main__':
>     import sys
>     for arg in sys.argv[1:]:
>            genIndex(arg,'txt')

 From the manual for the fileinput module:

"input([files[, inplace[, backup]]])
Create an instance of the FileInput class. The instance will be used as 
global state for the functions of this module, and is also returned to use 
during iteration. The parameters to this function will be passed along to 
the constructor of the FileInput class.
The following functions use the global state created by input()"

To apply this to your program:
    fileinputInstance = fileinput.input(glob.glob(fname))
    for line in fileinputInstance :
       location = fileinputInstance .filename(), fileinputInstance 
.filelineno()


Bob Gailer
bgailer@alum.rpi.edu
303 442 2625


--=====================_6848697==.ALT
Content-Type: text/html; x-avg-checked=avg-ok-7CB65F26; charset=us-ascii
Content-Transfer-Encoding: 8bit

<html>
<body>
At 09:33 PM 4/28/2003 -0700, Abdirizak abdi wrote:<br>
<blockquote type=cite class=cite cite>[snip]mport glob, getopt<br>
import fileinput,re,shelve,linecache,sys<br>
#from TextSplitter import TextSplitter<br>
&nbsp;<br>
aword =re.compile (r'&lt;[^&lt;&gt;]*&gt;|\b[\w-]+\b') #using xml as
well.<br>
index={}<br>
&nbsp;<br>
# Generate an index in file indexFileName<br>
def genIndex(indexFileName, extension):<br>
&nbsp;&nbsp; <br>
&nbsp;&nbsp; fname='*.'+extension<br>
&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp; for line in fileinput.input(glob.glob(fname)):<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; location = fileinput.filename(),
fileinput.filelineno()<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for word in
aword.findall(line.lower()):<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if word[0] !=
'&lt;':<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
index.setdefault(word,[]).append(location)<br><br>
&nbsp;&nbsp; print index&nbsp; # for testing<br>
&nbsp;&nbsp; <br>
&nbsp;&nbsp; shelf = shelve.open(indexFileName,'n')<br>
&nbsp;&nbsp; for word in index:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; shelf[word] = index[word]<br>
&nbsp;&nbsp; shelf.close()<br><br>
if __name__ == '__main__':<br>
&nbsp;&nbsp;&nbsp; import sys<br>
&nbsp;&nbsp;&nbsp; for arg in sys.argv[1:]:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
genIndex(arg,'txt')</blockquote><br>
 From the manual for the fileinput module:<br><br>
<tt>&quot;<a name="l2h-1029"></a>input</tt>(<font size=4>[</font>files<font size=4>[</font>,
inplace<font size=4>[</font>, backup<font size=4>]]]</font>) <br>
Create an instance of the <tt>FileInput</tt> class. The instance will be
used as global state for the functions of this module, and is also
returned to use during iteration. The parameters to this function will be
passed along to the constructor of the <tt>FileInput</tt> class. <br>
The following functions use the global state created by
<tt>input()&quot;<br><br>
</tt>To apply this to your program:<br>
&nbsp;&nbsp; fileinputInstance = fileinput.input(glob.glob(fname))<br>
&nbsp;&nbsp; for line in fileinputInstance :<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; location = fileinputInstance .filename(),
fileinputInstance .filelineno()<br><br>
<x-sigsep><p></x-sigsep>
Bob Gailer<br>
bgailer@alum.rpi.edu<br>
303 442 2625<br>
</body>
</html>


--=====================_6848697==.ALT--

--=======1E297A1E=======
Content-Type: text/plain; charset=us-ascii; x-avg=cert; x-avg-checked=avg-ok-7CB65F26
Content-Disposition: inline


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.474 / Virus Database: 272 - Release Date: 4/18/2003

--=======1E297A1E=======--