[Tutor] Extracting words..
Nicole Seitz
nicole.seitz@urz.uni-hd.de
Sun, 24 Mar 2002 21:39:18 +0100
Hi there!
I'm trying to write a little script that extracts word creations like
"StarOffice","CompuServe","PalmPilot",etc. from an text file.
import re; import string
reg = re.compile(r"\b[A-Z][a-z]+[A-Z][a-z]+\b")
file = open("heise_klein.txt")
txt = file.read()
result = reg.findall(txt)
-->returns a list containing such words, e.g.
['JavaScript', 'MacWeek', 'MacWeek', 'CompuServe', 'CompuServe',
'CompuServe', 'CompuServe', 'CompuServe', 'SysOps', 'SysOps', 'CompuServe',
'CompuServe', 'CompuServe', 'InterBus', 'NeuroVisionen', 'NeuroVisionen',
'InterBus']
My first question:
What do I have to do that each word appears only once in the list,i.e. is
found only once??
Second question:
I might want to have an output like that:
line 3: StarOffice
line34: CompuServe
line 42: PalmPilot
Would I then have to use match() or search() or whatever instead of findall-
scan the file line per line (file.readline() )??
Thanx in advance!
Nicole