[Tutor] Extracting words..

Nicole Seitz nicole.seitz@urz.uni-hd.de
Sun, 24 Mar 2002 21:39:18 +0100


Hi there!

I'm trying to write a little script that extracts word creations like 
"StarOffice","CompuServe","PalmPilot",etc. from an text file.

import re; import string

reg = re.compile(r"\b[A-Z][a-z]+[A-Z][a-z]+\b")

file = open("heise_klein.txt")

txt = file.read()
result = reg.findall(txt)


-->returns a list containing such words, e.g.

['JavaScript', 'MacWeek', 'MacWeek', 'CompuServe', 'CompuServe', 
'CompuServe', 'CompuServe', 'CompuServe', 'SysOps', 'SysOps', 'CompuServe', 
'CompuServe', 'CompuServe', 'InterBus', 'NeuroVisionen', 'NeuroVisionen', 
'InterBus']

My first question:

What do I have to do that each word appears only once in the list,i.e. is 
found only once??

Second question:

I might want to have an output like that:

line 3: StarOffice
line34: CompuServe
line 42: PalmPilot

Would I then have to use match() or search() or whatever instead of findall- 
scan the file line per line (file.readline() )??

Thanx in advance!

Nicole