[Tutor] about Regular Expression in Class function
Abdirizak abdi
a_abdi406@yahoo.com
Wed Apr 30 17:08:13 2003
--0-717062632-1051708139=:32655
Content-Type: text/plain; charset=us-ascii
hi, I have a class that indexes some text,this class has a function calledreadline() it reads a line from a file, what I need to do is to pass RE(regular expression) so that only normal words are extracted because the input file(i.e file to be read ) has some XML tags so for e.g. the file has this<S ID='S-0'> <W>Similarity-Based</W> <W>Estimation</W> <W>of</W> <W>Word</W> <W>Cooccurrence</W> <W>Probabilities</W> </S> and want to extract ['Similarity based', 'Estimation'.....] I think the problem is it doesn't accept the class instances such as : aword =re.compile (r'<W>([^<]+)</W>')self.line = aword.findall(self.line) Can anyone help me to suggest how I can incorporate this RE.Here is the function:Class text_word_iterator......def readline(self):
#aword =re.compile (r'<[^<>]*>|\b[\w-]+\b')#|<W>([^<]+)</W>') #added now
aword =re.compile (r'<W>([^<]+)</W>') #added now
self.line = self.file.readline() #already there
self.line = aword.findall(self.line) #added now for xml--should work
self.line = ' '.join(self.line) #this is a extra lin added -- works
print self.line
if self.line == '': self.words = None
else: self.words = filter(None, re.split(r'\W+', self.line))#mod form \W ->\s+
thanks in advance
---------------------------------
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
--0-717062632-1051708139=:32655
Content-Type: text/html; charset=us-ascii
<DIV>hi,</DIV>
<DIV> </DIV>
<DIV>I have a class that indexes some text,this class has a function called</DIV>
<DIV>readline() it reads a line from a file, what I need to do is to pass </DIV>
<DIV>RE(regular expression) so that only normal words are extracted because the input file(i.e file to be read ) has some XML tags so for e.g. the file has this</DIV>
<DIV><FONT size=2><S ID='S-0'> <W>Similarity-Based</W> <W>Estimation</W> <W>of</W> <W>Word</W> <W>Cooccurrence</W> <W>Probabilities</W> </S></FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>and want to extract ['Similarity based', 'Estimation'.....] I think the problem is it doesn't accept the class instances such as :</FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV>aword =re.compile (r'<W>([^<]+)</W>')</DIV>
<DIV><FONT size=2>self.line = aword.findall(self.line) </FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>Can anyone help me to suggest how I can incorporate this RE.</FONT></DIV>
<DIV><FONT size=2>Here is the function:</FONT></DIV>
<DIV><FONT size=2>Class text_word_iterator</FONT></DIV>
<DIV>......</DIV>
<DIV><FONT size=2>def readline(self):<BR> #aword =re.compile (r'<[^<>]*>|\b[\w-]+\b')#|<W>([^<]+)</W>') #added now<BR> aword =re.compile (r'<W>([^<]+)</W>') #added now<BR> self.line = self.file.readline() #already there<BR> self.line = aword.findall(self.line) #added now for xml--should work<BR> self.line = ' '.join(self.line) #this is a extra lin added -- works<BR> print self.line<BR> <BR> if self.line == '': self.words = None<BR> else: self.words = filter(None, re.split(r'\W+', self.line))#mod form \W ->\s+<BR> </FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV><FONT size=2>thanks in advance</DIV></FONT><p><hr SIZE=1>
Do you Yahoo!?<br>
<a href="http://us.rd.yahoo.com/search/mailsig/*http://search.yahoo.com">The New Yahoo! Search</a> - Faster. Easier. Bingo.
--0-717062632-1051708139=:32655--