[Tutor] about RE + python

Abdirizak abdi a_abdi406@yahoo.com
Thu Mar 20 09:15:08 2003


--0-1980350925-1048169657=:76558
Content-Type: text/plain; charset=us-ascii


thanks jenssen for your contribution.

I tried by including the stop punctuation in my character set as follows

buf = re.compile(r"[-a-zA-Z0-9\.]+")
te = buf.findall(test)
print te

this is the result

>>> import re
>>> test = 'Data sparseness is an inherent problem in statistical methods for natural language processin
g.'
>>> buf = re.compile(r"[-a-zA-Z0-9.]+")
>>> te = buf.findall(test)
>>> print te


['Data', 'sparseness', 'is', 'an', 'inherent', 'problem', 'in', 'statistical', 'methods', 'for', 'natura
l', 'language', 'processing.']
>>>

lok at the last line processing is followed by fullstop, but Iwant to have

 ...'processing', '.' ] 

i.e quoted processing followed by quoted fullstop which also means toconside the full stop as separate token


thanks in advance



---------------------------------
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
--0-1980350925-1048169657=:76558
Content-Type: text/html; charset=us-ascii

<P>thanks jenssen for your contribution.</P>
<P>I tried&nbsp;by including the stop punctuation in my character set as follows</P>
<P>buf = re.compile(r"[-a-zA-Z0-9\.]+")<BR>te = buf.findall(test)<BR>print te</P>
<P>this is the result</P>
<P>&gt;&gt;&gt; import re<BR>&gt;&gt;&gt; test = 'Data sparseness is an inherent problem in statistical methods for natural language processin<BR>g.'<BR>&gt;&gt;&gt; buf = re.compile(r"[-a-zA-Z0-9.]+")<BR>&gt;&gt;&gt; te = buf.findall(test)<BR>&gt;&gt;&gt; print te</P>
<P><BR>['Data', 'sparseness', 'is', 'an', 'inherent', 'problem', 'in', 'statistical', 'methods', 'for', 'natura<BR>l', 'language', 'processing.']<BR>&gt;&gt;&gt;</P>
<P>lok at the last line processing is followed by fullstop, but Iwant to have</P>
<P>&nbsp;<STRONG>...'processing', '.'&nbsp;] </STRONG></P>
<P>i.e quoted processing followed by quoted fullstop which also means toconside the full stop as separate token<BR></P>
<P>thanks in advance</P><p><br><hr size=1>Do you Yahoo!?<br>
<a href="http://rd.yahoo.com/platinum/evt=8162/*http://platinum.yahoo.com/splash.html">Yahoo! Platinum</a> - Watch CBS' NCAA March Madness, <a href="http://rd.yahoo.com/platinum/evt=8162/*http://platinum.yahoo.com/splash.html">live on your desktop</a>!
--0-1980350925-1048169657=:76558--