Nlp, Python and period
Paul Boddie
paul at boddie.org.uk
Mon Aug 4 06:21:23 EDT 2008
On 4 Aug, 11:59, Fred Mangusta <a... at bbb.it> wrote:
> Hi,
>
> are you aware of any nlp packages or algorithms in Python to spot
> whether a '.' represents an end of sentence or rather something else (eg
> Mr., f... at home.co.uk, etc)?
I wouldn't mind finding out about such packages, either. I see that
NLTK offers a few options, with the following tokeniser being
interesting if you don't mind training the software:
http://nltk.org/doc/guides/tokenize.html#punkt-tokenizer
There was also discussion of this topic on Ned Batchelder's blog a
while back:
http://nedbatchelder.com/blog/200804/separating_sentences.html
My comment on there (that I'm using a regular expression with some
postprocessing) still stands.
Paul
More information about the Python-list
mailing list