Nlp, Python and period

Paul Boddie paul at boddie.org.uk
Mon Aug 4 12:21:23 CEST 2008


On 4 Aug, 11:59, Fred Mangusta <a... at bbb.it> wrote:
> Hi,
>
> are you aware of any nlp packages or algorithms in Python to spot
> whether a '.' represents an end of sentence or rather something else (eg
> Mr., f... at home.co.uk, etc)?

I wouldn't mind finding out about such packages, either. I see that
NLTK offers a few options, with the following tokeniser being
interesting if you don't mind training the software:

http://nltk.org/doc/guides/tokenize.html#punkt-tokenizer

There was also discussion of this topic on Ned Batchelder's blog a
while back:

http://nedbatchelder.com/blog/200804/separating_sentences.html

My comment on there (that I'm using a regular expression with some
postprocessing) still stands.

Paul



More information about the Python-list mailing list