Sentence splitter module?

Mickel Grönroos mickel at csc.fi
Fri Mar 19 11:35:15 CET 2004


On Fri, 19 Mar 2004, Josiah Carlson wrote:

> > Is anyone aware of a freely available sentence splitter module for Python?
> > It does not have to be anything fancy, but it should be able to split
> > written text in most European languages on regular punctuation mark
> > combinations followed by whitespace and possible supply a list of
> > abbreviations that override the plain punctuation rules.
> >
> > All advice appreciated.
>
> Check out this thread:
> http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&threadm=c2d741%24m3t%241%40news.service.uci.edu&rnum=2&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DISO-8859-1%26safe%3Doff%26q%3Dsentence%2Bjosiah%2Bgroup%253Acomp.lang.python.*%26btnG%3DGoogle%2BSearch%26meta%3Dgroup%253Dcomp.lang.python.*

Thanks Josiah. I didn't have anything more important to do, so I
implemented a SenteceSplitter class from scratch:

http://staff.csc.fi/mickel/code/python/SentenceSplitter.py

I have just made some preliminary testing on Linux and Python 2.3, but it
seems to work OK.

/Mickel

--
Mickel Grönroos, application specialist, linguistics, Research support, CSC
PL 405 (Tekniikantie 15 a D), 02101 Espoo, Finland, phone +358-9-4572237
CSC is the Finnish IT center for science, www.csc.fi






More information about the Python-list mailing list