summarize text
Tim Chase
python.list at tim.thechases.com
Mon May 29 09:13:19 EDT 2006
> does anyone know of a library which permits to summarise text?
> i've been looking at nltk but haven't found anything yet. any
> help would be very welcome.
Well, summarizing text is one of those things that generally
takes a brain-cell or two to do. Automating the process would
require doing it either smartly (some sort of
neural-net/NLP/Markov-chain technology, which is a non-trivial
task--something one might consider braving in the 3rd or 4th-year
of a university computer-science program), or doing it fairly
dumbly. As an example of a "dumb" solution, you can use regexps
to trim off the first few words and the last few words and call
that a "summary":
>>> import re
>>> r = re.compile(r'^(.{8}.*?\b)\s.*\s(\b.{8}.*?)', re.DOTALL)
>>> s = """This is the first line
... and it has a second line
... and a third line
... and the last line is the fourth line."""
>>> result = r.sub(r"\1...\2",s.strip())
>>> result
'This is the...fourth line.'
You can adjust the "{8}" portions for more or less
leader/trailing context characters.
The regexp might need a bit of tweaking for somewhat short
strings, but if they're fairly short, one might not need to
summarize them ;)
-tkc
More information about the Python-list
mailing list