summarize text

Tim Chase python.list at tim.thechases.com
Mon May 29 09:13:19 EDT 2006


> does anyone know of a library which permits to summarise text?
> i've been looking at nltk but haven't found anything yet. any
> help would be very welcome.

Well, summarizing text is one of those things that generally 
takes a brain-cell or two to do.  Automating the process would 
require doing it either smartly (some sort of 
neural-net/NLP/Markov-chain technology, which is a non-trivial 
task--something one might consider braving in the 3rd or 4th-year 
of a university computer-science program), or doing it fairly 
dumbly.  As an example of a "dumb" solution, you can use regexps 
to trim off the first few words and the last few words and call 
that a "summary":

 >>> import re
 >>> r = re.compile(r'^(.{8}.*?\b)\s.*\s(\b.{8}.*?)', re.DOTALL)
 >>> s = """This is the first line
... and it has a second line
... and a third line
... and the last line is the fourth line."""
 >>> result = r.sub(r"\1...\2",s.strip())
 >>> result
'This is the...fourth line.'

You can adjust the "{8}" portions for more or less 
leader/trailing context characters.

The regexp might need a bit of tweaking for somewhat short 
strings, but if they're fairly short, one might not need to 
summarize them ;)

-tkc









More information about the Python-list mailing list