get word base

Bengt Richter bokr at
Fri Jun 28 17:12:58 EDT 2002

On Fri, 28 Jun 2002 15:14:50 -0500, John Hunter <jdhunter at> wrote:

>I would like to be able to get the root/base of a word by stripping
>off plurals, gerund endings, participle endings etc...  Here is a
>totally naive first attempt that gets it right sometimes:
>import re
>rgx = re.compile( '(\w+?)(?:ing|ed|es|s)')
>def get_base(word):
>    m = rgx.match(word)
>    if m:
>        return
>    else:
>        return word
>words = ['hello', 'taxes', 'thoughts', 'walked', 'rakes']
>for word in words:
>    print word, get_base(word)
>Produces the following output
>> python
>hello hello
>taxes tax
>thoughts thought
>walked walk
>rakes rak
>I can think of a few things to do to refine this, but before I forge
>ahead, I wanted to solicit advice.
Google for python stemmer ;-)

Bengt Richter

More information about the Python-list mailing list