[Tutor] Textparsing, a better way?

Zak Arntson zak@harlekin-maus.com
Tue May 6 12:42:03 2003


I'm working on my text adventure text parser (think Zork), and have
created the following code to turn a sentence into a list of words and
punctuation. E.g.: "Sailor, throw me the bottle. Get bottle" ->
['sailor',',','throw','me','the','bottle','.','get','bottle']

Here's my current code, but I can't help thinking there are areas for
improvement. Any suggestions/comments? I couldn't find a way for a regular
expression to create a list of all of its matches. I'd love to do
something like re.compile ('(\w+)|([\.,:;])') and have that drive
something to make a list of all occuring blocks of that reg exp.

###
def textparse (rawSentence):
    sentence = []

    reWord = re.compile (r'([\.,:;])')
    for chunk in re.compile (r'\s').split (rawSentence.strip ().lower ()):
  # first get rid of whitespace
        for word in reWord.split (chunk):   # now separate puncuation from
words
            if word:
                sentence.append (word)

    return sentence
###

-- 
Zak Arntson
www.harlekin-maus.com - Games - Lots of 'em