[Tutor] Textparsing, a better way?

Zak Arntson zak@harlekin-maus.com
Tue May 6 15:03:01 2003


> At 09:41 2003-05-06 -0700, Zak Arntson wrote:
>>I'm working on my text adventure text parser (think Zork), and have
>> created the following code to turn a sentence into a list of words and
>> punctuation. E.g.: "Sailor, throw me the bottle. Get bottle" ->
>>['sailor',',','throw','me','the','bottle','.','get','bottle']
>
> Is this what you want? (I threw in support for ?, ! and - as well.)
>
>  >>> t = "Sailor, throw me the bottle. Get bottle"
>  >>> b = re.compile(r'\S+?\b|[\.,:;\-\?!]')
>  >>> b.findall(t.lower())
> ['sailor', ',', 'throw', 'me', 'the', 'bottle', '.', 'get', 'bottle']

Oh man. I completely missed the findall method, even _after_ going to help
documentation and the dir(). Thank you tons!

And thanks for throwing in the extra functionality, to boot! And here  was
lamenting the absence of a 'get a list of matches' functionality. :)

Note, for anyone who'll be following my text adventure code in the future:
I need '-' to be part of a word (like 'fixed-width' or 'blue-green'), so
I'm going to change the above expression to: r'[\S\-]+?\b|[\.,:;\-\?!]'

Thanks again!

-- 
Zak Arntson
www.harlekin-maus.com - Games - Lots of 'em