aligning a set of word substrings to sentence
steven.bethard at gmail.com
Thu Dec 1 22:56:22 CET 2005
Fredrik Lundh wrote:
> Steven Bethard wrote:
>> I feel like there should be a simpler solution (maybe with the re
>> module?) but I can't figure one out. Any suggestions?
> using the finditer pattern I just posted in another thread:
> tokens = ['She', "'s", 'gon', 'na', 'write', 'a', 'book', '?']
> text = '''\
> She's gonna write
> a book?'''
> import re
> tokens.sort() # lexical order
> tokens.reverse() # look for longest match first
> pattern = "|".join(map(re.escape, tokens))
> pattern = re.compile(pattern)
> I get
> print [m.span() for m in pattern.finditer(text)]
> [(0, 3), (3, 5), (6, 9), (9, 11), (12, 17), (18, 19), (20, 24), (24, 25)]
> which seems to match your version pretty well.
That's what I was looking for. Thanks!
More information about the Python-list