aligning text with space-normalized text
Peter Otten
__peter__ at web.de
Thu Jun 30 03:07:04 EDT 2005
Steven Bethard wrote:
> I have a string with a bunch of whitespace in it, and a series of chunks
> of that string whose indices I need to find. However, the chunks have
> been whitespace-normalized, so that multiple spaces and newlines have
> been converted to single spaces as if by ' '.join(chunk.split()). Some
If you are willing to get your hands dirty with regexps:
import re
_reLump = re.compile(r"\S+")
def indices(text, chunks):
lumps = _reLump.finditer(text)
for chunk in chunks:
lump = [lumps.next() for _ in chunk.split()]
yield lump[0].start(), lump[-1].end()
def main():
text = """\
aaa bb ccc
dd eee. fff gggg
hh i.
jjj kk.
"""
chunks = ['aaa bb', 'ccc dd eee.', 'fff gggg hh i.', 'jjj', 'kk.']
assert list(indices(text, chunks)) == [(3, 10), (11, 22), (24, 40), (44,
47), (48, 51)]
if __name__ == "__main__":
main()
Not tested beyond what you see.
Peter
More information about the Python-list
mailing list