Can python read up to where a certain pattern is matched?

F. Petitjean littlejohn.75 at news.noos.fr
Sun Mar 7 23:35:55 CET 2004


On Fri, 5 Mar 2004, Anthony Liu <antonyliu2002 at yahoo.com> wrote:
> I am kinda new to Python, but not new to programming. 
> 
> I don't want to read line after line, neither do I
> want to read the whole file all at once.  Thus none of
> read(), readline(), readlines() is what I want. I want
> to read a text file sentence by sentence. 
> 
> A sentence by definition is roughly the part between a
> full stop and another full stop or !, ?
> 
> So, for example, for the following text:
> 
> "Some words here, and some other words. Then another
> segment follows, and more. This is a question, a junk
> question, followed by a question mark?"
> 
> It has 3 sentences (2 full stops and 1 question mark),
>  snip
> How can I achieve this?  Do we have a readsentence()
> function?
> 
> Please give a hint.  Thank you!
> 
the hint :
import itertools
help(itertool.takewhile)

# not tested (no python 2.3 on Debian gateway at home)

import itertools
def readsentence(iterable, ends = (".", "!", "?"), yield_fn=''.join):
    """generator function which yields sentences terminated by ends"""
    end_pred = ends
    if not callable(ends):
        end_pred = lambda c : c not in ends
    it = iter(iterable)
    while True:
        sentence = []
        add = sentence.append
        for c in itertools.takewhile(end_pred, it)
            add(c)
        # How to have the item skipped by takewhile ?
        t = tuple(sentence)
        if callable(yield_fn):
            t = yield_fn(t)
        yield t

text = """\
Some words here, and some other words. Then another
segment follows, and more. This is a question, a junk
question, followed by a question mark?"""

for sentence in readsentence(text):
    print sentence



More information about the Python-list mailing list