Can python read up to where a certain pattern is matched?
F. Petitjean
littlejohn.75 at news.noos.fr
Sun Mar 7 17:35:55 EST 2004
On Fri, 5 Mar 2004, Anthony Liu <antonyliu2002 at yahoo.com> wrote:
> I am kinda new to Python, but not new to programming.
>
> I don't want to read line after line, neither do I
> want to read the whole file all at once. Thus none of
> read(), readline(), readlines() is what I want. I want
> to read a text file sentence by sentence.
>
> A sentence by definition is roughly the part between a
> full stop and another full stop or !, ?
>
> So, for example, for the following text:
>
> "Some words here, and some other words. Then another
> segment follows, and more. This is a question, a junk
> question, followed by a question mark?"
>
> It has 3 sentences (2 full stops and 1 question mark),
> snip
> How can I achieve this? Do we have a readsentence()
> function?
>
> Please give a hint. Thank you!
>
the hint :
import itertools
help(itertool.takewhile)
# not tested (no python 2.3 on Debian gateway at home)
import itertools
def readsentence(iterable, ends = (".", "!", "?"), yield_fn=''.join):
"""generator function which yields sentences terminated by ends"""
end_pred = ends
if not callable(ends):
end_pred = lambda c : c not in ends
it = iter(iterable)
while True:
sentence = []
add = sentence.append
for c in itertools.takewhile(end_pred, it)
add(c)
# How to have the item skipped by takewhile ?
t = tuple(sentence)
if callable(yield_fn):
t = yield_fn(t)
yield t
text = """\
Some words here, and some other words. Then another
segment follows, and more. This is a question, a junk
question, followed by a question mark?"""
for sentence in readsentence(text):
print sentence
More information about the Python-list
mailing list