Refactor a buffered class...
Michael Spencer
mahs at telcopartners.com
Wed Sep 6 17:35:03 EDT 2006
lh84777 at yahoo.fr wrote:
> actually for the example i have used only one sentry condition by they
> are more numerous and complex, also i need to work on a huge amount on
> data (each word are a line with many features readed from a file)
An open (text) file is a line-based iterator that can be fed directly to
'chunker'. As for different sentry conditions, I imagine they can be coded in
either model. How much is a 'huge amount' of data?
> oops
>
>> to have:
>>
>> this .
>> this . is a .
>> this . is a . test to .
>> is a . test to . check if it .
>> test to . check if it . works .
>> check if it . works . well .
>> works . well . it looks like .
> well . it looks like .
> it looks like .
>
Here's a small update to the generator that allows optional handling of the head
and the tail:
def chunker(s, chunk_size=3, sentry=".", keep_first = False, keep_last = False):
buffer=[]
sentry_count = 0
for item in s:
buffer.append(item)
if item == sentry:
sentry_count += 1
if sentry_count < chunk_size:
if keep_first:
yield buffer
else:
yield buffer
del buffer[:buffer.index(sentry)+1]
if keep_last:
while buffer:
yield buffer
del buffer[:buffer.index(sentry)+1]
>>> for p in chunker(s.split(), keep_first = True, keep_last=True): print "
".join(p)
...
this .
this . is a .
this . is a . test to .
is a . test to . check if it .
test to . check if it . works .
check if it . works . well .
works . well . it looks like .
well . it looks like .
it looks like .
>>>
More information about the Python-list
mailing list