Refactor a buffered class...

George Sakkis george.sakkis at
Fri Sep 8 00:40:14 CEST 2006

Michael Spencer wrote:

> I think the two versions below each give the 'correct' output wrt to the OP's
> single test case.  I measure chunkerMS2 to be faster than chunkerGS2 across all
> chunk sizes, but this is all about the joins.
> I conclude that chunkerGS's deque beats chunkerMS's list for large chunk_size (~
>  >100).  But for joined output, chunkerMS2 beats chunkerGS2 because it does less
> joining.

Although I speculate that the OP is really concerned about the chunking
algorithm rather than an exact output format, chunkerGS2 can do better
even when the chunks must be joined. Joining each chunk can (and
should) be done only once, not every time the chunk is yielded.
chunkerGS3 outperforms chunkerMS2 even more than the original versions:

* chunk_size=3
chunkerGS3: 1.17 seconds
chunkerMS2: 1.56 seconds
* chunk_size=30
chunkerGS3: 1.26 seconds
chunkerMS2: 6.35 seconds
* chunk_size=300
chunkerGS3: 2.20 seconds
chunkerMS2: 54.51 seconds

def chunkerGS3(seq, sentry='.', chunk_size=3, keep_first=False,
     iterchunks = itersplit(seq,sentry)
     buf = deque()
     join = ' '.join
     def append(chunk):
     for chunk in islice(iterchunks, chunk_size-1):
         if keep_first:
             yield join(buf)
     for chunk in iterchunks:
         yield join(buf)
     if keep_last:
         while buf:
             yield join(buf)

>  > if you're going to profile something, better use the
>  > standard timeit module
> ...
> OT: I will when timeit grows a capability for testing live objects rather than
> 'small code snippets'.  Requiring source code input and passing arguments by
> string substitution makes it too painful for interactive work.  The need to
> specify the number of repeats is an additional annoyance.

timeit is indeed somewhat cumbersome, but having a robust bug-free
timing function is worth the inconvenience IMO.


More information about the Python-list mailing list