Refactor a buffered class...
george.sakkis at gmail.com
Fri Sep 8 00:40:14 CEST 2006
Michael Spencer wrote:
> I think the two versions below each give the 'correct' output wrt to the OP's
> single test case. I measure chunkerMS2 to be faster than chunkerGS2 across all
> chunk sizes, but this is all about the joins.
> I conclude that chunkerGS's deque beats chunkerMS's list for large chunk_size (~
> >100). But for joined output, chunkerMS2 beats chunkerGS2 because it does less
Although I speculate that the OP is really concerned about the chunking
algorithm rather than an exact output format, chunkerGS2 can do better
even when the chunks must be joined. Joining each chunk can (and
should) be done only once, not every time the chunk is yielded.
chunkerGS3 outperforms chunkerMS2 even more than the original versions:
chunkerGS3: 1.17 seconds
chunkerMS2: 1.56 seconds
chunkerGS3: 1.26 seconds
chunkerMS2: 6.35 seconds
chunkerGS3: 2.20 seconds
chunkerMS2: 54.51 seconds
def chunkerGS3(seq, sentry='.', chunk_size=3, keep_first=False,
iterchunks = itersplit(seq,sentry)
buf = deque()
join = ' '.join
for chunk in islice(iterchunks, chunk_size-1):
for chunk in iterchunks:
> > if you're going to profile something, better use the
> > standard timeit module
> OT: I will when timeit grows a capability for testing live objects rather than
> 'small code snippets'. Requiring source code input and passing arguments by
> string substitution makes it too painful for interactive work. The need to
> specify the number of repeats is an additional annoyance.
timeit is indeed somewhat cumbersome, but having a robust bug-free
timing function is worth the inconvenience IMO.
More information about the Python-list