[Tutor] Execute on 200 line segments of CSV

Danny Yoo dyoo at hashcollision.org
Fri Apr 10 20:46:32 CEST 2015


>> Python newbie here; I'm trying to figure out how to get python to
>> execute code on 200-line chunks of a 5000 line CSV. I haven't a clue
>> where to start on this. So the code would execute on lines 1-200,
>> 201-400, etc.


Peter's suggestion to reuse the csv library is appropriate: you want
to reuse the csv parser that's available in the standard library.
This will give you an "iterable", an object that we can use to march
down each row of the file, using the technique that Peter describes.

---

The following is for intermediate programmers.  The "chunking" logic
can be split off from the processing of those chunks, if we take
advantage of Python's generators.  Something like this:

##################################
def chunk(iterable, chunkSize):
    """Takes an iterable, and "chunks" it into blocks."""
    currentChunk = []
    for item in iterable:
        currentChunk.append(item)
        if len(currentChunk) >= chunkSize:
            yield(currentChunk)
            currentChunk = []
    if len(currentChunk) > 0:
        yield(currentChunk)
##################################


If we have a utility like this, then we can write a natural loop on
the chunks, such as:

###################################################################
>>> blocks = chunk('thisisatestoftheemergencybroadcastsystem', 5)
>>> for b in blocks:
...     print b
...
['t', 'h', 'i', 's', 'i']
['s', 'a', 't', 'e', 's']
['t', 'o', 'f', 't', 'h']
['e', 'e', 'm', 'e', 'r']
['g', 'e', 'n', 'c', 'y']
['b', 'r', 'o', 'a', 'd']
['c', 'a', 's', 't', 's']
['y', 's', 't', 'e', 'm']
###################################################################

Splitting of the chunking logic like this should allow us to separate
one concern, the chunking, from the primary concern, the processing of
each chunk.


More information about the Tutor mailing list