[Tutor] Processing CSV files

Dave Angel davea at davea.name
Tue Oct 8 23:28:17 CEST 2013


On 8/10/2013 16:46, Leena Gupta wrote:

> Hello,
>
> Looking for some inputs on Python's csv processing feature.
>
> I need to process a large csv file every 5-10 minutes. The file could
> contain 3mill to 10 mill rows and size could be 6MB to 10MB(+). As part of
> the processing, I need to sum up a number value by grouping on certain
> attributes and store the output in a datastore. I wanted to know if Python
> is recommended and can it be used for processing data in csv files of this
> size? Any issues that we need to be aware of? I believe Python has a csv
> library as well.
>
> Thanks!
>
>
> <div dir="ltr">Hello,<br><br>Looking for some inputs on Python's csv processing feature.<br><br>I need to process a large csv file every 5-10 minutes. The file could contain 3mill to 10 mill rows and size could be 6MB to 10MB(+). As part of the processing, I need to sum up a number value by grouping on certain attributes and store the output in a datastore. I wanted to know if Python is recommended and can it be used for processing data in csv files of this size? Any issues that we need to be aware of? I believe Python has a csv library as well.<br>
> <br>Thanks!<br></div>
>

Please use text messages here, not html.  It not only wastes space, but
frequently messes up formatting.

Python's csv logic should have no problem dealing with a file of 10
million rows.  As long as you're not trying to keep all 10 million of
them in some internal data structure, the csv logic will deal you a row
at a time, in a most incremental fashion.

Just make sure the particular datastore you require is supported in
Python.


-- 
DaveA





More information about the Tutor mailing list