CSV writer question

Jason Swails jason.swails at gmail.com
Mon Oct 24 01:18:46 EDT 2011


Hello,

I have a question about a csv.writer instance.  I have a utility that I want
to write a full CSV file from lots of data, but due to performance (and
memory) considerations, there's no way I can write the data sequentially.
Therefore, I write the data in chunks to temporary files, then combine them
all at the end.  For convenience, I declare each writer instance via a
statement like

my_csv = csv.writer(open('temp.1.csv', 'wb'))

so the open file object isn't bound to any explicit reference, and I don't
know how to reference it inside the writer class (the documentation doesn't
say, unless I've missed the obvious).  Thus, the only way I can think of to
make sure that all of the data is written before I start copying these files
sequentially into the final file is to unbuffer them so the above command is
changed to

my_csv = csv.writer(open('temp.1.csv', 'wb', 0))

unless, of course, I add an explicit reference to track the open file object
and manually close or flush it (but I'd like to avoid it if possible).  My
question is 2-fold.  Is there a way to do that directly via the CSV API, or
is the approach I'm taking the only way without binding the open file object
to another reference?  Secondly, if these files are potentially very large
(anywhere from ~1KB to 20 GB depending on the amount of data present), what
kind of performance hit will I be looking at by disabling buffering on these
types of files?

Tips, answers, comments, and/or suggestions are all welcome.

Thanks a lot!
Jason

As an afterthought, I suppose I could always subclass the csv.writer class
and add the reference I want to that, which I may do if there's no other
convenient solution.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20111024/c5e20567/attachment.html>


More information about the Python-list mailing list