CSV writer question
Peter Otten
__peter__ at web.de
Mon Oct 24 17:09:16 EDT 2011
Jason Swails wrote:
> Hello,
>
> I have a question about a csv.writer instance. I have a utility that I
> want to write a full CSV file from lots of data, but due to performance
> (and memory) considerations, there's no way I can write the data
> sequentially. Therefore, I write the data in chunks to temporary files,
> then combine them
> all at the end. For convenience, I declare each writer instance via a
> statement like
>
> my_csv = csv.writer(open('temp.1.csv', 'wb'))
>
> so the open file object isn't bound to any explicit reference, and I don't
> know how to reference it inside the writer class (the documentation
> doesn't
> say, unless I've missed the obvious). Thus, the only way I can think of
> to make sure that all of the data is written before I start copying these
> files sequentially into the final file is to unbuffer them so the above
> command is changed to
>
> my_csv = csv.writer(open('temp.1.csv', 'wb', 0))
>
> unless, of course, I add an explicit reference to track the open file
> object
> and manually close or flush it (but I'd like to avoid it if possible). My
> question is 2-fold. Is there a way to do that directly via the CSV API,
> or is the approach I'm taking the only way without binding the open file
> object
> to another reference? Secondly, if these files are potentially very large
> (anywhere from ~1KB to 20 GB depending on the amount of data present),
> what kind of performance hit will I be looking at by disabling buffering
> on these types of files?
>
> Tips, answers, comments, and/or suggestions are all welcome.
>
> Thanks a lot!
> Jason
>
> As an afterthought, I suppose I could always subclass the csv.writer class
> and add the reference I want to that, which I may do if there's no other
> convenient solution.
A contextmanager might help:
import csv
from contextlib import contextmanager
@contextmanager
def filewriter(filename):
with open(filename, "wb") as outstream:
yield csv.writer(outstream)
if __name__ == "__main__":
with filewriter("tmp.csv") as writer:
writer.writerows([
["alpha", "beta"],
["gamma", "delta"]])
More information about the Python-list
mailing list