Is anyone happy with csv module?

Wed Dec 12 05:22:25 EST 2007

Thanks to everyone in this thread. As always on this newsgroup, I
learned very much.

I'm also quite embarrassed of my ignorance. Only excuse I have is that
I learned programming and Python by myself, with no formal (or
informal) education in programming. So, I am often clumsy.

On Dec 12, 1:29 am, Bruno Desthuilliers
<bdesth.quelquech... at free.quelquepart.fr> wrote:
> > I'm just trying to use the CSV module
> > and I mostly can get it working. I just think its interface is much
> > less than perfect. I'd like something I can, say, give a whole
> > dictionary in input and obtain a CSV file in output, with each key of
> > the dictionary being a column in the CSV file. Or a row, if I prefer.
> > Something like:
>
> > dict={'First':[1,2,3,4],'Second':[10,20,30,40],'Third':
> > [100,200,300,400]}
>
> <ot>
> you're shadowing the builtin 'dict' type here, which is usalluy a bad idea
> </ot>

Yes, this I know, I just overlooked it when improvising the example.

> > f=open('test.csv','w')
> > try:
> >     csv_write_dict(f,dict,keys='columns',delimiter=',')
> > finally:
> >     f.close()
>
> > and obtaining:
> > First,Second,Third
> > 1,10,100
> > 2,20,200
> > 3,30,300
> > 4,40,400
>
> Doing the needed transformation (from a column:rows dict to the required
> format) is close to trivial. So you could actually implement it
> yourself, monkeypatch the relevant csv class, and submit a patch to the
> maintainer of the module.
>
> FWIW, I never had data structured that way to pass to the csv module -
> to be true, I think I never had a case where tabular data were
> structured by columns.

FWIW, I never had data structured by row. At most, I had data
structured by *both* row and column.
Vive la différence. :)

> > Doing the same thing with the current csv module is much more
> > cumbersome: see this example fromhttp://www.oreillynet.com/onlamp/blog/2007/08/pymotw_csv.html
>
> > f = open(sys.argv[1], 'wt')
> > try:
> >     fieldnames = ('Title 1', 'Title 2', 'Title 3')
> >     writer = csv.DictWriter(f, fieldnames=fieldnames)
> >     headers = {}
> >     for n in fieldnames:
> >         headers[n] = n
> >     writer.writerow(headers)
>
> # same as the 4 lines above
> writer.writerow(dict((item, item) for item in fieldnames))
>
> >     for i in range(10):
> >         writer.writerow({ 'Title 1':i+1,
> >                           'Title 2':chr(ord('a') + i),
> >                           'Title 3':'08/%02d/07' % (i+1),
> >                           })
>
> This one looks so totally unrealistic to me - I mean, wrt/ to real-life
> use cases - that I won't even propose a rewrite.

I can frankly think of a lot of cases where this kind of pattern makes
a lot of sense, but in that case it was just for the example purpose.

> > finally:
> >     f.close()
>
> A bit of a WTF, indeed. But most of the problem is with this example
> code, not with the csv module (apologies to whoever wrote this snippet).

Thank you. Let me say it was the *best* tutorial I found online -much
better than official docs, IMHO. Maybe it is the reason I felt dizzy
when trying to use csv.

> FWIW, here's a function what you want, at least for your first use case:
>
> def csv_write_cols(writer, data):
>      keys = data.keys()
>      writer.writerow(dict(zip(keys,keys)))
>      for row in zip(*data.values()):
>          writer.writerow(dict(zip(keys, row)))

Thanks!

> Now you do what you want, but as far as I'm concerned, I wouldn't start
> a total rewrite of an otherwise working (and non-trivial) module just
> for a trivial four (4) lines function.

I fully agree. I would like to add a bit of other trivial functions,
but this is a *clear* example of csv writer usage, which I did not
find.

> Also, have you considered that your columns may as well be rows, ie:
>
> First,  1,   2,   3,   4
> Second, 10,  20,  30,  40
> Third,  100, 200, 300, 400

Doesn't play well with my data for a number of reasons. For example,
columns VS rows limits on spreadsheets.

> > Another unrelated quirk I've found is that iterating the rows read by
> > a csv reader object seems to erase the rows themselves; I have to copy
> > them in another list to use them.
>
> It's not a "quirk", Sir, it's a feature !-)
>
> The csv reader object - like file objects and a couple others - are
> iterators. In this case, it means the csv reader is smart enough to not
> read the whole file into memory - which is not necessarily what you
> want, specially for huge files - but iterating over lines as long as you
> ask for them.
>
> Note that if you need the whole thing in memory, "copying" the rows in a
> list is a no-brainer:
>    rows = list(reader)

I know. I just thought odd it was undocumented. But it's self-evident
now that I missed how iterators work.
I'll look into the issue.

> > Probably it's me not being a professional programmer,
>
> <ot>
> Not sure the professional status is key here - I mean, it just mean
> you're getting paid for it, but says nothing about your competences.
> </ot>

In the meaning that I have no formal training in it and the like.

> > so I don't
> > understand that somehow the csv module *has* to be done this way. If
> > it's so, I'd like to know about it so I can learn something.
>
> As about why it's sometimes better to not read a whole file into memory
> at once, try with multi-gigabytes and watch your system crawl to a halt.
> wrt/ csv being 'row-oriented', fact is that 1/ it's by far the most
> common use case for tabular data and 2/ it's a simple mapping from lines
> to rows (and back) - which is important wrt/ perfs and maintainability.
> Try to read a csv file "by columns", and you'll find out that you'll
> either need to read it all in memory, parse it line by line, then turn
> lines into columns (the inverse operation of my small function above),
> or to rearrange your data the way I suggested above. And let's not talk
> about writing...

Yes, I understand. I just didn't think about such usage cases. :)

Thanks to everyone,
Massimo