Is anyone happy with csv module?
massimo s.
devicerandom at gmail.com
Wed Dec 12 05:22:25 EST 2007
Thanks to everyone in this thread. As always on this newsgroup, I
learned very much.
I'm also quite embarrassed of my ignorance. Only excuse I have is that
I learned programming and Python by myself, with no formal (or
informal) education in programming. So, I am often clumsy.
On Dec 12, 1:29 am, Bruno Desthuilliers
<bdesth.quelquech... at free.quelquepart.fr> wrote:
> > I'm just trying to use the CSV module
> > and I mostly can get it working. I just think its interface is much
> > less than perfect. I'd like something I can, say, give a whole
> > dictionary in input and obtain a CSV file in output, with each key of
> > the dictionary being a column in the CSV file. Or a row, if I prefer.
> > Something like:
>
> > dict={'First':[1,2,3,4],'Second':[10,20,30,40],'Third':
> > [100,200,300,400]}
>
> <ot>
> you're shadowing the builtin 'dict' type here, which is usalluy a bad idea
> </ot>
Yes, this I know, I just overlooked it when improvising the example.
> > f=open('test.csv','w')
> > try:
> > csv_write_dict(f,dict,keys='columns',delimiter=',')
> > finally:
> > f.close()
>
> > and obtaining:
> > First,Second,Third
> > 1,10,100
> > 2,20,200
> > 3,30,300
> > 4,40,400
>
> Doing the needed transformation (from a column:rows dict to the required
> format) is close to trivial. So you could actually implement it
> yourself, monkeypatch the relevant csv class, and submit a patch to the
> maintainer of the module.
>
> FWIW, I never had data structured that way to pass to the csv module -
> to be true, I think I never had a case where tabular data were
> structured by columns.
FWIW, I never had data structured by row. At most, I had data
structured by *both* row and column.
Vive la différence. :)
> > Doing the same thing with the current csv module is much more
> > cumbersome: see this example fromhttp://www.oreillynet.com/onlamp/blog/2007/08/pymotw_csv.html
>
> > f = open(sys.argv[1], 'wt')
> > try:
> > fieldnames = ('Title 1', 'Title 2', 'Title 3')
> > writer = csv.DictWriter(f, fieldnames=fieldnames)
> > headers = {}
> > for n in fieldnames:
> > headers[n] = n
> > writer.writerow(headers)
>
> # same as the 4 lines above
> writer.writerow(dict((item, item) for item in fieldnames))
>
> > for i in range(10):
> > writer.writerow({ 'Title 1':i+1,
> > 'Title 2':chr(ord('a') + i),
> > 'Title 3':'08/%02d/07' % (i+1),
> > })
>
> This one looks so totally unrealistic to me - I mean, wrt/ to real-life
> use cases - that I won't even propose a rewrite.
I can frankly think of a lot of cases where this kind of pattern makes
a lot of sense, but in that case it was just for the example purpose.
> > finally:
> > f.close()
>
> A bit of a WTF, indeed. But most of the problem is with this example
> code, not with the csv module (apologies to whoever wrote this snippet).
Thank you. Let me say it was the *best* tutorial I found online -much
better than official docs, IMHO. Maybe it is the reason I felt dizzy
when trying to use csv.
> FWIW, here's a function what you want, at least for your first use case:
>
> def csv_write_cols(writer, data):
> keys = data.keys()
> writer.writerow(dict(zip(keys,keys)))
> for row in zip(*data.values()):
> writer.writerow(dict(zip(keys, row)))
Thanks!
> Now you do what you want, but as far as I'm concerned, I wouldn't start
> a total rewrite of an otherwise working (and non-trivial) module just
> for a trivial four (4) lines function.
I fully agree. I would like to add a bit of other trivial functions,
but this is a *clear* example of csv writer usage, which I did not
find.
> Also, have you considered that your columns may as well be rows, ie:
>
> First, 1, 2, 3, 4
> Second, 10, 20, 30, 40
> Third, 100, 200, 300, 400
Doesn't play well with my data for a number of reasons. For example,
columns VS rows limits on spreadsheets.
> > Another unrelated quirk I've found is that iterating the rows read by
> > a csv reader object seems to erase the rows themselves; I have to copy
> > them in another list to use them.
>
> It's not a "quirk", Sir, it's a feature !-)
>
> The csv reader object - like file objects and a couple others - are
> iterators. In this case, it means the csv reader is smart enough to not
> read the whole file into memory - which is not necessarily what you
> want, specially for huge files - but iterating over lines as long as you
> ask for them.
>
> Note that if you need the whole thing in memory, "copying" the rows in a
> list is a no-brainer:
> rows = list(reader)
I know. I just thought odd it was undocumented. But it's self-evident
now that I missed how iterators work.
I'll look into the issue.
> > Probably it's me not being a professional programmer,
>
> <ot>
> Not sure the professional status is key here - I mean, it just mean
> you're getting paid for it, but says nothing about your competences.
> </ot>
In the meaning that I have no formal training in it and the like.
> > so I don't
> > understand that somehow the csv module *has* to be done this way. If
> > it's so, I'd like to know about it so I can learn something.
>
> As about why it's sometimes better to not read a whole file into memory
> at once, try with multi-gigabytes and watch your system crawl to a halt.
> wrt/ csv being 'row-oriented', fact is that 1/ it's by far the most
> common use case for tabular data and 2/ it's a simple mapping from lines
> to rows (and back) - which is important wrt/ perfs and maintainability.
> Try to read a csv file "by columns", and you'll find out that you'll
> either need to read it all in memory, parse it line by line, then turn
> lines into columns (the inverse operation of my small function above),
> or to rearrange your data the way I suggested above. And let's not talk
> about writing...
Yes, I understand. I just didn't think about such usage cases. :)
Thanks to everyone,
Massimo
More information about the Python-list
mailing list