Is anyone happy with csv module?
Bruno Desthuilliers
bdesth.quelquechose at free.quelquepart.fr
Tue Dec 11 19:29:15 EST 2007
massimo s. a écrit :
> On 11 Dic, 20:24, "Guilherme Polo" <ggp... at gmail.com> wrote:
>
>
>>Post your actual problem so you can get more accurate help.
>
>
> Hi Guilhermo,
> I have not an actual problem.
Yes you do - even if you don't realize it yet !-)
> I'm just trying to use the CSV module
> and I mostly can get it working. I just think its interface is much
> less than perfect. I'd like something I can, say, give a whole
> dictionary in input and obtain a CSV file in output, with each key of
> the dictionary being a column in the CSV file. Or a row, if I prefer.
> Something like:
>
> dict={'First':[1,2,3,4],'Second':[10,20,30,40],'Third':
> [100,200,300,400]}
<ot>
you're shadowing the builtin 'dict' type here, which is usalluy a bad idea
</ot>
> f=open('test.csv','w')
> try:
> csv_write_dict(f,dict,keys='columns',delimiter=',')
> finally:
> f.close()
>
> and obtaining:
> First,Second,Third
> 1,10,100
> 2,20,200
> 3,30,300
> 4,40,400
Doing the needed transformation (from a column:rows dict to the required
format) is close to trivial. So you could actually implement it
yourself, monkeypatch the relevant csv class, and submit a patch to the
maintainer of the module.
FWIW, I never had data structured that way to pass to the csv module -
to be true, I think I never had a case where tabular data were
structured by columns.
> Doing the same thing with the current csv module is much more
> cumbersome: see this example from http://www.oreillynet.com/onlamp/blog/2007/08/pymotw_csv.html
>
> f = open(sys.argv[1], 'wt')
> try:
> fieldnames = ('Title 1', 'Title 2', 'Title 3')
> writer = csv.DictWriter(f, fieldnames=fieldnames)
> headers = {}
> for n in fieldnames:
> headers[n] = n
> writer.writerow(headers)
# same as the 4 lines above
writer.writerow(dict((item, item) for item in fieldnames))
> for i in range(10):
> writer.writerow({ 'Title 1':i+1,
> 'Title 2':chr(ord('a') + i),
> 'Title 3':'08/%02d/07' % (i+1),
> })
This one looks so totally unrealistic to me - I mean, wrt/ to real-life
use cases - that I won't even propose a rewrite.
> finally:
> f.close()
A bit of a WTF, indeed. But most of the problem is with this example
code, not with the csv module (apologies to whoever wrote this snippet).
FWIW, here's a function what you want, at least for your first use case:
def csv_write_cols(writer, data):
keys = data.keys()
writer.writerow(dict(zip(keys,keys)))
for row in zip(*data.values()):
writer.writerow(dict(zip(keys, row)))
Now you do what you want, but as far as I'm concerned, I wouldn't start
a total rewrite of an otherwise working (and non-trivial) module just
for a trivial four (4) lines function.
Also, have you considered that your columns may as well be rows, ie:
First, 1, 2, 3, 4
Second, 10, 20, 30, 40
Third, 100, 200, 300, 400
>
> Another unrelated quirk I've found is that iterating the rows read by
> a csv reader object seems to erase the rows themselves; I have to copy
> them in another list to use them.
It's not a "quirk", Sir, it's a feature !-)
The csv reader object - like file objects and a couple others - are
iterators. In this case, it means the csv reader is smart enough to not
read the whole file into memory - which is not necessarily what you
want, specially for huge files - but iterating over lines as long as you
ask for them.
Note that if you need the whole thing in memory, "copying" the rows in a
list is a no-brainer:
rows = list(reader)
> Probably it's me not being a professional programmer,
<ot>
Not sure the professional status is key here - I mean, it just mean
you're getting paid for it, but says nothing about your competences.
</ot>
> so I don't
> understand that somehow the csv module *has* to be done this way. If
> it's so, I'd like to know about it so I can learn something.
As about why it's sometimes better to not read a whole file into memory
at once, try with multi-gigabytes and watch your system crawl to a halt.
wrt/ csv being 'row-oriented', fact is that 1/ it's by far the most
common use case for tabular data and 2/ it's a simple mapping from lines
to rows (and back) - which is important wrt/ perfs and maintainability.
Try to read a csv file "by columns", and you'll find out that you'll
either need to read it all in memory, parse it line by line, then turn
lines into columns (the inverse operation of my small function above),
or to rearrange your data the way I suggested above. And let's not talk
about writing...
Now I don't mean there's no room for improvement in the csv module -
there almost always is - but given the usefulness of this module in a
programmer's daily life, it would probably have been superseded by
something better if it wasn't at least perceived as good enough by it's
users.
HTH
More information about the Python-list
mailing list