continue vs. pass in this IO reading and writing

kbtyo ahlusar.ahluwalia at gmail.com
Thu Sep 3 18:35:11 CEST 2015


On Thursday, September 3, 2015 at 12:12:04 PM UTC-4, Chris Angelico wrote:
> On Fri, Sep 4, 2015 at 1:57 AM, kbtyo <ahlusar.ahluwalia at gmail.com> wrote:
> > I have used CSV and collections. For some reason when I apply this algorithm, all of my files are not added (the output is ridiculously small considering how much goes in - think KB output vs MB input):
> >
> > from glob import iglob
> > import csv
> > from collections import OrderedDict
> >
> > files = sorted(iglob('*.csv'))
> > header = OrderedDict()
> > data = []
> >
> > for filename in files:
> >     with open(filename, 'r') as fin:
> >         csvin = csv.DictReader(fin)
> >         header.update(OrderedDict.fromkeys(csvin.fieldnames))
> >         data.append(next(csvin))
> >
> > with open('output_filename_version2.csv', 'w') as fout:
> >     csvout = csv.DictWriter(fout, fieldnames=list(header))
> >     csvout.writeheader()
> >     csvout.writerows(data)
> 
> You're collecting up just one row from each file. Since you say your
> input is measured in MB (not GB or anything bigger), the simplest
> approach is probably fine: instead of "data.append(next(csvin))", just
> use "data.extend(csvin)", which should grab them all. That'll store
> all your input data in memory, which should be fine if it's only a few
> meg, and probably not a problem for anything under a few hundred meg.
> 
> ChrisA

Hmmmm - good point. However, I may have to deal with larger files, but thank you for the tip. 

I am also wondering, based on what you stated, you are only "collecting up just one row from each file"....

I am fulfilling this, correct? 

"I have files that may have different headers. If they are different, they should be appended (along with their values) into the output. If there are duplicate headers, then their values should just be added sequentially."

I am wondering how DictReader can skip empty rows by default and that this may be happening that also extrapolates to the other rows.


More information about the Python-list mailing list