[Tutor] Parsing large files

Greg Gent GGent at healthcare-automation.com
Fri Feb 6 17:00:34 EST 2004


Working from an example in the python-lists archives:

Here's my simple csv file:

ID,NAME,G1,G2,G3,SERIES
243,Darren,159,183,171,513
196,Greg,136,156,183,475
198,Chris D,153,168,151,472
35,Mike,210,187,197,594
200,Chris J,232,180,168,580

Here's my py file:

#cvsread.py
import csv
 
dicts  		= []
 
inputFile = open("test.csv",  "rb")
parser = csv.reader(inputFile)
firstRec = True
for fields in parser:
	if firstRec:
 		fieldNames = fields
		firstRec   = False
	else:
		dicts.append({})
		for i,f in enumerate(fields):
			dicts[-1][fieldNames[i]] = f

print "Send this to file #1"			
for i,row in enumerate(dicts):
    print row["ID"], row["NAME"], row["SERIES"]

print "Send this to file #2"			
for i,row in enumerate(dicts):
    print row["ID"], row["NAME"], row["G1"], row["G2"], row["G3"]
#EOF

This will display:

>>> 
Send this to file #1
243 Darren 513
196 Greg 475
198 Chris D 472
35 Mike 594
200 Chris J 580
Send this to file #2
243 Darren 159 183 171
196 Greg 136 156 183
198 Chris D 153 168 151
35 Mike 210 187 197
200 Chris J 232 180 168
>>> 

This should get you pointed in the right direction...

HTH,
Greg



-----Original Message-----
From: Andrew Eidson [mailto:abeidson at sbcglobal.net]
Sent: Friday, February 06, 2004 3:40 PM
To: tutor at python.org
Subject: RE: [Tutor] Parsing large files


Ok.. I have the file being read by csvreader.. but it seams that csvwriter
can only write rows.. the file does not have field names so I am having
difficulty copying specific columns to the seperate files. any suggestions
on place for documenation.. everything I am finding has no information on
writing individual columns.

Greg Gent <GGent at healthcare-automation.com> wrote: 
RTFQ.

The OP actaully stated that he was using a subset of columns into each of
the two resulting files. Each resulting file would have the same number of
rows as the original, not of each other (which since it was stated MORE THAN
1000 rows...your suggestion of 500 wouldn't accomplish same number of rows
in each file either).

As already stated the csv module seems appropriate.

BTW,

The unix split command will not simplify this task. It may split the file
into N 500 line pieces (if you tell it to use -l 500). However, that is not
what was asked.


> -----Original Message-----
> From: Rick Pasotto [mailto:rick at niof.net]
> Sent: Friday, February 06, 2004 1:46 PM
> To: tutor at python.org
> Subject: Re: [Tutor] Parsing large files
> 
> 
> On Fri, Feb 06, 2004 at 12:33:10PM -0500, Andrew Eidson wrote:
> > I have a text file that is tab delimited.. it has 321 columns with
> > over 1000 rows.. I need to parse this out to 2 text files 
> with roughly
> > half the columns and the same number of rows. Just looking on some
> > guidance on the best way of doing this (been working on a 
> GUI so much
> > lately my brain is toast)
> 
> Why does it matter how many columns there are? Are you 
> rearranging them
> or using a subset? If not just write the first 500 lines to 
> one file and
> then the rest to another. The unix 'split' command will do this.
> 
> Don't make things more complicated than necessary.
> 
> -- 
> "All progress is based upon the universal innate desire
> on the part of every organism to live beyond its income."
> -- Samuel Butler *KH*
> Rick Pasotto rick at niof.net http://www.niof.net
> 
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

_______________________________________________
Tutor maillist - Tutor at python.org
http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list