Reading a large csv file
Mag Gam
magawake at gmail.com
Wed Jun 24 07:38:11 EDT 2009
Sorry for the delayed response. I was trying to figure this problem
out. The OS is Linux, BTW
Here is some code I have:
import numpy as np
from numpy import *
import gzip
import h5py
import re
import sys, string, time, getopt
import os
src=sys.argv[1]
fs = gzip.open(src)
x=src.split("/")
filename=x[len(x)-1]
#Get YYYY/MM/DD format
YYYY=(filename.rsplit(".",2)[0])[0:4]
MM=(filename.rsplit(".",2)[0])[4:6]
DD=(filename.rsplit(".",2)[0])[6:8]
f=h5py.File('/tmp/test_foo/FE.hdf5','w')
grp="/"+YYYY
try:
f.create_group(grp)
except ValueError:
print "Year group already exists"
grp=grp+"/"+MM
try:
f.create_group(grp)
except ValueError:
print "Month group already exists"
grp=grp+"/"+DD
try:
group=f.create_group(grp)
except ValueError:
print "Day group already exists"
str_type=h5py.new_vlen(str)
mydescriptor = {'names': ('gender','age','weight'), 'formats': ('S1',
'f4', 'f4')}
print "Filename is: ",src
fs = gzip.open(src)
dset = f.create_dataset ('Foo',data=arr,compression='gzip')
s=0
#Takes the longest here
for y in fs:
continue
a=y.split(',')
s=s+1
dset.resize(s,axis=0)
fs.close()
f.close()
This works but just takes a VERY long time.
Any way to optimize this?
TIA
On Wed, Jun 24, 2009 at 12:13 AM, Chris Withers<chris at simplistix.co.uk> wrote:
> Terry Reedy wrote:
>>
>> Mag Gam wrote:
>>>
>>> Yes, the system has 64Gig of physical memory.
>>
>> drool ;-).
>
> Well, except that, dependent on what OS he's using, the size of one process
> may well still be limited to 2GB...
>
> Chris
>
> --
> Simplistix - Content Management, Zope & Python Consulting
> - http://www.simplistix.co.uk
> --
> http://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list