writing results to array
Matimus
mccredie at gmail.com
Mon Dec 3 17:39:39 EST 2007
On Dec 3, 12:45 pm, Bevan Jenkins <beva... at gmail.com> wrote:
> Hello,
>
> I have recently discovered the python language and am having a lot of
> fun getting head around the basics of it.
> However, I have run into a stumbling block that I have not been able
> to overcome, so I thought I would ask for help.
> <Overview>
> I am trying to import a text file that has the following format:
> 02/01/2000 @ 00:00:00 0.983896 Q10 T2
> 03/01/2000 @ 00:00:00 0.557377 Q10 T2
> 04/01/2000 @ 00:00:00 0.508871 Q10 T2
> 05/01/2000 @ 00:00:00 0.583196 Q10 T2
> 06/01/2000 @ 00:00:00 0.518281 Q10 T2
> when there is missing data:
> 12/09/2000 @ 00:00:00 Q151 T2
> 13/09/2000 @ 00:00:00 Q151 T2
>
> I have cobbled together some code which imports the data. The next
> step is to create an array in which each column contains a years worth
> of values. Thus, if i have 6 years of data (2001-2006 inclusive),
> there will be six columns, with 365 rows (not all years have a full
> data set and may only have say 340 days of data.
> <The question>
> In the code below
> print answer[j,1] is giving me the right answer but i can't write it
> to an array.
> any suggestions welcomed.
>
> This is what I have:
> flow=[]
> flowdate=[]
> yeardate=[]
> uniqueyear=[]
> #flow_order=
> flow_rank=[]
> icount=[]
> p=[]
>
> filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
> linesep ="\n"
>
> # read in whole file
> tempdata = open( filename).read()
> # break into lines
> tempdata = string.split( tempdata, linesep )
> # for each record, get the field values
> for i in range( len( tempdata)):
> # split into the lines
> fields = string.split( tempdata[i])
> if len(fields)>5:
> flowdate.append(fields[0])
> list =string.split(fields[0],"/")
> yeardate.append(list[2])
> flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))
>
> for rows in yeardate:
> if rows not in uniqueyear:
> uniqueyear.append(rows)
>
> #print answer[:,0] #date
> flow_order=empty((0,0),dtype=float)
> #for yr in enumerate(uniqueyear):
> for iyr,yr in enumerate(uniqueyear):
> for j, val, in enumerate (answer[:,0]):
> flowyr=string.split(val,"/")
> if int(flowyr[2])==int(yr):
> print answer[j,1]
> #flow_order =
I'm not sure what you mean by `write it to an array'. `answers' is an
array. Perhaps you could show an example that has the bad behavior you
are observing. Or at least an example of what you expect to get.
Also, just a couple of pointers:
this:
> tempdata = open( filename).read()
> # break into lines
> tempdata = string.split( tempdata, linesep )
> # for each record, get the field values
> for i in range( len( tempdata)):
> # split into the lines
> fields = string.split( tempdata[i])
is better written (and usually written) in python like this:
for line in open(filename):
fields = line.split()
Don't use the string module, use the methods of the strings
themselves.
Don't use built-in type names as variable names, as seen on this line:
> list =string.split(fields[0],"/") # list is a built-in type
You only need to use enumerate if you actually want the index. If you
don't need the index, just iterate over the sequence. eg. use this:
> for yr in uniqueyear:
You don't need to re-create the column-stack each time you get a value
from the file. It is very inefficient.
eg. this:
> for i in range( len( tempdata)):
> # split into the lines
> fields = string.split( tempdata[i])
> if len(fields)>5:
> flowdate.append(fields[0])
> list =string.split(fields[0],"/")
> yeardate.append(list[2])
> flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))
to this:
> for i in range( len( tempdata)):
> # split into the lines
> fields = string.split( tempdata[i])
> if len(fields)>5:
> flowdate.append(fields[0])
> list =string.split(fields[0],"/")
> yeardate.append(list[2])
> flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))
or, with the other suggested changes:
> for line in open(filename):
> # split into the lines
> fields = line.split()
> if len(fields) > 5:
> flowdate.append(fields[0])
> year = fields[0].split("/")[2]
> yeardate.append(year)
> flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))
If I was doing this though, I would use a dictionary (dict) where the
keys are the year and the values are lists of flows for that year.
Something like this:
[code]
filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
year2flows = {}
fin = open(filename)
for line in fin:
# split into the lines
fields = line.split()
if len(fields)>5:
date = fields[0]
year = fields[0].split("/")[-1]
flow = float(fields[3])
year2flows.setdefault(year, []).append((date, flow))
fin.close()
# This does what you were doing.
for yr in sorted(year2flows.keys()):
for date, flow in year2flows[yr]
print flow
# If you just wanted one year though you could do something like this:
for date, flow in year2flows[2004]:
print flow
[/code]
The above code is untested, so I make no guarantees. If you are using
python 2.5, you might look into using defaultdict (in the collections
module). It will simplify the code a bit.
from this:
year2flows = {}
# bunch of stuff...
year2flows.setdefault(year, []).append((date, flow))
to this:
from collections import defaultdict
year2flows = defaultdict(list)
# bunch of stuff...
year2flows[year].append((date, flow))
Matt
More information about the Python-list
mailing list