[Tutor] regex and parsing through a semi-csv file
Mina Nozar
nozarm at triumf.ca
Wed Oct 5 20:12:00 CEST 2011
Hi everyone,
I am post processing data from the output of simulation of activities for various radionuclide produced in a reaction at
different times.
I have already combined the information from 13 files (containing calculated activities and errors for 13 different
times). The format of this combined, semi-csv file is the following:
A line with an element's name, its isotope number, and its atomic number, followed by 13 lines containing
activation time 1, activation 1, error in activation time 1
...
...
So here what the input file looks for two isotopes:
AC,225,89
3.6000e+03,1.6625e-07,2.4555e-09
8.6400e+04,0.0000e+00,-1.1455e-23
2.5920e+05,3.1615e-07,4.6695e-09
8.6400e+05,3.6457e-05,5.3847e-07
1.8000e+06,5.5137e-04,8.1437e-06
1.8036e+06,5.5047e-04,8.1304e-06
1.8864e+06,5.3279e-04,7.8693e-06
2.6640e+06,6.9672e-04,1.0291e-05
4.3920e+06,3.2737e-03,4.8353e-05
1.0440e+07,2.3830e-02,3.5197e-04
2.7720e+07,9.2184e-02,1.3616e-03
8.8200e+07,9.2184e-02,1.3616e-03
1.7460e+08,6.7440e-01,9.9609e-03
AG,111,47
3.6000e+03,1.7936e+07,3.1191e+05
8.6400e+04,7.9538e+08,1.3800e+07
2.5920e+05,2.2201e+09,3.8519e+07
8.6400e+05,5.5546e+09,9.6372e+07
1.8000e+06,7.8612e+09,1.3639e+08
1.8036e+06,7.8484e+09,1.3617e+08
1.8864e+06,7.1836e+09,1.2464e+08
2.6640e+06,3.1095e+09,5.3950e+07
4.3920e+06,4.8368e+08,8.3918e+06
1.0440e+07,7.1793e+05,1.2456e+04
2.7720e+07,5.9531e-03,1.0329e-04
8.8200e+07,5.9531e-03,1.0329e-04
1.7460e+08,0.0000e+00,0.0000e+00
Now, I would like to parse through this code and fill out 3 lists: 1) activity_time, 2) activity, 3) error, and plot the
activities as a function of time using matplotlip. My question specifically is on how to parse through the lines
containing the data (activity time, activity, error) for a given isotope, stopping before reaching the next isotope's
info. The test I am trying in the following snippet is not working.
found_isotope = False
activity_time = []
activity = []
activity_err = []
f = open(args.fname, 'r')
for line in f.readlines():
line = line.strip()
if isotope_name in line and isotope_A in line:
print isotope_name, isotope_A
found_isotope = True
continue
if found_isotope:
print line
found = re.search(r'(\d+\.[eE][\+\-]\d+),(\d+\.[eE][\+\-]\d+),(\d+\.[eE][\+\-]\d+)', line, re.I)
print found
if found:
print found.group(1), found.group(2), found.group(3)
activity_time.append(found.group(1))
activity.append(found.group(2))
activity_err.append(found.group(3))
continue
else:
break
f.close()
If I run the code for isotope_name: AC and isotope_A: 225, I get the following:
AC 225
3.6000e+03,1.6625e-07,2.4555e-09
None
Note that the size of the lists will change depending on the number of activities for a given run of the simulation so I
don't want to hard code '13' as the number of lines to read in followed by the line containing isotope_name, etc.
If there is a more graceful way of doing this, please let me know as well. I am new to python...
Thank you very much,
Mina
More information about the Tutor
mailing list