[Tutor] regex and parsing through a semi-csv file

Wed Oct 5 20:12:00 CEST 2011

Hi everyone,

I am post processing data from the output of simulation of activities for various radionuclide produced in a reaction at 
different times.

I have already combined the information from 13 files (containing calculated activities and errors for 13 different 
times).  The format of this combined, semi-csv file is the following:

A line with an element's name, its isotope number, and its atomic number, followed by 13 lines containing
activation time 1, activation 1, error in activation time 1
...
...

So here what the input file looks for two isotopes:

AC,225,89
3.6000e+03,1.6625e-07,2.4555e-09
8.6400e+04,0.0000e+00,-1.1455e-23
2.5920e+05,3.1615e-07,4.6695e-09
8.6400e+05,3.6457e-05,5.3847e-07
1.8000e+06,5.5137e-04,8.1437e-06
1.8036e+06,5.5047e-04,8.1304e-06
1.8864e+06,5.3279e-04,7.8693e-06
2.6640e+06,6.9672e-04,1.0291e-05
4.3920e+06,3.2737e-03,4.8353e-05
1.0440e+07,2.3830e-02,3.5197e-04
2.7720e+07,9.2184e-02,1.3616e-03
8.8200e+07,9.2184e-02,1.3616e-03
1.7460e+08,6.7440e-01,9.9609e-03
AG,111,47
3.6000e+03,1.7936e+07,3.1191e+05
8.6400e+04,7.9538e+08,1.3800e+07
2.5920e+05,2.2201e+09,3.8519e+07
8.6400e+05,5.5546e+09,9.6372e+07
1.8000e+06,7.8612e+09,1.3639e+08
1.8036e+06,7.8484e+09,1.3617e+08
1.8864e+06,7.1836e+09,1.2464e+08
2.6640e+06,3.1095e+09,5.3950e+07
4.3920e+06,4.8368e+08,8.3918e+06
1.0440e+07,7.1793e+05,1.2456e+04
2.7720e+07,5.9531e-03,1.0329e-04
8.8200e+07,5.9531e-03,1.0329e-04
1.7460e+08,0.0000e+00,0.0000e+00

Now, I would like to parse through this code and fill out 3 lists: 1) activity_time, 2) activity, 3) error, and plot the 
activities as a function of time using matplotlip.  My question specifically is on how to parse through the lines 
containing the data (activity time, activity, error) for a given isotope, stopping before reaching the next isotope's 
info.  The test I am trying in the following snippet is not working.

found_isotope = False
activity_time = []
activity = []
activity_err = []

f = open(args.fname, 'r')
for line in f.readlines():
	line = line.strip()
	if isotope_name in line and isotope_A in line:
                 print isotope_name, isotope_A
		found_isotope = True
		continue

	if found_isotope:
                 print line      		
		found = re.search(r'(\d+\.[eE][\+\-]\d+),(\d+\.[eE][\+\-]\d+),(\d+\.[eE][\+\-]\d+)', line, re.I)
		print found
		if found:
			print found.group(1), found.group(2), found.group(3)
			activity_time.append(found.group(1))
			activity.append(found.group(2))
			activity_err.append(found.group(3))
			continue
		else:
			break
f.close()

If I run the code for isotope_name: AC and isotope_A: 225, I get the following:
AC 225
3.6000e+03,1.6625e-07,2.4555e-09
None

Note that the size of the lists will change depending on the number of activities for a given run of the simulation so I 
don't want to hard code '13' as the number of lines to read in followed by the line containing isotope_name, etc.

If there is a more graceful way of doing this, please let me know as well.  I am new to python...

Thank you very much,
Mina