[Tutor] regex and parsing through a semi-csv file
Mina Nozar
nozarm at triumf.ca
Wed Oct 19 19:53:25 CEST 2011
Hello Wayne,
Thank you for your help and sorry for the delay in the response. I was caught up with other simulation jobs and didn't
get around to testing what you suggested until yesterday.
On 11-10-05 01:24 PM, Wayne Werner wrote:
> On Wed, Oct 5, 2011 at 1:12 PM, Mina Nozar <nozarm at triumf.ca <mailto:nozarm at triumf.ca>> wrote:
> I just glanced through your email, but my initial thought would be to just use regex to collect the entire segment
> that you're looking for, and then string methods to split it up:
>
> pat = re.compile('({name},{number}.*?)[A-Z]{{1,2}}'.format(name='AC', number='225'), re.DOTALL)
>
> raw_data = re.search(pat, f.read())
> if raw_data is None:
> # we didn't find the isotope, so take appropriate actions, quit or tell the user
> else:
> raw_data = raw_data.string.strip().split('\n')
>
> Then it depends on how you want to process your data, but you could easily use list comprehensions/generator expressions.
>
> The most terse syntax I know of:
>
> data = [[float(x) for x in d.split(',')] for d in raw_data if d[0].isdigit()]
> <snip>
> data will then contain a list of 3-element lists of floating point values.
> If you want to "rotate" the list, you can do data = list(zip(*data)). To illustrate:
> >>> d = [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']]
> >>> d = list(zip(*d))
> >>> d
> [('a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b'), ('c', 'c', 'c', 'c')]
> HTH,
> Wayne
I tried what you suggested above, but it doesn't work. The search doesn't start at the right place (info. following an
isotope choice input by the user and doesn't stop at the info. for the particular isotope. So basically it seems like
data gets filled in with the data for all isotopes in a given file. I made a small test file to verify this.
Can you please explain what the statement assigned to pat actually does?
At the end, I should be getting three lists, one containing the times (column 1), one containing the activities (column
2), and one containing the error in activities (column 3) for a specific isotope requested.
Thank you and best wishes,
Mina
Here is what I tried:
#! /usr/bin/env python
import re
import argparse
parser = argparse.ArgumentParser(description='Plot activities for a given isotope')
parser.add_argument('-f', action="store", dest="fname", help='The csv file name containing ctivites')
parser.add_argument('-i', action="store", dest="isotope", help='Isotope to plot activities for, eg. U-238')
args=parser.parse_args()
print 'file name:', args.fname
print 'isotope:', args.isotope
isotope_name,isotope_A = args.isotope.split('-')
print isotope_name, isotope_A
f = open(args.fname, 'r')
pat = re.compile('({name},{number}.*?)[A-Z]{{1,2}}'.format(name=isotope_name, number=isotope_A), re.DOTALL)
result = re.search(pat, f.read())
print result.string
f.close()
if result is None:
exit(args.fname+' does not contain info on '+args.isotope)
else:
result = result.string.strip().split('\n')
data = [[float(x) for x in d.split(',')] for d in result if d[0].isdigit()]
data = list(zip(*data))
for i in range(0, len(data)):
print data[i]
Input file: test.csv
# element, z, isotope, activity_time, activity, error
AC,225,89
3.6000e+03,1.6625e-07,2.4555e-09
8.6400e+04,0.0000e+00,-1.1455e-23
2.5920e+05,3.1615e-07,4.6695e-09
8.6400e+05,3.6457e-05,5.3847e-07
1.8000e+06,5.5137e-04,8.1437e-06
AG,111,47
3.6000e+03,1.7936e+07,3.1191e+05
8.6400e+04,7.9538e+08,1.3800e+07
2.5920e+05,2.2201e+09,3.8519e+07
8.6400e+05,5.5546e+09,9.6372e+07
1.8000e+06,7.8612e+09,1.3639e+08
AG,112,47
3.6000e+03,2.7591e+07,4.9498e+05
8.6400e+04,3.8637e+09,6.9315e+07
2.5920e+05,7.3492e+09,1.3184e+08
8.6400e+05,8.2493e+09,1.4799e+08
1.8000e+06,8.2528e+09,1.4806e+08
and here is what I get when I run the code: python ActivityPlots.py -f test.csv -i AG-111
file name: test.csv
isotope: AG-111
AG 111
# element, z, isotope, activity_time, activity, error
AC,225,89
3.6000e+03,1.6625e-07,2.4555e-09
8.6400e+04,0.0000e+00,-1.1455e-23
2.5920e+05,3.1615e-07,4.6695e-09
8.6400e+05,3.6457e-05,5.3847e-07
1.8000e+06,5.5137e-04,8.1437e-06
AG,111,47
3.6000e+03,1.7936e+07,3.1191e+05
8.6400e+04,7.9538e+08,1.3800e+07
2.5920e+05,2.2201e+09,3.8519e+07
8.6400e+05,5.5546e+09,9.6372e+07
1.8000e+06,7.8612e+09,1.3639e+08
AG,112,47
3.6000e+03,2.7591e+07,4.9498e+05
8.6400e+04,3.8637e+09,6.9315e+07
2.5920e+05,7.3492e+09,1.3184e+08
8.6400e+05,8.2493e+09,1.4799e+08
1.8000e+06,8.2528e+09,1.4806e+08
(3600.0, 86400.0, 259200.0, 864000.0, 1800000.0, 3600.0, 86400.0, 259200.0, 864000.0, 1800000.0, 3600.0, 86400.0,
259200.0, 864000.0, 1800000.0)
(1.6625e-07, 0.0, 3.1615e-07, 3.6457e-05, 0.00055137, 17936000.0, 795380000.0, 2220100000.0, 5554600000.0, 7861200000.0,
27591000.0, 3863700000.0, 7349200000.0, 8249300000.0, 8252800000.0)
(2.4555e-09, -1.1455e-23, 4.6695e-09, 5.3847e-07, 8.1437e-06, 311910.0, 13800000.0, 38519000.0, 96372000.0, 136390000.0,
494980.0, 69315000.0, 131840000.0, 147990000.0, 148060000.0)
More information about the Tutor
mailing list