Regular expression help

Nick Dumas drakonik at gmail.com
Fri Jul 18 10:35:12 EDT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I think you're over-complicating this. I'm assuming that you're going to
do a line graph of some sorta, and each new line of the file contains a
new set of data.

The problem you mentioned with your regex returning a match object
rather than a string is because you're simply using a re function that
doesn't return strings. re.findall() is what you want. That being said,
here is working code to mine data from your file.

[code]
line = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107
mad=-3.597647 3pv=0'

energypat = r'\betot=(-?\d*?[.]\d*)'

#Note: To change the data grabbed from the line, you can change the
#'etot' to 'afrac' or 'emad' or anything that doesn't contain a regex
#special character.

energypat = re.compile(energypat)

re.findall(energypat, line)# returns a STRING containing '-12.020107'

[/code]

This returns a string, which is easy enough to convert to an int. After
that, you can datapoints.append() to your heart's content. Good luck
with your work.

nclbndk759 at googlemail.com wrote:
> Hello,
> 
> I am new to Python, with a background in scientific computing. I'm
> trying to write a script that will take a file with lines like
> 
> c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
> 3pv=0
> 
> extract the values of afrac and etot and plot them. I'm really
> struggling with getting the values of efrac and etot. So far I have
> come up with (small snippet of script just to get the energy, etot):
> 
> def get_data_points(filename):
>     file = open(filename,'r')
>     data_points = []
>     while 1:
>         line = file.readline()
>         if not line: break
>         energy = get_total_energy(line)
>         data_points.append(energy)
>     return data_points
> 
> def get_total_energy(line):
>     rawstr = r"""(?P<key>.*?)=(?P<value>.*?)\s"""
>     p = re.compile(rawstr)
>     return p.match(line,5)
> 
> What is being stored in energy is '<_sre.SRE_Match object at
> 0x2a955e4ed0>', not '-11.020107'. Why? I've been struggling with
> regular expressions for two days now, with no luck. Could someone
> please put me out of my misery and give me a clue as to what's going
> on? Apologies if it's blindingly obvious or if this question has been
> asked and answered before.
> 
> Thanks,
> 
> Nicole
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkiAqiAACgkQLMI5fndAv9h7HgCfU6a7v1nE5iLYcUPbXhC6sfU7
mpkAn1Q/DyOI4Zo7QJhF9zqfqCq6boXv
=L2VZ
-----END PGP SIGNATURE-----



More information about the Python-list mailing list