[Tutor] Python 3.6 Extract Floating Point Data from a Text File

Sun Apr 30 15:56:04 EDT 2017

On 04/30/2017 02:02 PM, Steven D'Aprano wrote:
> On Sun, Apr 30, 2017 at 06:09:12AM -0400, Stephen P. Molnar wrote:
> [...]
>> I would have managed to extract input data from another calculation (not
>> a Python program) into the following text file.
>>
>> LOEWDIN ATOMIC CHARGES
>>   ----------------------
>>      0 C :   -0.780631
>>      1 H :    0.114577
>>      2 Br:    0.309802
>>      3 Cl:    0.357316
>>      4 F :   -0.001065
>>
>> What I need to do is extract the floating point numbers into a Python file
>
> I don't quite understand your question, but I'll take a guess. I'm going
> to assume you have a TEXT file containing literally this text:
>
> # ---- cut here ----
>
> LOEWDIN ATOMIC CHARGES
> ----------------------
>     0 C :   -0.780631
>     1 H :    0.114577
>     2 Br:    0.309802
>     3 Cl:    0.357316
>     4 F :   -0.001065
>
> # ---- cut here ----
>
>
> and you want to extract the atomic symbols (C, H, Br, Cl, F) and
> charges as floats. For the sake of the exercise, I'll extract them into
> a dictionary {'C': -0.780631, 'H': 0.114577, ... } then print them.
>
> Let me start by preparing the text file. Of course I could just use a
> text editor, but let's do it with Python:
>
>
> data = """LOEWDIN ATOMIC CHARGES
> ----------------------
>     0 C :   -0.780631
>     1 H :    0.114577
>     2 Br:    0.309802
>     3 Cl:    0.357316
>     4 F :   -0.001065
> """
>
> filename = 'datafile.txt'
> with open(filename, 'w') as f:
>      f.write(data)
>
>
> (Of course, in real life, it is silly to put your text into Python just
> to write it out to a file so you can read it back in. But as a
> programming exercise, its fine.)
>
> Now let's re-read the file, processing each line, and extract the data
> we want.
>
> atomic_charges = {}
> filename = 'datafile.txt'
> with open(filename, 'r') as f:
>      # Skip lines until we reach a line made of nothing but ---
>      for line in f:
>          line = line.strip()  # ignore leading and trailing whitespace
>          if set(line) == set('-'):
>              break
>      # Continue reading lines from where we last got to.
>      for line in f:
>          line = line.strip()
>          if line == '':
>              # Skip blank lines.
>              continue
>          # We expect lines to look like:
>          #   1 C :   0.12345
>          # where there may or may not be a space between the
>          # letter and the colon. That makes it tricky to process,
>          # so let's force there to always be at least one space.
>          line = line.replace(':', ' :')
>          # Split on spaces.
>          try:
>              number, symbol, colon, number = line.split()
>          except ValueError as err:
>              print("failed to process line:", line)
>              print(err)
>              continue  # skip to the next line
>          assert colon == ':', 'expected a colon but found something else'
>          try:
>              number = float(number)
>          except ValueError:
>              # We expected a numeric string like -0.234 or 0.123, but got
>              # something else. We could skip this line, or replace it
>              # with an out-of-bounds value. I'm going to use an IEEE-754
>              # "Not A Number" value as the out-of-bounds value.
>              number = float("NaN")
>          atomic_charges[symbol] = number
>
> # Finished! Let's see what we have:
> for sym in sorted(atomic_charges):
>      print(sym, atomic_charges[sym])
>
>
>
>
>
> There may be more efficient ways to process the lines, for example by
> using a regular expression. But its late, and I'm too tired to go
> messing about with regular expressions now. Perhaps somebody else will
> suggest one.
>
>
>
Steve

Thanks for your reply to my, unfortunately imprecisely worded, question.

Here are the results of applying you code to my data:

Br 0.309802
C -0.780631
Cl 0.357316
F -0.001065
H 0.114577

I should have mentioned that I already have the file, it's part if the 
output from the Orca Quantum Chemistry Program.

As soon as I understand teh code I'm going to have to get rid of the 
atomic symbols and get the charges in the same order as they are in the 
original LOEWDIN ATOMIC CHARGES file.  The Molecular Transform suite of 
programs depends on the distances between pairs of bonded atoms, hence 
the order is important.

Again, many thanks for your help.

Regards,

	Steve

-- 
Stephen P. Molnar, Ph.D.		Life is a fuzzy set
www.molecular-modeling.net		Stochastic and multivariate
(614)312-7528 (c)
Skype: smolnar1