[Tutor] Python 3.6 Extract Floating Point Data from a Text File

Sun Apr 30 14:02:40 EDT 2017

On Sun, Apr 30, 2017 at 06:09:12AM -0400, Stephen P. Molnar wrote:
[...]
> I would have managed to extract input data from another calculation (not 
> a Python program) into the following text file.
> 
> LOEWDIN ATOMIC CHARGES
>  ----------------------
>     0 C :   -0.780631
>     1 H :    0.114577
>     2 Br:    0.309802
>     3 Cl:    0.357316
>     4 F :   -0.001065
> 
> What I need to do is extract the floating point numbers into a Python file

I don't quite understand your question, but I'll take a guess. I'm going 
to assume you have a TEXT file containing literally this text:

# ---- cut here ----

LOEWDIN ATOMIC CHARGES
----------------------
   0 C :   -0.780631
   1 H :    0.114577
   2 Br:    0.309802
   3 Cl:    0.357316
   4 F :   -0.001065

# ---- cut here ----

and you want to extract the atomic symbols (C, H, Br, Cl, F) and 
charges as floats. For the sake of the exercise, I'll extract them into 
a dictionary {'C': -0.780631, 'H': 0.114577, ... } then print them.

Let me start by preparing the text file. Of course I could just use a 
text editor, but let's do it with Python:

data = """LOEWDIN ATOMIC CHARGES
----------------------
   0 C :   -0.780631
   1 H :    0.114577
   2 Br:    0.309802
   3 Cl:    0.357316
   4 F :   -0.001065
"""

filename = 'datafile.txt'
with open(filename, 'w') as f:
    f.write(data)

(Of course, in real life, it is silly to put your text into Python just 
to write it out to a file so you can read it back in. But as a 
programming exercise, its fine.)

Now let's re-read the file, processing each line, and extract the data 
we want.

atomic_charges = {}
filename = 'datafile.txt'
with open(filename, 'r') as f:
    # Skip lines until we reach a line made of nothing but ---
    for line in f:
        line = line.strip()  # ignore leading and trailing whitespace
        if set(line) == set('-'):
            break
    # Continue reading lines from where we last got to.
    for line in f:
        line = line.strip()
        if line == '': 
            # Skip blank lines.
            continue
        # We expect lines to look like:
        #   1 C :   0.12345
        # where there may or may not be a space between the 
        # letter and the colon. That makes it tricky to process,
        # so let's force there to always be at least one space.
        line = line.replace(':', ' :')
        # Split on spaces.
        try:
            number, symbol, colon, number = line.split()
        except ValueError as err:
            print("failed to process line:", line)
            print(err)
            continue  # skip to the next line
        assert colon == ':', 'expected a colon but found something else'
        try:
            number = float(number)
        except ValueError:
            # We expected a numeric string like -0.234 or 0.123, but got
            # something else. We could skip this line, or replace it
            # with an out-of-bounds value. I'm going to use an IEEE-754
            # "Not A Number" value as the out-of-bounds value.
            number = float("NaN")
        atomic_charges[symbol] = number

# Finished! Let's see what we have:
for sym in sorted(atomic_charges):
    print(sym, atomic_charges[sym])

There may be more efficient ways to process the lines, for example by 
using a regular expression. But its late, and I'm too tired to go 
messing about with regular expressions now. Perhaps somebody else will 
suggest one.

-- 
Steve