[Tutor] Python 3.6 Extract Floating Point Data from a Text File
Steven D'Aprano
steve at pearwood.info
Sun Apr 30 14:02:40 EDT 2017
On Sun, Apr 30, 2017 at 06:09:12AM -0400, Stephen P. Molnar wrote:
[...]
> I would have managed to extract input data from another calculation (not
> a Python program) into the following text file.
>
> LOEWDIN ATOMIC CHARGES
> ----------------------
> 0 C : -0.780631
> 1 H : 0.114577
> 2 Br: 0.309802
> 3 Cl: 0.357316
> 4 F : -0.001065
>
> What I need to do is extract the floating point numbers into a Python file
I don't quite understand your question, but I'll take a guess. I'm going
to assume you have a TEXT file containing literally this text:
# ---- cut here ----
LOEWDIN ATOMIC CHARGES
----------------------
0 C : -0.780631
1 H : 0.114577
2 Br: 0.309802
3 Cl: 0.357316
4 F : -0.001065
# ---- cut here ----
and you want to extract the atomic symbols (C, H, Br, Cl, F) and
charges as floats. For the sake of the exercise, I'll extract them into
a dictionary {'C': -0.780631, 'H': 0.114577, ... } then print them.
Let me start by preparing the text file. Of course I could just use a
text editor, but let's do it with Python:
data = """LOEWDIN ATOMIC CHARGES
----------------------
0 C : -0.780631
1 H : 0.114577
2 Br: 0.309802
3 Cl: 0.357316
4 F : -0.001065
"""
filename = 'datafile.txt'
with open(filename, 'w') as f:
f.write(data)
(Of course, in real life, it is silly to put your text into Python just
to write it out to a file so you can read it back in. But as a
programming exercise, its fine.)
Now let's re-read the file, processing each line, and extract the data
we want.
atomic_charges = {}
filename = 'datafile.txt'
with open(filename, 'r') as f:
# Skip lines until we reach a line made of nothing but ---
for line in f:
line = line.strip() # ignore leading and trailing whitespace
if set(line) == set('-'):
break
# Continue reading lines from where we last got to.
for line in f:
line = line.strip()
if line == '':
# Skip blank lines.
continue
# We expect lines to look like:
# 1 C : 0.12345
# where there may or may not be a space between the
# letter and the colon. That makes it tricky to process,
# so let's force there to always be at least one space.
line = line.replace(':', ' :')
# Split on spaces.
try:
number, symbol, colon, number = line.split()
except ValueError as err:
print("failed to process line:", line)
print(err)
continue # skip to the next line
assert colon == ':', 'expected a colon but found something else'
try:
number = float(number)
except ValueError:
# We expected a numeric string like -0.234 or 0.123, but got
# something else. We could skip this line, or replace it
# with an out-of-bounds value. I'm going to use an IEEE-754
# "Not A Number" value as the out-of-bounds value.
number = float("NaN")
atomic_charges[symbol] = number
# Finished! Let's see what we have:
for sym in sorted(atomic_charges):
print(sym, atomic_charges[sym])
There may be more efficient ways to process the lines, for example by
using a regular expression. But its late, and I'm too tired to go
messing about with regular expressions now. Perhaps somebody else will
suggest one.
--
Steve
More information about the Tutor
mailing list