[Tutor] Python 3.6 Extract Floating Point Data from a Text File
Stephen P. Molnar
s.molnar at sbcglobal.net
Sun Apr 30 15:56:04 EDT 2017
On 04/30/2017 02:02 PM, Steven D'Aprano wrote:
> On Sun, Apr 30, 2017 at 06:09:12AM -0400, Stephen P. Molnar wrote:
> [...]
>> I would have managed to extract input data from another calculation (not
>> a Python program) into the following text file.
>>
>> LOEWDIN ATOMIC CHARGES
>> ----------------------
>> 0 C : -0.780631
>> 1 H : 0.114577
>> 2 Br: 0.309802
>> 3 Cl: 0.357316
>> 4 F : -0.001065
>>
>> What I need to do is extract the floating point numbers into a Python file
>
> I don't quite understand your question, but I'll take a guess. I'm going
> to assume you have a TEXT file containing literally this text:
>
> # ---- cut here ----
>
> LOEWDIN ATOMIC CHARGES
> ----------------------
> 0 C : -0.780631
> 1 H : 0.114577
> 2 Br: 0.309802
> 3 Cl: 0.357316
> 4 F : -0.001065
>
> # ---- cut here ----
>
>
> and you want to extract the atomic symbols (C, H, Br, Cl, F) and
> charges as floats. For the sake of the exercise, I'll extract them into
> a dictionary {'C': -0.780631, 'H': 0.114577, ... } then print them.
>
> Let me start by preparing the text file. Of course I could just use a
> text editor, but let's do it with Python:
>
>
> data = """LOEWDIN ATOMIC CHARGES
> ----------------------
> 0 C : -0.780631
> 1 H : 0.114577
> 2 Br: 0.309802
> 3 Cl: 0.357316
> 4 F : -0.001065
> """
>
> filename = 'datafile.txt'
> with open(filename, 'w') as f:
> f.write(data)
>
>
> (Of course, in real life, it is silly to put your text into Python just
> to write it out to a file so you can read it back in. But as a
> programming exercise, its fine.)
>
> Now let's re-read the file, processing each line, and extract the data
> we want.
>
> atomic_charges = {}
> filename = 'datafile.txt'
> with open(filename, 'r') as f:
> # Skip lines until we reach a line made of nothing but ---
> for line in f:
> line = line.strip() # ignore leading and trailing whitespace
> if set(line) == set('-'):
> break
> # Continue reading lines from where we last got to.
> for line in f:
> line = line.strip()
> if line == '':
> # Skip blank lines.
> continue
> # We expect lines to look like:
> # 1 C : 0.12345
> # where there may or may not be a space between the
> # letter and the colon. That makes it tricky to process,
> # so let's force there to always be at least one space.
> line = line.replace(':', ' :')
> # Split on spaces.
> try:
> number, symbol, colon, number = line.split()
> except ValueError as err:
> print("failed to process line:", line)
> print(err)
> continue # skip to the next line
> assert colon == ':', 'expected a colon but found something else'
> try:
> number = float(number)
> except ValueError:
> # We expected a numeric string like -0.234 or 0.123, but got
> # something else. We could skip this line, or replace it
> # with an out-of-bounds value. I'm going to use an IEEE-754
> # "Not A Number" value as the out-of-bounds value.
> number = float("NaN")
> atomic_charges[symbol] = number
>
> # Finished! Let's see what we have:
> for sym in sorted(atomic_charges):
> print(sym, atomic_charges[sym])
>
>
>
>
>
> There may be more efficient ways to process the lines, for example by
> using a regular expression. But its late, and I'm too tired to go
> messing about with regular expressions now. Perhaps somebody else will
> suggest one.
>
>
>
Steve
Thanks for your reply to my, unfortunately imprecisely worded, question.
Here are the results of applying you code to my data:
Br 0.309802
C -0.780631
Cl 0.357316
F -0.001065
H 0.114577
I should have mentioned that I already have the file, it's part if the
output from the Orca Quantum Chemistry Program.
As soon as I understand teh code I'm going to have to get rid of the
atomic symbols and get the charges in the same order as they are in the
original LOEWDIN ATOMIC CHARGES file. The Molecular Transform suite of
programs depends on the distances between pairs of bonded atoms, hence
the order is important.
Again, many thanks for your help.
Regards,
Steve
--
Stephen P. Molnar, Ph.D. Life is a fuzzy set
www.molecular-modeling.net Stochastic and multivariate
(614)312-7528 (c)
Skype: smolnar1
More information about the Tutor
mailing list