[Tutor] building table from textfile
Oscar Benjamin
oscar.j.benjamin at gmail.com
Thu Nov 28 11:32:10 CET 2013
On 27 November 2013 18:22, Ruben Guerrero <rudaguerman at gmail.com> wrote:
> Dear tutor.
>
> I am trying to build the matrix and get some other information from the
> following text files:
>
> file1:
> -O-
> 1 2 3 4 5 6
> 1 C 6.617775 -0.405794 0.371689 -0.212231 0.064402 0.064402
> 2 C -0.405794 6.617775 -0.212231 0.371689 -0.010799 -0.010799
> 3 C 0.371689 -0.212231 4.887381 -0.005309 0.263318 0.263318
> 4 C -0.212231 0.371689 -0.005309 4.887381 -0.005500 -0.005500
> 5 H 0.064402 -0.010799 0.263318 -0.005500 0.697750 -0.062986
> 6 H 0.064402 -0.010799 0.263318 -0.005500 -0.062986 0.697750
> 7 H 0.064402 -0.010799 0.263318 -0.005500 -0.062986 -0.062986
> 8 H -0.010799 0.064402 -0.005500 0.263318 0.003816 -0.001380
> 9 H -0.010799 0.064402 -0.005500 0.263318 -0.001380 -0.001380
> 10 H -0.010799 0.064402 -0.005500 0.263318 -0.001380 0.003816
> 7 8 9 10
> 1 C 0.064402 -0.010799 -0.010799 -0.010799
> 2 C -0.010799 0.064402 0.064402 0.064402
> 3 C 0.263318 -0.005500 -0.005500 -0.005500
> 4 C -0.005500 0.263318 0.263318 0.263318
> 5 H -0.062986 0.003816 -0.001380 -0.001380
> 6 H -0.062986 -0.001380 -0.001380 0.003816
> 7 H 0.697750 -0.001380 0.003816 -0.001380
> 8 H -0.001380 0.697750 -0.062986 -0.062986
> 9 H 0.003816 -0.062986 0.697750 -0.062986
> 10 H -0.001380 -0.062986 -0.062986 0.697750
I'm not sure what the problem with your code is but your method for
parsing the file is generally fragile. You should write a function
that parses the file and puts all the data into a convenient format
first *before* you do any processing with the data.
I've written a more robust parsing function for you:
#!/usr/bin/env python
import sys
import numpy as np
def read_file(inputfile):
atomnames = []
matrix = []
rows = []
nextcol = 1
for n, line in enumerate(inputfile):
lineno = '%s %r' % (n, line)
words = line.split()
# Skip this line at the beginning
if words == ['-0-']:
continue
# Either this line is column numbers
# or it is a row. If it is a row then
# the atom name is the second element.
if words[1].isdigit():
# Column numbers, check them for consistency
expected = range(nextcol, nextcol+len(words))
expected = [str(e) for e in expected]
assert words == expected, lineno
nextcol += len(words)
elif words[1].isalpha():
# A row
index, atom, numbers = words[0], words[1], words[2:]
index = int(index) - 1
numbers = [float(s) for s in numbers]
if index < len(atomnames):
# We've already seen this index
assert atomnames[index] == atom
matrix[index].extend(numbers)
elif index == len(atomnames):
# First time we see the index
atomnames.append(atom)
matrix.append(numbers)
else:
assert False, lineno
else:
# Not a column or a row
assert False, lineno
matrix = np.array(matrix, float)
return atomnames, matrix
filename = sys.argv[1]
with open(filename, 'r') as fin:
atomnames, matrix = read_file(fin)
print(atomnames)
print(matrix)
With that you should find it easier to implement the rest of the script.
Oscar
More information about the Tutor
mailing list