[Tutor] building table from textfile

Oscar Benjamin oscar.j.benjamin at gmail.com
Thu Nov 28 11:32:10 CET 2013


On 27 November 2013 18:22, Ruben Guerrero <rudaguerman at gmail.com> wrote:
> Dear tutor.
>
> I am trying to build the matrix and get some other information from the
> following text files:
>
> file1:
> -O-
>               1          2          3          4          5          6
>      1  C    6.617775  -0.405794   0.371689  -0.212231   0.064402   0.064402
>      2  C   -0.405794   6.617775  -0.212231   0.371689  -0.010799  -0.010799
>      3  C    0.371689  -0.212231   4.887381  -0.005309   0.263318   0.263318
>      4  C   -0.212231   0.371689  -0.005309   4.887381  -0.005500  -0.005500
>      5  H    0.064402  -0.010799   0.263318  -0.005500   0.697750  -0.062986
>      6  H    0.064402  -0.010799   0.263318  -0.005500  -0.062986   0.697750
>      7  H    0.064402  -0.010799   0.263318  -0.005500  -0.062986  -0.062986
>      8  H   -0.010799   0.064402  -0.005500   0.263318   0.003816  -0.001380
>      9  H   -0.010799   0.064402  -0.005500   0.263318  -0.001380  -0.001380
>     10  H   -0.010799   0.064402  -0.005500   0.263318  -0.001380   0.003816
>               7          8          9         10
>      1  C    0.064402  -0.010799  -0.010799  -0.010799
>      2  C   -0.010799   0.064402   0.064402   0.064402
>      3  C    0.263318  -0.005500  -0.005500  -0.005500
>      4  C   -0.005500   0.263318   0.263318   0.263318
>      5  H   -0.062986   0.003816  -0.001380  -0.001380
>      6  H   -0.062986  -0.001380  -0.001380   0.003816
>      7  H    0.697750  -0.001380   0.003816  -0.001380
>      8  H   -0.001380   0.697750  -0.062986  -0.062986
>      9  H    0.003816  -0.062986   0.697750  -0.062986
>     10  H   -0.001380  -0.062986  -0.062986   0.697750

I'm not sure what the problem with your code is but your method for
parsing the file is generally fragile. You should write a function
that parses the file and puts all the data into a convenient format
first *before* you do any processing with the data.

I've written a more robust parsing function for you:

#!/usr/bin/env python

import sys
import numpy as np

def read_file(inputfile):
    atomnames = []
    matrix = []
    rows = []
    nextcol = 1
    for n, line in enumerate(inputfile):
        lineno = '%s %r' % (n, line)
        words = line.split()
        # Skip this line at the beginning
        if words == ['-0-']:
            continue
        # Either this line is column numbers
        # or it is a row. If it is a row then
        # the atom name is the second element.
        if words[1].isdigit():
            # Column numbers, check them for consistency
            expected = range(nextcol, nextcol+len(words))
            expected = [str(e) for e in expected]
            assert words == expected, lineno
            nextcol += len(words)
        elif words[1].isalpha():
            # A row
            index, atom, numbers = words[0], words[1], words[2:]
            index = int(index) - 1
            numbers = [float(s) for s in numbers]
            if index < len(atomnames):
                # We've already seen this index
                assert atomnames[index] == atom
                matrix[index].extend(numbers)
            elif index == len(atomnames):
                # First time we see the index
                atomnames.append(atom)
                matrix.append(numbers)
            else:
                assert False, lineno
        else:
            # Not a column or a row
            assert False, lineno
    matrix = np.array(matrix, float)
    return atomnames, matrix

filename = sys.argv[1]
with open(filename, 'r') as fin:
    atomnames, matrix = read_file(fin)

print(atomnames)
print(matrix)

With that you should find it easier to implement the rest of the script.


Oscar


More information about the Tutor mailing list