[Tutor] Euclidean Distances between Atoms in a Molecule.

Matt Ruffalo mruffalo at cs.cmu.edu
Sun Apr 2 19:32:26 EDT 2017


Hi Stephen-

The `scipy.spatial.distance` module (part of the SciPy package) contains
what you will need -- specifically, the `scipy.spatial.distance.pdist`
function, which takes a matrix of m observations in n-dimensional space,
and returns a condensed distance matrix as described in
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html
. This condensed distance matrix can be expanded into a full m by m
matrix with `scipy.spatial.distance.squareform` as follows:

"""
In [1]: import pandas as pd

In [2]: from io import StringIO

In [3]: s = StringIO('''
   ...:       MASS         X         Y         Z
   ...: 0   12.011 -3.265636  0.198894  0.090858
   ...: 1   12.011 -1.307161  1.522212  1.003463
   ...: 2   12.011  1.213336  0.948208 -0.033373
   ...: 3   14.007  3.238650  1.041523  1.301322
   ...: 4   12.011 -5.954489  0.650878  0.803379
   ...: 5   12.011  5.654476  0.480066  0.013757
   ...: 6   12.011  6.372043  2.731713 -1.662411
   ...: 7   12.011  7.655753  0.168393  2.096802
   ...: 8   12.011  5.563051 -1.990203 -1.511875
   ...: 9    1.008 -2.939469 -1.327967 -1.247635
   ...: 10   1.008 -1.460475  2.993912  2.415410
   ...: 11   1.008  1.218042  0.451815 -2.057439
   ...: 12   1.008 -6.255901  2.575035  1.496984
   ...: 13   1.008 -6.560562 -0.695722  2.248982
   ...: 14   1.008 -7.152500  0.390758 -0.864115
   ...: 15   1.008  4.959548  3.061356 -3.139100
   ...: 16   1.008  8.197613  2.429073 -2.588339
   ...: 17   1.008  6.503322  4.471092 -0.543939
   ...: 18   1.008  7.845274  1.892126  3.227577
   ...: 19   1.008  9.512371 -0.273198  1.291080
   ...: 20   1.008  7.147039 -1.365346  3.393778
   ...: 21   1.008  4.191488 -1.928466 -3.057804
   ...: 22   1.008  5.061650 -3.595015 -0.302810
   ...: 23   1.008  7.402586 -2.392148 -2.374554
   ...: ''')

In [4]: d = pd.read_table(s, sep='\\s+', index_col=0)

In [5]: d.head()
Out[5]:
     MASS         X         Y         Z
0  12.011 -3.265636  0.198894  0.090858
1  12.011 -1.307161  1.522212  1.003463
2  12.011  1.213336  0.948208 -0.033373
3  14.007  3.238650  1.041523  1.301322
4  12.011 -5.954489  0.650878  0.803379

In [6]: points = d.loc[:, ['X', 'Y', 'Z']]

In [7]: import scipy.spatial.distance

In [8]: distances = scipy.spatial.distance.pdist(points)

In [9]: distances.shape
Out[9]: (276,)

In [10]: distances
Out[10]:
array([  2.53370139,   4.54291701,   6.6694065 ,   2.81813878,
         8.92487537,  10.11800281,  11.10411993,   9.23615791,
         2.05651475,   4.0588513 ,   4.97820424,   4.0700026 ,
         4.03910564,   4.0070559 ,   9.28870116,  11.98156386,
        10.68116021,  11.66869152,  12.84293061,  11.03539433,
         8.36949409,   9.15928011,  11.25178722,   2.78521357,
         4.58084922,   4.73253781,   7.10844399,   8.21826934,
         9.13028167,   8.11565138,   3.98188296,   2.04523847,

<remaining elements not shown here>

In [11]: scipy.spatial.distance.squareform(distances)
Out[11]:
array([[  0.        ,   2.53370139,   4.54291701,   6.6694065 ,
          2.81813878,   8.92487537,  10.11800281,  11.10411993,
          9.23615791,   2.05651475,   4.0588513 ,   4.97820424,
          4.0700026 ,   4.03910564,   4.0070559 ,   9.28870116,
         11.98156386,  10.68116021,  11.66869152,  12.84293061,
         11.03539433,   8.36949409,   9.15928011,  11.25178722],
       [  2.53370139,   0.        ,   2.78521357,   4.58084922,
          4.73253781,   7.10844399,   8.21826934,   9.13028167,
          8.11565138,   3.98188296,   2.04523847,   4.10992956,
          5.08350537,   5.83684597,   6.2398737 ,   7.66820932,
         10.2011846 ,   8.49081803,   9.42605887,  10.9712576 ,
          9.24797787,   7.65742836,   8.27370019,  10.12881562],

<remaining elements not shown here>
"""

MMR...

On 2017-04-02 13:41, Stephen P. Molnar wrote:
> I am trying to port a program that I wrote in FORTRAN twenty years ago
> into Python 3 and am having a hard time trying to calculate the
> Euclidean distance between each atom in the molecule and every other
> atom in the molecule.
>
> Here is a typical table of coordinates:
>
>
>       MASS         X         Y         Z
> 0   12.011 -3.265636  0.198894  0.090858
> 1   12.011 -1.307161  1.522212  1.003463
> 2   12.011  1.213336  0.948208 -0.033373
> 3   14.007  3.238650  1.041523  1.301322
> 4   12.011 -5.954489  0.650878  0.803379
> 5   12.011  5.654476  0.480066  0.013757
> 6   12.011  6.372043  2.731713 -1.662411
> 7   12.011  7.655753  0.168393  2.096802
> 8   12.011  5.563051 -1.990203 -1.511875
> 9    1.008 -2.939469 -1.327967 -1.247635
> 10   1.008 -1.460475  2.993912  2.415410
> 11   1.008  1.218042  0.451815 -2.057439
> 12   1.008 -6.255901  2.575035  1.496984
> 13   1.008 -6.560562 -0.695722  2.248982
> 14   1.008 -7.152500  0.390758 -0.864115
> 15   1.008  4.959548  3.061356 -3.139100
> 16   1.008  8.197613  2.429073 -2.588339
> 17   1.008  6.503322  4.471092 -0.543939
> 18   1.008  7.845274  1.892126  3.227577
> 19   1.008  9.512371 -0.273198  1.291080
> 20   1.008  7.147039 -1.365346  3.393778
> 21   1.008  4.191488 -1.928466 -3.057804
> 22   1.008  5.061650 -3.595015 -0.302810
> 23   1.008  7.402586 -2.392148 -2.374554
>
> What I need for further calculation is a matrix of the Euclidean
> distances between the atoms.
>
> So far in searching the Python literature I have only managed to
> confuse myself and would greatly appreciate any pointers towards a
> solution.
>
> Thanks in advance.
>



More information about the Tutor mailing list