[Tutor] Euclidean Distances between Atoms in a Molecule.
Matt Ruffalo
mruffalo at cs.cmu.edu
Sun Apr 2 19:32:26 EDT 2017
Hi Stephen-
The `scipy.spatial.distance` module (part of the SciPy package) contains
what you will need -- specifically, the `scipy.spatial.distance.pdist`
function, which takes a matrix of m observations in n-dimensional space,
and returns a condensed distance matrix as described in
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html
. This condensed distance matrix can be expanded into a full m by m
matrix with `scipy.spatial.distance.squareform` as follows:
"""
In [1]: import pandas as pd
In [2]: from io import StringIO
In [3]: s = StringIO('''
...: MASS X Y Z
...: 0 12.011 -3.265636 0.198894 0.090858
...: 1 12.011 -1.307161 1.522212 1.003463
...: 2 12.011 1.213336 0.948208 -0.033373
...: 3 14.007 3.238650 1.041523 1.301322
...: 4 12.011 -5.954489 0.650878 0.803379
...: 5 12.011 5.654476 0.480066 0.013757
...: 6 12.011 6.372043 2.731713 -1.662411
...: 7 12.011 7.655753 0.168393 2.096802
...: 8 12.011 5.563051 -1.990203 -1.511875
...: 9 1.008 -2.939469 -1.327967 -1.247635
...: 10 1.008 -1.460475 2.993912 2.415410
...: 11 1.008 1.218042 0.451815 -2.057439
...: 12 1.008 -6.255901 2.575035 1.496984
...: 13 1.008 -6.560562 -0.695722 2.248982
...: 14 1.008 -7.152500 0.390758 -0.864115
...: 15 1.008 4.959548 3.061356 -3.139100
...: 16 1.008 8.197613 2.429073 -2.588339
...: 17 1.008 6.503322 4.471092 -0.543939
...: 18 1.008 7.845274 1.892126 3.227577
...: 19 1.008 9.512371 -0.273198 1.291080
...: 20 1.008 7.147039 -1.365346 3.393778
...: 21 1.008 4.191488 -1.928466 -3.057804
...: 22 1.008 5.061650 -3.595015 -0.302810
...: 23 1.008 7.402586 -2.392148 -2.374554
...: ''')
In [4]: d = pd.read_table(s, sep='\\s+', index_col=0)
In [5]: d.head()
Out[5]:
MASS X Y Z
0 12.011 -3.265636 0.198894 0.090858
1 12.011 -1.307161 1.522212 1.003463
2 12.011 1.213336 0.948208 -0.033373
3 14.007 3.238650 1.041523 1.301322
4 12.011 -5.954489 0.650878 0.803379
In [6]: points = d.loc[:, ['X', 'Y', 'Z']]
In [7]: import scipy.spatial.distance
In [8]: distances = scipy.spatial.distance.pdist(points)
In [9]: distances.shape
Out[9]: (276,)
In [10]: distances
Out[10]:
array([ 2.53370139, 4.54291701, 6.6694065 , 2.81813878,
8.92487537, 10.11800281, 11.10411993, 9.23615791,
2.05651475, 4.0588513 , 4.97820424, 4.0700026 ,
4.03910564, 4.0070559 , 9.28870116, 11.98156386,
10.68116021, 11.66869152, 12.84293061, 11.03539433,
8.36949409, 9.15928011, 11.25178722, 2.78521357,
4.58084922, 4.73253781, 7.10844399, 8.21826934,
9.13028167, 8.11565138, 3.98188296, 2.04523847,
<remaining elements not shown here>
In [11]: scipy.spatial.distance.squareform(distances)
Out[11]:
array([[ 0. , 2.53370139, 4.54291701, 6.6694065 ,
2.81813878, 8.92487537, 10.11800281, 11.10411993,
9.23615791, 2.05651475, 4.0588513 , 4.97820424,
4.0700026 , 4.03910564, 4.0070559 , 9.28870116,
11.98156386, 10.68116021, 11.66869152, 12.84293061,
11.03539433, 8.36949409, 9.15928011, 11.25178722],
[ 2.53370139, 0. , 2.78521357, 4.58084922,
4.73253781, 7.10844399, 8.21826934, 9.13028167,
8.11565138, 3.98188296, 2.04523847, 4.10992956,
5.08350537, 5.83684597, 6.2398737 , 7.66820932,
10.2011846 , 8.49081803, 9.42605887, 10.9712576 ,
9.24797787, 7.65742836, 8.27370019, 10.12881562],
<remaining elements not shown here>
"""
MMR...
On 2017-04-02 13:41, Stephen P. Molnar wrote:
> I am trying to port a program that I wrote in FORTRAN twenty years ago
> into Python 3 and am having a hard time trying to calculate the
> Euclidean distance between each atom in the molecule and every other
> atom in the molecule.
>
> Here is a typical table of coordinates:
>
>
> MASS X Y Z
> 0 12.011 -3.265636 0.198894 0.090858
> 1 12.011 -1.307161 1.522212 1.003463
> 2 12.011 1.213336 0.948208 -0.033373
> 3 14.007 3.238650 1.041523 1.301322
> 4 12.011 -5.954489 0.650878 0.803379
> 5 12.011 5.654476 0.480066 0.013757
> 6 12.011 6.372043 2.731713 -1.662411
> 7 12.011 7.655753 0.168393 2.096802
> 8 12.011 5.563051 -1.990203 -1.511875
> 9 1.008 -2.939469 -1.327967 -1.247635
> 10 1.008 -1.460475 2.993912 2.415410
> 11 1.008 1.218042 0.451815 -2.057439
> 12 1.008 -6.255901 2.575035 1.496984
> 13 1.008 -6.560562 -0.695722 2.248982
> 14 1.008 -7.152500 0.390758 -0.864115
> 15 1.008 4.959548 3.061356 -3.139100
> 16 1.008 8.197613 2.429073 -2.588339
> 17 1.008 6.503322 4.471092 -0.543939
> 18 1.008 7.845274 1.892126 3.227577
> 19 1.008 9.512371 -0.273198 1.291080
> 20 1.008 7.147039 -1.365346 3.393778
> 21 1.008 4.191488 -1.928466 -3.057804
> 22 1.008 5.061650 -3.595015 -0.302810
> 23 1.008 7.402586 -2.392148 -2.374554
>
> What I need for further calculation is a matrix of the Euclidean
> distances between the atoms.
>
> So far in searching the Python literature I have only managed to
> confuse myself and would greatly appreciate any pointers towards a
> solution.
>
> Thanks in advance.
>
More information about the Tutor
mailing list