How To Avoid Quadratic Computation Resulting From Double 'for Loop' When Computing Distances Between Vectors
Given the following code that computes distances between vectors in list 'vect’: import numpy as np vect=([0.123, 0.345, 0.789], [0.234, 0.456, 0.567],[0.134, 0.246, 0.831]) def
Solution 1:
To avoid duplicate pairs nested loop should go upwards from the index of the outer loop, i.e.:
for i, v1 in enumerate(vect):
for j in xrange(i + 1, len(vect)):
a = np.array(v1)
b = np.array(vect[j])
space = np.linalg.norm(b - a)
print space
Or use a solution provided by the standard library:
import itertools
for v1, v2 in itertools.combinations(vect, 2):
a = np.array(v1)
b = np.array(v2)
space = np.linalg.norm(b - a)
print space
Solution 2:
Others have noted that you can't avoid the quadratic complexity. But if your concern is performance, you can speed things up considerably by using scipy.spatial.distance.pdist
, which will have all the looping and function calling happen in C, not Python. The following code returns a square, symmetrical array, but it only makes n *(n-1)/2
calculations:
from scipy.spatial.distance import pdist
pairwise_distances = pdist(vect, metric='euclidean', p=2)
If you want a flattened array with only the unique, off-diagonal values, you can use scipy.spatial.distance.squareform
:
from scipy.spatial.distance import pdist, squareform
pairwise_distances = squareform(pdist(vect, metric='euclidean', p=2))
In your case, pairwise_distances
would be a 3 element long array.
Post a Comment for "How To Avoid Quadratic Computation Resulting From Double 'for Loop' When Computing Distances Between Vectors"