How To Avoid Quadratic Computation Resulting From Double 'for Loop' When Computing Distances Between Vectors
Given the following code that computes distances between vectors in list 'vect’: import numpy as np vect=([0.123, 0.345, 0.789], [0.234, 0.456, 0.567],[0.134, 0.246, 0.831]) def
Solution 1:
To avoid duplicate pairs nested loop should go upwards from the index of the outer loop, i.e.:
for i, v1 in enumerate(vect):
for j in xrange(i + 1, len(vect)):
a = np.array(v1)
b = np.array(vect[j])
space = np.linalg.norm(b - a)
print space
Or use a solution provided by the standard library:
import itertools
for v1, v2 in itertools.combinations(vect, 2):
a = np.array(v1)
b = np.array(v2)
space = np.linalg.norm(b - a)
print space
Solution 2:
Others have noted that you can't avoid the quadratic complexity. But if your concern is performance, you can speed things up considerably by using scipy.spatial.distance.pdist, which will have all the looping and function calling happen in C, not Python. The following code returns a square, symmetrical array, but it only makes n *(n-1)/2 calculations:
from scipy.spatial.distance import pdist
pairwise_distances = pdist(vect, metric='euclidean', p=2)
If you want a flattened array with only the unique, off-diagonal values, you can use scipy.spatial.distance.squareform:
from scipy.spatial.distance import pdist, squareform
pairwise_distances = squareform(pdist(vect, metric='euclidean', p=2))
In your case, pairwise_distances would be a 3 element long array.
Post a Comment for "How To Avoid Quadratic Computation Resulting From Double 'for Loop' When Computing Distances Between Vectors"