What Is The Fastest Way To Compute A Sparse Gram Matrix In Python?
A Gram matrix is a matrix of the structure X @ X.T which of course is symmetrical. When dealing with dense matrices, the numpy.dot product implementation is intelligent enough to r
Solution 1:
Thanks to the comment of the user CJR, I worked out a satisfying solution. In fact, I found a library on GitHub which wraps the MKL routine mkl_sparse_spmm
for Python. This routine is for fast multiplication of two sparse matrices. So all I had to do was to extend the library and provide a similar wrapper for mkl_sparse_syrk
. And this is exactly what I did.
I still have to add some comments, afterwards I will submit a pull request to the original project.
However, here are the performance results, quite impressing:
random.seed(0)
X = random.randn(500, 5000)
X[X < 0.8] = 0
X = scipy.sparse.csr_matrix(X)
print(f'X sparsity: {100 * (1 - X.count_nonzero() / prod(X.shape)):5.2f} %')
# X sparsity: 78.80 %
expected_result = (X @ X.T).toarray()
expected_result_triu = expected_result.copy()
expected_result_triu[tril_indices(expected_result.shape[0], k=-1)] = 0
mkl_result1 = sparse_dot_mkl.dot_product_mkl(X, X.T)
allclose(mkl_result1.toarray(), expected_result)
# True
mkl_result2 = sparse_dot_mkl.dot_product_transpose_mkl(X)
allclose(mkl_result2.toarray(), expected_result_triu)
# True
%timeit X @ X.T
# 197 ms ± 5.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit sparse_dot_mkl.dot_product_mkl(X, X.T)
# 70.6 ms ± 593 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sparse_dot_mkl.dot_product_transpose_mkl(X)
# 34.2 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Using the generic dot product from MKL instead of the dot product implementation from scipy yields a speed-up of 279%. Using the specialized product for Gram matrix computation yields a speed-up of 576%. This is huge.
Post a Comment for "What Is The Fastest Way To Compute A Sparse Gram Matrix In Python?"