10x Faster Matrix Determination Using Numba Instead of Numpy

INFO

This function can be 10x faster with the exact result as Numpy.linalg.det()


Background

The speed of numpy.linalg.det() is slow, let’s have a test:

b = 256**2
k = 3
A = np.random.randn(b,k,k)
npDet = np.linalg.det(A)
print("The Numpy Determinant of A is", round(sum(npDet),9))

Output: The Numpy Determinant of A is 176.93690024

Test the speed:

%timeit npDet = np.linalg.det(A)

Output: 10.8 ms ± 90 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Code

@numba.jit(nopython=True)
def determinant(A):
    """A shape should be (B,n,n) return shape (B,)  """
    B,n,_ = A.shape
    AM = A.copy()
    # Section 2: Row ops on A to get in upper triangle form
    product = np.ones(B,dtype=A.dtype)
    for b in range(B):
        for fd in range(n): # A) fd stands for focus diagonal
            for i in range(fd+1,n): # B) only use rows below fd row
                if AM[b][fd][fd] == 0: # C) if diagonal is zero ...
                    AM[b][fd][fd] = 1.0e-18 # change to ~zero
                # D) cr stands for "current row"
                crScaler = AM[b][i][fd] / AM[b][fd][fd] 
                # E) cr - crScaler * fdRow, one element at a time
                for j in range(n): 
                    AM[b][i][j] = AM[b][i][j] - crScaler * AM[b][fd][j]
        # Section 3: Once AM is in upper triangle form ...
        for i in range(n):
            # ... product of diagonals is determinant
            product[b] *= AM[b][i][i] 
 
    return product

Test

b = 256**2
k = 3
A = np.random.randn(b,k,k)
Det = determinant(A)
npDet = np.linalg.det(A)
torch_Det = torch.linalg.det(torch.Tensor(A))
print("Determinant of A is", round(sum(Det),9))
print("The Numpy Determinant of A is", round(sum(npDet),9))
print("The Torch Determinant of A is", round(sum(npDet),9))
%timeit Det = determinant(A)
%timeit npDet = np.linalg.det(A)
%timeit torch_Det = torch.linalg.det(torch.Tensor(A))

Output:

Determinant of A is -397.275054658
The Numpy Determinant of A is -397.275054658
The Torch Determinant of A is -397.275054658
1.73 ms ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
10.8 ms ± 90 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.73 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The new function is 10x faster than numpy and 2x faster than pytorch




If you found this useful, please cite this as:
Xiao, Zhihua (Jul 2024). 10x Faster Matrix Determination Using Numba Instead of Numpy. https://forkxz.github.io/blog/2024/Det/.

or as a BibTeX entry:

@article{xiao202410x-faster-matrix-determination-using-numba-instead-of-numpy,
  title   = {10x Faster Matrix Determination Using Numba Instead of Numpy},
  author  = {Xiao, Zhihua},
  year    = {2024},
  month   = {Jul},
  url     = {https://forkxz.github.io/blog/2024/Det/}
}



    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • Triton DropConnect Kernel
  • Pytorch Implementation of Wasserstein Distance
  • What are Diffusion Models?