[Repost Q] Fast python convolution along one axis only

ShadowAether · edit-2 2 years ago

[Repost Q] Fast python convolution along one axis only

ShadowAether · 2 years ago

Original answer: (credit @Mercury)

Late answer, but worth posting for reference. Quoting from comments of the OP:

Each row in A is being filtered by the corresponding row in B. I could implement it like that, just thought there might be a faster way.

A is on the order of 10s of gigabytes in size and I use overlap-add.

Naive / Straightforward Approach

import numpy as np
import scipy.signal as sg

M, N, P = 4, 10, 20
A = np.random.randn(M, N) # (4, 10)
B = np.random.randn(M, P) # (4, 20)

C = np.vstack([sg.convolve(a, b, 'full') for a, b in zip(A, B)])

>>> C.shape
(4, 29)

Each row in A is convolved with each respective row in B, essentially convolving M 1D arrays/vectors.

No Loop + CUDA Supported Version It is possible to replicate this operation by using PyTorch’s F.conv1d. We have to imagine A as a 4-channel, 1D signal of length 10. We wish to convolve each channel in A with a specific kernel of length 20. This is a special case called a depthwise convolution, often used in deep learning.

Note that torch’s conv is implemented as cross-correlation, so we need to flip B in advance to do actual convolution.

import torch
import torch.nn.functional as F

@torch.no_grad()
def torch_conv(A, B):
    M, N, P = A.shape[0], A.shape[1], B.shape[1]
    C = F.conv1d(A, B[:, None, :], bias=None, stride=1, groups=M, padding=N+(P-1)//2)
    return C.numpy()

# Convert A and B to torch tensors + flip B

X = torch.from_numpy(A) # (4, 10)
W = torch.from_numpy(np.fliplr(B).copy()) # (4, 20)

# Do grouped conv and get np array
Y = torch_conv(X, W)

>>> Y.shape
(4, 29)

>>> np.allclose(C, Y)
True

Advantages of using a depthwise convolution with torch:

No loops! The above solution can also run on CUDA/GPU, which can really speed things up if A and B are very large matrices. (From OP’s comment, this seems to be the case: A is 10GB in size.) Disadvantages:

Overhead of converting from array to tensor (should be negligible) Need to flip B once before the operation>>

ShadowAether · 2 years ago

NN libraries like Pytorch and Tensorflow can repurposed to be used for GPU-accelerated operations like convolution by setting up specific networks with preset weights. Another option can be to use FFT or wavelet libraries but for applications with many samples, this can be a good option.