[Repost Q] Fast python convolution along one axis only

ShadowAether · edit-2 2 years ago

[Repost Q] Fast python convolution along one axis only

ShadowAether · 2 years ago

Original answer: (credit @Mercury)

Late answer, but worth posting for reference. Quoting from comments of the OP:

Each row in A is being filtered by the corresponding row in B. I could implement it like that, just thought there might be a faster way.

A is on the order of 10s of gigabytes in size and I use overlap-add.

Naive / Straightforward Approach

import numpy as np
import scipy.signal as sg

M, N, P = 4, 10, 20
A = np.random.randn(M, N) # (4, 10)
B = np.random.randn(M, P) # (4, 20)

C = np.vstack([sg.convolve(a, b, 'full') for a, b in zip(A, B)])

>>> C.shape
(4, 29)

Each row in A is convolved with each respective row in B, essentially convolving M 1D arrays/vectors.

No Loop + CUDA Supported Version It is possible to replicate this operation by using PyTorch’s F.conv1d. We have to imagine A as a 4-channel, 1D signal of length 10. We wish to convolve each channel in A with a specific kernel of length 20. This is a special case called a depthwise convolution, often used in deep learning.

Note that torch’s conv is implemented as cross-correlation, so we need to flip B in advance to do actual convolution.

import torch
import torch.nn.functional as F

@torch.no_grad()
def torch_conv(A, B):
    M, N, P = A.shape[0], A.shape[1], B.shape[1]
    C = F.conv1d(A, B[:, None, :], bias=None, stride=1, groups=M, padding=N+(P-1)//2)
    return C.numpy()

# Convert A and B to torch tensors + flip B

X = torch.from_numpy(A) # (4, 10)
W = torch.from_numpy(np.fliplr(B).copy()) # (4, 20)

# Do grouped conv and get np array
Y = torch_conv(X, W)

>>> Y.shape
(4, 29)

>>> np.allclose(C, Y)
True

Advantages of using a depthwise convolution with torch:

No loops! The above solution can also run on CUDA/GPU, which can really speed things up if A and B are very large matrices. (From OP’s comment, this seems to be the case: A is 10GB in size.) Disadvantages:

Overhead of converting from array to tensor (should be negligible) Need to flip B once before the operation>>