Varun Jampani is a Research Scientist at Nvidia.
Sparse High-Dimensional Neural Networks
The recent years have witnessed a dramatic increase in the adoption of convolutional neural networks (CNN) for a wide range of computer vision problems. ‘Convolutions’, which form the building blocks of most CNN architectures, are designed to process non-sparse data that lie on 1D, 2D or 3D grid. In this work, we propose a generalization of convolution operation to process sparse and high-dimensional data (for instance data that lies in 5D or 6D space). Specifically, we make use of permutohedral lattices, traditionally used for bilateral filtering, and hash table for efficient processing of sparse high-dimensional data while being able to learn generic N-dimensional filters. The ability to learn generic high-dimensional filters allows us to stack several parallel and sequential filters like in a CNN resulting in a new breed of neural networks which we call ‘Bilateral Neural Networks’ (BNN). We demonstrate the use of BNNs on several 2D, video and 3D vision tasks. Experiments on diverse datasets and tasks demonstrate the use of BNNs for a range of vision problems.