Vinamra Benara

I am currently a research intern at Carnegie Mellon University, Pittsburgh where I work on FPGA acceleration for computer vision and AR/VR with James C. Hoe and Anthony Rowe.

I am a final year Masters student in Computer Systems Group (CSG), IIIT Hyderabad, India advised by Suresh Purini and Uday Bondhugula. I did my Bachelors also from IIIT Hyderabad, India.

Email  /  CV  /  Google Scholar  

Synthesizing Power and Area Efficient Image Processing Pipelines on FPGAs using Customized Bit-widths
Vinamra Benara, Ziaul Choudhury, Suresh Purini, Uday Bondhugula
Under review at ACM TACO, 2018

The power and area savings while performing arithmetic operations on fixed-point data type are well known to be significant over using floating-point data type. PolyMage-HLS is the FPGA backend of PolyMage DSL where data is stored at each stage of a pipeline using a fixed-point data type (alpha, beta ) where alpha and beta denote the number of integral and fractional bits. In this work, we propose an interval-arithmetic based range analysis algorithm to estimate the number of bits required to store the integral part of the data at each stage of an image processing pipeline. The analysis algorithm uses the homogeneity of pixel signals at each stage to cluster them and perform a combined range analysis. We show that interval/affine arithmetic based techniques fail to take into account correlated computations across stages and hence could lead to poor range estimates. These errors in range estimates accumulate across stages, especially for iterative programs, such as Horn-Schunck Optical Flow, resulting in estimates nearly unusable in practice. Then, we propose a new range analysis technique using Satisfiability Modulo Theory (SMT) solvers, and show that the range estimates obtained through it are very close to the lower bounds obtained through profile-driven analysis. We evaluated our bitwidth analysis algorithms on four image processing benchmarks listed in the order of increasing complexity: Unsharp Mask, Down-Up Sampling, Harris Corner Detection and Horn-Schunck Optical Flow. The performance metrics considered are quality, power, and area. For example, on Optical Flow, the interval analysis based approach showed an 1.4x and 1.14x improvement on area and power metrics over floating-point representation respectively; whereas the SMT solver based approach showed 2.49x and 1.58x improvement on area and power metrics when compared to interval analysis.

Accurus: A Fast Convergence Technique for Accuracy Configurable Approximate Adder Circuits
Vinamra Benara, Suresh Purini

Approximate computing techniques have paved new paths to get substantial improvement in speed and power efficiency by making a trade-off with the accuracy of computations in inherently error tolerant applications, like from image and video processing domains. The accuracy requirements of various applications can differ from each other. Even within a same application different computations can have different accuracy requirements which can vary over time and upon user requirements. Accuracy configurable arithmetic circuits are essential for these reasons. Such techniques proposed earlier in the literature (ACA) work by improving the accuracy over several pipeline stages. However, those techniques suffer from the drawback that the corrections being made in the initial pipeline stages are small in magnitude as they are performed from the least significant bit position. In this paper, we propose a new correction technique -Accurus wherein we start from the most significant bit resulting in fast convergence of the result towards the accurate one. We used our approximate adder circuit in a Gaussian Blur filter which is then applied to an image. After one stage of correction, we achieved a peak signal to noise ratio of 40.90 dB when compared with 25.59 dB obtained using the previous well-known technique (ACA).

Low latency AR/VR headset
Vinamra Benara, John Miller, James C. Hoe, Anthony Rowe
Ongoing at CMU, 2018-

Designed an FPGA accelerated headset prototype that aims to solve the motion sickness problem with current AR/VR headsets. It runs at 240 FPS and currently has a motion-to-photon latency below 8 ms. As per the best of our knowledge, it is the world’s first prototype with such a low latency. I am working towards reducing it to 4 ms. We aimed to solve it by accelerating the post render warping (PRW) on an FPGA. The whole point in using the FPGA here is to have a very low latency end to end design. Our display is a DMD (Digital Micromirror Device). I drive this directly from the FPGA via HDMI. After a 360-degree scene is rendered, the orientation of the head is provided by the IMU (Inertial Measurement Unit). Based on that and the field of view, the scene is cropped and warped. We used a Zynq board (Xilinx ZedBoard) for this purpose. The IMU talks to the arm core of the device via SPI and Kalman filtering is done there. A new rotation matrix is obtained from the IMU at 100 Hz and is fed to the PRW pipeline in the FPGA. The setup is in a development stage and currently works with static scenes and work is in progress to integrate the system with an OptiTrack tracker which tracks the location of the user in the 3-dimensional space.

website template stolen from here.