publications
2025
- AdaDT: Learning robust control policies with adaptive feature crediting and pruinigHemant Kumawat, and Saibal MukhopadhyayUnder review at AAMAS 2025, 2025
Reinforcement learning (RL) can be viewed as a sequence modeling challenge, where the goal is to predict future actions based on past state-action-reward sequences. Traditional methods often rely on long trajectory sequences to capture environmental dynamics in offline RL scenarios. However, this can lead to a tendency to overemphasize the memorization of long-term representations, which hinders the models’ ability to prioritize trajectories and learned representations that are specifically relevant to the task at hand. In this study, we present AdaDT, a novel approach that conceptualizes trajectories as causal graphs derived from short-term action-reward-state sequences. Our model dynamically adapts its control policy by identifying and eliminating low-importance representations, focusing instead on those that are most pertinent to the downstream task. Experimental results show that policies based on AdaDT require shorter trajectory sequences and consistently outperform traditional methods in both offline reinforcement learning and imitation learning settings.
2024
- RoboKoop: Efficient Control Conditioned Representations from Visual Input in Robotics using Koopman OperatorHemant Kumawat, Biswadeep Chakraborty, and Saibal MukhopadhyayIn 62024 Conference on Robot Learning, 2024
Developing agents that can perform complex control tasks from high- dimensional observations is a core ability of autonomous agents that requires un- derlying robust task control policies and adapting the underlying visual represen- tations to the task. Most existing policies need a lot of training samples and treat this problem from the lens of two-stage learning with a controller learned on top of pre-trained vision models. We approach this problem from the lens of Koopman theory and learn visual representations from robotic agents conditioned on spe- cific downstream tasks in the context of learning stabilizing control for the agent. We introduce a Contrastive Spectral Koopman Embedding network that allows us to learn efficient linearized visual representations from the agent’s visual data in a high dimensional latent space and utilizes reinforcement learning to perform off-policy control on top of the extracted representations with a linear controller. Our method enhances stability and control in gradient dynamics over time, signif- icantly outperforming existing approaches by improving efficiency and accuracy in learning task policies over extended horizons.
- ChirpNet: Noise-Resilient Sequential Chirp Based Radar Processing for Object DetectionSudarshan Sharma**, Hemant Kumawat**, and Saibal MukhopadhyayIn 2024 IEEE/MTT-S International Microwave Symposium - IMS 2024 (**equal contribution), 2024
Radar-based object detection (OD) requires extensive pre-processing and complex Machine Learning (ML) pipelines. Previous approaches have attempted to address these challenges by processing raw radar data frames directly from the ADC or through FFT-based post-processing. However, the input data requirements and model complexity continue to impose significant computational overhead on the edge system. In this work, we introduce ChirpNet, a noise-resilient and efficient radar processing ML architecture for object detection. Diverging from previous approaches, we directly handle raw ADC data from multiple antennas per chirp using a sequential model, resulting in a substantial 15 × reduction in complexity and a 3 × reduction in latency, while maintaining competitive OD performance. Furthermore, our proposed scheme is robust to input noise variations compared to prior works.
- STEMFold: Stochastic temporal manifold for multi-agent interactions in the presence of hidden agentsHemant Kumawat, Biswadeep Chakraborty, and Saibal MukhopadhyayIn 6th Annual Learning for Dynamics & Control Conference, 15-17 July 2024, University of Oxford, Oxford, UK, 2024
Learning accurate, data-driven predictive models for multiple interacting agents following un- known dynamics is crucial in many real-world physical and social systems. In many scenarios, dynamics prediction must be performed under incomplete observations, i.e., only a subset of agents are known and observable from a larger topological system while the behaviors of the unobserved agents and their interactions with the observed agents are not known. When only incomplete obser- vations of a dynamical system are available, so that some states remain hidden, it is generally not possible to learn a closed-form model in these variables using either analytic or data-driven tech- niques. In this work, we propose STEMFold, a spatiotemporal attention-based generative model, to learn a stochastic manifold to predict the underlying unmeasured dynamics of the multi-agent system from observations of only visible agents. Our analytical results motivate STEMFold design using a spatiotemporal graph with time anchors to effectively map the observations of visible agents to a stochastic manifold with no prior information about interaction graph topology. We empirically evaluated our method on two simulations and two real-world datasets, where it outperformed exist- ing networks in predicting complex multiagent interactions, even with many unobserved agents.
- Cognitive Sensing for Energy-Efficient Edge IntelligenceMinah Lee, Sudarshan Sharma, Wei Chun Wang, and 3 more authorsIn 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2024
Edge platforms in autonomous systems integrate multiple sensors to interpret their environment. The high-resolution and high-bandwidth pixel arrays of these sensors improve sensing quality but also generate a vast, and arguably unnecessary, volume of real-time data. This challenge, often referred to as the analog data deluge, hinders the deployment of high-quality sensors in resource-constrained environments. This paper discusses the concept of cognitive sensing, which learns to extract low-dimensional features directly from high-dimensional analog signals, thereby reducing both digitization power and generated data volume. First, we discuss design methods for analog-to-feature extraction (AFE) using mixed-signal compute-in-memory. We then present examples of cognitive sensing, incorporating signal processing or machine learning, for various sensing modalities including vision, Radar, and Infrared. Subsequently, we discuss the reliability challenges in cognitive sensing, taking into account hardware and algorithmic properties of AFE. The paper concludes with discussions on future research directions in this emerging field of cognitive sensors.
2022
- A Methodology for Understanding the Origins of False Negatives in DNN Based Object DetectorsKruttidipta Samal, Hemant Kumawat, Marilyn Wolf, and 1 more authorIn 2022 International Joint Conference on Neural Networks (IJCNN), 2022
In this paper we present two novel complimentary methods namely the gradient analysis and the activation discrepancy analysis to analyze the perception failures occurring inside the DNN based object detectors. The gradient analysis localizes the nodes within the network that fail consistently in a scenario, thus creating a ‘signature’ of False Negatives (FNs). This method traces a set of False Negatives through the network and finds sections of the network that contribute to this set. The signatures show the location of the faulty nodes is sensitive to input conditions (such as darkness, glare etc.), network architecture, training hyperparameters, object class etc. Certain nodes of the network fail consistently throughout the training process thus implying that some False Negatives occur due to the global optimization nature of Stochastic Gradient Descent (SGD) based training. This analysis requires the knowledge of False Negatives and therefore can be used for post-hoc diagnostic analysis. On the other hand, the activation discrepancy analysis analyzes the discrepancy in forward activations of a DNN. This method can be conducted online and shows that the pattern of the activation discrepancy is sensitive to input conditions and detection recall.
- Radar Guided Dynamic Visual Attention for Resource-Efficient RGB Object DetectionHemant Kumawat, and Saibal MukhopadhyayIn 2022 International Joint Conference on Neural Networks (IJCNN), 2022
An autonomous system’s perception engine must provide an accurate understanding of the environment for it to make decisions. Deep learning based object detection networks experience degradation in the performance and robustness for small and far away objects due to a reduction in object’s feature map as we move to higher layers of the network. In this work, we propose a novel radar-guided spatial attention for RGB images to improve the perception quality of autonomous vehicles operating in a dynamic environment. In particular, our method improves the perception of small and long range objects, which are often not detected by the object detectors in RGB mode. The proposed method consists of two RGB object detectors, namely the Primary detector and a lightweight Secondary detector. The primary detector takes a full RGB image and generates primary detections. Next, the radar proposal framework creates regions of interest (ROIs) for object proposals by projecting the radar point cloud onto the 2D RGB image. These ROIs are cropped and fed to the secondary detector to generate secondary detections which are then fused with the primary detections via non-maximum suppression. This method helps in recovering the small objects by preserving the object’s spatial features through an increase in their receptive field. We evaluate our fusion method on the challenging nuScenes dataset and show that our fusion method with SSD-lite as primary and secondary detector improves the baseline primary yolov3 detector’s recall by 14 % while requiring three times fewer computational resources.
- Task-Driven RGB-Lidar Fusion for Object Tracking in Resource-Efficient Autonomous SystemKruttidipta Samal, Hemant Kumawat, Priyabrata Saha, and 2 more authorsIEEE Transactions on Intelligent Vehicles, 2022
Autonomous mobile systems such as vehicles or robots are equipped with multiple sensor modalities including Lidar, RGB, and Radar. The fusion of multi-modal information can enhance task accuracy but indiscriminate sensing and fusion in all modalities increase demand on available system resources. This paper presents a task-driven approach to input fusion that minimizes the utilization of resource-heavy sensors and demonstrates its application to Visual-Lidar fusion for object tracking and path planning. The proposed spatiotemporal sampling algorithm activates Lidar only at regions-of-interest identified by analyzing visual input and reduces the Lidar ‘base frame rate’ according to the kinematic state of the system. This significantly reduces Lidar usage, in terms of data sensed/transferred and potentially power consumed, without a severe reduction in performance compared to both a baseline decision-level fusion and state-of-the-art deep multi-modal fusion.