|
Himangi Mittal
I am a Ph.D. student in the Robotics Institute (RI) at Carnegie Mellon University (CMU), working with Prof. Shubham Tulsiani. My research focuses on building physics-grounded AI agents, world models, and differentiable simulation. I am currently collaborating with Microsoft on 4D asset generation research.
I graduated with a Master of Science in Robotics (MSR) from
the Robotics Institute at Carnegie Mellon University where I worked with Prof. Abhinav Gupta and collaborated with Prof. Pedro Morgado at UW-Madison. Before my Master's, I worked as a Research Assistant at CMU with
Prof. David Held at the R-Pad Lab, in collaboration with Pittsburgh-based autonomous driving company,
Argo AI.
During my Masters at CMU, I had worked on self-supervised representation learning methods for multimodal audio-visual videos and as a RA at CMU, I worked on self-supervised algorithms for real-world 3D LiDAR point clouds.
I have served in the organizing committee of WiCV@CVPR 2025, WiCV@CVPR 2024, and DEI Social Event@CVPR 2024.
|
|
Neural Physics Simulation / 3D
Multi-Modal Representation Learning / Large Video-Language Models
|
|
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
Himangi Mittal,
Nakul Agarwal,
Shao-Yuan Lo,
Kwonjoon Lee
[CVPR 2024]
Paper /
Arxiv
We leverage a large video-language model for anticipating action sequences that are plausible in the real-world. We develop the understanding of plausibility of an action sequence in a large video-language model by introducing two objective functions, a counterfactual-based plausible action sequence learning loss and a long-horizon action repetition loss.
|
|
|
Learning State-Aware Visual Representations from Audible Interactions
Himangi Mittal,
Pedro Morgado,
Unnat Jain,
Abhinav Gupta
[NeurIPS 2022]
ECCV 2022 Workshop on Visual Object-oriented Learning meets Interaction (VOLI): Discovery, Representations, and Applications
Sight and Sound Workshop (CVPR 2023)
Paper /
Arxiv /
Code /
Video
We propose a self-supervised algorithm to learn representations from untrimmed, egocentric videos containing audible interactions.
Our method uses the audio signals in two unique ways: (1) to identify moments in time that are conducive to better self-supervised learning
and (2) to learn representations that focus on the visual state changes caused by audible interactions.
|
Self-Supervised Learning / 3D Point CloudsPast Research Works (on Graphs)
|
|
Interpreting Context of Images using Scene Graphs
Himangi Mittal,
Ajith Abraham,
Anuja Arora
[International Conference on Big Data Analytics (BDA), 2019]
Paper /
ArXiv /
Code
Predicted action and spatial relationships in images between objects detected by YOLO, then combining VGG-Net based visual features and
Word2Vec based semantic features.
|
|
|
Anomaly Detection using Graph Neural Networks
Anshika Chaudhary,
Himangi Mittal,
Anuja Arora
[International Conference on Machine Learning, Big Data, Cloud and Parallel Computing , 2019]
Paper /
Code
A method to capture the anomalous behavior in a social network based on degree, betweenness, and closeness of graph nodes using
Graph Neural Networks (GNN) in Keras.
|
|
|
STWalk: Learning Trajectory Representations in Temporal Graphs
Supriya Pandhre,
Himangi Mittal
Manish Gupta,
Vineeth N. Balasubramanian
[ACM India Joint International Conference on Data Science and Management of Data (CoDS-COMAD), 2018]
Paper /
ArXiv /
Code
Presents trajectory analysis of spatio-temporal graph nodes using DeepWalk algorithm in NetworkX (Python) for classification and detecting
changing points of interest using SVMs.
|
|
|
Harnessing emotions for depression detection
Sahana Prabhu Muraleedhara
Himangi Mittal,
Rajesh Varagani,
Sweccha Jha,
Shivendra Singh
[Pattern Analysis and Applications Journal]
Paper
A method for multi-modal depression detection using audio, video, and textual modalities using LSTMs. This work leverages emotions to detect an early indication of
depression.
|
Academic Service/Volunteer Work
- Reviewer Service: ICCV 2021, AAAI 2022, WACV 2022, CVPR 2022, CVPR 2023 (+ Emergency reviewer), ICCV 2023, NeurIPS 2023, Pattern Recognition Journal, WACV 2024 (+ Emergency reviewer), ACCV 2024, CVPR 2024, ICLR 2024, ICML 2024, WACV 2025, CVPR 2025, ICLR 2025, ICML 2025, BMVC 2025, NeurIPS 2025.
- Workshop Service: Member of the organizing committee at WiCV@CVPR 2025, WiCV@CVPR 2024, DEI Social Event, and Challenges/Opportunities for ECRs in Fast Paced AI Social Event!
- Meta Reviewer Service: WiCV@CVPR 2025, WiCV@CVPR 2024.
- Teaching Assistant: 16-720A: Computer Vision (Fall 2025), 16-824: Visual Learning and Recognition (Spring 2024), 16-825: Learning for 3D Vision (Spring 2023).
- Mentor at CMU AI Undergraduate Mentoring Program (Fall 2022, Spring 2023, Fall 2023, Spring 2024, Fall 2024, Spring 2025).
- Mentor at Spring 2023 CMU Research Mixer for undergraduate students organized by DPAC Undergraduate Research Working Group.
- Volunteer at NeurIPS 2022 High School Outreach Program.
|
|