Sanyam Mehta

I am a builder at heart, currently diving deep into the intersection of computer vision, generative AI, and robotic perception. My goal is to move beyond static datasets and build intelligent systems that can perceive, reason, and interact with the dynamic world around them.

🔬 Research

I am interested in the emergent capabilities of large generative models and how they can be grounded in physical reality.

ICLR 2026

Point Prompting: Counterfactual Tracking with Video Diffusion Models

Sanyam Mehta*, Ayush Shrivastava*, Daniel Geng, Andrew Owens

*Equal Contribution

Abstract: Trackers and video generators solve closely related problems: the former analyze motion, while the latter synthesize it. We show that this connection enables pretrained video diffusion models to perform zero-shot point tracking by simply prompting them to visually mark points as they move over time. We place a distinctively colored marker at the query point, then regenerate the rest of the video from an intermediate noise level. This propagates the marker across frames, tracing the point’s trajectory. To ensure that the marker remains visible in this counterfactual generation, despite such markers being unlikely in natural videos, we use the unedited initial frame as a negative prompt. Through experiments with multiple image-conditioned video diffusion models, we find that these “emergent” tracks outperform those of prior zero-shot methods and persist through occlusions, often obtaining performance that is competitive with specialized self-supervised models.

[Project Page] [Paper] [ICLR 2026]

Generative Priors for Vision-Language Navigation

Current research with Dr. Bernadette Bucher

Embodied agents often struggle with long-horizon instruction following. We are investigating how the rich spatio-temporal priors found in video diffusion models can be adapted to improve Vision-Language Navigation (VLN). By leveraging these generative priors, we aim to give robots a better understanding of scene dynamics and temporal correspondence.

💼 Experience

Computer Vision & ML Engineer @ Gather AI (2021-2024)

Before grad school, I spent 3+ years at Gather AI, a CMU spin-off building the world's first autonomous drone-based inventory system. I started as an ML Ops Engineer, but my curiosity led me to learn computer vision on the fly. I eventually took ownership of core perception architectures, leading to my promotion to ML Engineer.

Mentored by founders Dr. Daniel Maturana (inventor of VoxNet) and Dr. Sankalp Arora, I learned to think critically while shipping code that processed 1.2M+ samples in the real world.

✅Built 3D Perception: Developed a neural network for 3D occupancy inference, classifying pallet locations with 97%+ accuracy.

✅Scaled Infrastructure: Engineered data pipelines that handled data from 34K+ drone flights.

✅Experimented: Benchmarked monocular SLAM systems to improve navigation in feature-poor warehouse environments.

Graduate Student Instructor @ University of Michigan (2024-Present)

I love deconstructing complex topics for others. As a GSI for ROB 550: Robotics Systems Laboratory, I guide graduate students through the "painful but rewarding" process of building autonomy from scratch—from programming robotic arms to implementing Particle Filter SLAM.

🛠️ Building & Exploring

I believe the best way to understand a concept is to build it from scratch.

Building a Motion Capture System

In my 3D Robot Perception class with Dr. Bernadette Bucher, I fell in love with the geometry of vision—fundamental matrices, homographies, and Direct Linear Transforms (DLT).

To put theory into practice, I teamed up with my friend Brandon to build a Motion Capture system from scratch. We are writing our own solvers to triangulate points from multiple camera feeds, turning raw 2D video into precise 3D trajectory data. It's a messy, challenging, and incredibly fun way to master the math behind the code.

❤️ Why Robotics?

For People: My time at Gather AI opened my eyes to the real-world impact of robotics. I learned that slips, trips, and falls are a leading cause of accidental deaths in warehouses. Seeing our drones collect data from dangerous heights—keeping people safely on the ground—showed me that robotics isn't just about efficiency; it's about protecting lives.

For the Planet: I was heartbroken to learn about the devastation wildfires have caused in California, destroying 20% of the world's Giant Sequoias. I work towards a future where advanced robots can assist Park Rangers in monitoring and protecting these natural wonders.

For the Fun of It: And sometimes, the reason is simple: robots are just cool. I love the challenge of making a machine "see" and "think," and that curiosity is what gets me excited to learn every single day.