Shubham Shrivastava
Head of Machine Learning @ Kodiak RoboticsIEEE Senior MemberKodiak Robotics, Mountain View, CA
Head of Machine Learning
September 2023-Present
Leading a top-tier team of machine learning engineers towards building KodiakDriver, the world's safest autonomous driving system. This includes advanced neural networks for lane, object detection, and 3D perception, alongside foundational models trained on petabyte-scale data, refined with targeted human oversight. With my team, I am innovating a machine learning engine capable of identifying and addressing rare and complex driving scenarios, therefore allowing us to tackle long-tail cases.
Ford Greenfield Labs, Palo Alto, CA
September 2019 - September 2023
September 2022 - September 2023
Leading a team of talented machine learning and robotics engineers towards building vision-centric 3D perception solutions at Ford Autonomy. My work includes building an end-to-end 3D Perception stack for Ford L2+ vehicles on the road, and the development of a flexible and scalable machine-learning framework for all ML tasks within Ford.
September 2019 - September 2022
My research includes computer vision and advanced machine learning methods including convolutional neural networks, generative adversarial networks, variational autoencoders, and 3D perception; with significant emphasis on object detection, semantics learning, 3D scene understanding, multi-view geometry, and visual odometry.
In addition to researching and developing novel methodology, I built the complete cloud-based MLOps pipeline including intelligent data sourcing, auto-annotation, training, model optimization, and deployment (TensorRT C++) for putting our prediction engine in production.
Two major projects for which I built end-to-end perception pipelines are (1) Ford Autonomous Shuttles (2) Ford Factory of the Future - Infrastructure-based autonomous vehicle marshaling through assembly plants.
[Topics of Research]
Monocular RGB camera and LiDAR-based 3D object detection, Classification, and Tracking in both indoor and outdoor environments.
Unsupervised and Semi-Supervised Object 6-DoF Pose Estimation to reduce the cost of manual data annotation from millions of dollars down to zero.
Multi-headed multi-task neural networks for scene understanding incorporating Sim2Real methods for zero-cost training of the networks.
Generative Adversarial Networks for realistic image generation with semantic and cycle consistency from simulation data to fill the gap between both worlds.
A combination of computer vision and traditional methods including non-linear optimization for object pose estimation.
Automated global localization of multiple spatially distributed sensors within the infrastructure to a common coordinate frame using an autonomous robot.
Perception system for localizing objects of various classes to within 10 centimeters and sub-degrees orientation accuracy within Ford’s Factory of the Future.
Renesas Electronics America, Inc.
Applications Engineer, Perception R&D - ADAS and Autonomous DrivingMarch 2017 - September 2019
I worked as part of a very small team towards building the ADAS & Autonomous Driving perception reference platform “Perception Quick Start” which includes end-to-end solutions for camera and LiDAR based road feature and object detection.
Developed complete lane detection pipeline from scratch. Pipeline includes lane pixel extraction using a combination of classical computer vision method and deep learning, lane detection, polyfit, noise suppression, lane tracking, lane smoothing, confidence computation, lane extrapolation, lane departure warning, lane offset, lane curvature, and lane types.
Developed C based Computer Vision Library for basic image processing functions like image read/write, hough transforms, edge detection (canny, horizontal, vertical), colorspace conversions, image filtering (sharpen, gaussian smooth, sobel, emboss, edge). Created advanced math library for functions like least-squares polyfit and matrix operations.
Stereo Camera calibration, rectification, disparity map generation, flat road free-space estimation, Object Detection using V-Disparity and 3D density-based clustering, 3D point cloud rendering along with 3D bounding-box, and depth perception.
Developed dynamic image ROI stabilization module for correcting rotation and translation using angular pose/velocity data by computing and applying homography at run-time. Developed a general-purpose positioning driver for bringing in GNSS/IMU data into the perception stack.
Optimized embedded implementation of algorithms for parallel computing on R-Car SoC HW Accelerators.
Developed the complete V2V Solution from Scratch for the Renesas’ V2X Platform including CAN Framework, GPS/INS driver, GPS+IMU fusion for localization, Concise Path History computation, Path Prediction, CSV and KML logging module, 360-degree lane-level Target Classification, Basic Safety Applications, and an HMI using QT for displaying Warnings, Vehicle Tracking, Maps, and Debug Information.
Changan US Research & Development Center, Inc.
Intelligent Vehicle Engineer (Connected Autonomous Vehicle Research Group)August 2016 - March 2017
Worked within Changan's Connected and Autonomous Vehicle research team to design and develop various vehicle safety models for 360-degrees target classification and warnings notifications with and without line-of-sight requirements.
DELPHI (now known as APTIV)
Embedded Software EngineerMay 2016 - August 2016
Worked with application teams, forward systems algorithm group, and controller design groups to define the functionality, develop algorithms, and implement them in accordance with the V-Model Software Development Life Cycle.
BlackBerry QNX
Software Development Intern (Board Support Package)January 2016 - May 2016
Developed the BSP (Board Support Package) for custom hardware with i.MX6 Solo Processor and several peripherals. Worked towards low-level board bring-ups and provided support for the following peripherals.
Support for RAM file system to manipulate files during runtime.
Support for SPI NOR Flash and Parallel NOR Flash mounted as a filesystem at startup.
Support for Removable storage (SD, microSD, USB Flash). Also, provided support for its auto-detection and auto mounting at the attachment.
Support for USB OTG to be used for the Console Service, Mass-Storage Device, and USB-to-Ethernet Adapter attachments.
Added new features to the QNX OS for the BSP including auto-detection and switching between device stack to provide console service, host stack to provide auto-mounting of mass storage devices, and host stack to provide networking with USB-Ethernet Adapter based on the type of attachment.
The University of Texas at Arlington Research Institute
Research InternAugust 2015 - December 2015
Designed and Developed the control GUI for a prosthetic system used to help rehabilitate post-stroke patients. It used an Arduino controller for adaptive adjustment of the air bubble pressure at the desired psi value for various points on the leg. Also designed a GUI which allows the user to enter the desired psi values for each air bubble and simultaneously measure current bubble pressure and display it on the GUI in real-time.
Used two Arduino UNO, one for sending the signals to 32 solenoids controlling airflow from Alicat Mass Flow Controller into the respective air bubbles, and one for receiving sense signal from 32 corresponding air pressure sensor. Signals were also sent to the Solenoids for deflating the air bubbles when required.
Indian Institute of Science (IISc)
Trainee EngineerJanuary 2014 - May 2014
Designed and developed a 2 Dimensional plotter (Smart XY Plotter) at the Mechatronics Lab, IISc, capable of plotting any 2D image using a pen, which was controlled by means of a Solenoid and two stepper motors (responsible for x, y, and z directional movement).
The control system was governed by an ARM Processor (STM32F4 Discovery Board) to plot images which features were extracted using MATLAB.
Developed the GUI in MATLAB which allows a user to either upload the image of their choice or select any other plot (Arbitrary interpolated curve, texts, shapes).
Used two timers for controlling and synchronizing the parallel movement of X and Y motors to provide any curve of any desired slope.
The solenoid setup was brought back to its initial position after every plot. Limit switches were used to detect its arrival at the desired reset position.
Keynotes and Media Coverage
In my talk at AutoAI 2024, I delved into the transformative power of early fusion of multiple modalities and introduced GigaFusionNet, our cutting-edge spatio-temporal multimodal fusion architecture at Kodiak. This innovation is designed to enhance perception capabilities by integrating diverse data streams seamlessly.
I also unveiled our Modular Cognitive Architecture (MCA), a robust approach to our autonomy stack. MCA emphasizes redundancy, end-to-end learnability, interpretability, generalizability, and cost-effective validation, setting a new standard for autonomous driving systems.
Lastly, I emphasized the critical role of vision-language models (VLMs) in autonomous driving, exploring the complexities of data distribution and their impact on performance.
In my conversation on The Brave Technologist podcast, I had the pleasure of diving deep into the transformative power of AI in enabling self-driving trucks with Luke Mulks.
This podcast episode was a fantastic opportunity to talk about my journey in the AI space, the challenges we face, the regulatory landscape, and what the future holds for AI in autonomous vehicles. We’re not just talking about the technology; we’re living it, shaping it, and leading its development to revolutionize the transportation industry.
Catch the full discussion here: https://kite.link/ShubhamS
I participated in a panel debate with Davide Scaramuzza, Sebastian Scherer, Ayoung Kim, Michael Mangan, and Punarjay Chakravarty at IROS 2023.
I gave a keynote at IROS 2023 workshop: “It’s what you see, not where you are! : Localization through Perception Lens”. In my talk, I shed light on how KodiakDriver is trailblazing in the industry with its innovative approaches. Notably, its design emphasizes localizing akin to humans, setting a new standard for autonomous systems.
Keynote talk @ Auto.AI USA 2023, delving into the dynamic world of vision-centric perception algorithms, breakthroughs, and bridging the academia-industry gap. 🌍✨ A key message to the research community: It's not just about building models that can do more, but rather empowering them to do "More with Less."
Panel discussion on Advanced Computer Vision Use Cases at Ai4 2023.
we.CONECT interview on the landscape of vision-centric perception for autonomous vehicles.
[Keynote Talk @In.Cabin Sensing Europe] From the outside to the inside – Rethinking the implications of autonomous driving for the communication with the driver (link)
[Panel Discussion @In.Cabin Sensing Europe] On the way to smart cabin – How to find the balance between in-cabin sensing, communication, privacy and user experience? (link)
Education
Stanford University
Graduate Program - Artificial IntelligenceGPA: 4+/4.0
Courses in Machine Learning, Meta-Learning, Multi-Task Learning, Deep Generative Models, Natural Language Processing, Computer Vision, and 3D ReconstructionAugust 2020 - December 2022
The University of Texas at Arlington
Master of Science in Electrical EngineeringGPA: 4.0/4.0
August 2014 - August 2016
Visvesvaraya Technological University
Bachelor of Engineering in Electronics and Communication EngineeringGPA: 4.0/4.0, First Class with Distinction, Aggregate Percentage: 86%
August 2014 - August 2016
Papers and Publications
DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets
Shubham Shrivastava, Xianling Zhang, Sushruth Nagesh, Armin ParchamiPropagating State Uncertainty Through Trajectory Forecasting
Boris Ivanovic, Yifeng Lin, Shubham Shrivastava, Punarjay Chakravarty, Marco PavoneCategory-Level Pose Retrieval with Contrastive Features Learnt with Occlusion Augmentation
Georgios Kouros, Shubham Shrivastava, Cédric Picron, Sushruth Nagesh, Punarjay Chakravarty, Tinne TuytelaarsDeflating Dataset Bias Using Synthetic Data Augmentation
Nikita Jaipuria, Xianling Zhang, Rohan Bhasin, Mayar Arafa, Punarjay Chakravarty, Shubham Shrivastava, Sagar Manglani, Vidya N. MuraliCubifAE-3D: Monocular Camera Space Cubification for Auto-Encoder based 3D Object Detection
Shubham Shrivastava and Punarjay ChakravartyQAGAN: Adversarial Approach To Learning Domain Invariant Language Features
Shubham Shrivastava and Kaiyue WangMeta-Regularization by Enforcing Mutual-Exclusiveness
Shubham Shrivastava, Edwin Pan, and Pankaj RajakAn A* Curriculum Approach to Reinforcement Learning for RGBD Indoor Robot Navigation
Kaushik Balakrishnan, Punarjay Chakravarty, Shubham ShrivastavaS-BEV: Semantic Birds-Eye View Representation for Weather and Lighting Invariant 3-DoF Localization
Mokshith Voodarla, Shubham Shrivastava, Sagar Manglani, Ankit Vora, Siddharth Agarwal, Punarjay ChakravartySim2Real for Self-Supervised Monocular Depth and Segmentation
Nithin Raghavan, Punarjay Chakravarty, Shubham ShrivastavaV2V Vehicle Safety Communication
Shubham ShrivastavaPatents
[11107228] S Shrivastava. “Realistic Image Perspective Transformation Using Neural Networks”. A system based on a deep neural network to synthesize multiple realistic perspectives of an image.
[20230419539] S Nagesh, S Shrivastava, P Chakravarty. “Vehicle Pose Management”. Pose estimation through unsupervised landmark estimation.
[11189049] P Chakravarty, and S Shrivastava. “Vehicle Neural Network Perception and Localization”. Using Map-Perception Disagreement for Robust Perception and Localization with Generative Models.
[11348278] P Chakravarty, S Shrivastava, G Pandey, and X Wong. “Object Detection”. Automatic Calibration of Automobile Cameras – In the Factory & On The Road.
[11482007] S Shrivastava. “Event-Based Vehicle Pose Estimation Using Monochromatic Imaging”. Event prediction derived 9-DOF Vehicle Pose Estimation in Garage-Like Space using Monocular Cameras.
[11562571] N Raghavan, S Shrivastava, and P Chakravarty. “Vehicle Neural Network”. Zero-Cost Training of Perception Tasks using a Sim-to-Real Architecture with Auxiliary Decoding.
[11619727] S Manglani, P Chakravarty, and S Shrivastava. “Determining Multi-Degree-Of-Freedom Pose For Sensor Calibration”. A robotic calibration device and a method of calculating a global multi-degree of freedom (MDF) pose of an array of cameras affixed to a structure.
[11670088] M Voodarla, P Chakravarty, and S Shrivastava. “Vehicle Neural Network Localization”. Semantic Birds-Eye View Representation for Weather and Lighting Invariant 3 DoF Localization.
[20230186587] S Shrivastava, P Chakravarty, G Pandey. “Three-Dimensional Object Detection”. A method of complete 3D Scene Understanding including Dynamic and Static Object Detection and Tracking from a single RGB camera image using an end-to-end Neural Network.
[11887317] B Ivanovic, Y Lin, S Shrivastava, P Chakravarty, M Pavone. “Object Trajectory Forecasting”. A method of agent trajectory prediction by propagating estimated state uncertainty through object perception.
[20230025152] S Shrivastava, G Pandey, and P Chakravarty. “Object Pose Estimation”. A method of end-to-end self-supervised method of 4-DoF vehicle pose estimation through 3D rendering engine.
[20230097584] P Chakravarty, and S Shrivastava. “ Object Pose Estimation”. Automated stitching of multiple-camera feeds without requiring overlaps for unsupervised object pose estimation using 3D model rendering.
[11710254] S Shrivastava, P Chakravarty, and G Pandey. “Neural Network Object Detection”. Multi-Camera assisted Semi-Supervised Monocular 3D Object Detection.
[20230025152] S Shrivastava, G Pandey, and P Chakravarty. “Object Pose Estimation”. Unsupervised end-to-end pose estimation through differentiable rendering.
[Pending | us 17/932021] C Picron, T Tuytelaars, P Chakravarty, and S Shrivastava. “Object Detection with Images”. FFDet, Fast-Converging, Feature-based Two-Stage Object-Detector.
[Pending | us 17/818447] S Shrivastava, B Ghadge, and P Chakravarty. “Vehicle Pose Management”. 6-DoF vehicle pose estimation using static monocular cameras utilizing 2D bounding-box and ray casting.
[20230252667] M Xu, S Garg, M Milford, P Chakravarty, and S Shrivastava. “Vehicle Localization”. A solution to the visual localization problem along repeated routes using automatic place-specific hashing of parameters.
[20230267640] P Chakravarty, S Mishra, A Parchami, G Pandey, and S Shrivastava. “Pose Estimation”. 6-DoF pose estimation of objects as viewed from a static fisheye camera utilizing geometric approach and neural networks trained on synthetic data.
[11827203] P Chakravarty, and S Shrivastava. “Multi-Degree-Of-Freedom Pose For Vehicle Navigation”. A weakly-supervised method of 6-DoF pose estimation for known objects by means of keypoints, visual tracking, and non-linear optimization.
[20230136871 ] P Chakravarty, S Shrivastava, B Ghadge, A Parchami, G Pandey. “Automated Camera Pose Estimation using Traffic Monitoring”. Automated perception node localization in an infrastructure system by monitoring traffic scenes.
[20220214692] P Chakravarty, K Balakrishnan, and S Shrivastava. “Vision-Based Navigation By Coupling Deep Reinforcement Learning And A Path Planning Algorithm”. Robot Navigation Using Vision Embeddings and A* for Improved Training of Deep-Reinforcement Learning Policies.
Peer Reviews
[6 Papers] The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024
[2 Papers] The IEEE International Conference on Computer Vision (ICCV) 2023
The IEEE Robotics and Automation Letters (RA-L) 2023
The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023
The IEEE International Conference on Robotics and Automation (ICRA) 2023
The IEEE Robotics and Automation Letters (RA-L) 2022
[2 Papers] The IEEE International Conference on Robotics and Automation (ICRA) 2022
[2 Papers] The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022
[2 Papers] ASME Journal of Autonomous Vehicles and Systems 2021