San Francisco, CAAvailable 2026

Bhavik
Upadhyay

Applied AI Engineer

I am an end-to-end AI Engineer who builds across the stack, from autonomous AI agents to GPU kernels. I can design and architect new systems from scratch, but I am just as comfortable jumping into an existing codebase to optimize and maintain it.

I thrive in environments that demand velocity and rapid adaptation. Give me an ambiguous problem, and I will ship the solution.

01Experience
Sep 2025 — Present

Backend & Systems Engineer

Easley Dunn Productions
  • Engineered a C# transaction subsystem in Unity governing card pack acquisition and player inventory state.
  • Implemented atomic purchase flows with rollback logic to guarantee economy consistency.
  • Reduced screen transition latency by 13% via a unified canvas architecture and decoupled event handling.
Sep 2024 — May 2025

HPC Researcher

GRIDS @ University of Southern California
  • Shipped OTTER, a C++17 reverse-mode autodiff library with 16 operations.
  • Wrote both memory allocators from scratch: CPU on an mmap pool, CUDA with pooled segments, OOM eviction, and cudaMalloc retry before propagating the error.
  • Built the autograd engine: runtime graph construction, backward traversal, and gradient accumulation across 4 parallel workers with mutex-protected writes into shared leaf tensors.
  • Separated memory management and kernel dispatch into abstract interfaces: adding a backend requires no changes to Tensor or autograd code, proven across CPU and CUDA with full kernel parity.
Oct 2023 — Dec 2024

Teaching Assistant

University of Southern California
  • Mentored a cohort of 850+ graduate students across Database Systems and NLP.
  • Collaborated with faculty to standardize grading criteria, resulting in a 7% improvement in student outcomes.
02Education
Aug 2023 — May 2025

Master of Science

University of Southern California

Computer Science

  • CGPA: 3.91 / 4.00
  • Coursework: Machine Learning, Computer Vision, NLP, Robotics, Algorithms, Operating Systems.
Aug 2019 — May 2023

Bachelor of Technology

Jawaharlal Nehru Technological University

Computer Science & IT

  • CGPA: 3.74 / 4.00
  • Awarded the Department Gold Medal for academic excellence.
  • Coursework: Data Structures, Distributed Systems, Computer Architecture.
03Projects

Triton GPU Kernels

PythonOpenAI TritonCUDA
0Custom Kernels
0xFused Speedup
~0xParallel-Scan Perf

Developed custom OpenAI Triton kernels across 7 progressive phases: from elementwise ops and reductions to optimized matrix multiply, FFT, convolution, and Flash Attention v2.

  • Achieved cuBLAS-parity (within 1%) on tiled matmul at N=4096.
  • Benchmarked every kernel against PyTorch/cuBLAS baselines using roofline-model analysis.
  • Applied SRAM tiling, L2-reuse group ordering, and memory coalescing.

Weenix OS Kernel

Cx86Systems
VFSVirtual File System
VMVirtual Memory
QEMUTesting

Constructed core kernel primitives for process scheduling, thread synchronization, and signal handling. Tested end-to-end in QEMU, debugging concurrency hazards across scheduling, VFS, and VM subsystems.

  • Enforced process isolation and memory protection via VFS and VM subsystems.
  • Enabled kernel-level security guarantees for fork, mmap, and open.
  • Devised a split-debugging strategy in QEMU to isolate critical concurrency bugs across scheduling and VFS.

CIFAR DDPM Generator

DDPMU-NetCIFAR-10
U-NetBackbone
0FID Score
0Diffusion Steps

Modular Denoising Diffusion Probabilistic Model trained on CIFAR-10. Implements a full UNet architecture with custom noise scheduling, variance-preserving forward process, and progressive denoising.

  • Built a configurable noise scheduler with linear and cosine beta schedules.
  • Implemented sinusoidal timestep embeddings and residual attention blocks.
  • Achieved class-conditional generation across all 10 CIFAR categories with FID 87.94.

Kinova Gen2 Grasping

ROSAIKIDOIKFast
BiRRTPlanning
TSRGrasping
IKFastIK Solver

End-to-end robotic manipulation pipeline for the Kinova Gen2 arm, integrating BiRRT motion planning, Task-Space Region grasp sampling, and IKFast inverse kinematics via the AIKIDO planning framework.

  • Implemented BiRRT planner with TSR-constrained grasp sampling for reliable object pick-and-place.
  • Integrated IKFast analytic IK solver for real-time joint-space trajectory generation.
  • Validated full pipeline on physical Kinova Gen2 hardware using ROS and AIKIDO.

Snake RL Agent

OpenAI GymStable-Baselines3Deep RL
0Algorithms
0%Training Speedup
0Peak Score

Comparative study of three deep RL agents (A2C, DQN, PPO) trained on a custom OpenAI Gym Snake environment, with reward shaping and curriculum scheduling to accelerate convergence.

  • Built a custom Gym-compatible environment with configurable grid sizes and rendering.
  • Benchmarked A2C, DQN, and PPO; PPO achieved a 60% reduction in training time to target score.
  • Reward shaping and curriculum scheduling pushed peak score to 16 points.

CoverageAgent

LangGraphGemini 2.5BraintrustE2B
0Pipeline Agents
BraintrustLLM Tracing
E2BSandbox

End-to-end AI system that identifies coverage gaps in real Python codebases and autonomously writes, validates, and commits passing tests.

  • Designed a 6-agent LangGraph pipeline spanning gap analysis, generation, eval, and auto-commit.
  • Added multi-stage output filtering before execution as a production-grade LLM reliability guard.
  • Sandboxed test runs in Firecracker micro-VMs with full Braintrust LLM call tracing.

Model Architecture Taxonomy

Torch JITGNNGraph MLNetworkX
0Models
0Extractors
0Families

Compute-graph pipeline that analyzes PyTorch model topology across five signals to recommend kernel optimizations and hardware placement — grounded in graph structure, no model name or documentation required.

  • Each recommendation cites the graph evidence that triggered it and what contradicted it.
  • Arithmetic intensity profiled per op type and matched against chip compute ceilings.
  • Single trace per model; classification, technique matching, and hardware placement all run weight-free.

Stock Trading Platform

Full-StackREST APICloud
AngularFrontend
Node / AWSBackend
AndroidMobile

Full-stack stock research and paper trading platform: single-page web app, REST API, and native Android client, all deployed on AWS.

  • Built real-time stock search and data visualisation pipeline with an Angular SPA backed by a Node/Express REST API.
  • Implemented a simulated trading engine with watchlist management and portfolio P&L analytics.
  • Delivered a native Android client with feature parity to the web app, deployed across AWS EC2 and S3.
04Archive
May 2025
Multi-Backbone Waste Classifier
9-class waste image classifier benchmarking four frozen ImageNet backbones (VGG16, ResNet101, ResNet50, EfficientNetB0) with augmentation and early stopping.
Oct 2024
Monte Carlo Localization
Particle filter–based robot localization in ROS, fusing lidar measurements with an EKF for position estimation.
Oct 2024
Robot Behavioral Cloning
Imitation learning agent trained via behavioral cloning on expert demonstrations in MuJoCo continuous-control environments.
Feb 2024
Traffic Shaper Simulator
Multithreaded traffic shaping simulator in C implementing token-bucket and leaky-bucket algorithms with mutex-guarded queues.
Dec 2023
Hinglish Language Detection
Code-switched Hinglish language detection using a BiLSTM model with FastText subword embeddings.
Oct 2023
HMM POS Tagger
Hidden Markov Model part-of-speech tagger with Viterbi decoding, built for the USC NLP course.
May 2023
SWINDetector
Deepfake detection pipeline using a SWIN Transformer backbone fine-tuned on FaceForensics++ via HuggingFace.
See more on GitHub
github.com/bhavikupadhyay
05Skills
Languages
  • Python
  • C++ / C / C#
  • TypeScript / JavaScript
  • CUDA
  • R
ML & Frameworks
  • PyTorch / TensorFlow
  • OpenAI Triton
  • Hugging Face / LangChain
  • Scikit-Learn / OpenCV
  • Weights & Biases
Systems & Infra
  • Docker / Kubernetes
  • AWS / GCP
  • CMake / Ninja
  • GitHub Actions
  • PostgreSQL / MongoDB
  • Redis
Web
  • React / Next.js
  • Angular
  • Express / Django / FastAPI
  • Framer Motion / Lenis
  • Bootstrap / Tailwind
Robotics
  • ROS
  • OpenAI Gym / Stable-Baselines3
  • MuJoCo
  • AIKIDO
Data
  • NumPy / Pandas
  • Matplotlib
  • Jupyter
  • Pinecone / Chroma
  • PySpark
click a layer
06Certifications
2021

Machine Learning

Stanford University / Coursera

Andrew Ng's ML course covering supervised and unsupervised learning, SVMs, neural networks, and ML system design.

Verify credential
2021

Deep Learning Specialization

DeepLearning.AI / Coursera

Five-course series by Andrew Ng covering neural networks, hyperparameter tuning, CNNs, and sequence models (RNNs, LSTMs, Transformers).

Verify credential
07Contact

Let's build
something.

Open to full-time roles in ML infrastructure, systems engineering, and GPU optimization. If you have an interesting problem, I'd love to hear about it.

Emailbkupadhy@usc.eduGitHubbhavikupadhyayLinkedInbhavik-upadhyay
Download Resume
Buy me a coffee

If anything here was useful, this helps.