San Francisco, CAAvailable 2026

Bhavik
Upadhyay

Applied AI Engineer

I am an end-to-end AI Engineer who builds across the stack, from autonomous AI agents to GPU kernels. I can design and architect new systems from scratch, but I am just as comfortable jumping into an existing codebase to optimize and maintain it.

I thrive in environments that demand velocity and rapid adaptation. Give me an ambiguous problem, and I will ship the solution.

01Experience

Sep 2025 — Present

Backend & Systems Engineer

Easley Dunn Productions

Engineered a C# transaction subsystem in Unity governing card pack acquisition and player inventory state.
Implemented atomic purchase flows with rollback logic to guarantee economy consistency.
Reduced screen transition latency by 13% via a unified canvas architecture and decoupled event handling.

Sep 2024 — May 2025

HPC Researcher

GRIDS @ University of Southern California

Shipped OTTER, a C++17 reverse-mode autodiff library with 16 operations.
Wrote both memory allocators from scratch: CPU on an mmap pool, CUDA with pooled segments, OOM eviction, and cudaMalloc retry before propagating the error.
Built the autograd engine: runtime graph construction, backward traversal, and gradient accumulation across 4 parallel workers with mutex-protected writes into shared leaf tensors.
Separated memory management and kernel dispatch into abstract interfaces: adding a backend requires no changes to Tensor or autograd code, proven across CPU and CUDA with full kernel parity.

Oct 2023 — Dec 2024

Teaching Assistant

University of Southern California

Mentored a cohort of 850+ graduate students across Database Systems and NLP.
Collaborated with faculty to standardize grading criteria, resulting in a 7% improvement in student outcomes.

02Education

Aug 2023 — May 2025

Master of Science

University of Southern California

Computer Science

CGPA: 3.91 / 4.00
Coursework: Machine Learning, Computer Vision, NLP, Robotics, Algorithms, Operating Systems.

Aug 2019 — May 2023

Bachelor of Technology

Jawaharlal Nehru Technological University

Computer Science & IT

CGPA: 3.74 / 4.00
Awarded the Department Gold Medal for academic excellence.
Coursework: Data Structures, Distributed Systems, Computer Architecture.

03Projects

Triton GPU Kernels

PythonOpenAI TritonCUDA

View Source

0Custom Kernels

0xFused Speedup

~0xParallel-Scan Perf

Developed custom OpenAI Triton kernels across 7 progressive phases: from elementwise ops and reductions to optimized matrix multiply, FFT, convolution, and Flash Attention v2.

Achieved cuBLAS-parity (within 1%) on tiled matmul at N=4096.
Benchmarked every kernel against PyTorch/cuBLAS baselines using roofline-model analysis.
Applied SRAM tiling, L2-reuse group ordering, and memory coalescing.

CoverageAgent

LangGraphlitellmGitHub Actions

View Source

0Verification Gates

~70%Fewer LLM Calls

0Model Providers

An AI agent pipeline that fills a codebase's missing tests by reading the surrounding repo. It gathers context, drafts a test for each uncovered gap, then verifies the test holds before keeping it. Ships as a CI-native GitHub Action.

Gated every test behind deterministic checks: 3-run flakiness detection plus a full-suite regression guard.
Clustered sibling branches into one agent conversation, cutting LLM calls by ~70% to keep runs inside free-tier limits.
Ran tests in the caller's own GitHub Actions runner across 7 BYOK model providers.

Model Architecture Taxonomy

Torch JITGNNGraph MLNetworkX

View Source

0Models

0Extractors

0Families

Compute-graph pipeline that analyzes PyTorch model topology across five signals to recommend kernel optimizations and hardware placement — grounded in graph structure, no model name or documentation required.

Each recommendation cites the graph evidence that triggered it and what contradicted it.
Arithmetic intensity profiled per op type and matched against chip compute ceilings.
Single trace per model; classification, technique matching, and hardware placement all run weight-free.

Weenix OS Kernel

Cx86Systems

View Source

VFSVirtual File System

VMVirtual Memory

QEMUTesting

Constructed core kernel primitives for process scheduling, thread synchronization, and signal handling. Tested end-to-end in QEMU, debugging concurrency hazards across scheduling, VFS, and VM subsystems.

Enforced process isolation and memory protection via VFS and VM subsystems.
Enabled kernel-level security guarantees for fork, mmap, and open.
Devised a split-debugging strategy in QEMU to isolate critical concurrency bugs across scheduling and VFS.

CIFAR DDPM Generator

DDPMU-NetCIFAR-10

View Source

U-NetBackbone

0FID Score

0Diffusion Steps

Modular Denoising Diffusion Probabilistic Model trained on CIFAR-10. Implements a full UNet architecture with custom noise scheduling, variance-preserving forward process, and progressive denoising.

Built a configurable noise scheduler with linear and cosine beta schedules.
Implemented sinusoidal timestep embeddings and residual attention blocks.
Achieved class-conditional generation across all 10 CIFAR categories with FID 87.94.

Kinova Gen2 Grasping

ROSAIKIDOIKFast

View Source

BiRRTPlanning

TSRGrasping

IKFastIK Solver

End-to-end robotic manipulation pipeline for the Kinova Gen2 arm, integrating BiRRT motion planning, Task-Space Region grasp sampling, and IKFast inverse kinematics via the AIKIDO planning framework.

Implemented BiRRT planner with TSR-constrained grasp sampling for reliable object pick-and-place.
Integrated IKFast analytic IK solver for real-time joint-space trajectory generation.
Validated full pipeline on physical Kinova Gen2 hardware using ROS and AIKIDO.

Snake RL Agent

OpenAI GymStable-Baselines3Deep RL

View Source

0Algorithms

0%Training Speedup

0Peak Score

Comparative study of three deep RL agents (A2C, DQN, PPO) trained on a custom OpenAI Gym Snake environment, with reward shaping and curriculum scheduling to accelerate convergence.

Built a custom Gym-compatible environment with configurable grid sizes and rendering.
Benchmarked A2C, DQN, and PPO; PPO achieved a 60% reduction in training time to target score.
Reward shaping and curriculum scheduling pushed peak score to 16 points.

Stock Trading Platform

Full-StackREST APICloud

View Source

AngularFrontend

Node / AWSBackend

AndroidMobile

Full-stack stock research and paper trading platform: single-page web app, REST API, and native Android client, all deployed on AWS.

Built real-time stock search and data visualisation pipeline with an Angular SPA backed by a Node/Express REST API.
Implemented a simulated trading engine with watchlist management and portfolio P&L analytics.
Delivered a native Android client with feature parity to the web app, deployed across AWS EC2 and S3.

01 / 08

04Archive

May 2025

Multi-Backbone Waste Classifier

9-class waste image classifier benchmarking four frozen ImageNet backbones (VGG16, ResNet101, ResNet50, EfficientNetB0) with augmentation and early stopping.

Oct 2024

Monte Carlo Localization

Particle filter–based robot localization in ROS, fusing lidar measurements with an EKF for position estimation.

Oct 2024

Robot Behavioral Cloning

Imitation learning agent trained via behavioral cloning on expert demonstrations in MuJoCo continuous-control environments.

Feb 2024

Traffic Shaper Simulator

Multithreaded traffic shaping simulator in C implementing token-bucket and leaky-bucket algorithms with mutex-guarded queues.

Dec 2023

Hinglish Language Detection

Code-switched Hinglish language detection using a BiLSTM model with FastText subword embeddings.

Oct 2023

HMM POS Tagger

Hidden Markov Model part-of-speech tagger with Viterbi decoding, built for the USC NLP course.

May 2023

SWINDetector

Deepfake detection pipeline using a SWIN Transformer backbone fine-tuned on FaceForensics++ via HuggingFace.

See more on GitHub

github.com/bhavikupadhyay

05Skills

Languages

Python
C++ / C / C#
TypeScript / JavaScript
CUDA
R

ML & Frameworks

PyTorch / TensorFlow
OpenAI Triton
Hugging Face / LangChain
Scikit-Learn / OpenCV
Weights & Biases

Systems & Infra

Docker / Kubernetes
AWS / GCP
CMake / Ninja
GitHub Actions
PostgreSQL / MongoDB
Redis

Web

React / Next.js
Angular
Express / Django / FastAPI
Framer Motion / Lenis
Bootstrap / Tailwind

Robotics

ROS
OpenAI Gym / Stable-Baselines3
MuJoCo
AIKIDO

Data

NumPy / Pandas
Matplotlib
Jupyter
Pinecone / Chroma
PySpark

click a layer

06Certifications

2021

Machine Learning

Stanford University / Coursera

Andrew Ng's ML course covering supervised and unsupervised learning, SVMs, neural networks, and ML system design.

Verify credential

2021

Deep Learning Specialization

DeepLearning.AI / Coursera

Five-course series by Andrew Ng covering neural networks, hyperparameter tuning, CNNs, and sequence models (RNNs, LSTMs, Transformers).

Verify credential

07Contact

Let's build
something.

Open to full-time roles in ML infrastructure, systems engineering, and GPU optimization. If you have an interesting problem, I'd love to hear about it.

Emailbkupadhy@usc.edu GitHubbhavikupadhyay LinkedInbhavik-upadhyay

Download Resume

Buy me a coffee

If anything here was useful, this helps.

BhavikUpadhyay

Backend & Systems Engineer

HPC Researcher

Teaching Assistant

Master of Science

Bachelor of Technology

Triton GPU Kernels

CoverageAgent

Model Architecture Taxonomy

Weenix OS Kernel

CIFAR DDPM Generator

Kinova Gen2 Grasping

Snake RL Agent

Stock Trading Platform

Machine Learning

Deep Learning Specialization

Let's buildsomething.

Bhavik
Upadhyay

Let's build
something.