Shauharda
Khadka

Senior Applied Scientist · Microsoft

Bridging foundational reinforcement learning research and production AI engineering at a scale that serves billions of machines every day.

About

I am a researcher and applied scientist (most people call me Shaw) with nearly a decade of experience building artificial intelligence at scale. My work sits at the intersection of deep reinforcement learning, multiagent coordination, and large language models, spanning theoretical contribution and production deployment.

Currently captivated by: how lessons from evolutionary multiagent RL transfer to the post-training of large language models. How should "exploration" adapt when the agent is a 200B-parameter reasoner.

At Microsoft, I cover research and engineering efforts that apply reinforcement learning to fine-tune large language models for cybersecurity and cloud defense, serving billions of Defender queries every day. A growing thread of this work involves reverse-engineering and fine-tuning large language models for security-specific reasoning, where the constraints of latency, scale, and adversarial robustness force interesting research questions and engineering challenges. This work spans engineering, Responsible AI, and the full research-to-production lifecycle. Earlier, at Intel Labs, I applied reinforcement learning to accelerator-aware policy optimization, achieving up to 78% inference speed-up on BERT and ResNet workloads on Intel's NNP-I chip.

PhD in Robotics from Oregon State University, where I worked with Dr. Kagan Tumer on a long-standing question: how do you assign credit in multi-robot systems where success is collective and emergent? My thesis answered with memory-augmented architectures that let robots learn to cooperate without hand-engineered rewards.

Off the clock, I like to paint, dive, and spend time thinking about how biological intelligence is so sample-efficient. The same curiosity that pulled me into physics and mathematics as an undergrad still runs the show.

Research

Reinforcement Learning

Sample efficiency, exploration, and credit assignment: particularly the interplay between gradient-based optimization and population-based search. Foundational work on evolutionary RL and collaborative learner portfolios.

Large Language Models

Post-training large language models with reinforcement learning: fine-tuning, reward modeling, and alignment for domain-specific reasoning, especially in adversarial and high-stakes settings like security.

Cybersecurity

Real-world cybersecurity research that ships: applying learning systems to threat detection and cloud defense at a scale that affects billions of machines every day, where reliability, latency, and adversarial robustness are first-class constraints, not afterthoughts.

Multiagent Systems

Coordination under sparse, coupled rewards; dynamic skill selection; and learning intrinsic reward structures that allow heterogeneous agents to cooperate without hand-engineered shaping.

Robotics & Control

Real-world multi-robot control with reinforcement learning: pursuing the long-running question of how embodied agents learn dexterous, generalizable behavior from limited supervision.

Selected Publications

NeurIPS · 2018

Evolution-Guided Policy Gradients in Reinforcement Learning

S. Khadka, K. Tumer The original ERL paper: combining gradient-based RL with evolutionary search to address exploration and credit-assignment failures in deep RL. Widely cited as a foundation for hybrid RL methods.

ICML · 2019

Collaborative Evolutionary Reinforcement Learning (CERL)

S. Khadka, S. Majumdar, T. Nassar, Z. Dwiel, E. Tumer, S. Miret, Y. Liu, K. Tumer A scalable framework where a portfolio of TD3 learners with varying time-horizons explores together via a shared replay buffer. Solves Mujoco Humanoid where individual learners fail.

ICML · 2020

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination (MERL)

S. Majumdar, S. Khadka, S. Miret, S. McAleer, K. Tumer A hybrid algorithm that resolves the alignment problem between local (per-agent) and global (team) rewards in cooperative multiagent RL, without requiring hand-designed reward shaping.

ICLR · 2021

Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

S. Khadka, E. Aflalo, M. Marder, A. Ben-David, S. Miret, S. Mannor, T. Hazan, H. Tang, S. Majumdar An RL-based compiler for deep learning accelerators that learns where to place tensors in memory. Treats hardware as a learnable graph problem rather than a fixed target.

Evol. Comput. · 2019

Neuroevolution of a Modular Memory-Augmented Neural Network for Deep Memory Problems

S. Khadka, J. J. Chung, K. Tumer Modular Memory Units (MMUs): an external-memory architecture in the lineage of Neural Turing Machines, evolved rather than gradient-trained, for tasks requiring long-horizon recall.

AAMAS · 2019

Memory-based Multiagent One-Shot Learning (DMMU)

S. Khadka, C. Yates, K. Tumer A distributed shared-memory framework enabling teams of agents to adapt jointly from a single demonstration, where the relevant signal is spread across the team rather than within any one agent.

Experience

2020 — Present

Senior Applied Scientist Microsoft · New York

2019 — 2020

Research Scientist Intel Labs · AI Research

2015 — 2019

PhD, Robotics Oregon State University · Advisor: Kagan Tumer

2010 — 2014

BSc, Engineering Physics Ramapo College of New Jersey

Elsewhere

Email 91page@gmail.com → Google Scholar scholar.google.com → LinkedIn /in/shauharda-shaw-khadka-phd →