Bridging foundational reinforcement learning research and production AI engineering at a scale that serves billions of machines every day.
I am a researcher and applied scientist (most people call me Shaw) with nearly a decade of experience building artificial intelligence at scale. My work sits at the intersection of deep reinforcement learning, multiagent coordination, and large language models, spanning theoretical contribution and production deployment.
Currently captivated by: how lessons from evolutionary multiagent RL transfer to the post-training of large language models. How should "exploration" adapt when the agent is a 200B-parameter reasoner.
At Microsoft, I cover research and engineering efforts that apply reinforcement learning to fine-tune large language models for cybersecurity and cloud defense, serving billions of Defender queries every day. A growing thread of this work involves reverse-engineering and fine-tuning large language models for security-specific reasoning, where the constraints of latency, scale, and adversarial robustness force interesting research questions and engineering challenges. This work spans engineering, Responsible AI, and the full research-to-production lifecycle. Earlier, at Intel Labs, I applied reinforcement learning to accelerator-aware policy optimization, achieving up to 78% inference speed-up on BERT and ResNet workloads on Intel's NNP-I chip.
PhD in Robotics from Oregon State University, where I worked with Dr. Kagan Tumer on a long-standing question: how do you assign credit in multi-robot systems where success is collective and emergent? My thesis answered with memory-augmented architectures that let robots learn to cooperate without hand-engineered rewards.
Off the clock, I like to paint, dive, and spend time thinking about how biological intelligence is so sample-efficient. The same curiosity that pulled me into physics and mathematics as an undergrad still runs the show.
Sample efficiency, exploration, and credit assignment: particularly the interplay between gradient-based optimization and population-based search. Foundational work on evolutionary RL and collaborative learner portfolios.
Post-training large language models with reinforcement learning: fine-tuning, reward modeling, and alignment for domain-specific reasoning, especially in adversarial and high-stakes settings like security.
Real-world cybersecurity research that ships: applying learning systems to threat detection and cloud defense at a scale that affects billions of machines every day, where reliability, latency, and adversarial robustness are first-class constraints, not afterthoughts.
Coordination under sparse, coupled rewards; dynamic skill selection; and learning intrinsic reward structures that allow heterogeneous agents to cooperate without hand-engineered shaping.
Real-world multi-robot control with reinforcement learning: pursuing the long-running question of how embodied agents learn dexterous, generalizable behavior from limited supervision.