I'm currently a partner research manager at Microsoft Research Asia (MSRA).
I received my Ph.D. in Computer Science and Technology from Tsinghua University in 2015 and B.S. degrees in Software Engineering from Beihang University in 2010.
I was as a visiting student in UCLA from September 2013 to September 2014. In 2015, I joined MSRA.
ICML’26 Pull Requests as a Training Signal for Repo-Level Code Editing
ICML’26 Beyond Sunk Costs: Boosting LLM Pre-training Efficiency via Orthogonal Growth of Mixture-of-Experts
SoCC’26 SlimeMold: Scaling Distributed Hardware Load Balancers via a Unified Giant Connection Table
ACL’26 Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training
SIGMOD’26 CoddSpeed: Hardware Accelerated Query Processing in Microsoft Fabric
MLSys’26 Virtual Machine NUMA Placement at Scale: Learning the Norm, Shielding the Tail
FSE’26 TSGuard: Automated User-Centric Incident Diagnosis for AI Workloads in the Cloud
NSDI’26 SmartNIC-Enabled Live Migration for Storage-Optimized VMs with PYROCUMULUS
ASPLOS’26 MSCCL++: Rethinking GPU Communication Abstractions for AI Inference
ICML’25 Optimizing Large Language Model Training Using FP4 Quantization
ICLR’25 Integrative Decoding: Improve Factuality via Implicit Self-consistency
ICLR’25 Automated Proof Generation for Rust Code via Self-Evolution
AAAI’25 Enhancing Large Language Model Performance with Gradient-Based Parameter Selection
MICRO’24 NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
ATC’24 SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation
SOSP’23 SPFresh: Incremental In-Place Update for Billion-Scale Vector Search
MLSys’23 Tutel: Adaptive Mixture-of-Experts at Scale
NSDI’23 ARK: GPU-driven Code Execution for Distributed Deep Learning
ASPLOS’23 ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
CoNEXT’22 PipeDevice: A Hardware-Software Co-Design Approach to Intra-Host Container Communication
NeurIPS’22 An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context
ICWS’22 RuleCache: Accelerating Web Application Firewalls by On-line Learning Traffic Patterns
ATC’22 PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training
INFOCOM’21 NFD: Using Behavior Models to Develop Cross-Platform Network Functions
ATC’20 NetKernel: Making Network Stack Part of the Virtualized Infrastructure
NSDI’19 Direct Universal Access: Making Data Center Resources Available to FPGA
ICNP’18 Micro-Burst in Data Centers: Observations, Analysis, and Mitigations
NSDI’18 Multi-Path Transport for RDMA in Datacenters
CoNEXT’17 Tagger: Practical PFC Deadlock Prevention in Data Center Networks
SIGCOMM’16 ClickNP: Highly flexible and High-performance Network Processing with Reconfigurable Hardware
ATC’16 Fast and Cautious: Leveraging Multi-path Diversity for Transport Loss Recovery in Data Centers
EuroSys’16 TFC: Token Flow Control in Data Center Networks
NSDI’14 Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Center