Publications
Here is the link to my Google Scholar which summarizes the latest publications and citations.
Conferences
Enhancing Large Language Model Performance with Gradient-Based Parameter Selection
Haoling Li, Xin Zhang, Xiao Liu, Yeyun Gong, Yifan Wang, Qi Chen, and Peng Cheng
AAAI 2025. (To Appear)
MINA: Fine-grained In-network Aggregation Resource Scheduling for Machine Learning Service
Shichen Dong, Zhixiong Niu, Mingchao Zhang, Zhiying Xu, Chuntao Hu, Pengzhi Zhu, Qingchun Song, Lei Qu, Peng Cheng, Cam-Tu Nguyen, Shaoling Sun, Xiaohu Xu, Yongqiang Xiong, Wei Wang, and Xiaoliang Wang
IEEE INFOCOM 2025. (To Appear)
NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
Zhe Zhou, Yiqi Chen, Tao Zhang, Yang Wang, Ran Shu, Shuotao Xu, Peng Cheng, Lei Qu, Yongqiang Xiong, and Guangyu Sun
MICRO 2024.
SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation
Yifan Xiong, Yuting Jiang, Ziyue Yang, Lei Qu, Guoshuai Zhao, Shuguang Liu, Dong Zhong, Boris Pinzur, Jie Zhang, Yang Wang, Jithin Jose, Hossein Pourreza, Jeff Baxter, Kushal Datta, Prabhat Ram, Luke Melton, Joe Chau, Peng Cheng, Yongqiang Xiong, and Lidong Zhou
USENIX ATC 2024. (Best Paper Award)
SPFresh: Incremental In-Place Update for Billion-Scale Vector Search
Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang
SOSP 2023.
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu, Han Hu, Zilong Wang, Rafael Salas, Jithin Jose, Prabhat Ram, Hoyuen Chau, Peng Cheng, Fan Yang, Mao Yang, and Yongqiang Xiong
MLSys 2023.
ARK: GPU-driven Code Execution for Distributed Deep Learning
Changho Hwang, KyoungSoo Park, Ran Shu, Xinyuan Qu, Peng Cheng, and Yongqiang Xiong
NSDI 2023.
ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
Diandian Gu, Yihao Zhao, Yinmin Zhong, Yifan Xiong, Zhenhua Han, Peng Cheng, Fan Yang, Gang Huang, Xin Jin, and Xuanzhe Liu
ASPLOS 2023.
PipeDevice: A Hardware-Software Co-Design Approach to Intra-Host Container Communication
Qiang Su, Chuanwen Wang, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Dongsu Han, Chun Jason Xue, and Hong Xu
CoNEXT 2022.
An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context
Xiaoyu Chen, Xiangming Zhu, Yufeng Zheng, Pushi Zhang, Li Zhao, Wenxue Cheng, Peng Cheng, Yongqiang Xiong, Tao Qin, Jianyu Chen, and Tie-Yan Liu
NeurIPS 2022.
RuleCache: Accelerating Web Application Firewalls by On-line Learning Traffic Patterns
Xiaoyi Chen, Qingni Shen, Peng Cheng, Yongqiang Xiong and Zhonghai Wu
IEEE ICWS 2022.
PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training
Wei Zhang, Binghao Chen, Zhenhua Han, Quan Chen, Peng Cheng, Fan Yang, Ran Shu, Yuqing Yang, and Minyi Guo
USENIX ATC 2022.
Moneo: Non-intrusive Fine-grained Monitor for AI Infrastructure
Yuting Jiang, Yifan Xiong, Lei Qu, Cheng Luo, Chen Tian, Peng Cheng, and Yongqiang Xiong
IEEE ICC 2022.
NFD: Using Behavior Models to Develop Cross-Platform Network Functions
Hongyi Huang, Wenfei Wu, Yongchao He, Bangwen Deng, Ying Zhang, Yongqiang Xiong, Guo Chen, Yong Cui, and Peng Cheng
IEEE INFOCOM 2021.
NetKernel: Making Network Stack Part of the Virtualized Infrastructure
Zhixiong Niu, Hong Xu, Peng Cheng, Qiang Su, Yongqiang Xiong, Tao Wang, Dongsu Han, and Keith Winstein
USENIX ATC 2020.
Dlbooster: Boosting End-to-end Deep Learning Workflows with Offloading Data Preprocessing Pipelines
Yang Cheng, Dan Li, Zhiyuan Guo, Binyao Jiang, Jiaxin Lin, Xi Fan, Jinkun Geng, Xinyi Yu, Wei Bai, Lei Qu, Ran Shu, Peng Cheng, Yongqiang Xiong, and Jianping Wu
ICPP 2019.
Direct Universal Access: Making Data Center Resources Available to FPGA
Ran Shu, Peng Cheng, Guo Chen, Zhiyuan Guo, Lei Qu, Yongqiang Xiong, Derek Chiou, and Thomas Moscibroda
USENIX NSDI 2019.
Micro-Burst in Data Centers: Observations, Analysis, and Mitigations
Danfeng Shan, Fengyuan Ren, Peng Cheng, Ran Shu, and Chuanxiong Guo
IEEE ICNP 2018.
Multi-Path Transport for RDMA in Datacenters
Yuanwei Lu, Guo Chen, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, Enhong Chen, and Thomas Moscibroda
USENIX NSDI 2018.
Tagger: Practical PFC Deadlock Prevention in Data Center Networks
Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, and Kai Chen
ACM CoNEXT 2017.
Performance Analysis of Randomized Data Fetching in Cluster Computing
Tong Zhang, Peng Cheng, Wenxue Chen, Bo Wang, and Fengyuan Ren
IEEE IWQos 2017.
ClickNP: Highly flexible and High-performance Network Processing with Reconfigurable Hardware
Bojie Li, Kun Tan, Layong Larry Luo, Yanqing Peng, Renqian Luo, Ningyi Xu, Yongqiang Xiong, and Peng Cheng
ACM SIGCOMM 2016.
Fast and Cautious: Leveraging Multi-path Diversity for Transport Loss Recovery in Data Centers
Guo Chen, Yuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong Larry Luo, Yongqiang Xiong, Xiaoliang Wang, and Youjian Zhao
USENIX ATC 2016.
TFC: Token Flow Control in Data Center Networks
Jiao Zhang, Fengyuan Ren, Ran Shu, and Peng Cheng
EuroSys 2016.
Slowing Little Quickens More: Improving DCTCP for Massive Concurrent Flows
Mao Miao, Peng Cheng, Fengyuan Ren, and Ran Shu
ICPP 2015.
Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Center
Peng Cheng, Fengyuan Ren, Ran Shu, Chuang Lin
USENIX NSDI 2014.
Congestion Integrated Control in Virtualized Clouds
Yumeng Zhang, Fengyuan Ren, Peng Cheng, Mao Miao, and Chuang Lin
IEEE PIC 2014.
Ease the Queue Oscillation: Analysis and Enhancement of DCTCP
Wen Chen, Peng Cheng, Fengyuan Ren, Ran Shu, and Chuang Lin
IEEE ICDCS 2013.
Access Points Can Tell More for Wiser Selection
Shibo Xu, Fengyuan Ren, Yinsheng Xu, Peng Cheng, and Chuang Lin
IEEE ICCCN 2013.
Workshops
SmartNIC-enabled Live Migration for Storage Optimized VMs
Jiechen Zhao, Ran Shu, Lei Qu, Ziyue Yang, Natalie Enright Jerger, Derek Chiou, Peng Cheng, and Yongqiang Xiong
APSys 2024.
SegaNet: An Advanced IoT Cloud Gateway for Performant and Priority-Oriented Message Delivery
Yeonho Yoo, Zhixiong Niu, Chuck Yoo, Peng Cheng, and Yongqiang Xiong
APNet 2023.
SlimeMold: Hardware Load Balancer at Scale in Datacenter
Ziyuan Liu, Zhixiong Niu, Ran Shu, Liang Gao, Guohong Lai, Na Wang, Zongying He, Jacob Nelson, Dan R. K. Ports, Lihua Yuan, Peng Cheng, and Yongqiang Xiong
APNet 2023.
Query Processing on Gaming Consoles
Wei Cui, Qianxi Zhang, Spyros Blanas, Jesús Camacho-Rodríguez, Brandon Haynes, Yinan Li, Ravi Ramamurthy, Peng Cheng, Rathijit Sen, and Matteo Interlandi
DAMON 2023.
OpenNetLab: Open Platform for RL-based Congestion Control for Real-Time Communications
Jeongyoon Eo, Zhixiong Niu, Wenxue Cheng, Francis Y. Yan, Rui Gao, Jorina Kardhashi, Scott Inglis, Michael Revow, Byung-Gon Chun, Peng Cheng, and Yongqiang Xiong
APNet 2022. (Best Paper Award)
A Disaggregate Data Collecting Approach for Loss-Tolerant Applications
Ziyuan Liu, Zhixiong Niu, Ran Shu, Wenxue Cheng, Peng Cheng, Yongqiang Xiong, Lihua Yuan, Jacob Nelson, and Dan R. K. Ports
APNet 2022.
Towards GPU-driven Code Execution for Distributed Deep Learning
Changho Hwang, KyoungSoo Park, Ran Shu, Xinyuan Qu, Peng Cheng, and Yongqiang Xiong
ISCA 2022 Workshop MLArchSys. (Best Paper Award)
Accelerating GNN Training with Locality-aware Partial Execution
Taehyun Kim, Changho Hwang, KyoungSoo Park, Zhiqi Lin, Peng Cheng, Youshan Miao, Lingxiao Ma, and Yongqiang Xiong
ACM APSys 2021. (Best Paper Award)
Enhanced Control Path for Repeated TCP Connections
Junho Lee, Gyeongsik Yang, Zhixiong Niu, Peng Cheng, Yongqiang Xiong, and Chuck Yoo
ACM APSys 2021.
Towards User-defined SLA in Cloud Flash Storage
Jinhao Fan, Ziyue Yang, Ran Shu, Peng Cheng, and Yongqiang Xiong
ACM APSys 2021.
Network Stack as a Service in the Cloud
Zhixiong Niu, Hong Xu, Dongsu Han, Peng Cheng, Yongqiang Xiong, Guo Chen, and Keith Winstein
ACM Hotnets 2017.
Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter
Yuanwei Lu, Guo Chen, Zhenyuan Ruan, Wencong Xiao, Bojie Li, Jiansong Zhang, Yongqiang Xiong, Peng Cheng, and Enhong Chen
APNet 2017.
The Feniks FPGA Operating System for Cloud Computing
Jiansong Zhang, Yongqiang Xiong, Ningyi Xu, Ran Shu, Bojie Li, Peng Cheng, Guo Chen, and Thomas Moscibroda
APSys 2017.
Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them
Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, and Kai Chen
ACM Hotnets 2016.
Journals
Intelligent Packet Processing for Performant Containers in IoT
Wonmi Choi, Yeonho Yoo, Kyungwoon Lee, Zhixiong Niu, Peng Cheng, Yongqiang Xiong, Gyeongsik Yang, and Chuck Yoo
IEEE Internet of Things Journal 2024.
Polaris: Enhancing CXL-based Memory Expanders with Memory-side Prefetching
Zhe Zhou, Shuotao Xu, Yiqi Chen, Tao Zhang, Ran Shu, Lei Qu, Peng Cheng, Yongqiang Xiong, and Guangyu Sun
Advanced Parallel Processing Technologies 2023.
Moneo: Non-intrusive Fine-grained Monitor for AI Infrastructure
Yuting Jiang, Yifan Xiong, Lei Qu, Cheng Luo, Chen Tian, Peng Cheng, and Yongqiang Xiong
ACM SIGOPS Operating Systems Review 2022.
NetKernel: Making Network Stack Part of the Virtualized Infrastructure
Zhixiong Niu, Qiang Su, Peng Cheng, Yongqiang Xiong, Dongsu Han, Keith Winstein, Chun Jason Xue, and Hong Xu
IEEE/ACM Transactions on Networking 2021.
Observing and Mitigating Micro-Burst Traffic in Data Center Networks
Danfeng Shan, Fengyuan Ren, Peng Cheng, Ran Shu, and Chuanxiong Guo
IEEE/ACM Transactions on Networking 2019.
MP-RDMA: Enabling RDMA With Multi-Path Transport in Datacenters
Guo Chen, Yuanwei Lu, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, and Thomas Moscibroda
IEEE/ACM Transactions on Networking 2019.
Fuso: Fast Multi-path Loss Recovery for Data Center Networks
Guo Chen, Yuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong Luo, Yongqiang Xiong, Xiaoliang Wang, and Youjian Zhao
IEEE/ACM Transactions on Networking 2018.
An Energy Efficiency Perspective on Rate Adaptation for 802.11 n NIC
Chi-Yu Li, Chunyi Peng, Peng Cheng, Songwu Lu, Xinbing Wang, Fengyuan Ren, and Tao Wang
IEEE Transactions on Mobile Computing 2015.
Posters and Demos
Meili: Towards SmartNIC as a Service
Qiang Su, Shaofeng Wu, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Chun Jason Xue, Zaoxing Liu, and Hong Xu
ACM SIGCOMM 2023 (Posters and Demos).
PipeDevice: A Hardware-Software Co-Design Approach to Intra-Host Container Communication
Qiang Su, Chuanwen Wang, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Dongsu Han, Chun Jason Xue, and Hong Xu
ACM SIGCOMM 2022 (Posters and Demos).
Simulating Performance of ML Systems with Offline Profiling
Hongming Huang, Peng Cheng, Hong Xu, and Yongqiang Xiong
MLOps 2019 (Poster).
Reinforcement Learning for Bandwidth Estimation and Congestion Control in Real-time Communications
Joyce Fang, Martin Ellis, Bin Li, Siyao Liu, Yasaman Hosseinkashi, Michael Revow, Albert Sadovnikov, Ziyuan Liu, Peng Cheng, Sachin Ashok, David Zhao, Ross Cutler, Yan Lu, and Johannes Gehrke
NeurIPS 2019 Workshop on Machine Learning for Systems (Poster).
Patents
Mixture-of-experts layer with dynamic gating
Yifan Xiong, Changho Hwang, Wei Cui, Yang Ziyue, Ze Liu, Han Hu, Zilong Wang, Rafael Omar Salas, Jithin Jose, Ram Prabhat, Ho-Yuen Chau, Peng Cheng, Fan Yang, Yang Mao, Yongqiang Xiong
US Patent App. 18054,451, 202405/16.
Mixture-of-experts layer with switchable parallel modes
Yifan Xiong, Changho Hwang, Wei Cui, Yang Ziyue, Ze Liu, Han Hu, Zilong Wang, Rafael Omar Salas, Jithin Jose, Ram Prabhat, Ho-Yuen Chau, Peng Cheng, Fan Yang, Yang Mao, Yongqiang Xiong
US Patent App. 18054,446, 202405/16.
Collective communication phases at mixture-of-experts layer
Yifan Xiong, Changho Hwang, Wei Cui, Yang Ziyue, Ze Liu, Han Hu, Zilong Wang, Rafael Omar Salas, Jithin Jose, Ram Prabhat, Ho-Yuen Chau, Peng Cheng, Fan Yang, Yang Mao, Yongqiang Xiong
US Patent App. 18054,452, 202405/16.
Multi-path RDMA Transmission
Guo Chen, Thomas Moscibroda, Peng Cheng, Yuanwei Lu, and Yongqiang Xiong
US Patent 11,308,024, 20220419.
Communications for Field Programmable Gate Array Device
Peng Cheng, Ran Shu, Guo Chen, YongQiang Xiong, Jiansong Zhang, Ningyi Xu, and Thomas Moscibroda
US Patent 11,042,497, 20220208.
Communication Between Field Programmable Gate Arrays
Peng Cheng, Ran Shu, Guo Chen, Yongqiang Xiong, Jiansong Zhang, Ningyi Xu, and Thomas Moscibroda
US Patent 11,042,497, 20210622.
Bot Behavior Detection
Yang Luo, Peng Cheng, Yongqiang Xiong, and Qian Li
US Patent App. 16442,819, 202012/17.
Preprints
Meili: Enabling SmartNIC as a Service in the Cloud
Qiang Su, Shaofeng Wu, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Chun Jason Xue, Zaoxing Liu, and Hong Xu
arXiv 2023.
FP8-LM: Training FP8 Large Language Models
Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu, Ruihang Li, Miaosen Zhang, Chen Li, Jia Ning, Ruizhe Wang, Zheng Zhang, Shuguang Liu, Joe Chau, Han Hu, and Peng Cheng
arXiv 2023.
CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner
Cheng Luo, Lei Qu, Youshan Miao, Peng Cheng, and Yongqiang Xiong
arXiv 2021.
BotGraph: Web Bot Detection Based on Sitemap
Yang Luo, Guozhen She, Peng Cheng, and Yongqiang Xiong
arXiv 2019.
|