Publications

Here is the link to my Google Scholar which summarizes the latest publications and citations.

Conferences

  1. NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
    Zhe Zhou, Yiqi Chen, Tao Zhang, Yang Wang, Ran Shu, Shuotao Xu, Peng Cheng, Lei Qu, Yongqiang Xiong, and Guangyu Sun
    MICRO 2024. (To Appear)

  2. SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation
    Yifan Xiong, Yuting Jiang, Ziyue Yang, Lei Qu, Guoshuai Zhao, Shuguang Liu, Dong Zhong, Boris Pinzur, Jie Zhang, Yang Wang, Jithin Jose, Hossein Pourreza, Jeff Baxter, Kushal Datta, Prabhat Ram, Luke Melton, Joe Chau, Peng Cheng, Yongqiang Xiong, and Lidong Zhou
    USENIX ATC 2024. (Best Paper Award)

  3. SPFresh: Incremental In-Place Update for Billion-Scale Vector Search
    Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang
    SOSP 2023.

  4. Tutel: Adaptive Mixture-of-Experts at Scale
    Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu, Han Hu, Zilong Wang, Rafael Salas, Jithin Jose, Prabhat Ram, Hoyuen Chau, Peng Cheng, Fan Yang, Mao Yang, and Yongqiang Xiong
    MLSys 2023.

  5. ARK: GPU-driven Code Execution for Distributed Deep Learning
    Changho Hwang, KyoungSoo Park, Ran Shu, Xinyuan Qu, Peng Cheng, and Yongqiang Xiong
    NSDI 2023.

  6. ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
    Diandian Gu, Yihao Zhao, Yinmin Zhong, Yifan Xiong, Zhenhua Han, Peng Cheng, Fan Yang, Gang Huang, Xin Jin, and Xuanzhe Liu
    ASPLOS 2023.

  7. PipeDevice: A Hardware-Software Co-Design Approach to Intra-Host Container Communication
    Qiang Su, Chuanwen Wang, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Dongsu Han, Chun Jason Xue, and Hong Xu
    CoNEXT 2022.

  8. An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context
    Xiaoyu Chen, Xiangming Zhu, Yufeng Zheng, Pushi Zhang, Li Zhao, Wenxue Cheng, Peng Cheng, Yongqiang Xiong, Tao Qin, Jianyu Chen, and Tie-Yan Liu
    NeurIPS 2022.

  9. RuleCache: Accelerating Web Application Firewalls by On-line Learning Traffic Patterns
    Xiaoyi Chen, Qingni Shen, Peng Cheng, Yongqiang Xiong and Zhonghai Wu
    IEEE ICWS 2022.

  10. PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training
    Wei Zhang, Binghao Chen, Zhenhua Han, Quan Chen, Peng Cheng, Fan Yang, Ran Shu, Yuqing Yang, and Minyi Guo
    USENIX ATC 2022.

  11. Moneo: Non-intrusive Fine-grained Monitor for AI Infrastructure
    Yuting Jiang, Yifan Xiong, Lei Qu, Cheng Luo, Chen Tian, Peng Cheng, and Yongqiang Xiong
    IEEE ICC 2022.

  12. NFD: Using Behavior Models to Develop Cross-Platform Network Functions
    Hongyi Huang, Wenfei Wu, Yongchao He, Bangwen Deng, Ying Zhang, Yongqiang Xiong, Guo Chen, Yong Cui, and Peng Cheng
    IEEE INFOCOM 2021.

  13. NetKernel: Making Network Stack Part of the Virtualized Infrastructure
    Zhixiong Niu, Hong Xu, Peng Cheng, Qiang Su, Yongqiang Xiong, Tao Wang, Dongsu Han, and Keith Winstein
    USENIX ATC 2020.

  14. Dlbooster: Boosting End-to-end Deep Learning Workflows with Offloading Data Preprocessing Pipelines
    Yang Cheng, Dan Li, Zhiyuan Guo, Binyao Jiang, Jiaxin Lin, Xi Fan, Jinkun Geng, Xinyi Yu, Wei Bai, Lei Qu, Ran Shu, Peng Cheng, Yongqiang Xiong, and Jianping Wu
    ICPP 2019.

  15. Direct Universal Access: Making Data Center Resources Available to FPGA
    Ran Shu, Peng Cheng, Guo Chen, Zhiyuan Guo, Lei Qu, Yongqiang Xiong, Derek Chiou, and Thomas Moscibroda
    USENIX NSDI 2019.

  16. Micro-Burst in Data Centers: Observations, Analysis, and Mitigations
    Danfeng Shan, Fengyuan Ren, Peng Cheng, Ran Shu, and Chuanxiong Guo
    IEEE ICNP 2018.

  17. Multi-Path Transport for RDMA in Datacenters
    Yuanwei Lu, Guo Chen, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, Enhong Chen, and Thomas Moscibroda
    USENIX NSDI 2018.

  18. Tagger: Practical PFC Deadlock Prevention in Data Center Networks
    Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, and Kai Chen
    ACM CoNEXT 2017.

  19. Performance Analysis of Randomized Data Fetching in Cluster Computing
    Tong Zhang, Peng Cheng, Wenxue Chen, Bo Wang, and Fengyuan Ren
    IEEE IWQos 2017.

  20. ClickNP: Highly flexible and High-performance Network Processing with Reconfigurable Hardware
    Bojie Li, Kun Tan, Layong Larry Luo, Yanqing Peng, Renqian Luo, Ningyi Xu, Yongqiang Xiong, and Peng Cheng
    ACM SIGCOMM 2016.

  21. Fast and Cautious: Leveraging Multi-path Diversity for Transport Loss Recovery in Data Centers
    Guo Chen, Yuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong Larry Luo, Yongqiang Xiong, Xiaoliang Wang, and Youjian Zhao
    USENIX ATC 2016.

  22. TFC: Token Flow Control in Data Center Networks
    Jiao Zhang, Fengyuan Ren, Ran Shu, and Peng Cheng
    EuroSys 2016.

  23. Slowing Little Quickens More: Improving DCTCP for Massive Concurrent Flows
    Mao Miao, Peng Cheng, Fengyuan Ren, and Ran Shu
    ICPP 2015.

  24. Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Center
    Peng Cheng, Fengyuan Ren, Ran Shu, Chuang Lin
    USENIX NSDI 2014.

  25. Congestion Integrated Control in Virtualized Clouds
    Yumeng Zhang, Fengyuan Ren, Peng Cheng, Mao Miao, and Chuang Lin
    IEEE PIC 2014.

  26. Ease the Queue Oscillation: Analysis and Enhancement of DCTCP
    Wen Chen, Peng Cheng, Fengyuan Ren, Ran Shu, and Chuang Lin
    IEEE ICDCS 2013.

  27. Access Points Can Tell More for Wiser Selection
    Shibo Xu, Fengyuan Ren, Yinsheng Xu, Peng Cheng, and Chuang Lin
    IEEE ICCCN 2013.

Workshops

  1. SmartNIC-enabled Live Migration for Storage Optimized VMs
    Jiechen Zhao, Ran Shu, Lei Qu, Ziyue Yang, Natalie Enright Jerger, Derek Chiou, Peng Cheng, and Yongqiang Xiong
    APSys 2024.

  2. SegaNet: An Advanced IoT Cloud Gateway for Performant and Priority-Oriented Message Delivery
    Yeonho Yoo, Zhixiong Niu, Chuck Yoo, Peng Cheng, and Yongqiang Xiong
    APNet 2023.

  3. SlimeMold: Hardware Load Balancer at Scale in Datacenter
    Ziyuan Liu, Zhixiong Niu, Ran Shu, Liang Gao, Guohong Lai, Na Wang, Zongying He, Jacob Nelson, Dan R. K. Ports, Lihua Yuan, Peng Cheng, and Yongqiang Xiong
    APNet 2023.

  4. Query Processing on Gaming Consoles
    Wei Cui, Qianxi Zhang, Spyros Blanas, Jesús Camacho-Rodríguez, Brandon Haynes, Yinan Li, Ravi Ramamurthy, Peng Cheng, Rathijit Sen, and Matteo Interlandi
    DAMON 2023.

  5. OpenNetLab: Open Platform for RL-based Congestion Control for Real-Time Communications
    Jeongyoon Eo, Zhixiong Niu, Wenxue Cheng, Francis Y. Yan, Rui Gao, Jorina Kardhashi, Scott Inglis, Michael Revow, Byung-Gon Chun, Peng Cheng, and Yongqiang Xiong
    APNet 2022. (Best Paper Award)

  6. A Disaggregate Data Collecting Approach for Loss-Tolerant Applications
    Ziyuan Liu, Zhixiong Niu, Ran Shu, Wenxue Cheng, Peng Cheng, Yongqiang Xiong, Lihua Yuan, Jacob Nelson, and Dan R. K. Ports
    APNet 2022.

  7. Towards GPU-driven Code Execution for Distributed Deep Learning
    Changho Hwang, KyoungSoo Park, Ran Shu, Xinyuan Qu, Peng Cheng, and Yongqiang Xiong
    ISCA 2022 Workshop MLArchSys. (Best Paper Award)

  8. Accelerating GNN Training with Locality-aware Partial Execution
    Taehyun Kim, Changho Hwang, KyoungSoo Park, Zhiqi Lin, Peng Cheng, Youshan Miao, Lingxiao Ma, and Yongqiang Xiong
    ACM APSys 2021. (Best Paper Award)

  9. Enhanced Control Path for Repeated TCP Connections
    Junho Lee, Gyeongsik Yang, Zhixiong Niu, Peng Cheng, Yongqiang Xiong, and Chuck Yoo
    ACM APSys 2021.

  10. Towards User-defined SLA in Cloud Flash Storage
    Jinhao Fan, Ziyue Yang, Ran Shu, Peng Cheng, and Yongqiang Xiong
    ACM APSys 2021.

  11. Network Stack as a Service in the Cloud
    Zhixiong Niu, Hong Xu, Dongsu Han, Peng Cheng, Yongqiang Xiong, Guo Chen, and Keith Winstein
    ACM Hotnets 2017.

  12. Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter
    Yuanwei Lu, Guo Chen, Zhenyuan Ruan, Wencong Xiao, Bojie Li, Jiansong Zhang, Yongqiang Xiong, Peng Cheng, and Enhong Chen
    APNet 2017.

  13. The Feniks FPGA Operating System for Cloud Computing
    Jiansong Zhang, Yongqiang Xiong, Ningyi Xu, Ran Shu, Bojie Li, Peng Cheng, Guo Chen, and Thomas Moscibroda
    APSys 2017.

  14. Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them
    Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, and Kai Chen
    ACM Hotnets 2016.

Journals

  1. Intelligent Packet Processing for Performant Containers in IoT
    Wonmi Choi, Yeonho Yoo, Kyungwoon Lee, Zhixiong Niu, Peng Cheng, Yongqiang Xiong, Gyeongsik Yang, and Chuck Yoo
    IEEE Internet of Things Journal 2024.

  2. Polaris: Enhancing CXL-based Memory Expanders with Memory-side Prefetching
    Zhe Zhou, Shuotao Xu, Yiqi Chen, Tao Zhang, Ran Shu, Lei Qu, Peng Cheng, Yongqiang Xiong, and Guangyu Sun
    Advanced Parallel Processing Technologies 2023.

  3. Moneo: Non-intrusive Fine-grained Monitor for AI Infrastructure
    Yuting Jiang, Yifan Xiong, Lei Qu, Cheng Luo, Chen Tian, Peng Cheng, and Yongqiang Xiong
    ACM SIGOPS Operating Systems Review 2022.

  4. NetKernel: Making Network Stack Part of the Virtualized Infrastructure
    Zhixiong Niu, Qiang Su, Peng Cheng, Yongqiang Xiong, Dongsu Han, Keith Winstein, Chun Jason Xue, and Hong Xu
    IEEE/ACM Transactions on Networking 2021.

  5. Observing and Mitigating Micro-Burst Traffic in Data Center Networks
    Danfeng Shan, Fengyuan Ren, Peng Cheng, Ran Shu, and Chuanxiong Guo
    IEEE/ACM Transactions on Networking 2019.

  6. MP-RDMA: Enabling RDMA With Multi-Path Transport in Datacenters
    Guo Chen, Yuanwei Lu, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, and Thomas Moscibroda
    IEEE/ACM Transactions on Networking 2019.

  7. Fuso: Fast Multi-path Loss Recovery for Data Center Networks
    Guo Chen, Yuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong Luo, Yongqiang Xiong, Xiaoliang Wang, and Youjian Zhao
    IEEE/ACM Transactions on Networking 2018.

  8. An Energy Efficiency Perspective on Rate Adaptation for 802.11 n NIC
    Chi-Yu Li, Chunyi Peng, Peng Cheng, Songwu Lu, Xinbing Wang, Fengyuan Ren, and Tao Wang
    IEEE Transactions on Mobile Computing 2015.

Posters and Demos

  1. Meili: Towards SmartNIC as a Service
    Qiang Su, Shaofeng Wu, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Chun Jason Xue, Zaoxing Liu, and Hong Xu
    ACM SIGCOMM 2023 (Posters and Demos).

  2. PipeDevice: A Hardware-Software Co-Design Approach to Intra-Host Container Communication
    Qiang Su, Chuanwen Wang, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Dongsu Han, Chun Jason Xue, and Hong Xu
    ACM SIGCOMM 2022 (Posters and Demos).

  3. Simulating Performance of ML Systems with Offline Profiling
    Hongming Huang, Peng Cheng, Hong Xu, and Yongqiang Xiong
    MLOps 2019 (Poster).

  4. Reinforcement Learning for Bandwidth Estimation and Congestion Control in Real-time Communications
    Joyce Fang, Martin Ellis, Bin Li, Siyao Liu, Yasaman Hosseinkashi, Michael Revow, Albert Sadovnikov, Ziyuan Liu, Peng Cheng, Sachin Ashok, David Zhao, Ross Cutler, Yan Lu, and Johannes Gehrke
    NeurIPS 2019 Workshop on Machine Learning for Systems (Poster).

Patents

  1. Mixture-of-experts layer with dynamic gating
    Yifan Xiong, Changho Hwang, Wei Cui, Yang Ziyue, Ze Liu, Han Hu, Zilong Wang, Rafael Omar Salas, Jithin Jose, Ram Prabhat, Ho-Yuen Chau, Peng Cheng, Fan Yang, Yang Mao, Yongqiang Xiong
    US Patent App. 18054,451, 202405/16.

  2. Mixture-of-experts layer with switchable parallel modes
    Yifan Xiong, Changho Hwang, Wei Cui, Yang Ziyue, Ze Liu, Han Hu, Zilong Wang, Rafael Omar Salas, Jithin Jose, Ram Prabhat, Ho-Yuen Chau, Peng Cheng, Fan Yang, Yang Mao, Yongqiang Xiong
    US Patent App. 18054,446, 202405/16.

  3. Collective communication phases at mixture-of-experts layer
    Yifan Xiong, Changho Hwang, Wei Cui, Yang Ziyue, Ze Liu, Han Hu, Zilong Wang, Rafael Omar Salas, Jithin Jose, Ram Prabhat, Ho-Yuen Chau, Peng Cheng, Fan Yang, Yang Mao, Yongqiang Xiong
    US Patent App. 18054,452, 202405/16.

  4. Multi-path RDMA Transmission
    Guo Chen, Thomas Moscibroda, Peng Cheng, Yuanwei Lu, and Yongqiang Xiong
    US Patent 11,308,024, 20220419.

  5. Communications for Field Programmable Gate Array Device
    Peng Cheng, Ran Shu, Guo Chen, YongQiang Xiong, Jiansong Zhang, Ningyi Xu, and Thomas Moscibroda
    US Patent 11,042,497, 20220208.

  6. Communication Between Field Programmable Gate Arrays
    Peng Cheng, Ran Shu, Guo Chen, Yongqiang Xiong, Jiansong Zhang, Ningyi Xu, and Thomas Moscibroda
    US Patent 11,042,497, 20210622.

  7. Bot Behavior Detection
    Yang Luo, Peng Cheng, Yongqiang Xiong, and Qian Li
    US Patent App. 16442,819, 202012/17.

Preprints

  1. Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance
    Haoling Li, Xin Zhang, Xiao Liu, Yeyun Gong, Yifan Wang, Yujiu Yang, Qi Chen, and Peng Cheng
    arXiv 2024.

  2. Meili: Enabling SmartNIC as a Service in the Cloud
    Qiang Su, Shaofeng Wu, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Chun Jason Xue, Zaoxing Liu, and Hong Xu
    arXiv 2023.

  3. FP8-LM: Training FP8 Large Language Models
    Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu, Ruihang Li, Miaosen Zhang, Chen Li, Jia Ning, Ruizhe Wang, Zheng Zhang, Shuguang Liu, Joe Chau, Han Hu, and Peng Cheng
    arXiv 2023.

  4. CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner
    Cheng Luo, Lei Qu, Youshan Miao, Peng Cheng, and Yongqiang Xiong
    arXiv 2021.

  5. BotGraph: Web Bot Detection Based on Sitemap
    Yang Luo, Guozhen She, Peng Cheng, and Yongqiang Xiong
    arXiv 2019.