Xinhao Cheng


I’m a second year PhD. student in the Computer Science Department of Carnegie Mellon University, affliated with Catalyst Group and Parallel Data Lab. I’m fortunately to be advised by Zhihao Jia. Before this, I received my Master degree from Carnegie Mellon University, advised by Zhihao Jia. Before CMU, I received my B.S. degree from Dalian University of Technology.

I am interested in building efficient and scalable systems for machine learning applications.

Gates Hillman Centers, 6003

4902 Forbes Ave, Pittsburgh, PA 15213

Email: xinhaoc@cs.cmu.edu

Projects


FlexFlow Serve is a high performance serving system which accelerate LLM inference with speculative decoding and tree-based verification.

Mirage is a super optimizer that automatically discovers highly-optimized GPU kernels for Machine learning applications

Publications


Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs
[Preprint] Xinhao Cheng*, Zhihao Zhang*, Yu Zhou*, Jianan Ji*, Jinchen Jiang, Zepeng Zhao, Ziruo Xiao, Zihao Ye, Yingyi Huang, Ruihang Lai, Hongyi Jin, Bohan Hou, Mengdi Wu, Yixin Dong, Anthony Yip, Songting Wang, Wenqin Yang, Xupeng Miao, Tianqi Chen, and Zhihao Jia

AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding
[EuroSys 2026] Zikun Li*, Zhuofu Chen*, Remi Delacourt, Gabriele Oliaro, Zeyu Wang, Qinghan Chen, Shuhuai Lin, April Yang, Zhihao Zhang, Zhuoming Chen, Sean Lai, Xinhao Cheng, Xupeng Miao, and Zhihao Jia

Mirage: A Multi-Level Superoptimizer for Tensor Programs
[OSDI 2025] Mengdi Wu, Xinhao Cheng, Shengyu Liu, Chunan Shi, Jianan Ji, Kit Ao, Praveen Velliengiri, Xupeng Miao, Oded Padon, and Zhihao Jia

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
[NSDI 2026] Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, and Zhihao Jia

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
[ACM Computing Surveys 2025] Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, and Zhihao Jia

SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
[ASPLOS 2024] Xupeng Miao*, Gabriele Oliaro*, Zhihao Zhang*, Xinhao Cheng*, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, and Zhihao Jia