Xinhao Cheng
I’m a first year PhD. student in the Computer Science Department of Carnegie Mellon University, affliated with Catalyst Group and Parallel Data Lab. I’m fortunately to be advised by Zhihao Jia. Before this, I received my Master degree from Carnegie Mellon University, advised by Zhihao Jia. Before CMU, I received my B.S. degree from Dalian University of Technology.
I am interested in building efficient and scalable systems for machine learning applications.
Gates Hillman Centers, 6003
4902 Forbes Ave, Pittsburgh, PA 15213
Email: xinhaoc@cs.cmu.edu
Projects
• FlexFlow Serve is a high performance serving system which accelerate LLM inference with speculative decoding and tree-based verification.
• Mirage is a super optimizer that automatically discovers highly-optimized GPU kernels for Machine learning applications
Publications
SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
[ASPLOS 2024]Xupeng Miao*, Gabriele Oliaro*, Zhihao Zhang*, Xinhao Cheng*, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, and Zhihao Jia
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
[Preprint]Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, and Tianqi Chen, Zhihao Jia
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
[Preprint]Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, and Zhihao Jia
A Multi-Level Superoptimizer for Tensor Programs
[Preprint]Mengdi Wu, Xinhao Cheng, Oded Padon, and Zhihao Jia