Xinhao Cheng


I’m a first year PhD. student in the Computer Science Department of Carnegie Mellon University and a member of Catalyst Group. I’m fortunately to be advised by Zhihao Jia. Before this, I received my Master degree from Carnegie Mellon University, advised by Zhihao Jia. Before CMU, I received my B.S. degree from Dalian University of Technology.

I am interested in building efficient and scalable systems for machine learning applications.

Gates Hillman Centers, 6003

4902 Forbes Ave, Pittsburgh, PA 15213

Email: xinhaoc@cs.cmu.edu

Projects


FlexFlow Serve is a high performance serving system which accelerate LLM inference with speculative decoding and tree-based verification.

Mirage is a super optimizer that automatically discovers highly-optimized GPU kernels for Machine learning applications

Publications


SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
[ASPLOS 2024]Xupeng Miao*, Gabriele Oliaro*, Zhihao Zhang*, Xinhao Cheng*, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, and Zhihao Jia



Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
[Preprint]Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, and Tianqi Chen, Zhihao Jia



FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
[Preprint]Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, and Zhihao Jia

A Multi-Level Superoptimizer for Tensor Programs

[Preprint]Mengdi Wu, Xinhao Cheng, Oded Padon, and Zhihao Jia