|
Yitao YANG, Ph.D. student |
I am currently a third-year Ph.D. student at the Department of Computer Science and Engineering of The Chinese University of Hong Kong under the supervision of Prof. Hong Xu and co-supervised by Prof. Baochun Li. Previously, I received my B.Eng. from The Northwestern Polytechnical University (NWPU) in 2023. My research interests lie primarily in the area of AI Agent for Network and AI Infrastructure, and Machine Learning Systems (MLSys).
AI Agent for Network and AI Infrastructure
Efficient Distributed Training & Inference Systems
Ph.D. Computer Science and Engineering, The Chinese University of Hong Kong, Aug. 2023 - Present
B.Eng. Computer Science and Engineering, The Northwestern Polytechnical University, Sep. 2019 - Jul. 2023
TSGuard: Automated Customer-Centric Incident Diagnosis for AI Workloads in the Cloud
Yitao Yang, Yangtao Deng, Yifan Xiong, Baochun Li, Hong Xu, Peng Cheng
ACM FSE, 2026
Towards End-to-End Optimization of LLM-based Applications with Ayo
Xin Tan, Yimin Jiang, Yitao Yang, Hong Xu
ACM ASPLOS, 2025
Arlo: Serving Transformer-based Language Models with Dynamic Input Lengths
Xin Tan, Jiamin Li, Yitao Yang, Jingzong Li, Hong Xu
ACM ICPP, 2024
Adaptive Gating in Mixture-of-Experts based Language Models
Jiamin Li, Qiang Su, Yitao Yang, Yimin Jiang, Cong Wang, Hong Xu
Empirical Methods in Natural Language Processing (EMNLP), 2023
NetOpsBench: Open Arena for Agentic NetOps in AI Infrastructure
GitHub
Website
Zhihu
NetOpsBench is an open arena for evaluating AI agents in network operations, with a focus on realistic data center network troubleshooting. Unlike benchmarks based on static questions or human-written descriptions, NetOpsBench provides an interactive environment where agents reason over live network state, use diagnostic tools, and perform closed-loop troubleshooting. It includes 109 built-in failure scenarios across four topology sizes, and offers a live arena for experimenting with, comparing, and certifying agentic NetOps capabilities in reproducible settings. Together, we hope to advance AI-driven network operations with the community.
We warmly welcome everyone to use and contribute to NetOpsBench.
Intern, ByteDance Network Observability, Mar. 2025 - Nov. 2025
Mentor: Shixian Guo, Zhuo Jiang
Designed and deployed OneNet Agent, a unified observability and intelligent network operations platform for ByteDance's global network infrastructure, supporting natural language search, one-click fault diagnosis, and multi-dimensional telemetry analysis.
Research Intern, MSRA Network Infrastructure Group, Apr. 2024 - Jan. 2025
Mentor: Yifan Xiong
Studied root causes of AI workload incidents in Microsoft Azure and developed TSGuard, an LLM-agent tool for automated incident diagnosis in AI infrastructure, accepted by ACM FSE 2026.
Visiting Student, The Chinese University of Hong Kong, Hong Kong SAR, Oct. 2022 - Jun. 2023
Topic: Efficient MoE Training Systems
CSCI4430 Computer Networks, Teaching Assistant, Fall 2023 and Spring 2025
CSCI3170 Introduction to Database Systems, Teaching Assistant, Spring 2024
CSCI3150 Introduction to Operating System, Teaching Assistant, Fall 2024
CSCI2100 Data Structure, Teaching Assistant, Fall 2025
ENGG2020 Digital Logic and Systems, Teaching Assistant, Spring 2026
Full Postgraduate Scholarship, The Chinese University of Hong Kong, 2023 - 2027
SAMSUNG Scholarship, SAMSUNG, 2020 - 2021
Huawei Scholarship, 2021
First Prize, the Fifth National Student Computer System Capability Challenge (NSCSCC), 2021
National Scholarship, Ministry of Education, 2019 - 2020
Pacemaker to Merit Student, The Northwestern Polytechnical University, 2019 - 2020