Yutao Xie  谢宇涛

I am an incoming Ph.D. student at UC Santa Barbara, where I will be advised by Prof. Shiyu Chang. I received my M.S. in Computer Science from UC San Diego, where I worked with Prof. Xiaolong Wang and Prof. Zhiting Hu. Prior to that, I received my B.Eng. in Software Engineering from Tianjin University.

/ / /
Yutao Xie

Research

My work sits at the intersection of reinforcement learning and large language models, and roughly alternates between two modes:

Going forward, I am also drawn to world models and planning: learning dynamics predictors from interaction and coupling them with RL for search, planning, and data-efficient control.

Publications and preprints

Papers sorted by recency. Representative papers are highlighted.

IsoCompute Playbook: Optimally Scaling Sampling Compute for RL Training of LLMs
Zhoujun Cheng*, Yutao Xie*, Yuxiao Qu*, Amrith Setlur*, Shibo Hao, Varad Pimpalkhute, Tongtong Liang, Feng Yao, Zhengzhong Liu, Eric Xing, Virginia Smith, Ruslan Salakhutdinov, Zhiting Hu, Taylor Killian, Aviral Kumar
International Conference on Machine Learning (ICML), 2026
blog / arXiv / bibtex
A compute-optimal workflow for scaling on-policy RL sampling in LLM post-training.
TIPS: Turn-Level Information–Potential Reward Shaping for Search-Augmented LLMs
Yutao Xie, Nathaniel Thomas, Nicklas Hansen, Yang Fu, Erran Li, Xiaolong Wang
International Conference on Learning Representations (ICLR), 2026
arXiv / code / bibtex
A self-distillation method leveraging mutual-information-based reward for finer credit assignment of RLVR.
K2-Think: A Parameter-Efficient Reasoning System
Institute of Foundation Models, MBZUAI (31 authors, including Yutao Xie)
ArXiv Preprint, 2025 (Tech Report / Model Release)
project / models / arXiv / bibtex
K2-Think is a 32B model that rivals much larger ones via six training/inference technique pillars.
Revisiting Reinforcement Learning for LLM Reasoning from a Cross-Domain Perspective
Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Haonan Li, Taylor W. Killian, Mikhail Yurochkin, Zhengzhong Liu, Eric P. Xing, Zhiting Hu
Conference on Neural Information Processing Systems (NeurIPS) Datasets & Benchmarks, 2025
arXiv / website / bibtex
A cross-domain RL dataset with analysis showing domain-dependent RL gains across six reasoning domains.
Sound Identification of Abnormal Pig Vocalizations: Enhancing Livestock Welfare Monitoring on Smart Farms
Yutao Xie, Jun Wang, Cheng Chen, Taixin Yin, Shiyu Yang, Zhiyuan Li, Ye Zhang, Juyang Ke, Le Song, Lin Gan
Information Processing & Management (IPM), 2024
journal / bibtex
A DDP inference system loaded with an AST-based detector method to help monitoring in smart farms.

Misc

Outside research, the same habits follow me into everything else I spend time with. I am simply drawn to what people make—and what making things reveals about them.

Away from the desk I lift weights , boulder badly as a beginner , and spend more time than is reasonable dialing in pour-over coffee .

What follows is not a full list, and it is not in any special order. I just wrote down a few things that came to mind while I was putting this page together.

A few favorites

Magic Realism

One Hundred Years of Solitude and most of García Márquez's works

Existentialism

Steppenwolf

Lyrical Introspection

Klingsor's Last Summer

Historical Narrative

Hamilton

Prog-rock Folk

冀西南林路行

Post-rock

Lost in 21st Century and most of Wangwen (惘闻)

Atmospheric Black Metal

Gu Yan · Zuriaake

Psychedelic Concept

The Dark Side of the Moon

Environmental Storytelling ARPG

Dark Souls / Elden Ring

Philosophical CRPG

Disco Elysium / Planescape: Torment

Historical Mystery

Pentiment

Contact

You are welcome to contact me regarding my research! You can reach me at yux076@ucsd.edu or yutaoxie@ucsb.edu.