Hi, I’m Wenming Tu (涂文明). I am currently a first-year Ph.D. student in Computer Science and Technology at Shanghai Jiao Tong University (SJTU). I am jointly supervised by Prof. Xie Chen from the X-LANCE Lab and Dr. Zilong Zheng from the Beijing Institute for General Artificial Intelligence (BIGAI), NLCo Group. I am passionate about advancing the frontiers of AI research and developing innovative solutions in this rapidly evolving field.
My research interests primarily focus on speech and audio processing and multimodal large language models. I aim to explore how these technologies can enhance human-computer interaction, improve speech synthesis and recognition systems, and advance AI capabilities in multimodal environments.
Think bold! Work hard!
🔥 News
- 2026.06: 🎉🎉 2 papers were accepted by INTERSPEECH 2026!
- 2026.05: 🎉🎉 1 paper was accepted by ICML 2026!
- 2026.02: 🎉🎉 We won 🥈 (2nd place) in the Agent Track of the Interspeech 2026 Audio Reasoning Challenge. See the Leaderboard; the official report is on Challenge Report.
- 2025.09: 🎉🎉 1 paper was accepted by NeurIPS 2025!
- 2025.05: 🎉🎉 1 paper was accepted by ACL 2025!
- 2025.01: 🎉🎉 1 paper was accepted by EuroSys 2025!
📝 Publications
(*represents co-first authors, #represents corresponding authors)
2026

MMAE: A Massive Multitask Audio Editing Benchmark. Preprint.
Ziyang Ma, Ruiqi Yan, Ruiyang Xu, Jie Fang, Zhikang Niu, Yi-Wen Chao, Wenming Tu, Tianrui Wang, Auden, Qi Chen, Wenxi Chen, Jiaying Chi, Yanru Huo, Zixuan Jiang, Xiquan Li, Yalin Li, Junxi Liu, Minghao Liu, Binghao Qiang, Yijia Shan, Zheshu Song, Tian Tan, Zixiang Wang, Zeyu Xie, Zhifei Xie, Xiaoyu Xing, Qixiang Xu, Chen Yang, Guanrou Yang, Shan Yang, Yifan Yang, Steve Yves, Haotian Zhang, Haina Zhu, Kai Yu, Liefeng Bo, Eng-Siong Chng, Xie Chen. (As a Core Contributor)

VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track. INTERSPEECH 2026.
Wenming Tu, Jian Gao, Yanru Huo, Yixuan Wang, Jing Peng, Bohan Li, Ziyang Ma, Tao Liu, Shuai Fan, Kai Yu, Xie Chen, Zilong Zheng.

A Unified and Reproducible Experimentation Framework for Speech Understanding. INTERSPEECH 2026.
Jing Peng, Junhao Du, Chenghao Wang, Hanqi Li, Yi Yang, Yixuan Wang, Xiaoyu Gu, Guanyu Chen, Yucheng Wang, Jiang Li, Zhangjie Zhao, Haoran Wang, Wenming Tu, Haoyu Li, Duo Ma, Lirong Qian, Yu Xi, Wen Wen, Jiaqi Guo, Hui Zhang, Shuai Fan, Wenbin Jiang, Shuai Wang, Kai Yu.

Audio-Mind: An Auditable Agentic Framework for Audio Understanding. Preprint.
Yucheng Wang, Jing Peng, Hanqi Li, Chenghao Wang, Wenming Tu, Yu Xi, Zhaokai Sun, Kai Yu, Shuai Wang.

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs. ICML 2026 (CCF-A).
Yaoting Wang, Ziyi Zhang, Wenming Tu, Shaoxuan Xu, Wenjie Du, Cheng Liang, Weijun Wang, Yuanchao Li, Guangyao Li, Hao Fei, Yuanchun Li, Henghui Ding, Yunxin Liu.

MOVA: Towards Scalable and Synchronized Video-Audio Generation. Preprint.
As a core contributor cooperate with SII-OpenMOSS Team.
2025

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models. Preprint.
Wenming Tu, Guanrou Yang, Ruiqi Yan, Wenxi Chen, Ziyang Ma, Yipeng Kang, Kai Yu, Xie Chen#, Zilong Zheng#.
- Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia. NeurIPS DB Track 2025 (CCF-A).
Cooperate with the DeepMind Concordia Team.

Are the Values of LLMs Structurally Aligned with Humans? A Causal Perspective. ACL 2025 Findings (CCF-A).
Yipeng Kang, Junqi Wang, Yexin Li, Mengmeng Wang, Wenming Tu, Quansen Wang, Hengli Li, Tingjun Wu, Xue Feng, Fangwei Zhong, Zilong Zheng#.

Empower Vision Applications with LoRA LMM. EuroSys 2025 (CCF-A, Fall round (30/367=8.17%)).
Liang Mi*, Weijun Wang*#, Wenming Tu, Qingfeng He, Rui Kong, Xinyu Fang, Yazhu Dong, Yikang Zhang, Yuanchun Li, Meng Li, Haipeng Dai, Guihai Chen, Yunxin Liu.
🎖 Honors and Awards
- 2025.06 Outstanding Graduates of CUMTB. 🎓
- 2024.03 Merit Student Award of Beijing. 🏅
- 2023.10 Xiaomi Scholarship. 🎖
📖 Educations
- 2025.09 - 2030.06(expected): Computer science and technology. School of Computer Science , Shanghai Jiao Tong University(SJTU)
- 2021.09 - 2025.06: Computer science and technology. School of Artificial Intelligence, China University of Mining and Technology-Beijing(CUMTB)
💻 Internships
- 2026.04 - Present, Tencent Hunyuan, Shanghai, China.
- 2025.10 - 2026.03, Sii & OpenMOSS, Shanghai, China.
- 2024.10 - 2025.09, Beijing Institute for General Artificial Intelligence (BIGAI), NLCo Group, Beijing, China. Co-supervised by Dr. Zilong Zheng and Dr. Yipeng Kang.
- 2023.12 - 2024.09, Institute for AI Industry Research (AIR), Tsinghua University, AIoT Group, Beijing, China. Co-supervised by Dr. Weijun Wang and Prof. Yuanchun Li.