Haoyu Li (李浩宇)

English | 中文

I’m Haoyu Li (李浩宇), an incoming Ph.D. student at Nanjing University starting in September 2026, where I will be supervised by Shuai Wang, Tenure-Track Associate Professor. I received my M.S. in Computer Science and Technology from Shanghai Jiao Tong University under the supervision of Prof. Kai Yu.

My research focuses on Target Speaker Extraction (TSE), Automatic Speech Recognition (ASR), and Speech Large Language Models (Speech LLMs). I aim to develop robust speech interaction systems capable of operating in noisy, multi-talker real-world environments.

Research Interests

Generally, I am focusing on TSE front-end and Multi-talker ASR for front-end signal processing and speech understanding:

Speech Separation, including TSE and Blind Source Separation (BSS)
Speaker-Attributed ASR (SA-ASR)
Keyword Spotting (KWS) for resource-constrained edge devices

Research Experience

My recent work spans both academic labs and industry research:

Text-Guided Speech Separation and Robust Keyword Spotting (AISpeech, Suzhou)
I developed a text-guided speech separation system that reduced the false rejection rate to 4.3% and lowered the false wake-up rate to 20% of the baseline in real-world multi-talker scenarios. Concurrently, I engineered an end-to-end keyword spotting algorithm for low-SNR environments, integrating robust streaming decoding with WFST optimization, yielding two papers accepted at ICASSP 2025.
Speaker-Adaptive Alignment for Flow-Matching TTS (Alibaba, Beijing)
I proposed a dual temporal and hierarchical adaptive scheme that dynamically modulates supervision strength during denoising and assigns layer-specific alignment objectives, significantly enhancing timbre consistency in zero-shot voice cloning. This work has been submitted to Interspeech 2026.
A Novel Paradigm for Keyword-Guided Target Speaker Extraction (Nanjing University / Collaborative Research)
I proposed a three-stage Detect-Attend-Extract framework that achieves extraction performance superior to conventional speech-enrollment baselines using only partial text cues. This work has been submitted to IJCAI 2026.

Publications (Selected)

You may refer to the full list on Google Scholar.

* indicates equal contribution.

Text-aware Speech Separation for Multi-talker Keyword Spotting
Haoyu Li, Baochen Yang, Yu Xi, Linfeng Yu, Tian Tan, Hao Li, Kai Yu
Interspeech 2024.
paper link
Detect, Attend and Extract: Keyword Guided Target Speaker Extraction
Haoyu Li*, Yu Xi*, Yidi Jiang, Shuai Wang, Kate Knill, Mark Gales, Haizhou Li, Kai Yu
arXiv:2602.07977. Submitted to IJCAI-ECAI 2026.
paper link
Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTS
Haoyu Li*, Mingyang Han*, Yu Xi, Dongxiao Wang, Hankun Wang, Haoxiang Shi, Boyu Li, Jun Song, Bo Zheng, Shuai Wang, Kai Yu
arXiv:2511.09995. Submitted to Interspeech 2026.
paper link
Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency
Yu Xi*, Haoyu Li*, Xiaoyu Gu, Hao Li, Yidi Jiang, Kai Yu
ICASSP 2025.
paper link
NTC-KWS: Noise-aware CTC for Robust Keyword Spotting
Yu Xi, Haoyu Li, Hao Li, Jiaqi Guo, Xu Li, Wen Ding, Kai Yu
ICASSP 2025.
paper link
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
Yu Xi, Haoyu Li, Xiaoyu Gu, Yidi Jiang, Kai Yu
TASLP.
paper link
G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
Jing Peng*, Ziyi Chen*, Haoyu Li*, Yucheng Wang, Duo Ma, Mengtian Li, Yunfan Du, Dezhu Xu, Kai Yu, Shuai Wang
arXiv:2603.10468. Submitted to Interspeech 2026.
paper link

Honors & Awards

National Scholarship for Undergraduates (2019–2020)
National Scholarship for Undergraduates (2020–2021)
Outstanding Graduate, Beijing Jiaotong University

Contact Information

I am willing to chat and collaborate on the topics above and you can contact me by:

Email: haoyu.li.cs@sjtu.edu.cn
GitHub: https://github.com/GnafiY
Google Scholar: https://scholar.google.com/citations?user=ox4ykukAAAAJ&hl