2024

Object-Centric Representation Learning for Video Scene Understanding.

[DOI]

Yi Zhou

Hui Zhang

Seung-In Park

ByungIn Yoo

Xiaojuan Qi

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Accented Text-to-Speech Synthesis With Limited Data.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis.

[DOI]

CoRR, 2024

Team Samsung-RAL: Technical Report for 2024 RoboDrive Challenge-Robust Map Segmentation Track.

[DOI]

CoRR, 2024

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition.

[DOI]

CoRR, 2024

Is Your HD Map Constructor Reliable under Sensor Corruptions?

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MBFusion: A New Multi-modal BEV Feature Fusion Method for HD Map Construction.

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion.

[DOI]

Jichen Yang

Yi Zhou

Hao Huang

Speech Commun., June, 2023

Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

TTS-Guided Training for Accent Conversion Without Parallel Data.

[DOI]

IEEE Signal Process. Lett., 2023

Zero-shot multi-speaker accent TTS with limited accent data.

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation.

[DOI]

Yi Zhou

Xiaohai Tian

Haizhou Li

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020

Multi-Task WaveRNN With an Integrated Architecture for Cross-Lingual Voice Conversion.

[DOI]

Yi Zhou

Xiaohai Tian

Haizhou Li

IEEE Signal Process. Lett., 2020

Personalized Singing Voice Generation Using WaveRNN.

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

The NUS & NWPU system for Voice Conversion Challenge 2020.

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

NUS-HLT System for Blizzard Challenge 2020.

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019

Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion.

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network.

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Speaker-independent Spectral Mapping for Speech-to-Singing Conversion.

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2005

A probabilistic model for robust face alignment in videos.

[DOI]

Proceedings of the 2005 International Conference on Image Processing, 2005

A Bayesian Mixture Model for Multi-View Face Alignment.

[DOI]

Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005