Minsu Kim
Orcid: 0000-0002-6514-0018Affiliations:
- Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Integrated Vision and Language Laboratory, Daejeon, South Korea
- Yonsei University, Seoul, School of Electrical and Electronic Engineering, South Korea
According to our database1,
Minsu Kim
authored at least 38 papers
between 2020 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
2020
2021
2022
2023
2024
2025
0
5
10
1
4
5
2
1
8
6
5
3
3
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
Prompt Tuning of Deep Neural Networks for Speaker-Adaptive Visual Speech Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2025
2024
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model.
IEEE Trans. Multim., 2024
Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages.
CoRR, 2024
Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units.
CoRR, 2024
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
CoRR, 2023
Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model.
CoRR, 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation.
CoRR, 2023
CoRR, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
IEEE Trans. Multim., 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection.
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
2020
Unsupervised Disentangling of Viewpoint and Residues Variations by Substituting Representations for Robust Face Recognition.
Proceedings of the 25th International Conference on Pattern Recognition, 2020
Proceedings of the IEEE International Conference on Image Processing, 2020
Proceedings of the IEEE International Conference on Image Processing, 2020