Minsu Kim

Orcid: 0000-0002-6514-0018

Affiliations:
  • Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Integrated Vision and Language Laboratory, Daejeon, South Korea
  • Yonsei University, Seoul, School of Electrical and Electronic Engineering, South Korea


According to our database1, Minsu Kim authored at least 38 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of five.

Timeline

2020
2021
2022
2023
2024
2025
0
5
10
1
4
5
2
1
8
6
5
3
3

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Prompt Tuning of Deep Neural Networks for Speaker-Adaptive Visual Speech Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2025

2024
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model.
IEEE Trans. Multim., 2024

Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages.
CoRR, 2024

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units.
CoRR, 2024

Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper.
Proceedings of the IEEE International Conference on Acoustics, 2024

Exploring Phonetic Context-Aware Lip-Sync for Talking Face Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.
Proceedings of the IEEE International Conference on Acoustics, 2024

Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion.
CoRR, 2023

Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model.
CoRR, 2023

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation.
CoRR, 2023

Reprogramming Audio-driven Talking Face Synthesis into Text-driven.
CoRR, 2023

Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation.
CoRR, 2023

Intelligible Lip-to-Speech Synthesis with Speech Units.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Multi-Temporal Lip-Audio Memory for Visual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Lip-to-Speech Synthesis in the Wild with Multi-Task Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
CroMM-VSR: Cross-Modal Memory Augmented Visual Speech Recognition.
IEEE Trans. Multim., 2022

Meta Input: How to Leverage Off-the-Shelf Deep Neural Networks.
CoRR, 2022

Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speaker-Adaptive Lip Reading with User-Dependent Padding.
Proceedings of the Computer Vision - ECCV 2022, 2022

VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection.
Proceedings of the Computer Vision - ECCV 2022, 2022

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Speech Reconstruction With Reminiscent Sound Via Visual Voice Memory.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Lip to Speech Synthesis with Visual Context Attentional GAN.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Interpretation of Lesional Detection via Counterfactual Generation.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Unsupervised Disentangling of Viewpoint and Residues Variations by Substituting Representations for Robust Face Recognition.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Robust Video Facial Authentication With Unsupervised Mode Disentanglement.
Proceedings of the IEEE International Conference on Image Processing, 2020

Learning Style Correlation for Elaborate Few-Shot Classification.
Proceedings of the IEEE International Conference on Image Processing, 2020


  Loading...