Joon Son Chung

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Multimodal Learning of Speech and Speaker Representations.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Speech Guided Masked Image Modeling for Visually Grounded Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Fregrad: Lightweight and Fast Frequency-Aware Diffusion Vocoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoiceLDM: Text-to-Speech with Environmental Context.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Seeing Through The Conversation: Audio-Visual Speech Separation Based on Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoxMM: Rich Transcription of Conversations in the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Slowfast Network for Continuous Sign Language Recognition.

[BibT_eX]

[DOI]

Junseok Ahn

Youngjoon Jang

Proceedings of the IEEE International Conference on Acoustics, 2024

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Automated Movie Trailer Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Scaling Up Video Summarization Pretraining with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos.

[BibT_eX]

[DOI]

Ji-Hoon Kim

Jaehun Kim

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

That's What I Said: Fully-Controllable Talking Face Generation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Disentangled Representation Learning for Multilingual Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Curriculum Learning for Self-supervised Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

FlexiAST: Flexibility is What AST Needs.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Sound Source Localization is All about Cross-Modal Alignment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

MarginNCE: Robust Sound Localization with a Negative Margin.

[BibT_eX]

[DOI]

Sooyoung Park

Arda Senocak

Proceedings of the IEEE International Conference on Acoustics, 2023

Imaginary Voice: Face-Styled Diffusion Model for Text-to-Speech.

[BibT_eX]

[DOI]

Jiyoung Lee

Proceedings of the IEEE International Conference on Acoustics, 2023

Advancing the Dimensionality Reduction of Speaker Embeddings for Speaker Diarisation: Disentangling Noise and Informing Speech Activity.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Metric Learning for User-Defined Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

In Search of Strong Embedding Extractors for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Sufficient Framework for Continuous Sign Language Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Deep Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Augmentation Adversarial Training for Self-Supervised Speaker Representation Learning.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Disentangled representation learning for multilingual speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Large-scale learning of generalised representations for speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2022

VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Pushing the limits of raw waveform speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-Scale Speaker Embedding-Based Graph Attention Networks For Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Spell My Name: Keyword Boosted Speech Recognition.

[BibT_eX]

[DOI]

Namkyu Jung

Geonmin Kim

Proceedings of the IEEE International Conference on Acoustics, 2022

AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Disentangled dimensionality reduction for noise-robust speaker diarisation.

[BibT_eX]

[DOI]

CoRR, 2021

Cross Attentive Pooling for Speaker Verification.

[BibT_eX]

[DOI]

Seong Min Kye

Yoohwan Kwon

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Supervised Attention for Speaker Recognition.

[BibT_eX]

[DOI]

Seong Min Kye

Hoirin Kim

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Look Who's Not Talking.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Metric Learning for Keyword Spotting.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Adapting Speaker Embeddings for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Look Who's Talking: Active Speaker Detection in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Three-Class Overlapped Speech Detection Using a Convolutional Recurrent Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The ins and outs of speaker recognition: lessons from VoxSRC 2020.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Graph Attention Networks for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Playing a Part: Speaker Verification at the movies.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval.

[BibT_eX]

[DOI]

Hong-Goo Kang

IEEE J. Sel. Top. Signal Process., 2020

Voxceleb: Large-scale speaker verification in the wild.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2020

VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2020

Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020.

[BibT_eX]

[DOI]

CoRR, 2020

Augmentation adversarial training for unsupervised speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Delving into VoxCeleb: Environment Invariant Speaker Recognition.

[BibT_eX]

[DOI]

Jaesung Huh

Seongkyu Mun

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision.

[BibT_eX]

[DOI]

Hong-Goo Kang

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spot the Conversation: Speaker Diarisation in the Wild.

[BibT_eX]

[DOI]

Jaesung Huh

Arsha Nagrani

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

In Defence of Metric Learning for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

FaceFilter: Audio-Visual Speech Separation Using Still Images.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Now You're Speaking My Language: Visual Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The Sound of My Voice: Speaker Representation Loss for Target Voice Separation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

ASR is All You Need: Cross-Modal Distillation for Lip Reading.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

BSL-1K: Scaling Up Co-articulated Sign Language Recognition Using Mouthing Cues.

[BibT_eX]

[DOI]

Samuel Albanie

Gül Varol

Liliane Momeni

Neil Fox

Proceedings of the Computer Vision - ECCV 2020, 2020

Self-supervised Learning of Audio-Visual Objects from Video.

[BibT_eX]

[DOI]

Andrew Owens

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

You Said That?: Synthesising Talking Faces from Audio.

[BibT_eX]

[DOI]

Amir Jamaludin

Int. J. Comput. Vis., 2019

VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge.

[BibT_eX]

[DOI]

CoRR, 2019

Naver at ActivityNet Challenge 2019 - Task B Active Speaker Detection (AVA).

[BibT_eX]

[DOI]

CoRR, 2019

Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings.

[BibT_eX]

[DOI]

Bong-Jin Lee

Icksang Han

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Utterance-level Aggregation for Speaker Recognition in the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation.

[BibT_eX]

[DOI]

Hong-Goo Kang

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Learning to lip read words by watching videos.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2018

LRS3-TED: a large-scale dataset for visual speech recognition.

[BibT_eX]

[DOI]

CoRR, 2018

VoxCeleb2: Deep Speaker Recognition.

[BibT_eX]

[DOI]

Arsha Nagrani

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Lip Reading: A Comparison of Models and an Online Application.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

The Conversation: Deep Audio-Visual Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Visual recognition of human communication.

[BibT_eX]

[DOI]

PhD thesis, 2017

VoxCeleb: A Large-Scale Speaker Identification Dataset.

[BibT_eX]

[DOI]

Arsha Nagrani

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Lip Reading Sentences in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Lip Reading in Profile.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference 2017, 2017

You said that?

[BibT_eX]

[DOI]

Amir Jamaludin

Proceedings of the British Machine Vision Conference 2017, 2017

2016

Signs in time: Encoding human motion as a temporal image.

[BibT_eX]

[DOI]

CoRR, 2016

Out of Time: Automated Lip Sync in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2016 Workshops, 2016

Lip Reading in the Wild.

[BibT_eX]

[DOI]