Joon Son Chung

Orcid: 0000-0001-7741-7275

According to our database1, Joon Son Chung authored at least 100 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
The VoxCeleb Speaker Recognition Challenge: A Retrospective.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Bridging the Gap Between Audio and Text Using Parallel-Attention for User-Defined Keyword Spotting.
IEEE Signal Process. Lett., 2024

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning.
IEEE Signal Process. Lett., 2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.
CoRR, 2024

Text-To-Speech Synthesis In The Wild.
CoRR, 2024

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment.
CoRR, 2024

ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions.
CoRR, 2024

Disentangled Representation Learning for Environment-agnostic Speaker Recognition.
CoRR, 2024

Lightweight Audio Segmentation for Long-form Speech Translation.
CoRR, 2024

FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching.
CoRR, 2024

To what extent can ASV systems naturally defend against spoofing attacks?
CoRR, 2024

Can CLIP Help Sound Source Localization?
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Multimodal Learning of Speech and Speaker Representations.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Speech Guided Masked Image Modeling for Visually Grounded Speech.
Proceedings of the IEEE International Conference on Acoustics, 2024

Fregrad: Lightweight and Fast Frequency-Aware Diffusion Vocoder.
Proceedings of the IEEE International Conference on Acoustics, 2024

VoiceLDM: Text-to-Speech with Environmental Context.
Proceedings of the IEEE International Conference on Acoustics, 2024

Seeing Through The Conversation: Audio-Visual Speech Separation Based on Diffusion Model.
Proceedings of the IEEE International Conference on Acoustics, 2024

VoxMM: Rich Transcription of Conversations in the Wild.
Proceedings of the IEEE International Conference on Acoustics, 2024

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2024

Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2024

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers.
Proceedings of the IEEE International Conference on Acoustics, 2024

Slowfast Network for Continuous Sign Language Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Automated Movie Trailer Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Scaling Up Video Summarization Pretraining with Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge.
CoRR, 2023

That's What I Said: Fully-Controllable Talking Face Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Disentangled Representation Learning for Multilingual Speaker Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Curriculum Learning for Self-supervised Speaker Verification.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

FlexiAST: Flexibility is What AST Needs.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Sound Source Localization is All about Cross-Modal Alignment.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples.
Proceedings of the IEEE International Conference on Acoustics, 2023

MarginNCE: Robust Sound Localization with a Negative Margin.
Proceedings of the IEEE International Conference on Acoustics, 2023

Imaginary Voice: Face-Styled Diffusion Model for Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

Advancing the Dimensionality Reduction of Speaker Embeddings for Speaker Diarisation: Disentangling Noise and Informing Speech Activity.
Proceedings of the IEEE International Conference on Acoustics, 2023

Metric Learning for User-Defined Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2023

In Search of Strong Embedding Extractors for Speaker Diarisation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Sufficient Framework for Continuous Sign Language Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Deep Audio-Visual Speech Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Augmentation Adversarial Training for Self-Supervised Speaker Representation Learning.
IEEE J. Sel. Top. Signal Process., 2022

Disentangled representation learning for multilingual speaker recognition.
CoRR, 2022

Large-scale learning of generalised representations for speaker recognition.
CoRR, 2022

VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge.
CoRR, 2022

Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Pushing the limits of raw waveform speaker recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-Scale Speaker Embedding-Based Graph Attention Networks For Speaker Diarisation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Spell My Name: Keyword Boosted Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks.
Proceedings of the IEEE International Conference on Acoustics, 2022

Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Disentangled dimensionality reduction for noise-robust speaker diarisation.
CoRR, 2021

Cross Attentive Pooling for Speaker Verification.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Supervised Attention for Speaker Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Look Who's Not Talking.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Metric Learning for Keyword Spotting.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Adapting Speaker Embeddings for Speaker Diarisation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Look Who's Talking: Active Speaker Detection in the Wild.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Three-Class Overlapped Speech Detection Using a Convolutional Recurrent Neural Network.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The ins and outs of speaker recognition: lessons from VoxSRC 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021

Graph Attention Networks for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Playing a Part: Speaker Verification at the movies.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval.
IEEE J. Sel. Top. Signal Process., 2020

Voxceleb: Large-scale speaker verification in the wild.
Comput. Speech Lang., 2020

VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge.
CoRR, 2020

Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020.
CoRR, 2020

Augmentation adversarial training for unsupervised speaker recognition.
CoRR, 2020

Delving into VoxCeleb: Environment Invariant Speaker Recognition.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spot the Conversation: Speaker Diarisation in the Wild.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

In Defence of Metric Learning for Speaker Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

FaceFilter: Audio-Visual Speech Separation Using Still Images.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Now You're Speaking My Language: Visual Language Identification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Disentangled Speech Embeddings Using Cross-Modal Self-Supervision.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The Sound of My Voice: Speaker Representation Loss for Target Voice Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

ASR is All You Need: Cross-Modal Distillation for Lip Reading.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

BSL-1K: Scaling Up Co-articulated Sign Language Recognition Using Mouthing Cues.
Proceedings of the Computer Vision - ECCV 2020, 2020

Self-supervised Learning of Audio-Visual Objects from Video.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
You Said That?: Synthesising Talking Faces from Audio.
Int. J. Comput. Vis., 2019

VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge.
CoRR, 2019

Naver at ActivityNet Challenge 2019 - Task B Active Speaker Detection (AVA).
CoRR, 2019

Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Utterance-level Aggregation for Speaker Recognition in the Wild.
Proceedings of the IEEE International Conference on Acoustics, 2019

Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Learning to lip read words by watching videos.
Comput. Vis. Image Underst., 2018

LRS3-TED: a large-scale dataset for visual speech recognition.
CoRR, 2018

VoxCeleb2: Deep Speaker Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Lip Reading: A Comparison of Models and an Online Application.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

The Conversation: Deep Audio-Visual Speech Enhancement.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017
Visual recognition of human communication.
PhD thesis, 2017

VoxCeleb: A Large-Scale Speaker Identification Dataset.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Lip Reading Sentences in the Wild.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Lip Reading in Profile.
Proceedings of the British Machine Vision Conference 2017, 2017

You said that?
Proceedings of the British Machine Vision Conference 2017, 2017

2016
Signs in time: Encoding human motion as a temporal image.
CoRR, 2016

Out of Time: Automated Lip Sync in the Wild.
Proceedings of the Computer Vision - ACCV 2016 Workshops, 2016

Lip Reading in the Wild.
Proceedings of the Computer Vision - ACCV 2016, 2016

2014
Re-presentations of Art Collections.
Proceedings of the Computer Vision - ECCV 2014 Workshops, 2014


  Loading...