Jesús Villalba

Comput. Biol. Medicine, March, 2024

End-to-End Neural Speaker Diarization With Non-Autoregressive Attractors.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Time-Domain Speech Super-Resolution With GAN Based Modeling for Telephony Speaker Verification.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Slowness Regularized Contrastive Predictive Coding for Acoustic Unit Discovery.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Clean Label Attacks against SLU Systems.

[BibT_eX]

[DOI]

CoRR, 2024

Unraveling Adversarial Examples against Speaker Identification - Techniques for Attack Detection and Victim Model Classification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Discovering Invariant Patterns of Cognitive Decline Via an Automated Analysis of the Cookie Thief Picture Description Task.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Towards Speech Processing Robust to Adversarial Deceptions.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

2023

Interpretable speech features vs. DNN embeddings: What to use in the automatic assessment of Parkinson's disease in multi-lingual scenarios.

[BibT_eX]

[DOI]

Comput. Biol. Medicine, November, 2023

Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Stabilized training of joint energy-based models and their practical applications.

[BibT_eX]

[DOI]

CoRR, 2023

DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Self-FiLM: Conditioning GANs with self-supervised representations for bandwidth extension based speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora?

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Segmental SpeechCLIP: Utilizing Pretrained Image-text Models for Audio-Visual Learning.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Clustering Unsupervised Representations as Defense Against Poisoning Attacks on Speech Commands Classification System.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Energy-Based Model for Robust Speech Classification System Against Dirty-Label Backdoor Poisoning Attacks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Model-Based Fairness Metric for Speaker Verification.

[BibT_eX]

[DOI]

Maliha Jahan

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Unsupervised Speech Segmentation and Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Non-Contrastive Self-Supervised Learning for Utterance-Level Information Extraction From Speech.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser.

[BibT_eX]

[DOI]

CoRR, 2022

A Multi-Modal Array of Interpretable Features to Evaluate Language and Speech Patterns in Different Neurological Disorders.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Vsameter: Evaluation of a New Open-Source Tool to Measure Vowel Space Area and Related Metrics.

[BibT_eX]

[DOI]

Tianyu Cao

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Advances in Cross-Lingual and Cross-Source Audio-Visual Speaker Recognition: The JHU-MIT System for NIST SRE21.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Advances in Speaker Recognition for Multilingual Conversational Telephone Speech: The JHU-MIT System for NIST SRE20 CTS Challenge.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Chunking Defense for Adversarial Attacks on ASR.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Non-contrastive self-supervised learning of utterance-level speech representations.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Study of Pre-Processing Defenses Against Adversarial Attacks on State-of-the-Art Speaker Recognition Systems.

[BibT_eX]

[DOI]

Sonal Joshi

IEEE Trans. Inf. Forensics Secur., 2021

Non-Autoregressive Transformer for Speech Recognition.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

The JHU submission to VoxSRC-21: Track 3.

[BibT_eX]

[DOI]

CoRR, 2021

Adversarial Attacks and Defenses for Speech Recognition Systems.

[BibT_eX]

[DOI]

CoRR, 2021

Adversarial Attacks and Defenses for Speaker Identification Systems.

[BibT_eX]

[DOI]

Sonal Joshi

CoRR, 2021

Invariant Representation Learning for Robust Far-Field Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the Statistical Language and Speech Processing, 2021

Representation Learning to Classify and Detect Adversarial Attacks Against Speaker and Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Spine2Net: SpineNet with Res2Net and Time-Squeeze-and-Excitation Blocks for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios.

[BibT_eX]

[DOI]

Sonal Joshi

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Deep Feature CycleGANs: Speaker Identity Preserving Non-Parallel Microphone-Telephone Domain Adaptation for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

CopyPaste: An Augmentation Method for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Perceptual Loss Based Speech Denoising with an Ensemble of Audio Pattern Recognition and Self-Supervised Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Focus on the Present: A Regularization Method for the ASR Source-Target Attention Layer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Beyond Isolated Utterances: Conversational Emotion Recognition.

[BibT_eX]

[DOI]

Leibny Paola García-Perera

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations.

[BibT_eX]

[DOI]

Fred Richardson

Réda Dehak

Comput. Speech Lang., 2020

Frustratingly Easy Noise-aware Training of Acoustic Models.

[BibT_eX]

[DOI]

CoRR, 2020

Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild.

[BibT_eX]

[DOI]

CoRR, 2020

Advances in Speaker Recognition for Telephone and Audio-Visual Data: the JHU-MIT Submission for NIST SRE19.

[BibT_eX]

[DOI]

Leibny Paola García-Perera

Pedro Torres-Carrasquiilo

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Analysis of Deep Feature Loss Based Enhancement for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Speaker Detection in the Wild: Lessons Learned from JSALT 2019.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Black-Box Attacks on Spoofing Countermeasures Using Transferability of Adversarial Examples.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

x-Vectors Meet Adversarial Attacks: Benchmarking Adversarial Robustness in Speaker Verification.

[BibT_eX]

[DOI]

Yuekai Zhang

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Learning Speaker Embedding from Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

X-Vectors Meet Emotions: A Study On Dependencies Between Emotion and Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Unsupervised Feature Enhancement for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Using X-Vectors to Automatically Detect Parkinson's Disease from Speech.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Feature Enhancement with Deep Feature Losses for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

A forced gaussians based methodology for the differential evaluation of Parkinson's Disease by means of speech processing.

[BibT_eX]

[DOI]

Jorge Andrés Gómez García

Juan Ignacio Godino-Llorente

Stefanie Shattuck-Hufnagel

Jan Rusz

Leibny Paola García-Perera

Biomed. Signal Process. Control., 2019

State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18.

[BibT_eX]

[DOI]

Daniel Povey

Sanjeev Khudanpur

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The JHU Speaker Recognition System for the VOiCES 2019 Challenge.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Tied Mixture of Factor Analyzers Layer to Combine Frame Level Representations in Neural Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cycle-GANs for Domain Adaptation of Acoustic Features for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Investigation on Neural Bandwidth Extension of Telephone Speech for Improved Speaker Recognition.

[BibT_eX]

[DOI]

Vicente Iglesias

Proceedings of the IEEE International Conference on Acoustics, 2019

Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition.

[BibT_eX]

[DOI]

Shinji Watanabe

Takaaki Hori

Murali Karthick Baskar

Hirofumi Inaguma

Proceedings of the IEEE International Conference on Acoustics, 2019

LSTM Siamese Network for Parkinson's Disease Detection from Speech.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing, 2019

Bottom-Up Unsupervised Word Discovery via Acoustic Units.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing, 2019

Hierarchical Transformers for Long Document Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-Gans.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson's Disease.

[BibT_eX]

[DOI]

Jorge Andrés Gómez García

Juan Ignacio Godino-Llorente

Juan Rafael Orozco-Arroyave

Appl. Soft Comput., 2018

The MIT Lincoln Laboratory / JHU / EPITA-LSE LRE17 System.

[BibT_eX]

[DOI]

Fred Richardson

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

End-to-End versus Embedding Neural Networks for Language Recognition in Mismatched Conditions.

[BibT_eX]

[DOI]

Niko Brummer

Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Investigation on Bandwidth Extension for Speaker Recognition.

[BibT_eX]

[DOI]

Cheng-I Lai

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

End-to-end Deep Neural Network Age Estimation.

[BibT_eX]

[DOI]

Pegah Ghahremani

Peter Sibbern Frederiksen

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Effectiveness of Single-Channel BLSTM Enhancement for Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

An Investigation of Non-linear i-vectors for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Joint Verification-Identification in end-to-end Multi-Scale CNN Framework for Topic Identification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Measuring Uncertainty in Deep Regression Models: The Case of Age Estimation from Speech.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

JHU Diarization System Description.

[BibT_eX]

[DOI]

Zili Huang

Daniel Povey

Proceedings of the Fourth International Conference, 2018

2017

Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering.

[BibT_eX]

[DOI]

Ignacio Viñals

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Tied Variational Autoencoder Backends for i-Vector Speaker Recognition.

[BibT_eX]

[DOI]

Niko Brümmer

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Bayesian Networks to Model the Variability of Speaker Verification Scores in Adverse Environments.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Analysis of speech quality measures for the task of estimating the reliability of speaker verification decisions.

[BibT_eX]

[DOI]

Speech Commun., 2016

Bottleneck Based Front-End for Diarization Systems.

[BibT_eX]

[DOI]

Ignacio Viñals

Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2016

2015

Spoofing detection with DNN and one-class SVM for the ASVspoof 2015 challenge.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Variational Bayesian PLDA for speaker diarization in the MGB challenge.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Factor analysis with sampling methods for text dependent speaker recognition.

[BibT_eX]

[DOI]

Carlos Vaquero

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Unsupervised adaptation of PLDA by using variational Bayes methods.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Unsupervised Training of PLDA with Variational Bayes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2014

Unsupervised Accent Modeling for Language Identification.

[BibT_eX]

[DOI]

Eduardo Lleida-Solano

Alfonso Ortega Giménez

Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2014

2013

Handling recordings acquired simultaneously over multiple channels with PLDA.

[BibT_eX]

[DOI]

Mireia Díez

Amparo Varona

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

The I3a speaker recognition system for NIST SRE12: post-evaluation analysis.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A new Bayesian network to assess the reliability of speaker verification decisions.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Handling i-vectors from different recording conditions using multi-channel simplified PLDA in speaker recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Segmentation-by-classification system based on factor analysis.

[BibT_eX]

[DOI]

Diego Castán

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Bayesian adaptation of PLDA based speaker recognition to domains with scarce development data.

[BibT_eX]

[DOI]

Luis Javier Rodríguez-Fuentes

Proceedings of the Odyssey 2012: The Speaker and Language Recognition Workshop, 2012

The BLZ Submission to the NIST 2011 LRE: Data Collection, System Development and Performance.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Reliability Estimation of the Speaker Verification Decisions Using Bayesian Networks to Combine Information from Multiple Speech Quality Measures.

[BibT_eX]

[DOI]

Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2012

Voice Pathology Detection on the Saarbrücken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit.

[BibT_eX]

[DOI]

Carlos Vaquero Avilés Casco

Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2012

2011

Speaker Verification On Summed-Channel Conditions With Confidence Measures.

[BibT_eX]

[DOI]

Alfonso Ortega Giménez

Eduardo Lleida-Solano

Computación y Sistemas, 2011

Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance.

[BibT_eX]

[DOI]

Niko Brümmer

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

I3A Language Recognition System for Albayzin 2010 LRE.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Hierarchical Audio Segmentation with HMM and Factor Analysis in Broadcast News Domain.

[BibT_eX]

[DOI]

Diego Castán

Carlos Vaquero

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Preventing replay attacks on speaker verification systems.

[BibT_eX]

[DOI]

Proceedings of the International Carnahan Conference on Security Technology, 2011

Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems.

[BibT_eX]

[DOI]

Proceedings of the Biometrics and ID Management, 2011

Multi-site heterogeneous system fusions for the Albayzin 2010 Language Recognition Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Confidence measures for speaker segmentation and their relation to speaker verification.

[BibT_eX]

[DOI]

Carlos Vaquero

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Speaker Verification in Noisy Environment Using Missing Feature Approach.

[BibT_eX]

[DOI]

Dayana Ribas