L. Paola García-Perera

Orcid: 0000-0002-7449-5726

Affiliations:
  • Johns Hopkins University, Center for Language and Speech Processing, Baltimore, MD, USA
  • Nuance Communications, Inc. (former)
  • Agnitio S.L., Madrid, Spain (former)
  • University of Zaragoza, Spain (PhD 2014)
  • Monterrey Institute of Technology and Higher Education (ITESM), Computer Science Department, Monterrey, Mexico


According to our database1, L. Paola García-Perera authored at least 96 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024.
CoRR, 2024

Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization.
CoRR, 2024

The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization.
CoRR, 2024

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model.
CoRR, 2024

On Speaker Attribution with SURT.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Odyssey 2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results.
Proceedings of the Odyssey 2024: The Speaker and Language Recognition Workshop, 2024

Where are you from? Geolocating Speech and Applications to Language Identification.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Enhancing Code-Switching Speech Recognition With Interactive Language Biases.
Proceedings of the IEEE International Conference on Acoustics, 2024

Unidirectional Brain-Computer Interface: Artificial Neural Network Encoding Natural Images to FMRI Response in the Visual Cortex.
Proceedings of the IEEE International Conference on Acoustics, 2024

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

ConEC: Earnings Call Dataset with Real-world Contexts for Benchmarking Contextual Speech Recognition.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors.
CoRR, 2023

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios.
CoRR, 2023

Genre Classification of Books on Spanish.
IEEE Access, 2023

Investigating model performance in language identification: beyond simple error statistics.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extracters.
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023), 2023

Crosslingual Handwritten Text Generation Using GANs.
Proceedings of the Document Analysis and Recognition - ICDAR 2023 Workshops, 2023

A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters.
Proceedings of the IEEE International Conference on Acoustics, 2023

Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Reducing Language Confusion for Code-Switching Speech Recognition with Token-Level Language Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2023

PQLM - Multilingual Decentralized Portable Quantum Language Model.
Proceedings of the IEEE International Conference on Acoustics, 2023

Building Keyword Search System from End-To-End Asr Systems.
Proceedings of the IEEE International Conference on Acoustics, 2023

Adapting Self-Supervised Models to Multi-Talker Speech Recognition Using Speaker Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2023

Euro: Espnet Unsupervised ASR Open-Source Toolkit.
Proceedings of the IEEE International Conference on Acoustics, 2023

Learning From Flawed Data: Weakly Supervised Automatic Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Synthetic Data Augmentation for ASR with Domain Filtering.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
Encoder-Decoder Based Attractors for End-to-End Neural Diarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Efficient Self-Supervised Learning Representations for Spoken Language Identification.
IEEE J. Sel. Top. Signal Process., 2022

Joint speaker diarization and speech recognition based on region proposal networks.
Comput. Speech Lang., 2022

PQLM - Multilingual Decentralized Portable Quantum Language Model for Privacy Protection.
CoRR, 2022

Investigating self-supervised learning for lyrics recognition.
CoRR, 2022

Online Neural Diarization of Unlimited Numbers of Speakers.
CoRR, 2022

Enhance Language Identification using Dual-mode Model with Knowledge Distillation.
CoRR, 2022

On Compressing Sequences for Self-Supervised Speech Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Advances in Cross-Lingual and Cross-Source Audio-Visual Speaker Recognition: The JHU-MIT System for NIST SRE21.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Enhancing Language Identification Using Dual-Mode Model with Knowledge Distillation.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Investigating Self-Supervised Learning for Speech Enhancement and Separation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-Channel End-To-End Neural Diarization with Distributed Microphones.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization.
CoRR, 2021

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap.
CoRR, 2021

Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers.
CoRR, 2021

Online End-To-End Neural Diarization with Speaker-Tracing Buffer.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Training Hybrid Models on Noisy Transliterated Transcripts for Code-Switched Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semi-Supervised Training with Pseudo-Labeling for End-To-End Neural Diarization.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Language Diarization for Bilingual Code-Switching Speech.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-To-End Speaker Diarization as Post-Processing.
Proceedings of the IEEE International Conference on Acoustics, 2021

The CLIR-CLSP System for the IberSPEECH-RTVE 2020 Speaker Diarization and Identity Assignment Challenge.
Proceedings of the Fifth International Conference, 2021

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations.
Comput. Speech Lang., 2020

DNN Speaker Tracking with Embeddings.
CoRR, 2020

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge.
CoRR, 2020

Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild.
CoRR, 2020

Advances in Speaker Recognition for Telephone and Audio-Visual Data: the JHU-MIT Submission for NIST SRE19.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020


End-to-End Domain-Adversarial Voice Activity Detection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Unsupervised Feature Enhancement for Speaker Verification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Feature Enhancement with Deep Feature Losses for Speaker Verification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Diarization with Region Proposal Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Overlap-Aware Diarization: Resegmentation Using Neural End-to-End Overlapped Speech Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Analysis of Robustness of Deep Single-Channel Speech Separation Using Corpora Constructed From Multiple Domains.
Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

Multi-PLDA Diarization on Children's Speech.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Optical Character Recognition with Chinese and Korean Character Decomposition.
Proceedings of the Second International Workshop on Machine Learning, 2019

Using ASR Methods for OCR.
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

2018
Building Corpora for Single-Channel Speech Separation Across Multiple Domains.
CoRR, 2018

JHU Diarization System Description.
Proceedings of the Fourth International Conference, 2018

2017
Analysis and Description of ABC Submission to NIST SRE 2016.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

DNN Bottleneck Features for Speaker Clustering.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016
Analysis of the Impact of the Audio Database Characteristics in the Accuracy of a Speaker Clustering System.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

2015
Context-Aware Communicator for All.
Proceedings of the Universal Access in Human-Computer Interaction. Access to Today's Technologies, 2015

2013
Ensemble approach in speaker verification.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Optimization of the DET curve in speaker verification under noisy conditions.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Optimization of the DET curve in speaker verification.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

2011
Speaker Verification in Different Database Scenarios.
Computación y Sistemas, 2011

2010
Speech Magnitude-Spectrum Information-Entropy (MSIE) for Automatic Speech Recognition in Noisy Environments.
Proceedings of the 20th International Conference on Pattern Recognition, 2010


2008
Enhancing acoustic models for robust speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Robust Automatic Speech Recognition Using PD-MEEMLIN.
Proceedings of the Pattern Recognition and Image Analysis, Third Iberian Conference, 2007

2006
Using PCA to Improve the Generation of Speech Keys.
Proceedings of the MICAI 2006: Advances in Artificial Intelligence, 2006

2005
Parameter Optimization in a Text-Dependent Cryptographic-Speech-Key Generation Task.
Proceedings of the Nonlinear Analyses and Algorithms for Speech Processing, 2005

Cryptographic-Speech-Key Generation Architecture Improvements.
Proceedings of the Pattern Recognition and Image Analysis, Second Iberian Conference, 2005

Phoneme Spotting for Speech-Based Crypto-key Generation.
Proceedings of the Progress in Pattern Recognition, 2005

Multi-speaker voice cryptographic key generation.
Proceedings of the 2005 ACS / IEEE International Conference on Computer Systems and Applications (AICCSA 2005), 2005

2004
Cryptographic-Speech-Key Generation Using the SVM Technique over the lp-Cepstral Speech Space.
Proceedings of the Nonlinear Speech Modeling and Applications, 2004

SVM Applied to the Generation of Biometric Speech Key.
Proceedings of the Progress in Pattern Recognition, 2004


  Loading...