2024
Maximum Gaussianality training for deep speaker vector normalization.
Pattern Recognit., January, 2024
On evaluation trials in speaker verification.
Appl. Intell., January, 2024
Keyword Guided Target Speech Recognition.
IEEE Signal Process. Lett., 2024
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition.
CoRR, 2024
Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective.
CoRR, 2024
Full-text Error Correction for Chinese Speech Recognition with Large Language Model.
CoRR, 2024
Few-Shot Keyword Spotting from Mixed Speech.
CoRR, 2024
Serialized Output Training by Learned Dominance.
CoRR, 2024
Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models.
CoRR, 2024
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge.
CoRR, 2024
Zero-Shot Fake Video Detection by Audio-Visual Consistency.
CoRR, 2024
An Investigation of Distribution Alignment in Multi-Genre Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
How Phonemes Contribute to Deep Speaker Models?
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
Random Cycle Loss and Its Application to Voice Conversion.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023
A Glance is Enough: Extract Target Sentence By Looking at A keyword.
CoRR, 2023
Zero-shot Mispronunciation Detection by Knowledge-based Data Augmentation.
Proceedings of the 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2023
CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Visualizing Data Augmentation in Deep Speaker Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Spot Keywords From Very Noisy and Mixed Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Adversarial Data Augmentation for Robust Speaker Verification.
Proceedings of the 9th International Conference on Communication and Information Processing, 2023
CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023
2022
A Principle Solution for Enroll-Test Mismatch in Speaker Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
CN-Celeb: Multi-genre speaker recognition.
Speech Commun., 2022
Pay Attention to Hard Trials.
CoRR, 2022
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion.
CoRR, 2022
Cycleflow: Purify Information Factors by Cycle Loss.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
C-P Map: A Novel Evaluation Toolkit for Speaker Verification.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Oriental Language Recognition (OLR) 2021: Summary and Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Reliable Visualization for Deep Speaker Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Real Additive Margin Softmax for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Deep Normalization for Speaker Vectors.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Can We Trust Deep Speech Prior?
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
M2ASR-MONGO: A Free Mongolian Speech Database and Accompanied Baselines.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021
KeSpeech: An Open Source Speech Dataset of Mandarin and Its Eight Subdialects.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021
Oriental Language Recognition (OLR) 2020: Summary and Analysis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Squeezing Value of Cross-Domain Labels: A Decoupled Scoring Approach for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021
A Study on Decoupled Probabilistic Linear Discriminant Analysis.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
OLR 2021 Challenge: Datasets, Rules and Baselines.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
An MAP Estimation for Between-Class Variance.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
2020
Deep Speaker Vector Normalization with Maximum Gaussianality Training.
CoRR, 2020
Deep generative factorization for speech signal.
CoRR, 2020
Deep Normalization for Speaker Vectors.
CoRR, 2020
Neural Discriminant Analysis for Deep Speaker Embedding.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
ASR-Free Pronunciation Assessment.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
A Robust Audio-Visual Speech Enhancement Model.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
CN-Celeb: A Challenging Chinese Speaker Recognition Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
AP20-OLR Challenge: Three Tasks and Their Baselines.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020
2019
On Investigation of Unsupervised Speech Factorization Based on Normalization Flow.
CoRR, 2019
VAE-Based Regularization for Deep Speaker Embedding.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Gaussian-constrained Training for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2019
Structure Growth for Small-Footprint Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
VAE-based Domain Adaptation for Speaker Verification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
AP19-OLR Challenge: Three Tasks and Their Baselines.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Phonetic-Attention Scoring for Deep Speaker Features in Speaker Verification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Question Mark Prediction By Bert.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
Phonetic Temporal Neural Model for Language Identification.
IEEE ACM Trans. Audio Speech Lang. Process., 2018
Chinese Poetry Generation with Flexible Styles.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Human and Machine Speaker Recognition Based on Short Trivial Events.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Deep Factorization for Speech Signal.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Full-Info Training for Deep Speaker Feature Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
VV-Couplet: An open source Chinese couplet generation system.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
RACORN-K: Risk-Aversion Pattern Matching-based Portfolio Selection.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
AP18-OLR Challenge: Three Tasks and Their Baselines.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
Map and Relabel: Towards Almost-Zero Resource Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
2017
Collaborative Joint Training With Multitask Recurrent Model for Speech and Speaker Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
Medical Diagnosis From Laboratory Tests by Combining Generative and Discriminative Learning.
CoRR, 2017
Full-info Training for Deep Speaker Feature Learning.
CoRR, 2017
Deep Factorization for Speech Signal.
CoRR, 2017
M2ASR: Ambitions and first year progress.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, 2017
Phone-aware neural language identification.
Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, 2017
A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Deep Speaker Feature Learning for Text-Independent Speaker Verification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Memory visualization for gated recurrent neural networks in speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Memory-augmented Neural Machine Translation.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017
Memory-augmented Chinese-Uyghur neural machine translation.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Speaker recognition with cough, laugh and "Wei".
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
AP17-OLR challenge: Data, plan, and baseline.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
A free Kazakh speech database and a speech recognition baseline.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Enhanced neural machine translation by learning from draft.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Cross-lingual speaker verification with deep feature learning.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Deep speaker verification: Do we need end to end?
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
Flexible and Creative Chinese Poetry Generation Using Neural Memory.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017
2016
Similar Word Model for Unfrequent Word Enhancement in Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Local Training for PLDA in Speaker Verification.
CoRR, 2016
OC16-CE80: A Chinese-English Mixlingual Database and A Speech Recognition Baseline.
CoRR, 2016
System Combination for Short Utterance Speaker Recognition.
CoRR, 2016
Collaborative Learning for Language and Speaker Recognition.
CoRR, 2016
Weakly Supervised PLDA Training.
CoRR, 2016
Relation Classification: CNN or RNN?
Proceedings of the Natural Language Understanding and Intelligent Applications, 2016
Learning from LDA Using Deep Neural Networks.
Proceedings of the Natural Language Understanding and Intelligent Applications, 2016
Binary speaker embedding.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Max-margin metric learning for speaker recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Chinese Song Iambics Generation with Neural Attention-Based Model.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016
Recurrent neural network training with dark knowledge transfer.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Can Machine Generate Traditional Chinese Poetry? A Feigenbaum Test.
Proceedings of the Advances in Brain Inspired Cognitive Systems, 2016
AP16-OL7: A multilingual database for oriental languages and a language recognition baseline.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Multi-task recurrent model for true multilingual speech recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Multi-task recurrent model for speech and speaker recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Feature transformation for speaker verification under speaking rate mismatch condition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Learning ordered word representations with γ-decay dropout.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
System combination for short utterance speaker recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
2015
Detection and reconstruction of clipped speech for speaker recognition.
Speech Commun., 2015
Noisy training for deep neural networks in speech recognition.
EURASIP J. Audio Speech Music. Process., 2015
Relation Classification via Recurrent Neural Network.
CoRR, 2015
Learning from LDA using Deep Neural Networks.
CoRR, 2015
Recurrent Neural Network Training with Dark Knowledge Transfer.
CoRR, 2015
Knowledge Transfer Pre-training.
CoRR, 2015
Deep Speaker Vectors for Semi Text-independent Speaker Verification.
CoRR, 2015
An open/free database and Benchmark for Uyghur speaker recognition.
Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015
Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015
Learning speech rate in speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Recognize foreign low-frequency words with similar pairs.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Lasso-based reverberation suppression in automatic speech Recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Stochastic Top-k ListNet.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015
Cross-lingual speaker verification based on linear transform.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015
Music removal by convolutional denoising autoencoder in speech recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Transfer learning for speech and language processing.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Document classification with spherical word vectors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Improved deep speaker feature learning for text-dependent speaker recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Joint Semantic Relevance Learning with Text Data and Graph Knowledge.
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, 2015
2014
Feature analysis for discriminative confidence estimation in spoken term detection.
Comput. Speech Lang., 2014
Research on generalization property of time-varying Fbank-weighted MFCC for i-vector based speaker verification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Document classification based on c.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Research on truncated speech in speaker verification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Pruning deep neural networks by optimal brain damage.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
ATVS-CSLT-HCTLab System for NIST 2013 Open Keyword Search Evaluation.
Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2014
Noisy training for deep neural networks.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014
Block-wise training for i-vector.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014
Document classification with distributions of word vectors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014
Discriminative scoring for speaker recognition based on I-vectors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014
2013
Online Non-Negative Convolutive Pattern Learning for Speech Signals.
IEEE Trans. Signal Process., 2013
Evolutionary discriminative confidence estimation for spoken term detection.
Multim. Tools Appl., 2013
Auditory features based on Gammatone filters for robust speech recognition.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
Sequential model adaptation for speaker verification.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Bottleneck features based on gammatone frequency cepstral coefficients.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Subspace models for bottleneck features.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Sequential UBM adaptation for speaker verification.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013
Emotional speaker verification with linear adaptation.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013
Emotional adaptive training for speaker verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
2012
Direct posterior confidence for out-of-vocabulary spoken term detection.
ACM Trans. Inf. Syst., 2012
A Comparative Study of Bottom-Up and Top-Down Approaches to Speaker Diarization.
IEEE Trans. Speech Audio Process., 2012
Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection.
J. Comput. Sci. Technol., 2012
Heterogeneous Convolutive Non-Negative Sparse Coding.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
N-gram FST Indexing for Spoken Term Detection.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Speech overlap detection and attribution using convolutive non-negative sparse coding.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
2011
Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection.
IEEE Trans. Speech Audio Process., 2011
Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields.
IEEE Signal Process. Lett., 2011
Parallel and Hierarchical Decision Making for Sparse Coding in Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Online Pattern Learning for Non-Negative Convolutive Sparse Coding.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Handling overlaps in spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2011
Linguistic influences on bottom-up and top-down clustering for speaker diarization.
Proceedings of the IEEE International Conference on Acoustics, 2011
An evolutionary confidence measurement for spoken term detection.
Proceedings of the 9th International Workshop on Content-Based Multimedia Indexing, 2011
2010
An Evolutionary Confidence Measure for Spotting Words in Speech Recognition.
Proceedings of the Trends in Practical Applications of Agents and Multiagent Systems, 2010
Evans, Joe Frankel, Raphaël Troncy: Direct posterior confidence for out-of-vocabulary spoken term detection.
Proceedings of the 2010 International Workshop on Searching Spontaneous Conversational Speech, 2010
CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Augmented set of features for confidence estimation in spoken term detection.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
An integrated top-down/bottom-up approach to speaker diarization.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Stochastic pronunciation modelling and soft match for out-of-vocabulary spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2010
2009
Term-dependent confidence for out-of-vocabulary term detection.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Stochastic pronunciation modelling for spoken term detection.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
A posterior probability-based system hybridisation and combination for spoken term detection.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Posterior-based confidence measures for spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2009
2008
A comparison of grapheme and phoneme-based units for Spanish spoken term detection.
Speech Commun., 2008
A posterior approach for microphone array based speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Growing bottleneck features for tandem ASR.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
A comparison of phone and grapheme-based spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2008