Dong Wang

Orcid: 0000-0002-1286-0644

Affiliations:
  • Tsinghua University, Center for Speech and Language Technologies, Beijing, China
  • Nuance Communications, Aachen, Germany (2011 - 2012)
  • EURECOM, Department of Multimedia Communications, Sophia Antipolis, France (2010 - 2011)
  • University of Edinburgh, CSTR, UK (PhD 2010)


According to our database1, Dong Wang authored at least 170 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Maximum Gaussianality training for deep speaker vector normalization.
Pattern Recognit., January, 2024

On evaluation trials in speaker verification.
Appl. Intell., January, 2024

Keyword Guided Target Speech Recognition.
IEEE Signal Process. Lett., 2024

AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition.
CoRR, 2024

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective.
CoRR, 2024

Full-text Error Correction for Chinese Speech Recognition with Large Language Model.
CoRR, 2024

Few-Shot Keyword Spotting from Mixed Speech.
CoRR, 2024

Serialized Output Training by Learned Dominance.
CoRR, 2024

Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models.
CoRR, 2024

CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge.
CoRR, 2024

Zero-Shot Fake Video Detection by Audio-Visual Consistency.
CoRR, 2024

An Investigation of Distribution Alignment in Multi-Genre Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

How Phonemes Contribute to Deep Speaker Models?
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Random Cycle Loss and Its Application to Voice Conversion.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

A Glance is Enough: Extract Target Sentence By Looking at A keyword.
CoRR, 2023

Zero-shot Mispronunciation Detection by Knowledge-based Data Augmentation.
Proceedings of the 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2023

CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Visualizing Data Augmentation in Deep Speaker Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Spot Keywords From Very Noisy and Mixed Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adversarial Data Augmentation for Robust Speaker Verification.
Proceedings of the 9th International Conference on Communication and Information Processing, 2023

CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
A Principle Solution for Enroll-Test Mismatch in Speaker Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

CN-Celeb: Multi-genre speaker recognition.
Speech Commun., 2022

Pay Attention to Hard Trials.
CoRR, 2022

Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion.
CoRR, 2022

Cycleflow: Purify Information Factors by Cycle Loss.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

C-P Map: A Novel Evaluation Toolkit for Speaker Verification.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Oriental Language Recognition (OLR) 2021: Summary and Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Reliable Visualization for Deep Speaker Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Real Additive Margin Softmax for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Deep Normalization for Speaker Vectors.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Can We Trust Deep Speech Prior?
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

M2ASR-MONGO: A Free Mongolian Speech Database and Accompanied Baselines.
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

KeSpeech: An Open Source Speech Dataset of Mandarin and Its Eight Subdialects.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Oriental Language Recognition (OLR) 2020: Summary and Analysis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Squeezing Value of Cross-Domain Labels: A Decoupled Scoring Approach for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Study on Decoupled Probabilistic Linear Discriminant Analysis.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

OLR 2021 Challenge: Datasets, Rules and Baselines.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

An MAP Estimation for Between-Class Variance.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
Deep Speaker Vector Normalization with Maximum Gaussianality Training.
CoRR, 2020

Deep generative LDA.
CoRR, 2020

Deep generative factorization for speech signal.
CoRR, 2020

Deep Normalization for Speaker Vectors.
CoRR, 2020

Neural Discriminant Analysis for Deep Speaker Embedding.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

ASR-Free Pronunciation Assessment.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Robust Audio-Visual Speech Enhancement Model.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

CN-Celeb: A Challenging Chinese Speaker Recognition Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

AP20-OLR Challenge: Three Tasks and Their Baselines.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
On Investigation of Unsupervised Speech Factorization Based on Normalization Flow.
CoRR, 2019

VAE-Based Regularization for Deep Speaker Embedding.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Gaussian-constrained Training for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2019

Structure Growth for Small-Footprint Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

VAE-based Domain Adaptation for Speaker Verification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

AP19-OLR Challenge: Three Tasks and Their Baselines.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Phonetic-Attention Scoring for Deep Speaker Features in Speaker Verification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Question Mark Prediction By Bert.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Phonetic Temporal Neural Model for Language Identification.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Chinese Poetry Generation with Flexible Styles.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Human and Machine Speaker Recognition Based on Short Trivial Events.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Factorization for Speech Signal.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Full-Info Training for Deep Speaker Feature Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

VV-Couplet: An open source Chinese couplet generation system.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

RACORN-K: Risk-Aversion Pattern Matching-based Portfolio Selection.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

AP18-OLR Challenge: Three Tasks and Their Baselines.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Map and Relabel: Towards Almost-Zero Resource Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
Collaborative Joint Training With Multitask Recurrent Model for Speech and Speaker Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Medical Diagnosis From Laboratory Tests by Combining Generative and Discriminative Learning.
CoRR, 2017

Full-info Training for Deep Speaker Feature Learning.
CoRR, 2017

Deep Factorization for Speech Signal.
CoRR, 2017

M2ASR: Ambitions and first year progress.
Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, 2017

Phone-aware neural language identification.
Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Deep Speaker Feature Learning for Text-Independent Speaker Verification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Memory visualization for gated recurrent neural networks in speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Memory-augmented Neural Machine Translation.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Memory-augmented Chinese-Uyghur neural machine translation.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Speaker recognition with cough, laugh and "Wei".
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

AP17-OLR challenge: Data, plan, and baseline.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

A free Kazakh speech database and a speech recognition baseline.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Enhanced neural machine translation by learning from draft.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Cross-lingual speaker verification with deep feature learning.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Deep speaker verification: Do we need end to end?
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Flexible and Creative Chinese Poetry Generation Using Neural Memory.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Similar Word Model for Unfrequent Word Enhancement in Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Local Training for PLDA in Speaker Verification.
CoRR, 2016

OC16-CE80: A Chinese-English Mixlingual Database and A Speech Recognition Baseline.
CoRR, 2016

System Combination for Short Utterance Speaker Recognition.
CoRR, 2016

Collaborative Learning for Language and Speaker Recognition.
CoRR, 2016

Weakly Supervised PLDA Training.
CoRR, 2016

Relation Classification: CNN or RNN?
Proceedings of the Natural Language Understanding and Intelligent Applications, 2016

Learning from LDA Using Deep Neural Networks.
Proceedings of the Natural Language Understanding and Intelligent Applications, 2016

Binary speaker embedding.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Max-margin metric learning for speaker recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Chinese Song Iambics Generation with Neural Attention-Based Model.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Recurrent neural network training with dark knowledge transfer.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Can Machine Generate Traditional Chinese Poetry? A Feigenbaum Test.
Proceedings of the Advances in Brain Inspired Cognitive Systems, 2016

AP16-OL7: A multilingual database for oriental languages and a language recognition baseline.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Multi-task recurrent model for true multilingual speech recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Multi-task recurrent model for speech and speaker recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Feature transformation for speaker verification under speaking rate mismatch condition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Learning ordered word representations with γ-decay dropout.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

System combination for short utterance speaker recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Detection and reconstruction of clipped speech for speaker recognition.
Speech Commun., 2015

Noisy training for deep neural networks in speech recognition.
EURASIP J. Audio Speech Music. Process., 2015

Relation Classification via Recurrent Neural Network.
CoRR, 2015

Learning from LDA using Deep Neural Networks.
CoRR, 2015

Recurrent Neural Network Training with Dark Knowledge Transfer.
CoRR, 2015

Knowledge Transfer Pre-training.
CoRR, 2015

Deep Speaker Vectors for Semi Text-independent Speaker Verification.
CoRR, 2015

An open/free database and Benchmark for Uyghur speaker recognition.
Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015

Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Learning speech rate in speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Recognize foreign low-frequency words with similar pairs.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Lasso-based reverberation suppression in automatic speech Recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Stochastic Top-k ListNet.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Cross-lingual speaker verification based on linear transform.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

Music removal by convolutional denoising autoencoder in speech recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Transfer learning for speech and language processing.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Document classification with spherical word vectors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Improved deep speaker feature learning for text-dependent speaker recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Joint Semantic Relevance Learning with Text Data and Graph Knowledge.
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, 2015

2014
Feature analysis for discriminative confidence estimation in spoken term detection.
Comput. Speech Lang., 2014

Research on generalization property of time-varying Fbank-weighted MFCC for i-vector based speaker verification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Document classification based on c.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Research on truncated speech in speaker verification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Pruning deep neural networks by optimal brain damage.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

ATVS-CSLT-HCTLab System for NIST 2013 Open Keyword Search Evaluation.
Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2014

Noisy training for deep neural networks.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

Block-wise training for i-vector.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

Document classification with distributions of word vectors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

Discriminative scoring for speaker recognition based on I-vectors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
Online Non-Negative Convolutive Pattern Learning for Speech Signals.
IEEE Trans. Signal Process., 2013

Evolutionary discriminative confidence estimation for spoken term detection.
Multim. Tools Appl., 2013

Auditory features based on Gammatone filters for robust speech recognition.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Sequential model adaptation for speaker verification.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Bottleneck features based on gammatone frequency cepstral coefficients.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Subspace models for bottleneck features.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Sequential UBM adaptation for speaker verification.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Emotional speaker verification with linear adaptation.
Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, 2013

Emotional adaptive training for speaker verification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012
Direct posterior confidence for out-of-vocabulary spoken term detection.
ACM Trans. Inf. Syst., 2012

A Comparative Study of Bottom-Up and Top-Down Approaches to Speaker Diarization.
IEEE Trans. Speech Audio Process., 2012

Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection.
J. Comput. Sci. Technol., 2012

Heterogeneous Convolutive Non-Negative Sparse Coding.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

N-gram FST Indexing for Spoken Term Detection.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Speech overlap detection and attribution using convolutive non-negative sparse coding.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection.
IEEE Trans. Speech Audio Process., 2011

Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields.
IEEE Signal Process. Lett., 2011

Parallel and Hierarchical Decision Making for Sparse Coding in Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Online Pattern Learning for Non-Negative Convolutive Sparse Coding.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Handling overlaps in spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2011

Linguistic influences on bottom-up and top-down clustering for speaker diarization.
Proceedings of the IEEE International Conference on Acoustics, 2011

An evolutionary confidence measurement for spoken term detection.
Proceedings of the 9th International Workshop on Content-Based Multimedia Indexing, 2011

2010
An Evolutionary Confidence Measure for Spotting Words in Speech Recognition.
Proceedings of the Trends in Practical Applications of Agents and Multiagent Systems, 2010

Evans, Joe Frankel, Raphaël Troncy: Direct posterior confidence for out-of-vocabulary spoken term detection.
Proceedings of the 2010 International Workshop on Searching Spontaneous Conversational Speech, 2010

CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Augmented set of features for confidence estimation in spoken term detection.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

An integrated top-down/bottom-up approach to speaker diarization.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Stochastic pronunciation modelling and soft match for out-of-vocabulary spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Term-dependent confidence for out-of-vocabulary term detection.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Stochastic pronunciation modelling for spoken term detection.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A posterior probability-based system hybridisation and combination for spoken term detection.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Posterior-based confidence measures for spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
A comparison of grapheme and phoneme-based units for Spanish spoken term detection.
Speech Commun., 2008

A posterior approach for microphone array based speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Growing bottleneck features for tandem ASR.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A comparison of phone and grapheme-based spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2008


  Loading...