2025
Skeleton and Font Generation Network for Zero-shot Chinese Character Generation.
CoRR, January, 2025
2024
DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions.
CoRR, 2024
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding.
CoRR, 2024
UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
2023
Joint optimization for attention-based generation and recognition of chinese characters using tree position embedding.
Pattern Recognit., August, 2023
QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
2021
A Multiple-Integration Encoder for Multi-Turn Text-to-SQL Semantic Parsing.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Robustness of Speech Spoofing Detectors Against Adversarial Post-Processing of Voice Conversion.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement.
Neural Networks, 2021
Adversarial Voice Conversion Against Neural Spoofing Detectors.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Tracking Interaction States for Multi-Turn Text-to-SQL Semantic Parsing.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention.
CoRR, 2020
Adversarial Post-Processing of Voice Conversion against Spoofing Detection.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020
2019
Knowledge Base Question Answering With Attentive Pooling for Question Representation.
IEEE Access, 2019
2017
Nonrecurrent Neural Structure for Long-Term Dependence.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
Towards human-like and transhuman perception in AI 2.0: a review.
,
,
,
,
,
,
,
,
,
,
,
,
Frontiers Inf. Technol. Electron. Eng., 2017
Cause-Effect Knowledge Acquisition and Neural Association Model for Solving A Set of Winograd Schema Problems.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017
Combing Context and Commonsense Knowledge Through Neural Networks for Solving Winograd Schema Problems.
Proceedings of the 2017 AAAI Spring Symposia, 2017
2016
Part-of-Speech Relevance Weights for Learning Word Embeddings.
CoRR, 2016
Probabilistic Reasoning via Deep Learning: Neural Association Models.
CoRR, 2016
Intra-Topic Variability Normalization based on Linear Projection for Topic Classification.
Proceedings of the NAACL HLT 2016, 2016
Modulation spectrum compensation for HMM-based speech synthesis using line spectral pairs.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Exploring Semantic Representation in Brain Activity Using Word Embeddings.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016
2015
State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2015
Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency.
CoRR, 2015
Keynote speech 1: Artificial intelligence needs a language cognitive revolution.
Proceedings of the 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015
Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015
2012
Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
2011
Trust Region-Based Optimization for Maximum Mutual Information Estimation of HMMs in Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2011
Boosted Mixture Learning of Gaussian Mixture Hidden Markov Models Based on Maximum Likelihood for Speech Recognition.
IEEE Trans. Speech Audio Process., 2011
The USTC System for Blizzard Challenge 2011.
Proceedings of the Blizzard Challenge 2011, Turin, Italy, September 2, 2011, 2011
2010
Robust pronunciation evaluation in adverse environments.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Phonetic clustering based confidence measure for embedded speech recognition.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Boosted mixture learning of Gaussian mixture HMMs for speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
A bounded trust region optimization for discriminative training of HMMS in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010
HMM-based pseudo-clean speech synthesis for splice algorithm.
Proceedings of the IEEE International Conference on Acoustics, 2010
The USTC System for Blizzard Challenge 2010.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010
2009
A new method for mispronunciation detection using Support Vector Machine based on Pronunciation Space Models.
Speech Commun., 2009
A trust region based optimization for maximum mutual information estimation of HMMS in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009
The USTC System for Blizzard Challenge 2009.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009
2008
Investigation on Adaptation Using Different Discriminative Training Criteria Based Linear Regression and Map.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Pronunciation Space Models for Pronunciation Evaluation.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Exploiting Non-Target Region Information for Confidence Measure Based on Bayesian Information Criterion.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Evaluation of a Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Modelon Aurora2, Aurora3, and Aurora4 Tasks.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
An Improvement for Training Efficiency of Semi-Tied Covariance.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Heteronym Verification for Mandarin Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Minimum word classification error training of HMMS for automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008
Heteroscedastic discriminant analysis with two-dimensional constraints.
Proceedings of the IEEE International Conference on Acoustics, 2008
2006
The Application of Phone Weight in Putonghua Pronunciation Quality Assessment.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
An HMM Compensation Approach Using Unscented Transformation for Noisy Speech Recognition.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
A Comparative Study on Confidence Measure in Mandarin Command Word Recognition.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Automatic Mandarin pronunciation scoring for native learners with dialect accent.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
2005
A Novel Source Analysis Method by Matching Spectral Characters of LF Model with STRAIGHT Spectrum.
Proceedings of the Affective Computing and Intelligent Interaction, 2005
2004
Modeling glottal effect on the spectral envelop of STRAIGHT using mixture of Gaussians.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Hearer model based stress prediction for Chinese TTS system.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Compression of speech database by feature separation and pattern clustering using STRAIGHT.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Polynomial regression model for duration prediction in Mandarin.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
2002
Decision tree based unit pre-selection in Mandarin Chinese synthesis.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002
A new method of building decision tree based on target information.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
A miniature Chinese TTS system based on tailored corpus.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
2000
Automatic Segmentation and Labeling of Speech Corpus Based on HMM With Adaptation.
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000
Prosody generation in Chinese synthesis using the template of quantified prosodic unit and base intonation contour.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
KD2000 Chinese Text-To-Speech System.
Proceedings of the Advances in Multimodal Interfaces, 2000